2 Hypothesis Testing
2 Hypothesis Testing
2.1 Introduction
We look at the classical frequen-
We wish to obtain the mean and variance of data, with condence in our conclusions. tist hypothesis testing approach in
We have assumed so far that the distribution is known: given parameters we deduce this chapter, and brie
y develop
something about data. A more common scenario involves being given data: we must Bayesian approaches in the next.
now induce the distribution: we check a variety of dierent models which attempt to
explain how data would look given specic values for the parameters involved. Sta-
tistical inference refers to estimation of population parameters given a small sample,
usually for the purpose of hypothesis testing (i.e. is a parameter within specied
bounds? If so, does it conrm/reject some hypotheis?). Estimation is about identi- Asking for and = getting point
fying how the system (model) behaves in untested situations. Knowing the essentials estimates . We could also ask for
of statistical inference, we can handle ANOVA, regression, contingency tables etc. interval estimates: x some multi-
Example: Given a set of marks, do they follow N (; 2 )? What are and ? ple of .
What is the sampling distribution of the mean? Given X1 , X2 , ... Xn , estimate = E [X ] is the population
P mean
E [X ] = and 2 .
There may have been many sets of samples of size n, each would and X = (1=n) ni=1 Xi is the
sample mean.
19
20 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing
have resulted in its own X (X1 ; X2 etc). Obviously X is itself a random variable and
has its own distribution. Therefore X = (X1 + X2 + ::: + Xn )=n
n n
1X
E [Xi ] = n1
X
E X =
n i=1
= (2.1)
i=1
Hence the expectation of the sample mean = population mean of each observation.
This is convenient: even though each measurement ought to be at or near the true
underlying value, by measuring many times and averaging, we get a better way to E [X ] = E X =
estimate .
Hence Pn
i=1 (Xi X )2 2 (n 1)2 E X =
E n
= 2
n
=
n n 1 2
If were to be dened for one sampling attempt as S 2 (Xi ) = E S 2 (Xi ) =
Pnthe sample variance n
i=1 (Xi X ) =n, then on many such sampling attempts, the average (expected)
2
Estimators: Of all the possible estimators of , we have used X = minimum variance Can you see this from the above
estimator of . Hence if X is used, we are on average getting a sample variance derivation?
slightly lower than the population variance and hence the n=(n 1) correction factor
is required. Therefore to get an unbiased estimate of 2 , a preferable form of the
sample variance is Pn
(X X )2
S (X ) = i=1 i
2
n 1
This denominator (n 1) is sometimes referred to as the degrees of freedom. In this
case the n sampling attempts contribute n degrees of freedom, with one being used
to estimate the mean.
Sampling distributions
Given that we expect to have limited data, parameter estimation may be assumed to
fall into one of three scenarios:
1. Estimation of the mean with known.
2. Estimation of the mean with unknown.
3. Estimation of .
These situations are discussed in the next three sections.
t follows a t distribution with n 1 degrees of freedom. The 100 uth percentile d td;0:975 Z0:975
of a t distribution with d degrees of freedom is denoted td;u and P (td < td;u ) u.
Thus, t20;0:95 = 95% percentile (upper 5%) of a t distribution with 20 degrees of 4
2.776 1.960
freedom. The t distribution is a symmetric distribution and the dierence from a 9
2.262 1.960
N (0; 1) distribution is greatest for n < 30. Hence when is unknown, use the t 29
2.045 1.960
statistic instead of the Z statistic. This t statistic should follow a t distribution with 60
2.000 1.960
df = n 1. E [t] = 0, Var [t] > 1 and ! 1 as n ! 1. That is, as n ! 1, t ! N (0; 1). 1 1.960 1.960
For n > 30, N (0; 1) may be used
to approximate the t distribution.
Interval estimate: For a given n, 100%(1 ) of the t statistics should fall between
Density
;=2 and
1
S= n S= n n ; =2
11
S S
X tn ; =2 p < < X tn ;=2 p
tn−−1 α 2 0 tn−−1 1−−α 2
11
n 1
n
But a t distribution is symmetric and hence tn 1;=2 = tn 1;1 =2 and Figure 2.2. The t test.
S S
X tn ; =2 p < < X + tn ; =2 p
11
n 11
n
For n < 30, the 100% (1 )th
and hence condence interval of =
p
S S X tn ; =2 S= n
P X tn =2 p < < X
+ tn =2 p =1 (2.11) 11
; ;
11
n 11
n
For n > 30, tn 1;1 =2 Z1 =2 .
Therefore the 100%(1 )th condence interval for is X tn 1;1 =2
p
S= n. This
implies that (1 )100 (e.g. 95%) of condence intervals constructed from sample
sizes n will have within these bounds. For large n (at least > 30, sometimes
n > 200), approximate the t distribution with the Normalpdistribution. Hence the
100% (1 )th condence interval of = X Z1 =2 S= p n.
The width of a condence interval = 2tn 1;1 =2 S= n = f (n; S; ). As n in-
creases, CI width decreases. As 1 increases, CI width increases and therefore in
increasing 1 from 0.95 to 0.99, a larger CI will be needed. The CI width also
increases with increasing S . S can be decreased by replacing one measurement with
the mean of replicates for example.
But
X (Xi X )2 X
S2 =
(n 1)
) (n 1)S 2 = (Xi X )2 2 2n 1
i
and hence
2
S2 2n (2.13)
n 1 1
Similarly,
X Var [X ] np(1 p) p(1 p)
Var [^p] = Var = = = (2.18)
n n2 n2 n
p
The standard error of p^ is pq=n and hence p^ is an unbiased
p
estimator of the popu-
lation parameter p for any pn. The standard error is pq=n and is estimated (given
that we do not know p) as p^q^=n.
2.2: Hypothesis testing CL688
c SBN, 2009 25
Normal theory method Assuming that a point estimate of p (= p^) is available, The number of successes in n
what is the interval estimate? IfPthe Normal distribution may be used to approximate Bernoulli trials is X = np^ which
the binomial then p^ = X=n = i Xi =n is normally distributed with mean = p and implies that X is a Binomial ran-
variance is pq=n. For n Bernoulli trials, each with a mean p and variance p(1 p), dom variable with parameters n
then for large n, CLT indicates that p^ = X = normally distributed with mean = p and p and hence np^ = X
and variance 2 =n = pq=n, or p^ N (p; pq=n). N (np; npq).
The normal approximation to the Binomial distribution was assumed to be valid
if npq 5. However, here p (and q ) are unknown. Therefore we estimate p by p^ and
q by 1 p^. To evaluate the 100% (1 ) condence interval for p,
r r
pq pq
P p Z1 =2 < p^ < p + Z1 =2 =1
n n
r r
pq pq
p Z1 =2 < p^ and p^ < p + Z1 =2
n n
both of which are quadratics in p. Rather than solve these, we can use pq=n p^q^=n,
and rearrange the inequalities to give Use the normal approximation to
r r ! the binomial if np^q^ 5.
p^q^ p^q^
P p^ Z1 =2 < p < p^ + Z1 =2 =1 (2.19)
n n
and hence the 100% (1 ) condence interval for p is
" r r #
p^q^ p^q^
p^ Z1 =2 ; p^ + Z1 =2
n n
The maximum error of the estimate is
r
jp^ pj = Z1 =2 p(1n p)
The sample size to achieve 100% (1 ) condence interval is For p = q = 1=2,
2 2
Z1 =2 1 Z1 =2
n = p(1 p) (2.20) n=
jX j 4 jX j
Exact method of interval estimation A 100% (1 ) condence interval for p
is given by [p1 ; p2 ] where
n
X
P (X xjp = p1 ) = = n C pk (1
k 1 p1 )n k
2 i=1
n
X
P (X xjp = p2 ) = = n C pk (1
k 2 p2 )n k (2.21)
2 i=1
The problem in using this is in computing the summations on the right.
It is more dicult to discriminate between the two means. If is xed, the only way
to increase 1- is to increase (i.e. move the critical value to the left, in the plot).
α 2 α 2
The only way to decrease is to decrease 1 . In the extreme, type I errors may β
be avoided, by always rejecting H1 . µ0 µ1
There is a practical problem: If 0 and 1 are xed (1 may be unknown, but is
assumed constant, and so is considered xed) then is xed. Then when is chosen,
the power may be small. The solution to this is to sample more: if the variances of the
Figure 2.4. Increasing the sample size
results in more power.
2.2: Hypothesis testing CL688
c SBN, 2009 27
Procedure: One sample test for µ: lower one sided test: To test H0 : =
0 ; unknown vs. H1 : < 0 ; unknown, using a signicance level of ,
Find the test statistic t
X p0
t=
s= n
and if t < tn 1; reject H0 , and if t tn 1; , accept H0 .
tn 1; is a critical value. For t < tn 1; H0 is rejected and for t tn 1; , H0 is
accepted.
Method 1: The critical value method of hypothesis testing depends on (type
I error). The level of used should depend on the relative importance of type
I and type II errors. For xed n, as increases, decreases and vice versa.
Usually = 0.05.
Method 2: Instead of performing the critical value test at various (and ob-
serving whether H0 is accepted or rejected at each value), perform the test at
all values by obtaining the p value.
Hypothesis testing clearly indi-
p value: The p value is the level when no decision can be made between accept- cates p values which are indicative
ing H0 and rejecting H1 . This is the level at which t is the borderline between of when signicance ends. However
acceptance and rejection. Hence there is indierence with respect to H0 if t = tn 1;p sometimes statistical signicance is
and p value = P (tn 1 t). Therefore the p value is indicative of the signicance a result of large n.
level. A 95% condence interval gives a
Range Implication range of values for and is hence
0:01 p < 0:05 results are signicant informative. It however does not
0:001 p < 0:01 results are highly signicant reveal anything about signicance
p < 0:001 results are very highly signicant at a higher %.
p > 0:05 results are not statistically signicant Hence both p values and a 95%
0:05 < p < 0:1 one may only consider trends condence interval for must be
computed. The medical commu-
nity usually uses CI = 95%.
28 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing
y2
2. p value method: Find the p value (P (tn 1 t)). Then H0 is rejected if p < 0:05 p 2 p 2
(statistically signicant result), else H0 is accepted. This method is easier than
the critical value method and is slightly more informative because it gives an t 0
exact value of p (whereas the critical value method only gives an approximate
value of p).
Figure 2.5. The p value.
p value):
Scores example (critical H0 : = p120 vs. H1 : < 120. using = 0:05
then t = (X 0 )=s= n = (115 120)=24= 100 = -2.08. tn 1; = t99;0:05 = -
1.66. Since -2.08<-1.66 (t < tn 1; ), H0 is rejected at a signicance level of 0.05. At
= 0:01, t99;0:01 = 2:36 and tn 1; < t and H0 should be accepted at signicance
level of 0.01. The p value is P (tn 1 < t) and hence P (t99 < 2:08) = 0.020 (which is
a statistically signicant result).
Scores example (modified): Suppose 10,000
p scores were measured p with mean =
119 and s = 24. Then t = (X 0 )=s= n = (119 200)=24= 10000 = -4.17.
P (tn 1 < 4:17)= P (t9999 < 4:17). But a t distribution with df = 9999 is ' N (0; 1)
and hence the p value is ( 4:17) < 0:001 and hence this is a very highly signicant
result. However, from a practical standpoint, X = 119 ' 0 = 120 and hence the
result is scientically insignicant even though it is statistically signicant. As statistics became increasingly
Conversely, statistically insignicant results may become scientically signicant, applied to policy making by politi-
with more sampling. cians in the late 1800's, a quote
(often attributed to Mark Twain)
2.2.3 One sample t test for the mean (upper one sided) which became increasingly popular
was: \There are three types of lies:
H0 : p= 0 vs. H1 : > 0 ), with a signicance level of . Using t = (X lies, damned lies, and statistics."
0 )=s= n,
If t > tn 1;1 then H0 is rejected.
If t tn 1;1 then H0 is accepted.
The p value for this test is p = P (tn 1 > t).
2.2.4 One sample test for , unknown variance: two sided test
We have assumed a priori knowledge about sidedness: national scores have been
assumed to be greater than area scores. If H0 is untrue, we are unsure as to which
side of 0 the alternative mean may fall on. A two tailed test for the mean is required.
under the alternative hypothesis is > or < from H0 .
Scores example: H0 : area score = national score ( = 0 ). H1 : 6= 0 . The best
test depends on X (or t). We reject H0 if t is either too small or too large: reject H0
if t < C1 or if t > C2 , and accept H0 otherwise (C1 t C2 ). C1 = tn 1;=2 and
C2 = tn 1;1 =2 .
p
Summary: One sample test for µ, unknown variance: two sided test H0 :
= 0 vs. H1 : 6= 0 , with signicance level . Compute t = (X 0 )=S= n.
If jtj > tn 1;1 =2 , reject H0 .
If jtj tn 1;=2 accept H0 .
The p value may be computed as follows: For t 0, p = 2 P (tn 1 t) = twice left
hand tail area. For t > 0, p = 2 [1 P (tn 1 t)] = area to right of t + area to left
of t = twice right hand tail area.
For large n (> 30), the t distribution percentile tn 1;1 =2 may be approximated
with the corresponding percentile of N (0; 1) i.e. Z1 =2 . The p value may then be
computed from P (N (0; 1) < t) = (t).
2.2: Hypothesis testing CL688
c SBN, 2009 29
One-sided vs. two-sided tests Usually one sided tests are used: sample means
hopefully fall on one expected side of 0 . A two-sided test can always be used: 6= 0 Use the one sample t test if
is also implied by < 0 . The two-sided approach is more conservative; we do not
have to guess the appropriate side. If only one-sidedness is expected, the one sided there is one variable of inter-
test should be used: it has more power, and hence it is easier to reject H0 with nite est,
samples, if H1 is true. DO NOT change from two sided to one sided tests AFTER the underlying distribution is
looking at data. normal or CLT is assumed to
Two sided one sample Z test t tests assume hold,
p that is unknown. If is known,
then t may be replaced by Z = (X 0 )== n, and critical values based on the tn 1 an inference concerning is
distribution may be replaced with corresponding values of N (0; 1) distribution. required, and if
2.2.5 Power of a one sample test for is NOT known.
This calculation needs to be done when planning a study. Usually data is not available
and at best a pilot study with a small sample size is performed.
Example: A 10 patient pilot study for a glaucoma drug is performed by measuring
intra ocular pressure (IOP). In the pilot study, the mean IOP decreases on using the
drug, by 5 mm Hg (standard deviation (SD) of 10 mm Hg). Are 100 people enough
for the real study?
Soln: Power = probability of declaring that the drug makes a dierence, with sample
size 100, if the true mean IOP drop is 5 mm Hg with SD = 10 mm Hg. If the
power turns out to be greater than 80%, the larger study may be performed. Since You want power to be at least 80%.
= 10 mm Hg = known, the one sided Z test may be used with H0 : = 0 vs.
H1 : 1 < 0 . Then, for a signicance level of ; H0 is rejected if Z < Z . Notice
that this test does not depend on the alternative mean 1 as long as < 0 .
The power of a test = 1 P (type II error) = 1 . Power = P (reject H0 jH0 is false). H0 H1
For a one sample, lower one sided test, power = P (Z < Z j = 1 )
X p0
=P < Z = 1 = P X < 0 + Z p = 1
y2
= n n
But under H1 , X N (1 ; 2 ) and hence the power is
" p #
0 + Z = n 1
p 1 p µ0 + Zασ n
= Z + 0 n (2.22)
= n
The power is indicative of how likely a signicant dierence would be found given H1 Figure 2.6. Power of a one sample,
is true. if the power is small, there is a small chance of nding a signicant dierence upper one-sided test.
even if H1 is true (i.e., the true mean is in reality dierent from the null mean). Scores example: Assuming 1 =
For an upper one sided test, power = P ( > 0 +Z1 =pnj = 1 ) = 1 P (X <
X 115; = 24 and = 0:05. Then
p
0 + Z1 = nj = 1 ) since 0 = 120, 1 = 115, = 24,
+ Z1 p
p
= n 1
( 1 ) n
p = 0:05 and n = 100, the power
=1 0 = 1 Z1 + 0 is
= n
Using ( x) = 1 (x) and Z = Z1 gives (for 1 > 0 ) (120 115)
p p Z0:05 + p = (0:438)
(1 0 ) n (1 0 ) n 24= 100
Z1 + = Z + (2.23)
=0.669. Hence there is only a 67%
Summary: power of a one sided one sample Z test: The Z test is used for chance of detecting a signicant
the mean of a normal distribution with known variance. In general, H0 : = 0 vs. dierence using a 5% signicance
H1 : = 1 . level, with sample size = 100.
p p
Power = Z + j0 1 j n= = ( Z1 + j0 1 j n= ) and hence the
power depends on , j0 1 j, n and .
As decreases, Z decreases and hence power decreases.
As j1 0 j increases, the power increases.
As increases, the power decreases.
As n increases, the power increases.
A power curve can be drawn through various 1 given = 0:05, = 24, n = 100,
0 = 120.
30 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing
p
0 + Z=2 = n 1
p
0 + Z1 =2 = n 1
= p
= n
+1 p
= n
( ) n
p p
( ) n
= Z=2 + 0 1
+ 1 Z1 =2 + 1 0
But 1 (x) = ( x) and hence the power is
(0 1 ) n
p
(1 0 ) n
p
Z=2 + + Z1 =2 +
But Z=2 = Z1 =2 and therefore the power is
(0 1 ) n
p
0 ) n (1
p
Z1 =2 + + Z1 =2 +
For 1 < 0 , the second curve is usually negligible, and for 1 > 0 , the rst term is
negligible. Therefore the power may be approximated by
j p
0 1 j n
Z1 =2 + (2.24)
n=
(Z1 + Z1 )2 2
(2.25)
as 2 increases, n increases.
(0 1 )2 as decreases, Z1 in-
Notice the symmetry. This test holds for either a one sided upper or one sided lower creases, hence n increases.
test. Notice that n would depend (rather sensitively) on (0 1 )2 . Usually 0 as required power increases
is known, = 0:05, power > 80%. Appropriate values of 1 and 2 are usually (1 increases) then n in-
unknown. Some guesses may be made using prior knowledge. creases.
The term j0 1 j must be evaluated for scientic signicance. A pilot study
may be done (with small n) to get a feel for 1 and 2 , usually when the investigator as j0 1 j increases, n de-
is `convinced' that H1 is right, and = 1 and not = 0 . (For example, when creases.
alternate evidence indicates that a drug should work).
Sample size determination for a two sided test For a two sided test with
known, with desired signicance level and a desired power 1 , from the power
of a two sided test,
1 =
Z1
j0 1 jpn = (Z
=2 + )
1
2.3: One sample test (2 test) for variance of a normal distribution (two sided) CL688
c SBN, 2009 31
and hence
(Z1 + Z1 =2 )2 2
n= (2.26)
(0 1 )2
Note that n for a two sided test is greater than the n for a one sided test (because
Z1 =2 > Z1 ).
2.3 One sample test (2 test) for variance of a normal distribution
(two sided)
H0 : 2 = 02 vs. H1 : 2 6= 02 . If measurements X1 :::Xn are random samples, then
S 2 may be used as an unbiased estimator of 2 . If these measurements are from a
normal distribution N (; 2 ), then H0 implies that
X (n 1)S 2
Xi2 =
02
2n 1
i
and therefore
P (X 2 < 2n 1;=2 ) = 2 = P (X
2
> 2n 11 ; =2 ) (2.27)
Hence H0 may be accepted for 2n 1;=2 X 2 2n 1;1 =2 and rejected otherwise
P
(Note that i Xi2 = (n 1)s2 =02 according to H0 ).
The p value for the two sided test depends on whether S 2 02 or whether S 2 > 02 .
If S 2 02 , then p value = twice area to left of X 2 under a 2n 1 distribution. If
S 2 > 02 , then p value = twice area to right of X 2 under a 2n 1 distribution.
Example: The average prevalence of breast cancer is 2%. The average prevalence of
breast cancer in women whose mothers had breast cancer = 4/100 = 4%. Is the 4%
important with respect to the 2% (i.e. is there a hereditary aspect to breast cancer)?
Then if p is the prevalence rate of cancer patients whose mothers had cancer, we can
test H0 : p = 0:02 vs. H1 : p 6= 0:02.
We will use a sample proportion of cases p^. Assuming the normal approximation
to the binomial is valid (np0 q0 5) where p0 is the prevalence rate according to
H0 and q0 = 1 p0 . Then, under H0 , p^ p N (p0 ; p0 q0 =n). Standardizing p^ using
Z = (^p p0 )=standard error = (^p p0 )= p0 q0 =n gives Z N (0; 1) under H0 .
Therefore,
P (Z < Z=2 ) = P (Z > Z1 =2 ) = =2
H0 is rejected if Z < Z=2 or Z > Z1 =2 .
H0 is accepted if Z=2 Z Z1 =2 .
The p value of the test depends on whether p^ > p0 or p^ p0 . If p^ < p0 then p value
= 2 (Z ) If p^ p0 then p value = 2 (1 (Z ))
Power of a one sample, two sided binomial test: The hypotheses are usually
written as H0 : p = p0 vs. H1 : p 6= p0 . For a specic alternative p1 , using the
normal theory approach where np0 q0 5, power =
r
p0 q0
jp p j n p
Z=2 + 0 p 1 (2.28)
p1 q1 p0 q0
32 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing
Sample size for a one sample, two sided binomial test: H0 : p = p0 vs.
H1 : p 6= p0 . Then the sample size n should be
h p i2
p0 q0 Z1 =2 + Z1 (p1 q1 )=(p0 q0 )
n= (2.29)
(p1 p0 )2
One sided one sample binomial tests: Replace =2 by in the above two equa-
tions (for sample size and power).
H0 : = 0 vs H1 : 6= 0
If the number of nonzero di = n 20, and the number of di 's when di > 0 is c, then
reject H0 if
r r
n
1 n n 1 n
c > c2 = + + z1 =2 c < c1 = z1 =2
2 2 4 2 2 4
The p value for the sign test Normal theory method
" ( )#
c (n=2) (1=2)
p=2 1 p if c n=2
n=4
" ( )#
c (n=2) + (1=2)
p=2 1 p if c n=2
n=4
Alternative formulae for p value:
p=2 1
j C pD j 1
n
where c = number of di > 0, D = number of di < 0. This is called the sign test because
The sign test is a special case of the 1- sample binomial test: it is only looking for the sign.
H0 : p = 1=2 vs H1 : p 6= 1=2 We assume a a large number of
samples.
Assuming that the normal approximation to the binomial is valid, under H0 : p = 1=2,
E [c] = np = n=2 and var(c) = npq = n=4 ) c N (n=2; n=4).
Example Two ointments preventing sunburn are to be evaluated. Ointment A is
applied on one arm, B on other, and we measure redness for 45 people. 22 are better
o on arm A, 18 on arm B, 5 equally well o. ) there are 40 untied pairs and
c = 18 < n=2 = 20 z0:975 = 1:96
r r
1n n 40 1 40
c2 = + +z = + + z0:975 = 26:7
2 2 1 =2 4 2 2 4
r
n 1 n
c1 = z = 13:3:
2 2 1 =2 4
38 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing
13:3 c = 18 26:7
H0 is accepted using a two sided
(18 20 + 1=2) test, = 0:05.
p = 2 [ ] = 2 ( 0:47) = 2 0:3176 = 0:635
40=4
) not statistically signicant and hence both ointments are equally eective.
Other method:
z=
j C pD j 1 = j 18 p22 j 1 = p3 = 0:47
n 40 40
) p = 2 [1 (0:47)] = 0:635
P
If c > n2 , p = 2 P nk=c n Ck (1=2)n
Sign test: Exact method
This is a special case of one sample binomial tests, where for small samples, H0 : p = If c < n2 , p = 2 ck=0 n Ck (1=2)n
1=2 vs. H1 : p 6= 1=2 If n < 20 we need to use exact binomial probabilities rather If c = n=2, p = 1:0.
than the normal approximation.
Test procedure: normal approx., two sided, significance level α Rank the
dierences using the procedure provided above. Compute the rank sum R1 of the
position dierences. If there are no ties: i.e. no group of dierences with same
absolute value h i
j R1 n(n+1)
4
j 1
2
T= q
n(n+1)(2n+1)
24
If there are ties, let ti refer to the number of dierences with same absolute value in Use this procedure only if n 16.
the ith tied group and g = number of tied groups. The dierence scores are assumed
h i to have an undergoing continuous
j R1 n(n+1)
4
j 1
2 symmetric distribution.
T= q g (2.49)
[n(n + 1)(2n + 1)=24] i=1 (t3i ti )=48
If T > z1 =2 then reject H0 . The p value for the test is = 2 (1 (T )).
Burns example: En = number of non zero dierences = 22 + 18 = 40 16 ) we
can use the normal approximation. Compute the rank sum for people with di > 0:
R1 = 10(7:5) + 6(19:5) + 2(28:0) = 248
and E [R1 ] = 40 41=4 = 410 and Var [R1 ] is
40 41 81 (143 14) + (103 10) + (73 7)
Var [R1 ] =
24 48
(2 2) + (2 2) + (3 3)
3 3 3
48
= 5535 4092=49 = 5449:75
p
sd(R1 ) = 5449:75 = 73:82
and hence
[j 248 410 j 12 ] 161:5
T= = = 2:19
73:82 73:82
p value of the test= 2[1 (2:19)] = 2 [1 0:9857] = 0:029. The observed rank sum Remember that the sign test did
(248) is smaller than the expected rank sum (410), ) ointment A does better than not report any signicant dier-
ointment B. ence!
If the test is performed on negative dierence scores with R2 = rank sum of
negative dierences, then It does not matter if you focus on
R1 or R2 .
R2 = 4(7:5) + 4(19:5) + 5(28:0) + 1(32:0) + 2(33:5) + 2(33:5) + 3(38:0) + 1(40:0)
= 572
Combine data from the two groups. Order values from lowest to highest (or from
best to worst). Then assign ranks (best = low rank etc.). Compute the range
of ranks for each group, and then assign the average rank for every observation.
The test statistic = Rank sum in rst sample = R1 . If R1 is large, dominant group has
poor eyesight.
If the number of observations in the two groups are n1 and n2 , the average rank
of combined sample = (1 + n1 + n2 )=2. Then under H0 : E [R1 ] = n1 average
rank of combined sample
Assume that smaller group is of size at least 10 and that the variable under
study has an underlying continuous distribution ) R1 Normal.
Test procedure: normal approximation: two sided, level α Rank all observa-
tion as discussed above. Then compute rank sum R1 in the rst sample. If there are The choice of R1 is arbitrary.
no ties compute
T=
j R1 n1 (n1 + n2 + 1)=2 j 12
p
(n1 n2 )(n1 + n2 + 1)=12
and if there are any ties, compute Use the test only if n1 ; n2 > 10
Pg and if there is an underlying con-
T=
j R1 n1 (n1 + n2 + 1)=2 j
p
1
2
S= i=1 ti (ti
2
1)
(2.51) tinuous distribution.
(n1 n2 )(n1 + n2 + 1 S )=12 (n1 + n2 )(n1 + n2 1)
where g is number of tied groups and ti = number of observations in ith group. If
T > z1 =2 we reject H0 . Compute exact p value by p = 2 [1 (T )].
Example: Since the minimum sample size = 25 10, we use the normal approxi-
mation.
R1 = 5(3:5) + 9(13:5) + 6(25:5) + 3(34) + 2(42:5) = 479
E [R1 ] = 25 26=2 = 700 and Var [R1 ] = corrected for ties is R1 < E [R1 ]!
25 30 A 5387
= 56 = 62:5 56 = 3386:74
12 55 54 2970
where
A = [6(62 1)+14(142 1)+10(102 1)+7(72 1)+10(102 1)+5(52 1)+2(22 1)]
(j 479 700 j 0:5)
T=p = 3:79 )
3386:74
p = 2 [1 (3:79)] < 0:001
The variation in visual acuity is signicantly dierent.
Comments
If n1 or n2 < 10, use a small table of exact signicant levels.
H test or Kruskal-Wallis test = generalization of U test to see if K independent
samples come from identical populations.