Lecture_note4
Lecture_note4
H 0 : µ = µ0 vs H1 : µ 6= µ0
1
– Rejecting H0 when |t| is large is equivalent to rejecting H0 in favor of H1,
at significance level α if
2 (X̄ − µ0)2 2 −1 2
t = = n( X̄ − µ 0 )(s ) (X̄ − µ 0 ) > tn−1 (α/2)
s2/n
is equivalent to
s
µ0 lies in the 100(1 − α)% confidence interval x̄ ± tn−1(α/2) √
n
2
• A natural generalization of the squared distance above is its multivariate
analog
−1
1
T 2 = (X̂ − µ0)0 S (X̂ − µ0) = n(X̂ − µ0)0S−1(X̄ − µ0)
n
• If the observed statistical distance T 2 is too large— that is, if x̂ is “too far”
from µ0—the hypothesis H0 : µ = µ0 is rejected.
•
(n − 1)p
T 2 is distributed as Fp,n−p
n−p
where Fp,n−p denotes a random variable with an F-distribution with p and
n − p degree of freedom.
3
To summarize, we have the following:
2 (n − 1)p
α = P T > Fp,n−p(α)
n−p
0 −1 (n − 1)p
= P n(X̄ − µ) S (X̄ − µ) > Fp,n−p(α)
n−p
whatever the true µ and Σ. Here Fp,n−p(α) is the upper (100α)th percentile
of the Fp,n−p distribution.
4
Example 4.1(Evaluating T 2) Let the data matrix for a random sample of size
n = 3 from a bivariate normal population be
6 9
X = 10 6
8 3
Evaluate the observed T 2 for µ00 = [9, 5]. What is the sampling distribution of
T 2 in this case ?
Test the hypothesis H0 : µ0 = [4, 50, 10] against H1 : µ0 6= [4, 50, 10] at
level of significance α = .10.
5
6
4.2 Confidence Regions and Simultaneous Comparisons of
Component Means
• The region R(X) is said to be a 100(1 − α)% confidence region if, before
the sample is selected,
1
x1 = (measured ration with door closed) 4
and
1
x2 = (measured ration with door open) 4
9
Simultaneous Confidence Statements
10
• Let X have an Np(µ, Σ) distribution and form the linear combination
Z = a1X1 + a2X2 + · · · + apXp = a0X
Hence
µZ = E(Z) = a0µ and σZ
2
= Var(Z) = a0Σa
2
• For a fixed and σZ unknown, a 100(1 − α)% confidence interval for µZ = a0µ
is based on student’s t-ratio
√
z̄ − µZ n(a0x̄ − a0µ)
t= √ = √
sZ / n a0Sa
11
• and leads to the statement
sz sz
z̄ − tn−1(α/2) √ ≤ µZ ≤ z̄ + tn−1(α/2) √
n n
√ √
or 0
a Sa a0Sa
0 0 0
a x̄ − tn−1(α/2) √ ≤ a µ ≤ a x̄ + tn−1(α/2) √
n n
where tn−1(α/2) is the upper 100(α/2)th percentile of a t-distribution with
n − 1 d.f.
• A simultaneous confidence region is given by the set a0µ values such that t2
is relatively small for all choice of a. It seems reasonable to expect that the
constant t2(α/2) will be replaced by a large value c2, when statements are
developed for many choices of a.
• Considering the values of a for which t2 ≤ c2, we are naturally led to the
determination of
2 n(a0(x̄ − µ))2 0 −1 2
max t = max = n( x̄ − µ) S (x̄ − µ) = T
a a a0Sa
18
4.5 Paired Comparisons
Paired Comparisons
In the single response (univariate) case, let Xj1 denote the response to treatment
1, and let Xj2 denote the response to treatment 2 for the jth trial. That is,
(Xj1, Xj2) are measurements recorded on the jth unit or jth pair of like units.
By design, the n differences
Dj = Xj1 − Xj2, j = 1, 2, . . . , n
Should reflect only the differences Dj represent independent observations from
an N (δ, σd2) distribution. the variable
D̄ − δ
t= √
sd / n
n n
1 2 1
(Dj − D̄)2 has a t-distribution with n − 1
P P
where D̄ = n Dj and sd = n−1
j=1 j=1
d.f.
19
• An α-level test of
H0 : δ = 0 vs H1 : δ 6= 0
may be conducted by comparing |t| with tn−1(α/2)-the upper 100(α/2)th
percentile of a t-distribution with n-1 d.f.
• A 100(1 − α)% confidence interval for the mean difference δ = E(Xj1 − Xj2)
is provided the statement
sd sd
D̄ − tn−1(α/2) √ ≤ δ ≤ D̄ + tn−1(α/2) √
n n
T 2 = n(D̄ − δ)0S−1
d (D̄ − δ)
n n
1 1
(D j − D̄)(D j − D̄)0.
P P
where D̄ = n D j and Sd = n−1
j=1 j=1
T 2 = n(D̄ − δ)0S−1
d (D̄ − δ)
23
4.6 Comparing Mean Vectors from Two Populations
Sample Summary statistics
(Population 1)
n1 n1
1 1
(x1j − x̄1)(x1j − x̄1)0
P P
x11, x12, . . . , x1n1 x̄1 = n1 x1j S1 = n1 −1
j=1 j=1
(Population 2)
n2 n2
1 1
(x2j − x̄2)(x2j − x̄2)0
P P
x21, x22, . . . , x2n12 x̄2 = n2 x2j S2 = n2 −1
j=1 j=1
Then
1 1 1 1
Cov(X̄ 1 − X̄ 2) = Cov(X 1) + Cov(X 2) = Σ + Σ = ( + )Σ
n1 n2 n1 n2
Hence 1 1
+ Σ
n1 n2
25
is an estimator of Cov(X̄ 1 − X̄ 2).
Result 4.7 If X 11, X 12, . . . , X 1n1 is a random sample of size n1 from
Np(µ1, Σ) and X 21, X 22, . . . , X 2n2 is an independent random sample size
n2 from Np(µ2, Σ), then
−1
1 1
T 2 = [X̄ 1 − X̄ 2 − (µ1 − µ2)]0 + Spooled [X̄ 1 − X̄ 2 − (µ1 − µ2)]
n1 n2
is distributed as
(n1 + n2 − 2)p
Fp,n1+n2−p−1
n1 + n2 − p − 1
Consequently,
P (T 2 ≤ c2) = 1 − α
where
(n1 + n2 − 2)p
c2 = Fp,n1+n2−p−1(α).
n1 + n2 − p − 1
26
Example 4.8 (Constructing a confidence region for the difference of two
mean vectors) Fifty bars of soap are manufactured in each of two ways. Two
characteristics, X1 =lather and X2 =mildness, are measured. The summary
statistics for bars produced by method 1 and 2 are
27
Simultaneous Confidence Intervals
Result 4.8 Let c2 = [(n1 + n2 − 2)p/(n1 + n2 − p − 1)]Fp,n1+n2−p−1(α). With
probability 1 − α.
s
1 1
a0(x̄1 − x̄2) ± c a0 + Spooleda
n1 n 2
will cover a0(µ1 − µ2) for all a. In particular µ1i − µ2i will be covered by
s
1 1
(X̄ 1i − X̄ 2i) ± c + sii,pooled for i = 1, 2, . . . , p
n1 n2
28
The Two-Sample Situation When Σ1 6= Σ2
Result 4.9 Let the sample sizes be such that n1 − p and n2 − p are large. Then,
an approximate 100(1 − α)% confidence ellipsoid for µ1 − µ2 is given by all
µ1 − µ2 satisfying
−1
1 1
T 2 = [X̄ 1−X̄ 2−(µ1−µ2)]0 + Spooled [X̄ 1−X̄ 2−(µ1−µ2)] ≤ χ2p(α)
n1 n2
where χ2p(α) is the upper (100α)th percentile of a chi-square distribution
with p d.f. Also 100(1 − α)% simultaneous confidence interval for all linear
combinations a0(µ1 − µ2) are provided by
s
q 1 1
a0(µ1 − µ2) belongs to a0(x̄1 − x̄2) ± χ2p(α) a0 S1 + S2 a.
n1 n2
29
4.7 Testing for Equality of Covariance Matrics
With g populations, the null hypothesis is
H0 : Σ1 = Σ2 = · · · = Σg = Σ
30
• Assuming multivariate normal populations, a likelihood ratio statistic for
testing above is given by
Y (nl−1)/2
|Sl|
Γ=
|Spooled|
l
Here nl is the sample size for the lth group, Sl is the lth group sample
covariance matrix and Spool is the pooled sample covariance matrix given by
1
Spool = P {(n1 − 1)S1 + (n2 − 1)S2 + · · · + (ng − 1)Sg }
(nl − 1)
l
1 1 1
ν = g p(p + 1) − p(p + 1) = p(p + 1)(g − 1)
2 2 2
32
Example 4.9 (Testing equality of covariance matrices—nursing homes)
The Wisconsin Department of Health and Social Services reimburse nursing
homes in the state for the services provided. The department develops a set of
formulas for the rates for each facility, based on factors such as level of care,
mean wage rate, and average wage rate in the state.
33
Group Number of observations Sample mean vectors
l = 1 (private) n1 = 271 x¯1 = [2.066 .480 .082 .360]0
l = 2 (nonprofit) n2 = 138 x¯2 = [2.167 .596 .124 .418]0
l = 3(government) n3 = 107 x¯3 = [2.273 .521 .125 .283]0
34