Stat 700 HW3 Solutions, 10/9/09

Stat 700 HW3 Solutions, 10/9/09
(1). (a) First, by conditioning and unconditioning on ϑ ∈ {ϑ1 . ϑ2 } (with

probabilities 0.5, 0.5), we find for j = 1, 2 that EXj = 0.5·(1) + 0.5·( 12 ) =
3/4, EXj2 = 2(.5)(1 + (.5)2 ) = 5/4, Var(Xj ) = 11/16, and then
1 5 1 1
E(X1 X2 ) = .5 (1)2 +.5( )2 = , Cov(X1 , X2 ) = , Corr(X1 , X2 ) =
2 8 16 11
(b) Let S = X1 + · · · + X10 , s = x1 + · · · + x10 . Thenthe joint mixed-type
pdf for (X1 , . . . , Xn , ϑ) is .5 e−s I[ϑ=1] + .5 210 e−2s I[ϑ=2] , so that at s = 13,
.5 e−s I[j=1] + .5(210 )e−2s I[j=2] I[j=1] + 210 e−13 I[j=2]

fϑ|X (j|x) = =
.5e−s + .5 210 e−2s 1 + 210 e−13
(c) Then fX11 |X (x|X) at S = 13 is (e−x + 211 e−13−2x )/(1 + 210 e−13 ).
Pn 2
Bickel-Doksum,
Pn # 1.2.12. Put T = i=1 (Xi − µ0 ) , t = t(x) =
2
i=1 (xi − µ0 ) . Then the joint density for fixed ϑ is (as a function of ϑ)
n √
Y ϑ 2
s
√ e−ϑ(xi −µ0 ) /2 ∝ ϑn/2 exp(−ϑ )
i=1
2π 2
(b) With Gamma(λ/2, ν/2) prior density as given, the posterior is

proportional to the product of the prior and the density of X, i.e., to
ϑ(n+λ−2)/2 exp(− 21 ϑ (ν + t)). Viewing this (up to a constant depending
on x) as a density in ϑ, it is proportional and therefore identical to a
Gamma((n + λ)/2, (ν + t)/2) density. When n, λ are integers, we use the
fact that cW ∼ Γ(α, β/c) whenever W ∼ Γ(α, β) to conclude that the
conditional density of V ≡ (ν + T )ϑ given X is Γ((n + λ)/2, 21 ) = χ2n+λ .
√
(c) Thus the posterior density of σ = ϑ−1/2 = V −1/2 T + ν, so by univari-
ate change of variable, we find that the posterior density of σ is
2−(n+λ)/2 T + ν (n+λ−2)/2 2(T + ν) −(T +ν)/(2s2 )

√ e
π s2 s3
21−(n+λ)/2 (T + ν)(n+λ)/2 −(T +ν)/(2s2 )

= √ e , s>0
π sn+λ+1
1
Bickel-Doksum, # 1.2.14. P (a) Joint (ϑ, X1 , . . . , Xn ) density is
C exp(− 12 (ϑ − ϑ0 )2 /τ02 − 12 σ0−2 ni=1 (xi − ϑ)2 ). The conditional density of
Xn+1 given (ϑ, X1 , . . . , Xn ) is N (ϑ, σ02 ), and this ‘predictive’ density does
not change with n. The posterior predictive density (of θ given Xn ) is a
random variable depending on X̄n , which can be found either by a direct use
of the definitions involving integrals or through the following argument. First
we know (ϑ, X̄n ) is bivariate normal with X̄ −ϑ ∼ N (0, σ02 ) independent of
ϑ. Next, with γ ≡ τ02 /(τ02 +σ02 /n), we have ϑ−γ X̄n independent of X̄n , and
therefore its variance added to the variance of γ X̄n is equal to the variance
τ02 of ϑ: thus conditionally given X̄n , ϑ ∼ N (γ X̄n + (1 − γ)ϑ0 , γσ02 /n).
Since Xn+1 = (Xn+1 − ϑ) + ϑ is a sum of two conditionally normal
and independent variables given Xn , we conclude that it is conditionally
N (γ X̄n + (1−γ)ϑ0 , σ02 + γσ02 /n). Now when n gets large, X̄n → ϑ (either
in probability or almost surely) by the Law of Large Numbers), and we find
X̄n conditionally given Xn has distribution converging to N (ϑ, σ02 ), the
same as the frequentist predictive density.
Bickel-Doksum, # 1.3.3. For each ϑ, X̄ ∼ N (ϑ, 1/n). (a) For ϑ < 0,

we calculate
R(ϑ, δr,s ) = cPϑ (r ≤ X̄ ≤ s) + (b + c) Pϑ (X̄ > s)

√ √
= c (1 − Φ( n(r − ϑ))) + b (1 − Φ( n(s − ϑ)))
Similarly, R(0, δr,s ) = b P0 (X̄ 6∈ [r, s]) and for ϑ > 0,
√ √
R(ϑ, δr,s ) = cPϑ (X̄ ≤ s) + b Pϑ (X̄ < r) = c Φ( n(s − ϑ)) + b Φ( n(r − ϑ))
(b) In both (i),(ii), the risk curve is decreasing for negative ϑ , increasing for
positive (with equal limits at 0 for (i) but not for (ii)), but the curves have a
discontinuity at 0 with isolated risk value ϑ lower than either the left- or right-
limits there. The risk for (i) is 2Φ(−1)I[ϑ=0] + (Φ(1−|ϑ|)+Φ(−1−|ϑ|)) I[ϑ6=0] ,
and that for (ii) is (Φ(ϑ + 1) + Φ(ϑ − 2))I[ϑ<0] + (Φ(−1) + Φ(−2))I[ϑ=0] +
(Φ(2 − ϑ) + Φ(−1 − ϑ))I[ϑ>0] . The risk is smaller under (ii) than under (i)
if and only if ϑ ≤ 0.
Bickel-Doksum, # 1.3.11. (a) Fix any ϑ, and let g(ϑ0 ) ≡

0 2 0 2
Eϑ (ϑ −a(X)) = (Eϑ (a(X)−ϑ ) + Var(a(X)). Since we want this function
of ϑ0 to be everywhere ≥ g(ϑ), we can differentiate in ϑ0 = ϑ, to ob-
tain g 0 (ϑ0 ) = 2(ϑ0 − Eϑ (a(X)). We conclude that g is calculus-minimized
2
at ϑ iff Eϑ (a(X)) = ϑ (it is definitely not minimized there at all when
Eϑ (a(X)) 6= ϑ). For this to hold for all ϑ is to say that the estimator a(X)
is unbiased in the usual sense.
(b) Note: this problem part is stated incorrectly: it should have
said that the ‘unbiased’ definition implies that the power is always
at least as large over the alternative parameter region as over the
null-hypothesis parameter region, but the converse does not gen-
erally hold ! Here
Eϑ (l(ϑ0 , a(·)) = I[ϑ0 ∈Θ0 ] β(ϑ, a) + I[ϑ0 ∈Θc0 ] (1 − β(ϑ, a))
Saying that this is always at least as large as the corresponding expectation

with ϑ0 replaced by θ is easily checked to be the same as saying
∀ ϑ ∈ Θ0 ( ∀ ϑ0 ∈ Θc0 ) 1 − β(θ, a) ≥ β(θ, a)
∀ ϑ ∈ Θc0 ( ∀ ϑ0 ∈ Θ0 ) 1 − β(θ, a) ≤ β(θ, a)

This pair of assertions says equivalently that
1
infc β(ϑ, a) ≥ ≥ sup β(ϑ, a)
ϑ∈Θ0 2 ϑ∈Θ0
Bickel-Doksum, # 1.3.19. (a) The four nonrandomized rules regarded as

binary functions of X = 0, 1 are δj , j = 1, 2, 3, 4, with δ1 (1) = δ1 (0) =
a1 , δ2 (1) = δ2 (0) = a2 , δ3 (X) = a1 if f X = 1, δ4 (X) = a2 if f X = 1. For
these rules, it is easy to calculate
R(ϑ1 , δ1 ) = 0, R(ϑ2 , δ1 ) = 3, R(ϑ1 , δ2 ) = 2, R(ϑ2 , δ2 ) = 1, R(ϑ1 , δ3 ) = .2(2) = .4
R(ϑ2 , δ3 ) = .4(1)+(.6)3 = 2.2, R(ϑ1 , δ4 ) = 1.6, R(ϑ2 , δ4 ) = .6(1)+.4(3) = 1.8

The maximum risk for the 4 rules is easily seen to be: 3, 2, 2.2, 1.8. So the
unique minimax among nonrandomized rules is δ4 .
(b). The risk set is a convex polygon, the convex hull in the (r1 , r2 ) orthant
of the four points (0, 3), (2, 1), (.4, 2.2), (1.6, 1.8). The minimax among ran-
domized rules occurs at the intersection of the lower envelope of the region,
which is the segment with .4 ≤ r1 ≤ 2 of the line r2 − 1 = (−3/4)(r1 − 2),
with the ray r2 = r1 ≥ 0. The intersection occurs for risks r2 = r1 = 10/7,
3
which occurs at the point p(R(1, δ3 ), R(2, δ3 ) + (1 − p)(R(1, δ2 ), R(2, δ2 ) at
p = 5/14. So the minimax among all randomized procedures is the one which
chooses δ3 with probability p = 5/14 and δ2 with the remaining probability
9/14.
(c) The four rπ (δj ) values for j = 1, . . . , 4 are: 2.7, 1.1, 2.02, 1.78. So
with this prior, δ2 is the Bayes optimal rule, with Bayes risk 1.1.
Bickel-Doksum, # 1.4.18. (a) Here Z = (Z0 − µ)/σ ∼ N (βY0 /σ, 1) =

N (Y, 1) given Y or Y0 .
(b) With Expon(λ) prior, the posterior is
1 2 2
fY |Z (y|z) = λe−λy √ e−(z−y) /2 /fZ (z) ∝ e(z−λ)y−y /2 I[y>0]
2π
which is the density of a N (z − λ, 1) random variable conditioned to be
positive, i.e. the truncated normal N (λ − z, 1) density truncated at 0.
(c) So Y given Z is a N (Z − λ, 1) r.v. conditioned to be positive,
which says equivalently that Y given Z0 is a N ((Z0 − µ/σ − λ, 1) r.v.
conditioned to be positive, i.e., that Y0 given Z0 is σ/β times as large,
i.e., is a N ((Z0 − µ − σλ)/β, (σ/β)2 ) r.v. conditioned to be positive.
In parts (d) and (e), we respectively use the posterior median and the
posterior expectation of a truncated normal density, which are guaranteed
to be Bayes-optimal for the respective absolute-error and squared-error loss
functions. Let W be a truncated normal N (a, b2 ) r.v. truncated at 0, i.e.
a N (a, b2 ) r.v. conditioned to be positive. Its distribution function is
1 −z2 /2 . ∞ 1 −z2 /2
Z w
Φ((w − a)/b) − Φ(−a/b)
Z
FW (w) = √ e dz √ e dz =
0 b π 0 b π 1 − Φ(−a/b)
which is set equal to 1/2 at the median a + b Φ−1 ( 21 (1 + Φ(−a/b))).

Similarly, the expectation is
Z ∞
1 −z2 /2 . ∞ 1 −z2 /2
Z
b φ(−a/b)
EW = (a+b(w−a)/b) √ e dz √ e dz = a+
0 b π 0 b π 1 − Φ(−a/b)
In part (d), substitute for the posterior median with a = (Z0 − µ − σλ)/β
and b = σ/β. In part (e), substitute for the posterior expectation with
a = z − λ and b = 1.

Stat 700 HW3 Solutions, 10/9/09

Uploaded by

Copyright:

Available Formats

Stat 700 HW3 Solutions, 10/9/09

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat 700 HW3 Solutions, 10/9/09

Uploaded by

Copyright:

Available Formats

Stat 700 HW3 Solutions, 10/9/09

(1). (a) First, by conditioning and unconditioning on ϑ ∈ {ϑ1 . ϑ2 } (with

.5 e−s I[j=1] + .5(210 )e−2s I[j=2] I[j=1] + 210 e−13 I[j=2]

(b) With Gamma(λ/2, ν/2) prior density as given, the posterior is

2−(n+λ)/2  T + ν (n+λ−2)/2 2(T + ν) −(T +ν)/(2s2 )

21−(n+λ)/2 (T + ν)(n+λ)/2 −(T +ν)/(2s2 )

Bickel-Doksum, # 1.3.3. For each ϑ, X̄ ∼ N (ϑ, 1/n). (a) For ϑ < 0,

R(ϑ, δr,s ) = cPϑ (r ≤ X̄ ≤ s) + (b + c) Pϑ (X̄ > s)

Bickel-Doksum, # 1.3.11. (a) Fix any ϑ, and let g(ϑ0 ) ≡

Eϑ (l(ϑ0 , a(·)) = I[ϑ0 ∈Θ0 ] β(ϑ, a) + I[ϑ0 ∈Θc0 ] (1 − β(ϑ, a))

Saying that this is always at least as large as the corresponding expectation

∀ ϑ ∈ Θ0 ( ∀ ϑ0 ∈ Θc0 ) 1 − β(θ, a) ≥ β(θ, a)

∀ ϑ ∈ Θc0 ( ∀ ϑ0 ∈ Θ0 ) 1 − β(θ, a) ≤ β(θ, a)

Bickel-Doksum, # 1.3.19. (a) The four nonrandomized rules regarded as

R(ϑ1 , δ1 ) = 0, R(ϑ2 , δ1 ) = 3, R(ϑ1 , δ2 ) = 2, R(ϑ2 , δ2 ) = 1, R(ϑ1 , δ3 ) = .2(2) = .4

R(ϑ2 , δ3 ) = .4(1)+(.6)3 = 2.2, R(ϑ1 , δ4 ) = 1.6, R(ϑ2 , δ4 ) = .6(1)+.4(3) = 1.8

Bickel-Doksum, # 1.4.18. (a) Here Z = (Z0 − µ)/σ ∼ N (βY0 /σ, 1) =

which is set equal to 1/2 at the median a + b Φ−1 ( 21 (1 + Φ(−a/b))).

You might also like

2−(n+λ)/2 T + ν (n+λ−2)/2 2(T + ν) −(T +ν)/(2s2 )