Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Solution 3 Problem 1: Let X

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

6.434J/16.

391J
Statistics for Engineers and Scientists March 9
MIT, Spring 2006 Handout #7
Solution 3
Problem 1: Let X1 , X2 , . . . , Xn be a random sample from the uniform p.d.f.
f (x|θ) = 1/θ, for 0 < x < θ and for some unknown parameter θ > 0.
(a) Find a maximum likelihood estimator of θ, say Tn .

(b) Find a bias of Tn .

(c) Based on (b), derive an unbiased estimator of θ, say Wn .

(d) [Extra Credit] Compare variances of Tn and Wn .

(e) [Extra Credit] Show that Tn is a consistence sequence of estimators.


Solution
(a) The likelihood function is

 1 , if 0 < X < θ, 0 < X < θ, . . . , and 0 < X < θ;
n 1 2 n
L(θ) = θ
0 otherwise

 1 , if θ > max{X , X , . . . , X };
n 1 2 n
= θ
0 otherwise.

Since L(·) is maximum when θ = max{X1 , X2 , . . . , Xn }, a MLE for an


unknown θ > 0 is

Tn = max{X1 , X2 , . . . , Xn }.

See Fig. 1 for an illustration. (More precisely, Tn satisfies L(Tn ) =


sup{L(θ) | θ > 0}. Hence it is a MLE.)

(b) Note that Tn is a random variable since it is a function of random variables


Xi ’s. To obtain moments of Tn , we derive the cdf and pdf of Tn .
Let any real number x be given. Then, the cdf of Yn is
P {Tn ≤ x} = P {X1 ≤ x and X2 ≤ x and · · · and Xn ≤ x}
= P {X1 ≤ x} P {X2 ≤ x} · · · P {Xn ≤ x}


 0, if x ≤ 0; (1)

 n
x
= if 0 < x < θ;


θ

1, otherwise.

1
L(θ)

θ
max{X1 , X2 , . . . , Xn }

Figure 1: The likelihood function for problem 5(a) is non-zero when θ >
max{X1 , X2 , . . . , Xn }.

Differentiating both sides of the equations (for each case of the range for
x) yields the pdf

 nxn−1 , for 0 < x < θ;
θn
fTn (x) =
0, otherwise.

Therefore, the mean of Tn is


 θ
nxn−1
E {Tn } = x· dx
0 θn

= ,
n+1
and the bias of Tn is
θ
bn (θ)  E {Tn } − θ = − .
n+1

(c) The expression for E {Tn } in part (a) implies that


n + 1
Wn  Tn
n
n + 1
= max{X1 , X2 , . . . , Xn }
n
is an unbiased estimator of θ.

2
(d) [Extra Credit] By construction and by a property of the variance, we
derive a relation

n + 1
Var {Wn } = Var Tn
n
 n + 1 2
= Var {Tn }
n
> Var {Tn } .

The inequality is strict because Var {Tn } = 0. In other words, Tn is a


random variable (non-constant), and its variance must be strictly positive.
Note This problem does not require us to compute the variances, Var {Tn }
and Var {Wn }. To derive those variances, we use the pdf of Tn , given in
part (b). Notice that the second moment of Tn is
 θ

nxn−1
E Tn2 = x2 · dx
0 θn
nθ2
= .
n+2
Then, the variance of Tn is


Var {Tn }  E Tn2 − E2 {Tn }
nθ2  nθ 2
= −
n+2 n+1
2

= .
(n + 1)2 (n + 2)

A relationship between Wn and Tn implies that


 n + 1 2
Var {Wn } = Var {Tn }
n
θ2
= .
n(n + 2)

(e) We need to show that the sequence of estimators, T1 , T2 , T3 , . . . , converges


to θ in probability. That is, for any  > 0, limn→∞ P { |Tn − θ| > } = 0.
Since Xi < θ for any integer i = 1, 2, 3, . . . , the MLE, defined to be
Tn  max{X1 , X2 , . . . , Xn }, satisfies Tn < θ. That is, |Tn − θ| = θ − Tn .

3
Let any  > 0 be given. Then,

P { |Tn − θ| > } = P { θ − Tn > }


= P {Tn < θ − }
 n
 θ− , if 0 <  < θ;
θ
=
0 otherwise.

The last equality follows from the cdf of Tn in equation (1). Taking the
limit both sides of the equation (for each case of the range for ), we have

lim P { |Tn − θ| > } = 0,


n→∞

for any  > 0. Therefore, the MLE Tn is a consistent estimator.

Problem 2: Suppose that X1 , X2 , . . . , Xn are independent random variables,


each N (µ, σ 2 ) with both µ and σ 2 unknown:

(µ, σ 2 ) ∈ Θ  {(x, y) | − ∞ < x < +∞, y > 0}.

(a) Find a maximum likelihood estimator of (µ, σ 2 ).

(b) Suppose that n = 131 and that X1 , X2 , . . . , X131 are taken to be the
average temperature observed at Boston, MA, since the 1872 that are given
in Table 1. Use MATLAB to find the numerical values of the estimator of
(µ, σ 2 ) for this data set.

Solution

(a) The likelihood function is

1  n (Xi −µ)2
L(µ, σ) = √ e− i=1 2σ 2 ,
(σ 2π)n

for a real number −∞ < µ < ∞ and a positive number σ > 0. We want to
find µ and σ that maximize the likelihood function, or equivalently, that
maximize the log-likelihood function,
n
n (Xi − µ)2
ln L(µ, σ) = − ln(2π) − n ln σ − .
2 i=1
2σ 2

Taking the partial derivatives of the log-likelihood function with respect to


µ and to σ, and setting the partial derivatives to zero yield two equations

4
with two unknowns, µ and σ:
n
(Xi − µ)

0= ln L(µ, σ) = , (2)
∂µ i=1
σ2
n
∂ n (Xi − µ)2
0= ln L(µ, σ) = − + . (3)
∂σ σ i=1 σ3

Solving for µ∗ and σ ∗ that satisfy those two equations yield the solution
n
∗ Xi
µ = i=1  Xn
 n
n 2
∗ i=1 (Xi − X n )
σ = .
n

(Equation (2) gives the expression of µ∗ = X n . Then, substitute µ in (3)


with X n , and solve for σ.) Therefore, MLEs for unknown parameters µ
and σ 2 are

 = Xn
µ
n 2
2 = i=1 (Xi − X n ) ,
σ
n
where the second equality follows from invariance property of a MLE.

Note It is not hard to verify that µ∗ and σ ∗ maximize the likelihood


function. For a two-variable function, say, f (x, y), we need to verify that

∂2f
< 0,
∂x2
∂2f
< 0, and (4)
∂y 2
 ∂ 2 f   ∂ 2 f   ∂ 2 f 2
· − > 0.
∂x2 ∂y 2 ∂x∂y
Note that
∂2 n
2
ln L(µ, σ) = − 2
∂µ σ
n
∂2 n 3 i=1 (Xi − µ)2
ln L(µ, σ) = 2 −
∂σ 2 σ σ4
2 n
∂ 2 i=1 (Xi − µ)
ln L(µ, σ) = − .
∂µ∂σ σ3

5
Substituting µ∗ and σ ∗ to the above partial derivatives yields the relations
∂2 
 n
2
ln L(µ, σ) =− ∗ 2 < 0,
∂µ ∗
µ=µ ,σ=σ ∗ (σ )
∂2 
 2n
2
ln L(µ, σ) =− ∗ 2 < 0, and
∂σ ∗
µ=µ ,σ=σ ∗ (σ )
∂2  2n2

ln L(µ, σ) = > 0.
∂µ∂σ µ=µ∗ ,σ=σ ∗ (σ ∗ )4
Therefore, µ∗ and σ ∗ maximize the likelihood function.
2 for this data set
 and σ
(b) The MATLAB code to compute estimators µ
yields

 = 50.64
µ and 2 = 2.34.
σ

Average temperatures (F) in Boston, MA, from 1872−2002


0.4
pdf of N(50.64, 2.34)

0.35

0.3
normalized frequency

0.25

0.2

0.15

0.1

0.05

0
35 40 45 50 55 60 65
Temperature (F)

Figure 2: In problem 2(b), average temperatures at Boston are modelled to be


normal random variables.

Problem 3: Let X1 , X2 , . . . , Xn be independent random variables, each


N (µ, 1). Find an unbiased estimator of µ2 that is a function of X n .
[Hint: Consider a bias of (X n )2 .]
Solution In class, we show that X n ∼ N (µ, 1/n). Hence, the second moment
of this normal random variable is given by




E (X n )2  Var X n + E2 X n
1
= + µ2 .
n
By inspection, (X n )2 − 1/n is an unbiased estimator of µ2 .

6
Problem 4: A one-bit analog-to-digital (A/D) converter is defined by a
single threshold α and has two outputs, 0 and 1. Suppose a random variable X
with probability density function fX (.) is input to this A/D and the output is
defined as Y .

(a) Find the MMSE estimator of x based on the observation y. (Question


does not ask for estimator to be linear.)

(b) The output of the A/D is input to a binary symmetric channel character-
ized by a single parameter 0 ≤ p ≤ 1. Let Z be the output of the channel,
with the following conditional pdf:

p, z = y
pZ|Y (z|y) =
1 − p, z = y

Find the optimum (MMSE) estimator of x based on the observation z.

(c) For the special cases of p = 0, p = 1 and p = 0.5, interpret the results of
part (b).

Solution

(a) The MMSE estimator x̂(y) directly follows from the following:
 ∞
x̂(y) = xfX|Y (x|y)dx
−∞
 ∞
pY |X (y|x)fX (x)
= x dx
−∞ pY (y)
 ∞
1
= xpY |X (y|x)fX (x)dx (5)
pY (y) −∞

From the description of A/D, we can write the conditional pmf pY |X as


well as pmf of Y , pY (y). Although X may be continuous or discrete, Y is
discrete, with 0 and 1 as its possible values.

P {X ≤ α} , for y = 0
pY (y) =
P {X > α} , for y = 1
 α
fX (x)dx, for y = 0
= −∞

α
fX (x)dx, for y = 1

and pY |X is simply:

1, for x ≤ α
pY |X (y = 0|x) =
0, for x > α

7

0, for x ≤ α
pY |X (y = 1|x) =
1, for x > α

Hence we get the MMSE estimate x̂(y) by substituting corresponding


values for the cases y = 0 and y = 1, in (5).
 ∞
1
x̂(y = 0) = xpY |X (y = 0|x)fX (x)dx
pY (0) −∞

xfX (x)dx
= −∞ α
−∞ X
f (x)dx

 ∞
1
x̂(y = 1) = xpY |X (y = 1|x)fX (x)dx
pY (1) −∞
∞
xfX (x)dx
= α∞
α
fX (x)dx

(b) We are given fZ|Y which can be stated as:



p, z=1
pZ|Y (z|y = 0) =
1 − p, z = 0


p, z=0
pZ|Y (z|y = 1) =
1 − p, z = 1
The pmf for Z is:

(1 − p)P {Y = 0} + pP {Y = 1} , z = 0
pZ (z) =
pP {Y = 0} + (1 − p)P {Y = 1} , z = 1
 α ∞
(1 − p) −∞ fX (x)dx + p α fX (x)dx, z = 0
= α ∞
p −∞ fX (x)dx + (1 − p) α fX (x)dx, z = 1

Also we see that y = 0 =⇒ x ≤ α and y = 1 =⇒ x > α. Therefore,



1 − p, x ≤ α
pZ|X (z = 0|x) =
p, x>α

and similarly, 
p, x≤α
pZ=|X (z = 1|x) =
1 − p, x > α

As previous case, the MMSE estimate of x given z is:

8
 ∞
x̂(z) = xfX|Z (x|z)dx
−∞
 ∞
pZ|X (z|x)fX (x)
= x dx
−∞ pZ (z)
 ∞
1
= xpZ|X (z|x)fX (x)dx
pZ (z) −∞

Using pZ|X in this equation, we get the result. For z = 0,


 ∞
1
x̂(z = 0) = xpZ|X (z = 0|x)fX (x)dx
pZ (0) −∞
 α  ∞
1 1
= x(1 − p)fX (x)dx + xpfX (x)dx
pZ (0) −∞ pZ (0) α
α ∞
(1 − p) −∞ xfX (x)dx + p α xfX (x)dx
=  α ∞
(1 − p) −∞ fX (x)dx + p α fX (x)dx

Similarly, for z = 1,
 ∞
1
x̂(z = 1) = xpZ|X (z = 1|x)fX (x)dx
pZ (1) −∞
 α  ∞
1 1
= xpfX (x)dx + x(1 − p)fX (x)dx
pZ (1) −∞ pZ (1) α
α ∞
p −∞ xfX (x)dx + (1 − p) α xfX (x)dx
= α ∞
p −∞ fX (x)dx + (1 − p) α fX (x)dx

(c) p = 0 corresponds to x̂(z) = x̂(y) as expected, since in this case Z = Y


with probability 1. For p = 1, we have an almost sure inversion of binary
value of Y . Therefore, x̂(z) and x̂(y) are the same but at the opposite
values of observations: x̂(z = 0) = x̂(y = 1) and vice versa.
For p = 0.5, the estimate of x from z is the average of the estimate based
on y. That is, x̂(z) = 0.5x̂(y = 0) + 0.5x̂(y = 1) for either value of z which
is the same as E {X}.

Problem 5: Suppose X and Y are jointly Gaussian, and z = F x + g, where


F and g are known. Show that MMSE estimator of z given y is given by
ẑ = F x̂ + g, where x̂ is the MMSE estimator of x given y. Find an expression
for the mean square estimation error of z given y.
Solution X and Y are jointly Gaussian. From a result proved in class (lecture
6 and 7), the MMSE for x given y is given as:

x̂MMSE (y) = µX + ΣTXY Σ−1


Y (Y − µY )

9
Z = F x + g is also Gaussian (because F and g are known constants which only
scale and shift the distribution of X). Hence,

ẑMMSE (y) = µZ + ΣTZY Σ−1


Y (Y − µY )

(We write them here as x̂ and ẑ for brevity.) Let’s find µZ .

µZ = E {Z} = E {F X + g}
= F E {X} + g
= F µX + g

The variance is:




ΣZY = E (Z − µZ )(Y − µY )T


= E (F X + g − F µX − g)(Y − µY )T


= F E (XµX )(Y − µY )T
= F ΣXY (6)

Substitute µZ and ΣZY in ẑ, we get ẑ = F x̂ + g.

Variance of estimation error is available from the derivation of the MMSE. Recall
that the distribution of X|Y in case of jointly Gaussian X and Y is normal. Its
mean is the MMSE x̂(y) and its variance is the variance of estimation error, i.e,
Var {x − x̂}. Therefore, for ẑ, variance of estimation error is: ΣZ −ΣTZY ΣY ΣZY .
Where,
ΣZ = F ΣX F T (7)

Therefore, using (6) and (7), the variance of estimation error of ẑ:

F ΣX F T − ΣTXY (F T ΣY F )ΣXY

Since mean of estimation error is zero (E {Z − ẑ} = µZ − µZ = 0), the mean


square estimation error is the same as variance of estimation error.
Note that, Σz−ẑ = F Σx−x̂ F T .

10
Table 1: Average temperatures (in F ) at Boston, MA. (source: National
Weather Service Eastern Region)

Year Average Year Average Year Average Year Average


1872 53.0 1899 50.1 1926 49.0 1953 53.6
1873 48.2 1900 50.9 1927 51.8 1954 51.4
1874 48.6 1901 49.0 1928 51.3 1955 51.4
1875 46.5 1902 49.7 1929 51.4 1956 50.6
1876 47.9 1903 49.5 1930 52.3 1957 52.5
1877 50.1 1904 47.1 1931 53.0 1958 50.0
1878 50.2 1905 49.1 1932 52.4 1959 51.8
1879 48.4 1906 50.0 1933 50.9 1960 51.4
1880 50.2 1907 48.7 1934 48.8 1961 51.0
1881 49.6 1908 51.1 1935 48.9 1962 49.8
1882 48.9 1909 50.5 1936 49.8 1963 51.0
1883 47.9 1910 50.8 1937 51.2 1964 50.1
1884 49.0 1911 50.9 1938 51.1 1965 49.6
1885 47.6 1912 50.6 1939 49.8 1966 51.4
1886 48.3 1913 52.3 1940 48.5 1967 49.5
1887 48.5 1914 49.7 1941 51.2 1968 50.4
1888 47.3 1915 51.2 1942 50.7 1969 51.2
1889 50.6 1916 49.7 1943 49.9 1970 50.9
1890 49.1 1917 47.9 1944 50.9 1971 51.2
1891 50.4 1918 49.8 1945 51.0 1972 50.4
1892 49.4 1919 51.1 1946 51.7 1973 53.0
1893 47.9 1920 50.0 1947 51.2 1974 50.9
1894 50.3 1921 52.4 1948 50.7 1975 52.8
1895 49.8 1922 51.3 1949 53.6 1976 52.2
1896 49.2 1923 50.3 1950 51.2 1977 52.5
1897 49.9 1924 50.4 1951 52.2 1978 50.3
1898 50.8 1925 51.7 1952 52.5 1979 52.1

11
Table 1: (Continued)

Year Average Year Average Year Average Year Average


1980 50.6 1986 50.8 1992 50.2 1998 53.0
1981 51.5 1987 50.4 1993 51.6 1999 52.7
1982 51.0 1988 51.3 1994 52.2 2000 50.6
1983 53.2 1989 50.3 1995 51.4 2001 52.5
1984 51.6 1990 53.2 1996 50.9 2002 56.2
1985 51.0 1991 53.4 1997 50.9

12

You might also like