100% found this document useful (3 votes)

53 views

Notes Estimation Theory

Uploaded by

arifabd

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

53 views

Notes Estimation Theory

Uploaded by

arifabd

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Chapter 1

Estimation theory

In this chapter, an introduction to estimation theory is provided. The objec-

tive of an estimation problem is to infer the value of an unknown quantity,
by using information concerning other quantities (the data).
Depending on the type of a priori information available on the unknown
quantity to be estimated, two different settings can be considered:

• Paramteric estimation;

• Bayesian estimation.

Paragraphs 1.1-1.5 discuss parametric estimation problems, while paragraph

1.6 concerns the Bayesian estimation framework.

1.1 Parametric estimation

The aim of a parametric estimation problem is to estimate a deterministic
quantity θ from observations of the random variables y 1 , . . . y n .

1.1.1 Problem formulation

let:

- θ ∈ Θ ⊆ Rp , an unknown vector of parameters;

1
2 CHAPTER 1. ESTIMATION THEORY

- y = (y 1 , . . . y n )T ∈ Y ⊆ Rn a vector of random variables, hereafter

called observations or measurements;

- Fyθ (y) , fyθ (y) the cumulative distribution function and the probabil-
ity density function, respectively, of the observation vector y, which
depend on the unknown vector θ.

The set Θ, to which the parameter vector θ belongs, is referred to as

the parameter set. It represents the a priori information available on the
admissible values of the vector θ. If all values are admissible, Θ = Rp .
The set Y, containing all the values that the random vector y may take,
is known as observation set. It is assumed that the cdf Fyθ (y) (or equivalently
the pdf fyθ (y)) is parameterized by the p parameters θ ∈ Rp (which means
that such parameters enter in the expressions of those functions). Hereafter,
the word parameter will be used to denote the entire unknown vector θ. To
emphasize the special case p = 1, we will sometimes use the expression scalar
parameter.
We are now ready to formulate the general version of a parametric esti-
mation problem.

Problem 1.1. Estimate the unknown parameter θ ∈ Θ, by using an obser-

vation y of the random vector y ∈ Y.

In order to solve Problem 1.1, one has to construct an estimator.

Definition 1.1. An estimator T (·) of the parameter θ is a function that

maps the set of observations to the parameter set:

T : Y → Θ.

The value θ̂ = T (y), returned by the estimator when applied to the observa-
tion y of y, is called estimate of θ.

An estimator T (·) defines a rule that associates to each realization y of

the measurement vector y, the quantity θ̂ = T (y) which is an estimate of θ.
1.1. PARAMETRIC ESTIMATION 3

Notice that θ̂ can be seen as a realization of the random variable T (y);

in fact, since T (y) is a function of the random variable y, the estimate θ̂ is
a random variable itself.

1.1.2 Properties of an estimator

According to Definition 1.1, the class of possible estimators is infinite. In
order to characterize the quality of an estimator, it is useful to introduce
some desired properties.

Unbiasedness

A first desirable property is that the expected value of the estimate θ̂ = T (y)
be equal to the actual value of the parameter θ.

Definition 1.2. An estimator T (y) of the parameter θ is unbiased (or cor-

rect) if
Eθ [T (y)] = θ, ∀θ ∈ Θ. (1.1)

In the above definition we used the notation Eθ [·], which stresses the
dependency on θ of the expected value of T (y), due to the fact that the pdf
of y is parameterized by θ itself.
The unbiasedness condition (1.1) guarantees that the estimator T (·) does
not introduce systematic errors, i.e., errors that are not averaged out even
when considering an infinite amount of observations of y. In other words,
T (·) does not overestimate neither underestimate θ, on average (see Fig. 1.1).

Example 1.1. Let y 1 , . . . , y n be random variables with mean m. The quan-

tity
n
1X
y= y (1.2)
n i=1 i

is the so-called sample mean. It is easy to verify that y is an unbiased

estimator of m. Indeed, due to the linearity of the expected value operator,
4 CHAPTER 1. ESTIMATION THEORY

unbiased
biased

Figure 1.1: Probability density function of an unbiased estimator and of a

biased one.

one has
" n
# n n
1 X 1X 1X
E [y] = E y = E [y i ] = m = m.
n i=1 i n i=1 n i=1

Example 1.2. Let y 1 , . . . , y n be scalar random variables, independent and

identically distributed (i.i.d.) with mean m and variance σ 2 . The quantity
n
1X
σ̂y2 = (y i − y)2
n i=1

is a biased estimator of the variance σ 2 . Indeed, from (1.2) one has

 !2 
n n
1 X 1 X
E σ̂y2 = E  yi − y
n j=1 j

n i=1
 !2 
n n
1 X 1  X
= E ny i − yj 
n i=1 n2 j=1
 !2 
n n
1X 1  X
= E n(y i − m) − (y j − m)  .
n i=1 n2 j=1
1.1. PARAMETRIC ESTIMATION 5

However,
 !2 
n
X
 = n2 E (y i − m)2

E  n(y i − m) − (y j − m)
j=1
" #  !2 
n
X n
X
− 2nE (y i − m) (y j − m) + E  (y j − m) 
j=1 j=1

= n2 σ 2 − 2nσ 2 + nσ 2
= n(n − 1)σ 2

because, for the independency assumption, E (y i − m)(y j − m) = 0 for
i 6= j. Therefore,
n
1X 1 n−1 2
σ̂y2 n(n − 1)σ 2 = σ 6= σ 2 .

E = 2
n i=1 n n

Example 1.3. Let y 1 , . . . , y n be i.i.d. scalar random variables, with mean

m and variance σ 2 . The quantity
n
1 X
S2 = (y − y)2
(n − 1) i=1 i

is called sample variance. It is straightforward to verify that S 2 is an unbiased

estimator of the variance σ 2 . In fact, observing that
n
S2 = σ̂y2 ,
n−1
one has immeantely
n n n−1 2
E S2 = E σ̂y2 = σ = σ2 .

n−1 n−1 n
△

Notice that, if T (·) is an unbiased estimator of θ, then g(T (·)) is not in

general an unbiased estimator of g(θ), unless g(·) is a linear function.
6 CHAPTER 1. ESTIMATION THEORY

Consistency

Another desirable property of an estimator is to provide an estimate that

converges to the actual value of θ as the number of measurements grows.
Being the estimate a random variable, we need to introduce the notion of
convergence in probability.

Definition 1.3. Let {y i }∞

i=1 be a sequence of random variables. The se-
quence of estimators θ̂n = Tn (y 1 , . . . , y n ) of θ is said to be consistent if θ̂n
converges in probability to θ, for all admissible values of θ, i.e.

lim P θ̂n − θ ≥ ε = 0, ∀ε > 0, ∀θ ∈ Θ.
n→∞

n = 500

n = 100
n = 50
n = 20

Figure 1.2: Probability density function of a consistent estimator.

Notice that consistency is an asymptotic property of an estimator. It

guarantees that, as the number of data goes to infinity, the probability that
the estimate differ from the actual value of the parameter goes to zero (see
Fig. 1.2).
The next Theorem provides a sufficient condition for consistency of un-
biased estimators.

Theorem 1.1. Let θ̂n be a sequence of unbiased estimators of the scalar

parameter θ: h i
E θ̂n = θ, ∀n, ∀θ ∈ Θ.
1.1. PARAMETRIC ESTIMATION 7

If h i
lim E (θ̂n − θ)2 = 0,
n→∞

then the sequence θ̂n is consistent.

Proof
For a random variable x, the Chebishev inequality holds:
1
P (|x − mx | ≥ ε) ≤ 2 E (x − mx )2 .

ε
Therefore, one has
1 h 2
i
lim P θ̂n − θ ≥ ε ≤ lim 2 E (θ̂n − θ) ,
n→∞ n→∞ ε

from which the result follows immediately.

Therefore, for a sequence of unbiased estimators to be consistent, it is suffi-
cient that the variance of the estimates goes to zero as the number of mea-
surements grows.

Example 1.4. Let y 1 , . . . , y n be i.i.d. random variables with mean m and

variance σ 2 . In Example 1.1 it has been shown that the sample mean
n
1X
y= y
n i=1 i
is an unbiased estimator of the mean m. Let us now show that it is also a
consistent estimator of m. The variance of the estimate is given by
 !2 
n
1 X
Var(y) = E (y − m)2 = E 

y −m 
n i=1 i
 !2 
n
1 X σ2
= 2E (y i − m)  =
n i=1
n

because the random variables y i are independent. Therefore,

σ2
Var(y) = → 0 as n → ∞.
n
Hence, due to Theorem 1.1, the sample mean y is a consistent estimator of
the mean m. △
8 CHAPTER 1. ESTIMATION THEORY

The result in Example 1.4 is a special case of the following more general
celebrated result.

Theorem 1.2. (Law of large numbers)

Let {y i }∞
i=1 be a sequence of independent random variables with mean m and
finite variance. Then, the sample mean y converges to m in probability.

Mean square error

A criterion for measuring the quality of the estimate provided by an estimator

is the Mean Square Error. Let us first consider the case of a scalar parameter
(θ ∈ R).

Definition 1.4. Let θ ∈ R. The Mean Square Error (MSE) of an estimator

T (·) is defined as
MSE T (·) = Eθ (T (y) − θ)2

Notice that if an estimator is unbiased, then the MSE is equal to the

variance of the estimate T (y), and also to the variance of the estimation
error T (y) − θ. On the other hand, for a biased estimator one has

MSE T (·) = Eθ (T (y) − mT (y) + mT (y) − θ)2

= Eθ (T (y) − mT (y) )2 + (mT (y) − θ)2

where mT (y) = E [T (y)]. The above expression shows that the MSE of
a biased estimator is the sum of the variance of the estimator and of the
square of the deterministic quantity mT (y) − θ, which is called bias error. As
we will see, the trade off between the variance of the estimator and the bias
error is a fundamental limitation in many practical estimation problems.
The MSE can be used to decide which estimator is better within a family
of estimators.

Definition 1.5. Let T1 (·) and T2 (·) be two estimators of the parameter θ.
Then, T1 (·) is uniformly preferable to T2 (·) if

Eθ (T1 (y) − θ)2 ≤ Eθ (T2 (y) − θ)2 ,

∀θ ∈ Θ
1.1. PARAMETRIC ESTIMATION 9

It is worth stressing that in order to be preferable to other estimators, an

estimator must provide a smaller MSE for all the admissible values of the
parameter θ.
The above definitions can be extended quite naturally to the case of a
parameter vector θ ∈ Rp .

Definition 1.6. Let θ ∈ Rp . The Mean Square Error (MSE) of an estimator

T (·) is defined as

MSE T (·) = Eθ kT (y) − θ)k2

= Eθ tr{(T (y) − θ)(T (y) − θ)T }

where tr(M) denotes the trace of the matrix M.

The concept of uniformly preferable estimator is analogous to that in

Definition 1.5. It can be also defined in terms of inequality between the
corresponding covariance matrices, i.e., T1 (·) is uniformly preferable to T2 (·)
if
Eθ (T1 (y) − θ)(T1 (y) − θ)T ≤ Eθ (T2 (y) − θ)(T2 (y) − θ)T

where the matrix inequality A ≤ B means that B−A is a positive semidefinite

matrix.

1.1.3 Minimum variance unbiased estimator

Let us restrict our attention to unbiased estimators. Since we have introduced
the concept of mean square error, it is natural to look for the estimator which
minimizes this performance index.

Definition 1.7. An unbiased estimator T ∗ (·) of the scalar parameter θ is a

Uniformly Minimum Variance Unbiased Estimator (UMVUE) if

Eθ (T ∗ (y) − θ)2 ≤ Eθ (T (y) − θ)2 ,

∀θ ∈ Θ (1.3)

for all unbiased estimators T (·) of θ.

10 CHAPTER 1. ESTIMATION THEORY

Notice that for an estimator to be UMVUE, it has to satisfy the following

conditions:

• be unbiased;

• have minimum variance among all unbiased estimators;

• the previous condition must hold for every admissible value of the pa-
rameter θ.

Unfortunately, there are many problems for which there does not exist any
UMV UE estimator. For this reason, we often restrict the class of estimators,
in order to find the best one within the considered class. A popular choice
is that of linear estimators, i.e., taking the form
n
X
T (y) = ai y i , (1.4)
i=1

with ai ∈ R.

Definition 1.8. A linear unbiased estimator T ∗ (·) of the scalar parameter θ

is the Best Linear Unbiased Estimator (BLUE) if

Eθ (T ∗ (y) − θ)2 ≤ Eθ (T (y) − θ)2 ,

∀θ ∈ Θ

for every linear unbiased estimator T (·) of θ.

Differently from the UMVUE estimator, the BLUE estimator takes on a

simple form and can be easily computed (one has just to find the optimal
values of the coefficients ai ).

Example 1.5. Let y i be independent random variables with mean m and

variance σi2 , i = 1, . . . , n. Assume the variances σi2 are known. Let us
compute the BLUE estimator of m. Being the estimator linear, it takes on
the form (1.4). In order to be unbiased, T (·) must satisfy
" n # n n
X X X
Eθ [T (y)] = Eθ ai y i = ai Eθ [y i ] = m ai = m
i=1 i=1 i=1
1.1. PARAMETRIC ESTIMATION 11

Therefore, we must enforce the constraint

n
X
ai = 1 (1.5)
i=1

Now, among all the estimators of form (1.4), with the coefficients ai satisfying
(1.5), we need to find the minimum variance one. Being the observations y i
independent, the variance of T (y) is given by
 !2 
n
X n
X
Eθ (T (y) − m)2 = Eθ  a2i σi2 .

ai y i − m =
i=1 i=1

Summing up, in order to determine the BLUE estimator, we have to solve

the following constrained optimization problem:
n
X
min a2i σi2
ai
i=1

s.t.
n
X
ai = 1
i=1

Let us write the Lagrangian function

n n
!
X X
L(a1 , . . . , an , λ) = a2i σi2 + λ ai − 1
i=1 i=1

and compute the stationary points by imposing

∂L(a1 , . . . , an , λ)
= 0, i = 1, . . . , n (1.6)
∂ai
∂L(a1 , . . . , an , λ)
= 0. (1.7)
∂λ

From (1.7) we obtain the constraint (1.5), while (1.6) implies that

2ai σi2 + λ = 0, i = 1, . . . , n
12 CHAPTER 1. ESTIMATION THEORY

from which
1
λ=− n (1.8)
X 1
i=1
2σi2
1
σi2
ai = n , i = 1, . . . , n (1.9)
X 1
j=1
σj2

Tehrefore, the BLUE estimator of the mean m is given by

n
1 X 1
m̂BLU E = n y (1.10)
X 1 σ2 i
i=1 i
σ2
i=1 i

Notice that if all the measurements have the same variance σi2 = σ 2 , the
estimator m̂BLU E boils down to the sample mean y. This means that the
BLUE estimator can be seen as a generalization of the sample mean, in
the case when the measurements y i have different accuracy (i.e., different
variance σi2 ). In fact, the BLUE estimator is a weighted average of the ob-
servations, in which the weights are inversely proportional to the variance of
the measurements or, seen another way, directly proportional to the precision
of each observation. Let us assume that for a certain i, σi2 → ∞. This means
1
that the measurement y i is completely unreliable. Then, the weight σi2
of y i
within m̂BLU E will tend to zero. On the other hand, for an infinitely precise
1
measurement y j (σj2 → 0), the corresponding weight σj2
will be predominant
over all the other weights and the BLUE estimate will approach that mea-
surement, i.e., m̂BLU E ≃ y j . △

1.2 Cramér-Rao bound

This paragraph introduces a fundamental result which establishes a lower
bound to the variance of every unbiased estimator of the parameter θ.
1.2. CRAMÉR-RAO BOUND 13

Theorem 1.3. (Cramér-Rao bound) Let T (·) be an unbiased estimator

of the scalar parameter θ based on the observations y of the random variables
y ∈ Y, and let that the observation set Y be independent from θ. Then,
under some technical regularity assumptions (see (Rohatgi and Saleh, 2001)),
it holds
Eθ (T (y) − θ)2 ≥ [In (θ)]−1 ,

(1.11)

where  !2 
θ
∂ ln fyθ (y)
In (θ) = E  (1.12)
∂θ

is called Fisher information. Moreover, if the observations y 1 , . . . , y n are

independent and identically distributed with the same pdf fyθ1 (y1 ), one has

In (θ) = n I1 (θ).

When θ is a p-dimensional vector, the Cramér-Rao bound (1.11) becomes

h i
Eθ (T (y) − θ) (T (y) − θ)T ≥ [In (θ)]−1 ,

where the inequality must be intended in matricial sense and the matrix
In (θ) ∈ Rp×p is the so-called Fisher information matrix
 ! !T 
θ θ
∂ ln fy (y) ∂ ln fy (y)
In (θ) = Eθ  .
∂θ ∂θ

h i
Notice that the matrix Eθ (T (y) − θ) (T (y) − θ)T is the covariance matrix
of the unbiased estimator T (·).
Theorem 1.3 states that there does not exist any estimator with variance
smaller than [In (θ)]−1 . Notice that In (θ) depends, in general, on the actual
value of the parameter θ (because the partial derivatives must be evaluated
in θ) which is unknown. For this reason, an approximation of the lower
bound is usually computed in practice, by replacing θ with an estimate θ̂.
Nevertheless, the Cramér-Rao is also important because it allows to define
the key concept of efficiency of an estimator.
14 CHAPTER 1. ESTIMATION THEORY

Definition 1.9. An unbiased estimator T (·) is efficient if its variance achieves

the Cramér-Rao bound, i.e.

Eθ (T (y) − θ)2 = [In (θ)]−1 .

An efficient estimator has the least possible variance among all unbiased
estimators (therefore, it is also a UMVUE).
In the special case of i.i.d. observations y i , Theorem 1.3 states that
In (θ) = nI1 (θ), where I1 (θ) is the Fisher information of a single observa-
tion. Therefore, for a fixed θ, the Cramér-Rao bound decreases as n1 , as the
number of observations n grows.

Example 1.6. Let y 1 , . . . , y n be i.i.d. random variables with mean my and

variance σy2 . In Examples 1.1 and 1.4, we have seen that the sample mean
n
1X
y= y
n i=1 i

is a consistent unbiased estimator of the mean my . Being the observations

i.i.d., from Theorem 1.3 one has

θ
2
σy2 −1 [I1 (θ)]−1
E (y − my ) = ≥ [In (θ)] = .
n n
Let us now assume that the y i are distributed according to the Gaussian pdf
(y −m ) 2
1 − i 2y
fyi (yi ) = √ e 2σy
.
2πσy
Let us compute the Fisher information of a single measurement
 !2 
θ
∂ ln fy1 (y1 )
I1 (θ) = Eθ  .
∂θ

In this example, the unknown parameter to be estimated is the mean θ = m.

Therefore,
∂ ln fyθ1 (y1 ) (y1 − m)2

∂ 1 y − my
= ln √ − = ,
∂θ ∂m 2πσy 2σy2 m=my
σy2
1.3. MAXIMUM LIKELIHOOD ESTIMATOR 15

and hence,
(y − my )2

θ 1
I1 (θ) = E = .
σy4 σy2
The Cramér-Rao bound takes on the value

−1 [I1 (θ)]−1 σy2

[In (θ)] = = ,
n n
which is equal to the variance of the estimator y. Therefore, we can con-
clude that: in the case of i.i.d. Gaussian observations, the sample mean is
an efficient estimator of the mean. △

1.3 Maximum Likelihood Estimator

In general, for a given parametric estimation problem, an efficient estimator
may not exist. In Example 1.6, it has been shown that the Cramér-Rao
bound allows one to check if an estimator is efficient. However, it remains
unclear how to find suitable candidates for efficient estimators and, in the
case that such candidates turn out to be not efficient, whether it is possible
to conclude that for the problem at hand there are no efficient estimators.
An answer to these questions is provided by the class of Maximum Likelihood
estimators.

Definition 1.10. Let y be a vector of observations with pdf fyθ (y), depend-
ing on the unknown parameter θ ∈ Θ. The likelihood function is defined
as
L(θ|y) = fyθ (y) .

It is worth remarking that, once the realization y of the random variable

y has been observed (i.e., after the data have been collected), the likelihood
function depends only on the unknown parameter θ (indeed, we refer to
L(θ|y) as the likelihood of θ “given” y).
A meaningful way to estimate θ is to choose the value that maximizes the
probability of the observed data. In fact, by exploiting the meaning of the
16 CHAPTER 1. ESTIMATION THEORY

probability density function, maximizing fyθ (y) with respect to θ corresponds

to choose θ in such a way that the measurement y has the highest possible
probability of having been observed, among all feasible scenarios θ ∈ Θ.

Definition 1.11. The Maximum Likelihood (ML) estimator of the unknown

parameter θ is given by

TM L (y) = arg max L(θ|y).

θ∈Θ

In several problems, in order to ease the computation, it may be conve-

nient to maximize the so-called log-likelihood function:

ln L(θ|y).

Being the natural logarithm a monotonically increasing function, L(θ|y) and

ln L(θ|y) achieve their maxima in the same values.

Remark 1.1. Assuming that the pdf fyθ (y) be a differentiable function of
θ = (θ1 , . . . , θp ) ∈ Θ ⊆ Rp , with Θ an open set, if θ̂ is a maximum for L(θ|y),
it has to be a solution of the equations

∂L(θ|y)
= 0, i = 1, . . . , p (1.13)
∂θi θ=θ̂

or equivalently of

∂ ln L(θ|y)
= 0, i = 1, . . . , p. (1.14)
∂θi θ=θ̂

It is worth observing that in many problems, even for a scalar parameter

(p = 1), equation (1.13) may admit more than one solution. It may also
happen that the likelihood function is not differentiable everywhere in Θ or
that Θ is not an open set, in which case the maximum can be achieved on
the boundary of Θ. For all these reasons, the computation of the maximum
likelihood estimator requires to study the function L(θ|y) over the entire
domain Θ (see Exercise 1.5). Clearly, this may be a formidable task for high
dimensional parameter vectors.
1.3. MAXIMUM LIKELIHOOD ESTIMATOR 17

Example 1.7. Let y 1 , . . . , y n be independent Gaussian random variables,

with unknown mean my and known variance σy2 . Let us compute the ML
estimator of the mean my .
Being the measurements independent, the lieklihood is given by
n 2
(y −m)
Y 1 − i 2
L(θ|y) = fyθ (y) = √ e 2σy .
i=1
2πσy

In this case, it is convenient to maximize the log-likelihood, which takes on

the form
n
(yi − m)2

X 1
ln L(θ|y) = ln √ −
i=1
2πσ y 2σy2
n
1 X (yi − m)2
= n ln √ − 2
.
2πσy i=1
2σy

By imposing the condition (1.14), one gets

n
!
∂ ln L(θ|y) ∂ 1 X (yi − m)2
= n ln √ − = 0,
∂θ ∂m 2πσy i=1
2σy2
m=m̂M L

from which
n
X yi − m̂M L
= 0,
i=1
σy2

and hence
n
1X
m̂M L = y.
n i=1 i

Therefore, in this case the ML estimator coincides with the sample mean.
Since the observations are i.i.d. Gaussian variables, this estimator is also ef-
ficient (see Example 1.6). △

The result in Example 1.7 is not restricted to the specific setting or pdf
considered. The following general theorem illustrates the importance of max-
imum likelihood estimators, in the context of parametric estimation.
18 CHAPTER 1. ESTIMATION THEORY

Theorem 1.4. Under the same assumptions for which the Cramér-Rao bound
holds, if there exists an efficient estimator T ∗ (·), then T ∗ (·) is a maximum
likelihood estimator.

Therefore, if we are looking for an efficient estimator, the only candidates

are maximum likelihood estimators.

Example 1.8. Let y 1 , . . . , y n be independent Gaussian random variables,

with mean my and variance σy2 , both unknown. Let us compute the Maxi-
mum Likelihood estimator of the mean and the variance.
Similarly to what observed in Example 1.7, the log-likelihood turns out
to be n
1 X (yi − m)2
ln L(θ|y) = n ln p − .
2πσ 2 i=1
2σ 2

The unknown parameter vector to be estimated is θ = (m, σ 2 )T , for which

condition (1.14) becomes
n
!
∂ ln L(θ|y) ∂ 1 X (yi − m)2
= n ln p − = 0,
∂θ1 ∂m 2πσ 2 i=1
2σ 2 2
(m=m̂M L ,σ2 =σ̂M L)
n
!
∂ ln L(θ|y) ∂ 1 X (yi − m)2
= n ln p − = 0.
∂θ2 ∂σ 2 2πσ 2 i=1
2σy2 2
(m=m̂M L ,σ2 =σ̂M L)

By differentiating with respect m and σ 2 , one gets

n
X yi − m̂M L
2
=0
i=1
σ̂M L
n
n 1 X
− 2
+ 4 (yi − m̂M L )2 = 0,
2σM L 2σM L i=1

from which
n
1X
m̂M L = y
n i=1 i
n
2 1X
σM = (y − m̂M L )2 .
L
n i=1 i
1.4. NONLINEAR ESTIMATION WITH ADDITIVE NOISE 19

Although Eθ [m̂M L ] = my (see Example 1.1), one has Eθ [σM

2
L] =
n−1 2
n
σy (see
Example 1.2). Therefore, in this case, the Maximum Likelihood estimator is
biased and hence it is not efficient. Due to Theorem 1.4, we can conclude that
there does not exist any efficient estimator for the parameter θ = (m, σ 2 )T . △

The previous example shows that Maximum Likelihood estimators can

be biased. However, besides the motivations provided by Theorem 1.4, there
exist other reasons that make such estimators attractive.

Theorem 1.5. If the random variables y 1 , . . . , y n are i.i.d., then (under

suitable technical assumptions)
p
lim In (θ) (TML (y) − θ)
n→+∞

is a random variable with standard normal distribution N(0, 1).

Theorem 1.5 states that the maximum likelihood estimator is:

• asymptotically unbiased;

• consistent;

• asymptotically efficient;

• asymptotically normal.

1.4 Nonlinear estimation with additive noise

A popular class of estimation problems is the one in which the aim is to esti-
mate a parameter θ, by using n measurements y = (y 1 , . . . , y n )T corrupted
by additive noise. Formally, let

h : Θ ⊆ Rp → Rn

be a deterministic function of θ. The aim is to estimate θ by using the

observations
y = h(θ) + ε
20 CHAPTER 1. ESTIMATION THEORY

where ε ∈ Rn represents the measurement noise, modeled as a vector of

random variables with pdf fε (ε).
Under this assumptions, the likelihood function is given by

L(θ|y) = fyθ (y) = fε (y − h(θ)) .

In the case in which the measurement noise ε is distributed according to

the Gaussian pdf
1 1 T −1
fε (ε) = e− 2 εΣε ε
(2π)n/2 (det Σε )1/2
with zero mean and known covariance matrix Σε , the log-likelihood function
takes on the form
1
ln L(θ|y) = K − (y − h(θ))T Σ−1
ε (y − h(θ)),
2
where K is a constant that does not depend on θ. The computation of
the maximum likelihood estimator boils down to the following optimization
problem

θ̂M L = arg max ln L(θ|y)

= arg min(y − h(θ))T Σ−1

ε (y − h(θ)). (1.15)
θ

Being h(·), in general, a nonlinear function of θ, the solution of (1.15) can

be computed by resorting to numerical methods. Clearly, the computational
complexity depends not only on the number p of parameters to be estimated
and on the size n of the data set, but also on the structure of h(·). For
example, if h(·) is convex there are efficient algorithms that allow to solve
problems with very large n and p, while if h(·) is noncovex the problem may
become intractable even for relatively small values of p.

1.5 Linear estimation problems

An intersting scenario is the one in which the relationship between the un-
known parameters and the data is linear, i.e. h(θ) = U θ, where U ∈ Rn×p .
1.5. LINEAR ESTIMATION PROBLEMS 21

In this case, the measurement equation takes on the form

y = Uθ + ε. (1.16)

In the following, we will assume that rank(U) = p, which means that the
number of linearly independent measurements is not smaller than the number
of parameters to be estimated (otherwise, the problem is ill posed).
We now introduce two popular estimators that can be used to estimate
θ in the setting (1.16). We will discuss their properties, depending on the
assumptions we make on the measurement noise ε. Let us start with the
Least Squares estimator.

Definition 1.12. Let y be a vector of random variables related to θ according

to (1.16). The estimator

TLS (y) = (U T U)−1 U T y (1.17)

is called Least Squares (LS) estimator of the parameter θ.

The name of this estimator comes from the fact that it minimizes the
sum of the squared differences between the data realization y and the model
Uθ, i.e.
θ̂LS = arg min ky − Uθk2 .
θ

Indeed,

ky − Uθk2 = (y − Uθ)T (y − Uθ) = y T y + θT U T Uθ − 2y T Uθ.

By differentiating with respect to θ, on gets

∂
ky − Uθk2 T
= 2θ̂LS U T U − 2y T U = 0,
∂θ θ=θ̂LS

∂xT Ax ∂Ax
where the properties ∂x
= 2xT A and ∂x
= A have been exploited. By
T
solving with respect to θ̂LS , one gets

T
θ̂LS = y T U(U T U)−1 .
22 CHAPTER 1. ESTIMATION THEORY

Finally, by transposing the above expression and taking into account that
the matrix (U T U) is symmetric, one obtains the equation (1.17).
It is worth stressing that the LS estimator does not require any a priori
information about the noise ε to be computed. As we will see in the sequel,
however, the properties of ε will influence those of the LS estimator .

Definition 1.13. Let y be a vector of random variables related to θ according

to (1.16). Let Σε be the covariance matrix of ε. The estimator:

TGM (y) = (U T Σ−1 −1 T −1

ε U) U Σε y (1.18)

is called Gauss-Markov (GM) estimator (or Weighted Least Squares Estima-

tor) of the parameter θ.

Similarly to what has been shown for the LS estimator, it is easy to verify
that the GM estimator minimizes the weighted sum of squared errors between
y and Uθ, i.e.
θ̂GM = arg min(y − Uθ)T Σ−1
ε (y − Uθ).
θ

Notice that the Gauss-Markov estimator requires the knowledge of the co-
variance matrix Σε of the measurement noise. By using this information, the
measurements are weighted with a matricial weight that is inversely propor-
tional to their uncertainty.
Under the assumtpion that the noise has zero mean, E [ε] = 0, it is easy
to show that both the LS and the GM estimator are unbiased. For the LS
estimator one has
h i
Eθ θ̂LS = Eθ (U T U)−1 U T y = Eθ (U T U)−1 U T (Uθ + ε)

= Eθ θ + (U T U)−1 U T ε = θ.

For the GM estimator,

h i
Eθ θ̂GM = Eθ (U T Σ−1 −1 T −1 θ
T −1 −1 T −1
ε U) U Σε y = E (U Σε U) U Σε (Uθ + ε)

= Eθ θ + (U T Σ−1 −1 T −1

ε U) U Σε ε = θ.
1.5. LINEAR ESTIMATION PROBLEMS 23

If the noise vector ε has non-zero mean, mε = E [ε], but the mean mε
is known, the LS and GM estimators can be easily amended to remove the
bias. In fact, if we define the new vector of random variables ε̃ = ε − mε ,
the equation (1.16) can be rewritten as

y − mε = Uθ + ε̃, (1.19)

and being clearly E [ε̃] = 0, E [ε̃ε̃′ ] = Σε , all the treatment can be repeated
by replacing y with y − mε . Therefore, the expressions of the LS and GM
estimators remain those in (1.17) and (1.18), with y replaced by y − mε .
The case in which the mean of ε is unknown is more intriguing. In some
cases, one may try to estimate it from the data, along with the parameter θ.
Assume for example that E [εi ] = m̄ε , ∀i. This means that E [ε] = m̄ε · 1,
where 1 = [1 1 ... 1]T . Now, one can define the extended parameter
vector θ̄ = [θ′ m̄ε ]T ∈ Rp+1 , and use the same decomposition as in (1.19) to
obtain
y = [U 1]θ̄ + ε̃
Then, one can apply the LS or GM estimator, by replacing U with [U 1], to
obtain a simultaneous estimate of the p parameters θ and of the scalar mean
m̄ε .

An important property of the Gauss-Markov estimator is that of being

the minimum variance estimator among all linear unbiased estimators, i.e.,
the BLUE (see Definition 1.8). In fact, the following result holds.

Theorem 1.6. Let y be a vector of random variables related to the param-

eter θ according to (1.16). Let Σε be the covariance matrix of ε. Then, the
BLUE estimator of θ is the Gauss-Markov estimator (1.18).The correspond-
ing variance of the estimation error is given by
h i
E (θ̂GM − θ)(θ̂GM − θ)T = (U T Σ−1 −1
ε U) . (1.20)

In the special case Σε = σε2 In (with In identity matrix of dimension n), i.e.,
when the variables ε are uncorrelated and have the same variance σε2 , the
BLUE estimator is the Least Squares estimator (1.17).
24 CHAPTER 1. ESTIMATION THEORY

Proof
Since we consider the class of linear unbiased estimators, we have T (y) = Ay,
and E [Ay] = AE [y] = AUθ. Therefore, one must impose the constraint
AU = Ip to guarantee that the estimator is unbiased.
In order to find the minimum variance estimator, it is necessary to minimize
(in matricial sense) the covariance of the estimation error

E (Ay − θ)(Ay − θ)T = E (AUθ + Aε − θ)(·)T

= E AεεT AT

= AΣε AT

where we have enforced the constraint AU = Ip in the second equality. Then,

the BLUE estimator is obtained by solving the constrained optimization
problem
ABLU E = arg min AΣε AT
A
s.t. (1.21)
AU = Ip
and then setting T (y) = ABLU E y.
Being the constraint AU = Ip linear in the matrix A, it is possible to param-
eterize all the admissible solutions A as

A = (U T Σ−1 −1 T −1
ε U) U Σε + M (1.22)

with M ∈ Rp×n such that MU = 0. It is easy to check that all matrices A

defined by (1.22) satisfy the constraint AU = Ip . It is therefore sufficient to
find the one that minimizes the quantity AΣε AT . By substituting A with
the expression (1.22), one gets

AΣε AT = (U T Σ−1 −1 T −1 −1 T −1
ε U) U Σε Σε Σε U(U Σε U)
−1

+(U T Σ−1 −1 T −1
ε U) U Σε Σε M
T

+MΣε Σ−1 T −1
ε U(U Σε U)
−1
+ MΣε M T
= (U T Σ−1
ε U)
−1
+ MΣε M T
≥ (U T Σ−1
ε U)
−1
1.5. LINEAR ESTIMATION PROBLEMS 25

where the second equality is due to MU = 0, while the final inequality

exploits the fact that Σε is positive definite and hence MΣε M T is posi-
tive semidefinite. Since the expression (U T Σ−1
ε U)
−1
does not dipend on M,
we can conclude that the solution of problem (1.21) is obtained by setting
M = 0 in (1.22), which amounts to choosing ABLU E = (U T Σ−1 −1 T −1
ε U) U Σε .
Therefore, the BLUE estimator coincides with the Gauss-Markov one. The
expression of the covariance of the estimation error (1.20) is obtained from
AΣε AT when M = 0.
Finally, if Σε = σε2 In one has ABLU E = (U T U)−1 U T (whatever is the value
of σε2 ) and hence the GM estimator boils down to the LS one.

In Section 1.4 it has been observed that, if the measurement noise ε is

Gaussian, the Maximum Likelihood estimator can be computed by solving
the optimization problem (1.15). If the observations depend linearly on θ, as
in (1.16), such a problem becomes

θ̂M L = arg min(y − Uθ)T Σ−1

ε (y − Uθ). (1.23)
θ

As it has been noticed after Definition 1.13, the solution of (1.23) is actu-
ally the Gauss-Markov estimator. Therefore, we can state that: in the case
of linear observations corrupted by additive Gaussian noise, the Maximum
Likelihood estimator coincides with the Gauss-Markov estimator. Moreover,
it is possible to show that in this setting
 ! !T 
θ θ
∂ ln fy (y) ∂ ln fy (y)
Eθ   = U T Σ−1 U
∂θ ∂θ

and hence the Gauss-Markov estimator is efficient (and UMVUE).

Finally, if the measurements are also independent and have the same
variance σε2 , i.e., being the noise Gaussian,

ε ∼ N(0, σε2 In ),

the, the GM estimator boils down to the LS one. Therefore: in the case
of linear observations, corrupted by independent and identically distributed
26 CHAPTER 1. ESTIMATION THEORY

Gaussian noise, the Maximum Likelihood estimator coincides with the Least
Squares estimator.

The following table summarizes the properties of the GM and LS estima-

tors, depending on the assumptions made on the noise ε.

Assumptions on ε Properties GM Properties LS

none arg minθ (y − U θ)T Σ−1
ε (y − U θ) arg minθ ky − U θk2
with known Σε
E [ε] known unbiased unbiased
E [ε] = mε BLUE BLUE
h i
E (ε − mε )(ε − mε )T = Σε if Σε = σε2 In
ε ∼ N (mε , Σε ) ML estimator ML estimator
efficient, UMVUE if Σε = σε2 In

Table 1.1: Properties of GM and LS estimators.

Example 1.9. On an unknown parameter θ, we collect n measurements

yi = θ + vi , i = 1, . . . , n

where the vi are realizations of n random variables v i , independent, with

zero mean and variance σi2 , i = 1, . . . , n.
It is immediate to verify that the measurements yi are realizations of random
variables y i , with mean θ and variance σi2 . Therefore, the estimate of θ can
be cast in terms of the estimate of the mean of n random variables (see
Examples 1.1 and 1.5, and Exercise 1.1).

1.6 Bayesian Estimation

In the Bayesian estimation setting, the quantity to be estimated is not deter-
ministic, but it is modeled as a random variable. In particular, the objective
is to estimate the random variable x ∈ Rm , by using observations of the ran-
dom variable y ∈ Rn (we will denote the unknown variable to be estimated
1.6. BAYESIAN ESTIMATION 27

by x instead of θ, to distinguish between the parametric and the Bayesian

framework). Clearly, the complete knowledge on the stochastic relationship
between x and y is given by the joint pdf fx,y (x, y).
As in the parametric setting, the aim is to find an estimator x̂ = T (y),
where
T (·) : Rn → Rm

Definition 1.14. In the Bayesian setting, an estimator T (·) is unbiased if

E [T (y)] = E [x] .

Similarly to what has been done in parametric estimation problems, it is

necessary to introduce a criterion to evaluate the quality of an estimator.

Definition 1.15. We define Bayes risk function the quantity

Z +∞ Z +∞
Jr = E [d(x, T (y))] = d(x, T (y))fx,y (x, y) dxdy
−∞ −∞

where d(x, T (y)) denotes the distance between x and its estimate T (y),
according to a suitable metric.

Since the distance d(x, T (y)) is a random variable, the aim is to minimize
its expected value, i.e. to find

T ∗ (·) = arg min Jr .

T (·)

1.6.1 Minimum Mean Square Error Estimator

A standard choice for the distance d(·) is the quadratic error

d(x, T (y)) = kx − T (y)k2 .

Definition 1.16. The minimum Mean Square Error (MSE) estimator is

defined as x̂M SE = T ∗ (·), where

T ∗ (·) = arg min E kx − T (y)k2 .

(1.24)
T (·)
28 CHAPTER 1. ESTIMATION THEORY

Notice that in (1.24), the expected value is computed with respect to

both random variables x and y, and hence it is necessary to know the joint
pdf fx,y (x, y).
The following fundamental result provides the solution to the minimum
MSE estimation problem.

Theorem 1.7. Let x be a random variable and y a vector of observations.

The minimum MSE estimator x̂M SE of x based on y is equal to the condi-
tional expected value of x given y:

x̂M SE = E [x|y] .

The previous result states that the estimator minimizing the MSE is the
a posteriori expected value of x, given the observation of y, i.e.
Z +∞
x̂M SE = xfx|y (x|y)dx. (1.25)
−∞

which is indeed a function of y.

Since it is easy to prove that

E [E [x|y]] = E [x] ,

one can conclude that the minimum MSE estimator is always unbiased.
The minimum MSE estimator has other attractive properties. In partic-
ulare, if we consider the matrix

Q(x, T (y)) = E (x − T (y))(x − T (y))T ,

it can be shown that:

• x̂M SE is the estimator minimizing (in matricial sense) Q(x, T (y)), i.e.

Q(x, x̂M SE ) ≤ Q(x, T (y)), ∀ T (y)

• x̂M SE minimizes every monotonically increasing scalar function of Q(x, T (y)),

like for example the trace of Q, corresponding to the MSE, E kx − T (y)k2 .

1.6. BAYESIAN ESTIMATION 29

The computation of the minimum MSE estimator may be difficult, or

even intractable, in practical problems, because it requires the knowledge of
the joint pdf fx,y (x, y) and the computation of the integral (1.25).

Example 1.10. Consider two random variables x and y, whose joint pdf is
given by

− 3 x2 + 2xy if 0 ≤ x ≤ 1, 1≤y≤2
2
fx,y (x, y) =
0 elsewhere

Let us find the minimum MSE estimator of x based on one observation of y.

From Theorem 1.7, we know that
Z +∞
x̂M SE = xfx|y (x|y)dx.
−∞

First, we need to compute

fx,y (x, y)
fx|y (x|y) = .
fy (y)
The marginal pdf of y can be calculated from the joint pdf as
Z 1
3
fy (y) = − x2 + 2xydx
0 2
x=1
x3 1
=− + yx2 =y− .
2 x=0 2
Hence, the conditional pdf is given by
 3 2
 − 2 x +2xy if 0 ≤ x ≤ 1, 1≤y≤2
y− 12
fx|y (x|y) =
0 elsewhere

Now, it is possible to compute the minimum MSE estimator

− 3 x2 + 2xy
Z 1
x̂M EQM = x 2 dx
0 y − 12
x=1 2
y − 83

1 3 4 2 3 3
= − x + x y = .
y − 21 8 3 x=0 y − 12
△
30 CHAPTER 1. ESTIMATION THEORY

1.6.2 Linear Mean Square Error Estimator

We now restrict our attention to the class of affine linear estimators

T (y) = Ay + b (1.26)

in which the matrix A ∈ Rm×n and the vector b ∈ Rm are the coefficients of
the estimator to be determined. Among all estimators of the form (1.26), we
aim at finding the one minimizing the MSE.

Definition 1.17. The Linear Mean Square Error (LMSE) estimator is de-
fined as x̂LM SE = A∗ y + b∗ , where

A∗ , b∗ = arg min E kx − Ay − bk2 .

(1.27)
A,b

Theorem 1.8. Let x be a random variable and y a vector of observations,

such that
E [x] = mx , E [y] = my
 ! !T  !
x − mx x − mx Rx Rxy
E =
T
.
y − my y − my Rxy Ry
Then, the solution of problem (??) is given by

A∗ = Rxy Ry−1 ,
b∗ = mx − Rxy Ry−1 my .

and hence, the LMSE estimator x̂LM SE of x is given by

x̂LM SE = mx + Rxy Ry−1 (y − my ).

Proof
First, observe that the cost to be minimized is

E kx − Ay − bk2 = tr E (x − Ay − b)(x − Ay − b)T .

Since the trace is a monotonically increasing function, solving problem (??)

is equivalent to find A∗ , b∗ such that

E (x − A∗ y − b∗ )(x − A∗ y − b∗ )T ≤ E (x − Ay − b)(x − Ay − b)T ∀A, b

(1.28)
1.6. BAYESIAN ESTIMATION 31

Therefore, by denoting the estimation error as x̃ = x − Ay − b, one gets

E x̃x̃T = E [(x − mx − A(y − my ) + mx − Amy − b)

i
× (x − mx − A(y − my ) + mx − Amy − b)T

= Rx + ARy AT − Rxy AT − ARyx

+ (mx − Amy − b)(mx − Amy − b)T
= Rx + ARy AT − Rxy AT − ARxy
T
+ Rxy Ry−1 Rxy
T
− Rxy Ry−1 Rxy
T

+ (mx − Amy − b)(mx − Amy − b)T

T
= Rx − Rxy Ry−1 Rxy
T
+ Rxy Ry−1 − A Ry Rxy Ry−1 − A

+ (mx − Amy − b)(mx − Amy − b)T . (1.29)

Observe that the last two terms of the previous expression are positive
semidefinite matrices. Hence, the solution of problem (1.28) is obtained by
choosing A∗ , b∗ such that the last two terms are equal to zero, i.e.

A∗ = Rxy Ry−1 ;
b∗ = mx − Amy = mx − Rxy Ry−1 my .

This concludes the proof.

The LMSE estimator is unbiased because the expected value of the esti-
mation error is equal to zero. In fact,

E [x̃] = E [x − x̂LM SE ] = mx − E mx + Rxy Ry−1 (y − my )

= mx − mx + Rxy Ry−1 E [y − my ] = 0.

By setting A = A∗ and b = b∗ in the last expression in (1.29), we can

compute the variance of the estimation error of the LMSE estimator, which
is equal to
T
E x̃x̃T = Rx − Rxy Ry−1 Rxy

.
It is worth noting that, by interpreting Rx as the a priori uncertainty on
x, Rx − Rxy Ry−1 Rxy
T
represents the new uncertainty on x after having ob-
served the measurement y. Since the matrix Rxy Ry−1 Rxy
T
is always positive
32 CHAPTER 1. ESTIMATION THEORY

semidefinite, the effect of the observations is that to reduce the uncertainty

on x. Moreover, such a reduction depends on the size of Rxy , i.e., on the cor-
relation between the measurement y and the unknown x (notice that there
is no uncertainty reduction when Rxy = 0, as expected).
It is worth stressing that in order to compute the LMSE estimator it is
not necessary to know the joint pdf fx,y (x, y), but only the first and second
order statistics mx , my , Rx , Ry , Rxy .
An interesting property of the LMSE estimator is that the estimation
error x̃ is uncorrelated to the observations y. In fact, one has

E x̃y T = E x − mx − Rxy Ry−1 (y − my ) y T

(1.30)
= Rxy − Rxy Ry−1 Ry = 0.
This result is often known as orthogonality principle. Conversely, it is pos-
sible to show that if a linear estimator satisfies the orthogonality condition
E x̃y T = 0, then it is the LMSE estimator.

In the case in which the random variables x, y are jointly Gaussian, with
mean and covariance matrix defined as in Theorem 1.8, we recall that the
conditional expected value of x given the observation of y is given by

E [x|y] = mx + Rxy Ry−1 (y − my ).

Therefore, we can conclude that: if x, y are Gaussian variables, the LMSE

estimator coincides with the minimum MSE estimator, i.e., x̂M SE = x̂LM SE .
In other words, in the Gaussian case the minimum MSE estimator is a linear
function of the observations y.

Example 1.11. Let y 1 , y 2 be two noisy observations of the scalar random

variable x, having mean mx and variance σx2 :

y 1 = x + ε1 ,
y 2 = x + ε2 .

Let ε1 , ε2 be two independent random variables, with zero mean and variance
σ12 , σ22 , respectively. Under the assumption that x and εi , i = 1, 2, are
independent, we aim at computing the LMSE estimator of x.
1.6. BAYESIAN ESTIMATION 33

Define the vectors y = (y 1 y 2 )T and ε = (ε1 ε2 )T , and rewrite the

measurement equations in the form

y = 1 x + ε,

where 1 = (1 1)T .
First, let us compute the mean of y

E [y] = E [1 x + ε] = 1 mx

In order to find the estimate x̂LM SE , we have to compute the covariance

matrices Rxy and Ry . We get
h i
Rxy = E (x − mx ) (1 (x − mx ) + ε)T = σx2 1T ,

because x and ε are uncorrelated. Moreover,

h i
T
Ry = E (1 (x − mx ) + ε) (1 (x − mx ) + ε)

= 1 σx2 1T + Rε ,

where !
σ12 0
Rε =
0 σ22

is the covariance matrix of ε. Finally, let us compute the inverse of the

measurement covariance matrix
" ! !#−1
1 1 σ12 0
Ry−1 = σx2 +
1 1 0 σ22
!−1
σx2 + σ12 σx2
=
σx2 σx2 + σ22
!
1 σx2 + σ22 −σx2
= 2 2 .
σx (σ1 + σ22 ) + σ12 σ22 −σx2 σx2 + σ12
34 CHAPTER 1. ESTIMATION THEORY

Hence, the LMSE estimator is given by

x̂LM SE = mx + Rxy Ry−1 (y − 1 mx )

= mx + σx2 1T Ry−1 (y − 1 mx )
! !
σx2 σx2 + σ22 −σx2 y 1 − mx
= mx + 2 2 (1 1)
σx (σ1 + σ22 ) + σ12 σ22 −σx2 σx2 + σ12 y 2 − mx
!
1 2 2 y 1 − mx
= mx + 2
σ σ 2 (σ2 σ 1 )
σ12 + σ22 + σ1 2 2 y 2 − mx
x

σ22 y 1 + σ12 y 2 − mx (σ12 + σ22 )

= mx + σ12 σ22
σ12 + σ22 + σx 2

mx σ12 σ22 mx 1 1
σx2 + σ22 y 1 + σ12 y 2 2
σx
+ y
σ12 1
+ y
σ22 2
= σ12 σ22
= σ12 +σ22
1
σ12 + σ22 + σx2 σx2 + σ12 σ22
mx 1 1
2
σx
+ y +
σ12 1 σ22 2
y
= 1 1 1 .
σx2 + σ2 + σ22
1

Notice that each measurement is weighted with a weight that is inversely

proportional to the variance of the noise affecting the measurement. More-
over, the a priori information on x (i.e., its mean mx and variance σx2 ), is
treated as an additional observation of x. In particular, it is interesting to
observe that if σx2 → +∞ (i.e., the a priori information on x is completely
unreliable), the estimate x̂LM SE takes on the same form of the Gauss-Markov
estimate of the mean mx (see Example 1.5 and Exercise 1.1). This highlights
the relationship between Bayesian and parametric estimation. △

1.7 Exercises

1.1. Verify that in the problem of Example 1.9, the LS and GM estimators
of θ coincide respectively with y in (1.2) and m̂BLU E in (1.10).
1.7. EXERCISES 35

1.2. Let d1 , d2 be two i.i.d. random variables, with pdf given by

(
θe−θδ se δ ≥ 0
f (δ) =
0 se δ < 0
Let δ1 , δ2 be the available observations of d1 , d2 . Find the Maximum Like-
lihood estimator of θ.

1.3. Let d1 , d2 be independent Gaussian random variables such that

E [d1 ] = m, E [d2 ] = 3m, E (d1 − m)2 = 2, E (d2 − 3m)2 = 4

Let δ1 , δ2 be the available observations of d1 , d2 . Find:

a) the minimum variance estimator of m among all linear unbiased esti-

mators;

b) the variance of such an estimator;

c) the Maximum Likelihood estimator (is it different from the estimator

in item a)?).

1.4. Two measurements are available on the unknown quantity x:

y 1 = x + d1
y 2 = 2x + d2
where d1 and d2 are independent disturbances modeled as random variables
with pdf (
λe−λδ se δ ≥ 0
f (δ) =
0 se δ < 0
a) Find the Maximum Likelihood estimator of x.

b) Determine if the ML estimator is unbiased.

1.5. Let x and y be two random variables with joint pdf

1

 3 (3x + y) 0 ≤ x ≤ θ, 0 ≤ y ≤ θ

fx,y (x, y) = 2θ

 0 elsewhere
where θ is a real parameter.
36 CHAPTER 1. ESTIMATION THEORY

a) Assume θ = 1 and suppose that an observation y of the random variable

y is available. Compute the minimum MSE estimator x̂M SE of x, based
on the observation y.

b) Assume θ is unknown and suppose that an observation y of the random

variable y is available. Compute the ML estimator θ̂M L of the param-
eter θ, based on the measurement y. Establish if such an estimator is
unbiased.

c) Assume θ is unknown and suppose that two observations x and y of the

random variables x and y are available. Compute the ML estimator
θ̂M L of the parameter θ, based on the measurements x and y.

1.6. Let θ ∈ [−2, 2] and consider the function


 θx + 1 − θ if x ∈ [0, 1]
θ
f (x) = 2
 0 elsewhere

a) Show that for all θ ∈ [−2, 2], f θ is a probability density function.

b) Let y be a random variable with pdf f θ . Compute mean and variance

of y as functions of θ.

c) Compute the Maximum Likelihood estimator of θ, based on an obser-

vation y of the random variable y.

d) Let y 1 , . . ., y n be n random variables, each one distributed according

to the pdf f θ , and consider the estimator
n
!
1X 1
T (y 1 , . . . , y n ) = 12 yk −
n k=1 2

Show that T (·) is an unbiased estimator of θ.

e) Find the variance of the estimation error for the estimator T (·) defined
in item d), in the case n = 1. Compute the Fisher information I1 (θ)
and show that the inequality (1.11) holds.
1.7. EXERCISES 37

1.7. Let a and b be two unknown quantities, for which we have three different
measurements:
y1 = a + v1
y2 = b + v2
y3 = a + b + v3
where v i , i = 1, 2, 3, are independent random variables, with zero mean.
Let E [v 21 ] = E [v 23 ] = 1 and E [v 22 ] = 21 . Find:

a) The LS estimator of a and b;

b) The GM estimator of a and b;

h i
c) The variance of the estimation error E (a − â)2 + (b − b̂)2 , for the
estimators computed in items a) and b).

Compare the obtained estimates with those one would have if the observation
y 3 were not available. How does the variance of the estimation error change?

1.8. Consider two random variables x and y, whose joint pdf is

− 23 x2 + 2xy
(
0 ≤ x ≤ 1, 1 ≤ y ≤ 2
fx,y (x, y) =
0 elsewhere

Find the LMSE estimate x̂LM SE of x, based on one observation of y.

Plot the estimate x̂LM SE computed above and the minimum MSE estimate
x̂M SE derived in Example 1.10, as functions of y (the realization of y).
Compute the expected values of both estimates and compare them with the
a priori mean E [x].

1.9. Let x and y be two random variables with joint pdf


 1 (x + y)e−y 0 ≤ x ≤ 4, y ≥ 0

fx,y (x, y) = 12
 0

elsewhere

Assume that an observation y of y is available.

a) Find the estimators x̂M SE and x̂LM SE of x, and plot them as functions
of the observation y.
38 CHAPTER 1. ESTIMATION THEORY

b) Compute the MSE of the estimators obtained in item a) [Hint: use

MATLAB to compute the integrals].

1.10. Let X be an unknown quantity and assume the following measurement

is available
1
y = ln +v
X
(
e−v v ≥ 0
where v is a random variable, whose pdf is given by fv (v) = .
0 v<0

a) Find the Maximum Likelihood estimator of X. Establish if it is biased

or not. Is it possible to find an unbiased estimator of X?

b) Assume that X is a random variable independent from v, whose a

priori pdf is given by
(
1 0≤x≤1
fX (x) = .
0 altrimenti

Find the MSE and LMSE estimators of X.

c) Plot the estimates obtained in items a) e b) as functions of y.

Bibliography

Rohatgi, V. K. and A. K. Md. E. Saleh (2001). An introduction to probability

and statistics. 2nd ed.. Wiley Interscience.

809 - Food - Production - MS 19-20
No ratings yet
809 - Food - Production - MS 19-20
10 pages
Introduction To The Lifting Scheme
No ratings yet
Introduction To The Lifting Scheme
15 pages
Classical Detection and Estimation Theory.
100% (1)
Classical Detection and Estimation Theory.
13 pages
2 Classical Detection and Estimation Theory
100% (1)
2 Classical Detection and Estimation Theory
62 pages
Digital Communications: Lab Manual
No ratings yet
Digital Communications: Lab Manual
77 pages
Lecture 53 DC 16 QAM and M Ary PSK
100% (1)
Lecture 53 DC 16 QAM and M Ary PSK
24 pages
809 Food Production SQP
100% (1)
809 Food Production SQP
8 pages
FSK Simulnik Lab Manual
No ratings yet
FSK Simulnik Lab Manual
5 pages
Robust ML Detection Algorithm For Mimo Receivers in Presence of Channel Estimation Error
100% (1)
Robust ML Detection Algorithm For Mimo Receivers in Presence of Channel Estimation Error
5 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Digital Communication Lab Manual
No ratings yet
Digital Communication Lab Manual
25 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
EC 8652 WC Unit 4 Lecture
No ratings yet
EC 8652 WC Unit 4 Lecture
126 pages
Digital Communication Laboratory Manual - Ec592
No ratings yet
Digital Communication Laboratory Manual - Ec592
57 pages
DC Lab Manual
100% (1)
DC Lab Manual
73 pages
Digital Communication Record
No ratings yet
Digital Communication Record
46 pages
809 Food Production SQP
No ratings yet
809 Food Production SQP
9 pages
Digital Signal Processing Question Bank 01
No ratings yet
Digital Signal Processing Question Bank 01
37 pages
Content Beyond Syllabus Unit 2
No ratings yet
Content Beyond Syllabus Unit 2
3 pages
Topic 1 - Vector Calculus
No ratings yet
Topic 1 - Vector Calculus
26 pages
04 - Signals & Systems All Chapters PDF
0% (1)
04 - Signals & Systems All Chapters PDF
58 pages
Mumbai University DTSP EXTC Viva Questions Answers
No ratings yet
Mumbai University DTSP EXTC Viva Questions Answers
36 pages
Digital Communication (EC 691) Lab Manual Protected PDF
No ratings yet
Digital Communication (EC 691) Lab Manual Protected PDF
62 pages
Information Theory Entropy Relative Entropy
No ratings yet
Information Theory Entropy Relative Entropy
60 pages
Experiment No. 1: Objective: Write A MATLAB Program To Generate An Exponential Sequence X (N) (A)
No ratings yet
Experiment No. 1: Objective: Write A MATLAB Program To Generate An Exponential Sequence X (N) (A)
53 pages
2.P&S Lesson Plan
No ratings yet
2.P&S Lesson Plan
10 pages
Feedback and Control Systems: Activity No. 2 - Time Response of Dynamic Systems
No ratings yet
Feedback and Control Systems: Activity No. 2 - Time Response of Dynamic Systems
15 pages
Ad3351 Daa Important Questions
No ratings yet
Ad3351 Daa Important Questions
94 pages
Design A 4×1 Multiplexer Using Pass Transistor Logic in Schematic and Simulate For Transient Characteristics.
100% (1)
Design A 4×1 Multiplexer Using Pass Transistor Logic in Schematic and Simulate For Transient Characteristics.
6 pages
ADSP Lab Manual
No ratings yet
ADSP Lab Manual
33 pages
Discrete Maths Questions
No ratings yet
Discrete Maths Questions
111 pages
CEC332 Adsp
No ratings yet
CEC332 Adsp
1 page
Wide-Sense Stationary Process
No ratings yet
Wide-Sense Stationary Process
8 pages
Detec%on and Es%ma%on Theory: Class Notes Ell 719
No ratings yet
Detec%on and Es%ma%on Theory: Class Notes Ell 719
29 pages
Estimation Theory Presentation
100% (2)
Estimation Theory Presentation
66 pages
Discrete Mathematical Structures Min
No ratings yet
Discrete Mathematical Structures Min
192 pages
Ec3354-Signals and Systems-858759966-Ss QB - 2023-06-10T151453.626
No ratings yet
Ec3354-Signals and Systems-858759966-Ss QB - 2023-06-10T151453.626
41 pages
Lehmann-Scheff e Theorem: Proof
No ratings yet
Lehmann-Scheff e Theorem: Proof
7 pages
Rayleigh Distributions of A Random Variable
No ratings yet
Rayleigh Distributions of A Random Variable
5 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages
Refresher Probabilities Statistics PDF
No ratings yet
Refresher Probabilities Statistics PDF
3 pages
Chapter-1, DFT and FFT, Z-Transform
No ratings yet
Chapter-1, DFT and FFT, Z-Transform
64 pages
Unit 5
No ratings yet
Unit 5
95 pages
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
PTSP Notes Unit 3 PDF
No ratings yet
PTSP Notes Unit 3 PDF
11 pages
BM2406 Digital Image Processing Lab Manual
No ratings yet
BM2406 Digital Image Processing Lab Manual
24 pages
UNIT - 4 - Algebraic Structures & Morphisms
No ratings yet
UNIT - 4 - Algebraic Structures & Morphisms
30 pages
Ss Important Questions
No ratings yet
Ss Important Questions
21 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
Assignment-1 - Numerical NPTEL
No ratings yet
Assignment-1 - Numerical NPTEL
4 pages
Z - Transforms E4
No ratings yet
Z - Transforms E4
21 pages
26 Matrices
No ratings yet
26 Matrices
26 pages
Question Bank Smta1101
No ratings yet
Question Bank Smta1101
3 pages
Two-Way Classification (With One Observation Per Cell) :: y I P Q
No ratings yet
Two-Way Classification (With One Observation Per Cell) :: y I P Q
7 pages
Statistics Tutorial: Basic Probability: Probability of A Sample Point
No ratings yet
Statistics Tutorial: Basic Probability: Probability of A Sample Point
48 pages
1 Preliminaries: 1.1 Motivation
No ratings yet
1 Preliminaries: 1.1 Motivation
7 pages
Topic 2a Theory of Estimation
No ratings yet
Topic 2a Theory of Estimation
12 pages
Teorema Central Del Limite
No ratings yet
Teorema Central Del Limite
9 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
Ms BPR 1259
No ratings yet
Ms BPR 1259
21 pages
1974 - Lieblein - Efficient Methods of Extreme-Value Methodology
No ratings yet
1974 - Lieblein - Efficient Methods of Extreme-Value Methodology
36 pages
Statistical Signal Processing of Complex Valued Data The Theory of Improper and Noncircular Signals 1st Edition Peter J. Schreier - The full ebook version is available, download now to explore
100% (1)
Statistical Signal Processing of Complex Valued Data The Theory of Improper and Noncircular Signals 1st Edition Peter J. Schreier - The full ebook version is available, download now to explore
47 pages
Iit Jam Mathematical Statistics Question Paper 2015
No ratings yet
Iit Jam Mathematical Statistics Question Paper 2015
18 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
Fluorophore Localization Algorithms For Super-Resolution Micros
No ratings yet
Fluorophore Localization Algorithms For Super-Resolution Micros
13 pages
VAN TREES I Detection Estimation and Modulation Theory Part I.0002
No ratings yet
VAN TREES I Detection Estimation and Modulation Theory Part I.0002
1 page
(Advanced Texts in Physics) Philippe Réfrégier (Auth.) - Noise Theory and Application To Physics - From Fluctuations To Information-Springer-Verlag New York (2004)
No ratings yet
(Advanced Texts in Physics) Philippe Réfrégier (Auth.) - Noise Theory and Application To Physics - From Fluctuations To Information-Springer-Verlag New York (2004)
293 pages
Direction of Arrival Estimation of Reflections From Room Impulse Responses Using A Spherical Microphone Array
No ratings yet
Direction of Arrival Estimation of Reflections From Room Impulse Responses Using A Spherical Microphone Array
13 pages
Model Exit Exam Questions
No ratings yet
Model Exit Exam Questions
12 pages
JAM 2021 Mathematical Statistics - Ms
No ratings yet
JAM 2021 Mathematical Statistics - Ms
21 pages
Point Estimation: Institute of Technology of Cambodia
No ratings yet
Point Estimation: Institute of Technology of Cambodia
22 pages
Estimation Theory - Wikipedia
No ratings yet
Estimation Theory - Wikipedia
9 pages
FIN-403 Final Exam Sample Questions
No ratings yet
FIN-403 Final Exam Sample Questions
6 pages
N Unbiased
No ratings yet
N Unbiased
15 pages
Notes Estimation Theory
100% (3)
Notes Estimation Theory
39 pages
Bernal 2014 First Mode Damping
No ratings yet
Bernal 2014 First Mode Damping
32 pages
Machine learning: A Bayesian and optimization perspective 2nd Edition Theodoridis S - eBook PDF all chapter instant download
100% (4)
Machine learning: A Bayesian and optimization perspective 2nd Edition Theodoridis S - eBook PDF all chapter instant download
69 pages
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
No ratings yet
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
75 pages
Exercise 2 Question + MATLAB
No ratings yet
Exercise 2 Question + MATLAB
2 pages
Ho 1993
No ratings yet
Ho 1993
12 pages
MTL390 L0 Introduction
No ratings yet
MTL390 L0 Introduction
12 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Mathematics For Economics and Finance: Answer Key To Final Exam
No ratings yet
Mathematics For Economics and Finance: Answer Key To Final Exam
14 pages
Cia Test Two Mathematical Stats Question Bank
No ratings yet
Cia Test Two Mathematical Stats Question Bank
10 pages
Instant Access to Modern Signal Processing 1st Edition Xian-Da Zhang ebook Full Chapters
No ratings yet
Instant Access to Modern Signal Processing 1st Edition Xian-Da Zhang ebook Full Chapters
67 pages
TMA From Bearings and Multipath Time Delays
No ratings yet
TMA From Bearings and Multipath Time Delays
12 pages
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
No ratings yet
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
7 pages
1976 Savage
No ratings yet
1976 Savage
60 pages
1 Lec0
No ratings yet
1 Lec0
10 pages