LN Estimation Theory
LN Estimation Theory
Theory
Part 3
This lecture note is based on large number of Text books on optimal signal processing and is
suitable for Grad/Post grad students . It should be read in conjunction with class room
discussion.
ESTIMATION THEORY : LN -3
A signal Detection problem is centered around , where the receiver receives a noisy version of a signal
and decides which hypothesis is true among the many possible hypotheses. In the binary case, the
receiver had to decide between the null hypothesis H0 and the alternate hypothesis H1. Suppose the
receiver has received or made the true hypothesis , but still some parameter of signal is not known,
e.g amplitude , phase , type of disturbance etc , then the Estimation theory comes in which helps in
estimating such parameters optimally based on finite data samples. The parameter to be estimated
may be random or nonrandom. The estimation of random parameters is known as the Bayes’
estimation, while the estimation of nonrandom parameters is referred to as the maximum likelihood
estimation (MLE).
Conversely, detection theory often requires estimation of unknown parameters: Signal presence is
assumed, parameter estimates are incorporated into the detection statistic, and consistency of
observations and assumptions tested. Consequently, detection and estimation theory form a symbiotic
relationship, each requiring the other to yield high-quality signal processing algorithms.
Detection is science while estimation is an art. Understanding the problem , in terms of which error
criteria to choose , and innovative algorithms are key to estimation procedure. 1 Terminology in
Estimation Theory
The parameter estimation problem is to determine from a set of L observations, represented by the L-
dimensional vector r, the values of parameters denoted by the vector θ. We write the estimate of this
parameter vector as θ^ (r), where the "hat" denotes the estimate, and the functional dependence on r
explicitly denotes the dependence of the estimate on the observations. This dependence is always
present, but we frequently denote the estimate compactly as θ^. Because of the probabilistic nature of
the problems , a parameter estimate is itself a random vector, having its own statistical characteristics.
The estimation error є (r) equals the estimate minus the actual parameter value: є (r) =θ ^ (r)-θ . It too
is a random quantity and is often used in the criterion function. For example, the mean-squared error is
given by E [є єT ] ; the minimum mean-squared error estimate would minimize this quantity. The mean
squared error matrix is E [є єT ]; on the main diagonal, its entries are the mean-squared estimation
errors for each component of the parameter vector, whereas the off diagonal terms express the
correlation between the errors. The mean-squared estimation error E [є єT ] equals the trace of the
mean-squared error matrix : tr ( E [є єT ])
Bias
An estimate is said to be unbiased if the expected value of the estimate equals the true value of the
When we have a biased estimate, the bias usually depends on the number of observations L. An
estimate is said to be asymptotically unbiased if the bias tends to zero for large L: ( ) . An
estimate's variance equals the mean-squared estimation error only if the estimate is unbiased.
An unbiased estimate has a probability distribution where the mean equals the actual value of the
parameter. Should the lack of bias be considered a desirable property? If many unbiased estimates are
computed from statistically independent sets of observations having the same parameter value, the
average of these estimates will be close to this value. This property does not mean that the estimate has
less error than a biased one; there exist biased estimates whose mean-squared errors are smaller than
unbiased ones. In such cases, the biased estimate is usually asymptotically unbiased. Lack of bias is
good, but that is just one aspect of how we evaluate estimators.
Consistency
We term an estimate consistent if the mean-squared estimation error tends to zero as the number of
observations becomes large: ( ) = 0. Thus, a consistent estimate must be at least
asymptotically unbiased. Unbiased estimates do exist whose errors never diminish as more data are
collected: Their variances remain nonzero no matter how much data are available. Inconsistent
estimates may provide reasonable estimates when the amount of data is limited, but have the
counterintuitive property that the quality of the estimate does not improve as the number of
observations increases. Although appropriate in the proper circumstances (smaller mean-squared error
than a consistent estimate over a pertinent range of values of L, consistent estimates are usually favored
in practice.
Efficiency
An estimators can be derived in a variety of ways, their error characteristics must always be analyzed
and compared. In practice, many problems and the estimators derived for them are sufficiently
implicated to render analytic studies of the errors di-cult, if not impossible. Instead, numerical
simulation and comparison with lower bounds on the estimation error are frequently used instead to
assess the estimator performance.
An efficient estimate has a mean-squared error that equals a particular lower bound: the Cramér-Rao
bound. If an efficient estimate exists (the Cramér-Rao bound is the greatest lower bound), it is optimum
in the mean-squared sense: No other estimate has a smaller mean-squared error .
For many problems no efficient estimate exists. In such cases, the Cramér-Rao bound remains a lower
bound, but its value is smaller than that achievable by any estimator. How much smaller is usually not
known. However, practitioners frequently use the Cramér-Rao bound in comparisons with numerical
error calculations. Another issue is the choice of mean-squared error as the estimation criterion; it may
not suffice to pointedly assess estimator performance in a particular problem. Nevertheless, every
Since the estimator ̂ is a random variable and may assume more than one value some characteristics
of a “good” estimate need to be determined.
Biased estimate : when the estimate is E[ ̂] = θ+ B( θ) where B is the bias function . If this does not
depend on , we say it is a known Bias. In case it depends on then we say it as Unknown Bias.
In case of Unbiased estimator , the true estimate is approximated by an average value. However it
may not be the best estimate, since variance could be large enough. Hence the second parameter to
consider in Unbiased estimate is to check if the variance is small.
[| ̂ | ]
[ ̂]
[ ̂]
The maximum likelihood estimate θ˄ML (r) of a nonrandom parameter is, simply, that value which
maximizes the likelihood function (the a priori density of the observations). Assuming that the maximum
can be found by evaluating a derivative, θ˄ML (r) is defined by
( ( | ) | evaluated at θ= θ˄ML
Example 1
Let r (l) be a sequence of independent, identically distributed Gaussian random variables having an
unknown mean θ but a known variance σn2 . Often, we cannot assign a probability density to a
parameter of a random variable's density; we simply do not know what the parameter's value is.
Maximum likelihood estimates are often used in such problems. In the specific case here, the derivative
of the logarithm of the likelihood function equals
The expected value of this estimate E[ θ^ML| θ ] equals the actual value θ, showing that the maximum
likelihood estimate is unbiased. The mean-squared error equals σn2/ L and we infer that this estimate is
consistent.
Parameter Vectors
The maximum likelihood procedure , or any other procedure , can be easily generalized to situations
where more than one parameter must be estimated. Letting denote the parameter vector, the
likelihood function is now expressed as p (r|θ). The maximum likelihood estimate θ^ML of the
parameter vector is given by the location of the maximum of the likelihood function (or equivalently of
its logarithm).
where Λθ denotes the gradient with respect to the parameter vector. This equation means that we
must estimate all of the parameter simultaneously by setting the partial of the likelihood function with
respect to each parameter to zero. Given P parameters, we must solve in most cases a set of P
nonlinear, simultaneous equations to find the maximum likelihood estimates.
In those cases in which the expected value of the a posteriori density cannot be computed, a related but
simpler estimate, the maximum a posteriori (MAP) estimate, can usually be evaluated. The estimate θ^
map (r) equals the location of the maximum of the a posteriori density. Assuming that this maximum can
be found by evaluating the derivative of the a posteriori density, the MAP estimate is the solution of the
equation
The only quantity required to compute MAP are the likelihood function and a priori density of the
parameter.
Section II
Suppose we have data function F an where X is unknown and observation Y and we assume a linear
relation between them such as
Y = FX + e where e is residual error or some noise about which we are not sure.
Minimize mean square error e subject to relation Y = FX+e in such a way that Y becomes appx equal
to FX.
or Z ( x) = min || FX - Y||2 = |FX-Y| | Fx _y|T { we are assuming F & X as multiple value function }
= FX F T XT - FXYT – Y FTXT + Y YT
To minimize this we take fiorst derivative with respect to X and equate it to zero.
i.e X^ = ( FT F )-1 FT Y
Since F would be a full rank matrix hence it is invertible and solution can be found out .
L-1 Y = L-1 FX + e
Is it Unbiased Estimator ?
For an estimator to be unbiased the mean of estimated value should be equal to mean of data value.
Now we take Expectation of this equation [ since to expectation of X would give mean of a random
variable }
E(X ) = E ( X) + E [( FT F )-1 FT e ] = X
Assuming we have e as a random variable with Zero mean and say σ variance. The n E ( e) = 0
What is the variance in case e ~ ( 0 , I) means zero mean and unity variance ?
E[ ( X^ – X ) ( X^ – X )T ]
In the previous example we assumed that parameter F is known we estimated the X. sometimes we
need to estimate the parameter itself based on some observation . This problem formulation in Linear
manner could be expressed as : -
Y^ = Ax+ B and here we have to find A and B such that error in estimation is minimum i.e
∂e/∂b = -2 E [ ( y – AX-B) ] = 0
On an average an Estimator generally yields the true value of unknown parameter i.e
Unbiased estimators E * Ŝ + = ∫ ( ) ( )
Bias of estimator E * Ŝ + - S = b ( s)
The m S e of estimator is E * Ŝ – S ] 2 which after expansion and taking Expectation operation we get
This shows that he error is composed of error due to variance and bias.
Any estimator depending on bias will be unrealizable. Hence we set the bias to zero and seek for
MVUE.
Correlation cancelers
Assume vectors x and y of dimensions N and M, having Zero mean respectively. We assume
that the both signals are correlated with each other I.e.
Rxy = E[xyT ] ≠ 0,
where H (NxM) Vector is to be chosen in such a manner that e and Y are no longer
correlated.
Rey = E[eyT ] = 0
The vector
ˆx = Hy = R XY RYY -1 y = E[xyT] E[yyT] -1 y )
obtained by linearly processing the vector y by the matrix H. H matrix is called the linear
regression, or orthogonal projection, of x on the vector y. In a sense x also represents the best
“copy,” or estimate, of x that can be made on the basis of the vector y.
Thus, the vector e = x−Hy = x−ˆx may be thought of as an estimation error . Also ˆx = Hy not as
an estimate of x but rather as an estimate of that part of x which is correlated with y.
Inference 1.
If x and y are jointly Gaussian, show that the linear estimate ˆx = Hy is also
the conditional mean E[x|y] of the vector X given Y .
Proof
we know that Under a linear transformation, a Gaussian random vector remains Gaussian. also
if they are uncorrelated and jointly Gaussian, then they are also independent of each other.
The transformation from the jointly Gaussian pair (x, y) to the uncorrelated pair (e, y) is
linear: i.e.
The conditional mean E[x|y] is the best unrestricted (i.e., not necessarily linear) estimate of x in
the mean-square sense.
Inference 2.
For A random vector x with mean m and covariance Σ, the best choice of a deterministic vector
ˆx which minimizes the quantity Ree = E[eeT]= minimum , where e = x − ˆx, is the mean m itself,
that is, ˆx = m.
Ree = E[eeT]= E [ (x −m−Δ)(x −m−Δ)T ] = = E [(x −m)(x −m)T] −ΔE[xT –mT]−E[x −m]Δ + Δ ΔT
= Σ +ΔΔT
Since E[x−m]= E[x]- m = 0
Since the matrix Δ ΔT is non negative definite, it follows that Ree, will be minimized when Δ = 0,
and in this case the minimum value will be min R ee = Σ.
Appx
Cramer Rao Lower Bound ( Ref Class room discussion for details )
( )
( )
[ ]
( )
( )( ( ) )