1.1 Parametric and Nonparametric Statistical Inference
1.1 Parametric and Nonparametric Statistical Inference
Introduction
Statisticians want to learn as much as possible from a limited amount of data. A first
step is to set up an appropriate mathematical model for the process which generated the
data. Such a model has a probabilistic nature.
1
2 CHAPTER 1. INTRODUCTION
Examples
1. An experiment has only two possible outcomes : S and F (Success and Failure).
Let X be the random variable
0 . . . if F occurs
X =
1 . . . if S occurs
X N (; 2 ) (normal)
Statistical inference deals with methods of using the outcomes to obtain information on
the true distribution function (or the true parameter) underlying the experiment.
1.3. RANDOM SAMPLE 3
Approaches to statistical inference There are two broad approaches to formal sta-
tistical inference. Differences relate to interpretation of probability and objectives of sta-
tistical inference.
(ii) Bayesian approach. The unknown parameter treated as random variable. The
key in this method is that one has to specify a prior distribution about before
the data analysis. The specification is objective or subjective. Inference is a for-
malization of how the prior changes to the posterior in the light of data via Bayes
formula.
In this material we will mainly focus on the frequentist approach. We will study methods
for obtaining estimation and testing procedures which satisfy certain optimality criteria.
The two most important topics in statistical inference are estimation and hypothesis
testing.
Hypothesis testing. The observations are used to conclude whether or not the
true distribution belongs to some smaller family of distribution functions F0 F.
In the parametric case, the statistician infers whether or not the true parameter
belongs to a subset 0 .
A very common situation is the following : the statistician has available a number of
outcomes
x1 , x2 , . . . , xn
X1 , X2 , . . . , Xn
i.i.d.
X1 , X2 , . . . , Xn X
1.4 Statistics
Mostly the statistician does not use the observations x1 , x2 , . . . xn as such, but he tries to
condense them in some known function (not depending on any unknown parameters) such
as
t(x1 , x2 , . . . xn )
If the function t is such that t(X1 , X2 , . . . Xn ) is a random variable, then
Tn = t(X1 , X2 , . . . Xn )
is called a statistic.
where, for j = 1, . . . k :
Tnj = tj (X1 , . . . Xn )
is a (1-dimensional) statistic.
n
1X
X= Xi
n
i=1
n
1X
S2 = (Xi X)2
n
i=1
1.5. DISTRIBUTION THEORY FOR SAMPLES FROM A NORMAL POPULATION5
It will be important to calculate, for a given statistic Tn , characteristics such as: E(Tn ), V ar(Tn ), . . .
or the distribution function P (Tn x), . . . In the next section, we consider the distribu-
tion theory for X and S 2 in the important case where the sample comes from a normally
distributed population random variable X.
In this section we give distribution theory for two important statistics X and S 2 in sam-
pling from a normal population : i.e.
i.i.d
X1 , . . . , Xn X N (; 2 )
The reason is that these results play a crucial role in the whole theory of statistics. This
is because many populations are normal or can be well approximated by a normal. We
restrict attention to the two statistics X and S 2 . These will turn out to be the only two
of interest in sampling from a normal population (X and S 2 are sufficient statistics).
Theorem 1
Proof
We prefer to give a proof only in the very special case n = 2. In this case
1
X = (X1 + X2 )
2
and
1 1
S 2 = [(X1 X)2 + (X2 X)2 ] = (X1 X2 )2 .
2 4
From this joint characteristic function we seethat (Y1 , Y2 )is 2-variate normal with mean
2
2 0
vector (0, 0) and variance-covariance matrix
.
0 2 2
Hence Cov (Y1 , Y2 ) = 0 and (because of normality) this is equivalent to independence of
Y1 and Y2 .
Theorem 2
If X N (; 2 ), then
2
(a) X N (; )
n
nS 2
(b) 2 (n 1)
2
Proof
n
1X
(a) Since X = Xi is a linear combination of X1 , . . . , Xn , we have
n
i=1
n n
X 1 X 1 2 2
X N( ; ) = N (; )
n n2 n
i=1 i=1
(b)
n
X n
X
2 2
nS = (Xi X) = (Xi + X)2
i=1 i=1
n
X n
X
2
= (Xi ) 2(X ) (Xi ) + n(X )2
i=1 i=1
n
X
= (Xi )2 n(X )2
i=1
1.5. DISTRIBUTION THEORY FOR SAMPLES FROM A NORMAL POPULATION7
2
n 2 X
nS 2 X
= Xi q
2 2
i=1 n
or
2
X n 2
nS 2 X
Xi
+ q =
2 2
n i=1
U V W
W (t) = U +V (t)
= U (t).V (t) ,since U and V are independent (Th.1).
Hence :
W (t)
U (t) =
V (t)
(1 2it)n/2
= , since W 2 (n) and V 2 (1)
(1 2it)1/2
n1
= (1 2it) 2
nS 2
By the uniqueness theorem U = 2 (n 1).
2
Theorem 3
X
If X N (; 2 ),then r t(n 1)
S2
n1
Proof
q
X
X 2 /n X nS 2
r =s t(n 1), since p N (0; 1); 2 2 (n 1) ; and
S2 nS 2 2 /n
2
n1
n1
8 CHAPTER 1. INTRODUCTION
X nS 2
p and 2 are independent (Th. 1).
2 /n