Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
150 views

Chapter6 Sampling Regression Method Estimation PDF

The document discusses the regression method of estimation. It begins by explaining that regression estimation is more appropriate than ratio estimation when the regression of the study variable Y on the auxiliary variable X is linear but does not necessarily pass through the origin. It then defines the regression estimator as Yˆreg=y+β(X-x), where β is the regression coefficient of Y on X. It shows that this estimator is unbiased when β is known and its variance is always less than or equal to the variance of simple random sampling, making it more efficient. Finally, it considers the case when β is estimated from the sample. It approximates the expressions for the mean and variance of the resulting regression estimator Yˆ

Uploaded by

Ay Feia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Chapter6 Sampling Regression Method Estimation PDF

The document discusses the regression method of estimation. It begins by explaining that regression estimation is more appropriate than ratio estimation when the regression of the study variable Y on the auxiliary variable X is linear but does not necessarily pass through the origin. It then defines the regression estimator as Yˆreg=y+β(X-x), where β is the regression coefficient of Y on X. It shows that this estimator is unbiased when β is known and its variance is always less than or equal to the variance of simple random sampling, making it more efficient. Finally, it considers the case when β is estimated from the sample. It approximates the expressions for the mean and variance of the resulting regression estimator Yˆ

Uploaded by

Ay Feia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter 6

Regression Method of Estimation


The ratio method of estimation uses the auxiliary information which is correlated with the study
variable to improve the precision which results in the improved estimators when the regression of Y on
X is linear and passes through the origin. When the regression of Y on X is linear, it is not necessary that
the line should always pass through the origin. Under such conditions, it is more appropriate to use the
regression type estimator to estimate the population means.

In ratio method, the conventional estimator sample mean y was improved by multiplying it by a factor

X
where x is an unbiased estimator of the population mean X which is chosen as the population
x
mean of auxiliary variable. Now we consider another idea based on difference.

Consider an estimator x of X for which E ( x  X )  0.

Consider an improved estimator of Y as

Yˆ *  y   ( x  X )

which is an unbiased estimator of Y and  is any constant. Now find  such that the Var (Yˆ * ) is
minimum

Var (Yˆ *)  Var ( y )   2 Var ( x )  2  Cov( x , y )


Var (Y * )
0

Cov( x , y )

Var ( x )
N n
S XY
  Nn
N n 2
SX
Nn
S
  XY2
SX
1 N 1 N
where S XY   i
N  1 i 1
( X  X )(Yi  Y ), S 2
X   ( X i  X ).
N  1 i 1

Consider a linear regression model y  x  e where y is the dependent variable, x is the independent
variable and e is the random error component which takes care of the difference arising due to lack of
exact relationship between x and y.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 1
Note that the value of regression coefficient  in a linear regression model y  x  e of y on x
n
Cov( x, y ) S xy
obtained by minimizing e
i 1
2
i based on n data sets ( xi , yi ), i  1, 2,.., n is  
Var ( x)
 2 . Thus the
Sx
optimum value of  is same as the regression coefficient of y on x with a negative sign, i.e.,
   .

So the estimator Yˆ * with optimum value of  is

Yˆreg  y   ( X  x )

which is the regression estimator of Y and the procedure of estimation is called as the regression
method of estimation.
ˆ
The variance of Yreg is

Var (Yˆreg )  V ( y )[1   2 ( x , y )]

ˆ
where  ( x , y ) is the correlation coefficient between x and y . So Yreg would be efficient if x and y are

ˆ
highly correlated. The estimator Yreg is more efficient than Y if  ( x , y )  0 which generally holds.

Regression estimates with preassigned  :


If value of  is known as  0 (say), then the regression estimator is

Yˆreg  y   0 ( X  x ) .

Bias of Yˆreg :
Now, assuming that the random sample ( xi , yi ), i  1, 2,.., n is drawn by SRSWOR,

E (Yˆreg )  E ( y )   0  X  E ( x ) 
 Y   0  X  X 
Y
ˆ
Thus Yreg is an unbiased estimator of Y when  is known.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 2
Variance of Yˆreg
2
Var (Yˆreg )  E Yˆreg  E (Yˆreg ) 
 
2
 E  y   0 ( X  x )  Y 
2
 E ( y  Y )   0 ( x  X ) 
 E ( y  Y ) 2   02 ( x  X ) 2  2  0 E ( x  X )( y  Y ) 
 Var ( y )   02Var ( x )  2  0Cov( x , y )
f
  SY2   02 S X2  2  0 S XY 
n
f
  SY2   02 S X2  2  0  S X SY 
n
where
N n
f 
N
1 N
S X2  
N  1 i 1
( X i  X )2

1 N
SY2  
N  1 i 1
(Yi  Y ) 2

 : Correlation coefficient between X and Y .

Comparing Var (Yˆreg ) with Var ( y ) , we note that

Var (Yˆreg )  Var ( y )

if  02 S X2  2  0 S XY  0

 2S 
or  0 S X2   0  2XY   0
 SX 

which is possible when


 2S  2S
either  0  0 and   0  2XY   0  2XY   0  0 .
 SX  SX

 2S  2S
or  0  0 and   0  2XY )   0  0   0  2XY .
 SX  SX

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 3
Optimal value of 

Choose  such that Var (Yˆreg ) is minimum.

So

Var (Yˆreg ) 
  SY2   2 S X2  2 S X SY   0
 
S S
    Y  XY2 .
SX SX

ˆ S
The minimum value of the variance of Yreg with optimum value of  opt  Y is
SX

f  S2 
Varmin (Yˆreg )   SY2   2 Y2 S X2  2  Y  S X SY 
S
n SX SX 
f
 SY2 (1   2 ).
n

Since 1    1, so

Var (Yˆreg )  VarSRS ( y )

which always holds true. So the regression estimator is always better than the sample mean under
SRSWOR.

Departure from  :
If  0 is the preassigned value of regression coefficient, then

Varmin (Yˆreg )   SY2   02 S X2  2 0  S X SY 


f
n
f
  SY2   02 S X2  2  0 S X SY   2 SY2   2 SY2 
n
f
 (1   2 ) SY2   02 S X2  2 0 S X2  opt   opt
2
S X2 
n
f
 (1   2 ) SY2  (  0   opt ) 2 S X2 
n
 SY
where  opt  .
SX

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 4
Estimate of variance
An unbiased sample estimate of Var (Yˆreg ) is
n
Var (Yˆreg ) 
f
 ( yi  y )  0 ( xi  x )
2

n(n  1) i 1
f
 ( s y2   02 sx2  2  0 sxy ).
n
ˆ
Note that the variance of Yreg increases as the difference between  0 and  opt increases.

Regression estimates when  is computed from the sample


Suppose a random sample of size n on paired observations on ( xi , yi ), i  1, 2,.., n is drawn by SRSWOR.

When  is unknown, it is estimated as

s
ˆ  xy2
sx

and then the regression estimator of Y is given by

Yˆreeg  y  ˆ ( X  x ).

ˆ
It is difficult to find the exact expressions of E (Yreg ) and Var (Yreg ). So we approximate them using the

same methodology as in the case of the ratio method of estimation.


Let
y Y
0   y  Y (1   0 )
Y
xX
1   x  X (1  1 )
x
s  S XY
 2  xy  sxy  S XY (1   2 )
S XY
sx2  S X2
3  2
 sx2  S X2 (1   3 )
SX

Then
E ( 0 )  0, E (1 )  0,
E ( 2 )  0, E ( 3 )  0,
f 2
E ( 02 )  CY ,
n
f
E (12 )  C X2 ,
n
f
E ( 0 1 )   C X CY
n

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 5
and
sxy
Yreg  y  (X  x)
sx2
S XY (1   2 )
 Y (1   0 )  (1 X ).
S x2 (1   3 )

The estimation error of Yˆreg is

(Yˆreg  Y )  Y  0   X 1 (1   2 )(1   3 ) 1

S XY
where   is the population regression coefficient.
S X2

Assuming  3  1,

(Yˆreg  Y )  Y  0   X (1  1 2 )(1   3   32  ....)

Retaining the terms up to second power of  's and ignoring other terms, we have

(Yˆreg  Y ) Y  0   X (1  1 2 )(1   3   32 )


Y  0   X (1  1 3  1 2 )

Bias of Yˆreg
ˆ
Now the bias of Yreg up to the second order of approximation is

E (Yˆreg  Y ) E Y  0   X (1  1 2 )(1   3   32 ) 


 Xf  21  
   302 
n  XS XY XS X 

N n
where f  and (r , s)th cross-product moment is given by
N
rs  E ( x  X ) r ( y  Y ) s 

So that

21  E ( x  X )2 ( y  Y ) 
30  E ( x  X )3  .

Thus
f  21 30 
E (Yˆreg )     2 .
n  S XY S X 

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 6
Also,

E (Yˆreg )  E ( y )  E[ ˆ ( X  x )]
 Y  XE ( ˆ )  E ( ˆ x )
 Y  E ( x ) E ( ˆ )  E ( ˆ x )
 Y  Cov( ˆ , x )
Bias(Yˆreg )  E (Yˆreg )  Y  Cov( ˆ , x )

ˆ
MSE of Yreg

ˆ
To obtain the MSE of Yreg , consider

E (Yˆreg  Y ) 2  E  0Y   X (1  1 3  1 2 ) 


2

Retaining the terms of  's up to the second power second and ignoring others, we have

E (Yˆreg  Y ) 2  E  02Y 2   2 X 212  2  XY  0 1 


 Y 2 E ( 02 )   2 X 2 E (12 )  2  XYE ( 01 )
 2 SY2
f 2 SX
2
S S 
  Y2
Y   2
X 2
 2  XY  X Y 
 n X XY 
MSE (Yˆreg )  E (Yˆreg  Y ) 2
f 2
( SY   2 S X2  2  S X SY )

n
S S
Since   XY2   Y ,
SX SX
ˆ
so substituting it in MSE (Yreg ), we get

MSE (Yˆreg)  SY2 (1   2 ).


f
n
So up to the second order of approximation, the regression estimator is better than the conventional
sample mean estimator under SRSWOR. This is because the regression estimator uses some extra
information also. Moreover, such extra information requires some extra cost also. This shows a false
superiority in some sense. So the regression estimators and SRS estimates can be combined if the cost
aspect is also taken into consideration.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 7
Comparison of Yˆreg with ratio estimate and SRS sample mean estimate

MSE (Yˆreg )  SY2 (1   2 )


f
n
MSE (YˆR )  ( SY2  R 2 S X2  2  RS X SY )
f
n
f
VarSRS ( y )  SY2 .
n
ˆ ˆ
(i) As MSE (Yreg )  VarSRS ( y )(1   2 ) and because  2  1, so Yreg is always superior to y .

(ii ) Yˆreg is better than YˆR if MSE (Yˆreg )  MSE (YˆR )


f 2 f
or if SY (1   2 )  ( SY2  R 2 S X2  2  RS X SY )
n n
or if ( RS X   SY )  0
2

which always holds true.

So regression estimate is always superior to the ratio estimate upto the second order of
approximation.

Regression estimates in stratified sampling


Under the set up of stratified sampling, let the population of N sampling units be divided into k
k
strata. The strata sizes are N1 , N2 ,.., Nk such that N
i 1
i  N . A sample of size ni on

( xij , yij ), j  1, 2,.., ni , is drawn from ith strata (i = 1,2,..,k) by SRSWOR where xij and yij denote the

jth unit from ith strata on auxiliary and study variables, respectively.

In order to estimate the population mean, there are two approaches.

1. Separate regression estimator


 Estimate regression estimator

Yˆreg  y   0 ( X  x )

from each stratum separately, i.e., the regression estimate in the ith stratum is

Yˆreg (i )  yi  i ( X i  xi ).

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 8

ˆ
Find the stratified mean as the weighted mean of Yreg (i ) i  1, 2,.., k as

k N Yˆ
ˆ
Ysreg   i reg (i )
i 1 N
k
  [ wi { yi  i ( X i  xi )}]
i 1

Sixy Ni
where i  2
, wi  .
S ix N
In this approach, the regression estimator is separately obtained in each of the strata and then
ˆ
combined using the philosophy of the stratified sample. So Ysreg is termed as separate regression

estimator,

2. Combined regression estimator


ˆ
Another strategy is to estimate x and y in the Yreg as respective stratified mean. Replacing x
k k
by xst   wi xi and y by yst   wi yi , we have
i 1 i 1

Yˆcreg  yst   ( X  xst ).

In this case, all the sample information is combined first and then implemented in regression
ˆ
estimator, so Yreg is termed as combined regression estimator.

Properties of separate and combined regression


ˆ ˆ
In order to derive the mean and variance of Ysreg and Ycreg , there are two cases

- when  is pre-assigned as  0

- when  is estimated from the sample.

sxy
We consider here the case that  is pre-assigned as  0 . Other case when  is estimated as ˆ  2 can
sx

be dealt with the same approach based on defining various  's and using the approximation theory as
ˆ
in the case of Yreg .

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 9
1. Separate regression estimator
Assume  is known, say  0 . Then
k
Yˆs reg   wi [ yi   0i ( X i  xi )]
i 1
k
E (Yˆs reg )   wi  E ( yi )   0i  X i  E ( xi )  
i 1
k
  wi [Yi  ( X i  X i )]
i 1

Y.
2
Var (Yˆs reg )  E Yˆs reg  E (Yˆs reg ) 
 
2
 k k

 E   wi yi i   wi  0i ( X i  xi )  Y 
 i 1 i 1 
2
 k k

 E   wi ( yi  Y )   wi  0i ( xi  X i ) 
 i 1 i 1 
k k k
  wi2 E ( yi  Yi ) 2   wi2  02i E ( xi  X i )]2  2 wi2  0i E ( xi  X i )( yi  Yi )
i 1 i 1 i 1
k k k
  wi2Var ( yi )   wi2  02iVar ( xi )  2 wi2  0i Cov( xi , yi )
i 1 i 1 i 1
k 2
w f 2
 ( SiY   02i SiX2  2  0i SiXY )]
i i

i 1 ni

Var (Yˆs reg ) is minimum when  0i  iXY


S
and so substituting 0i , we have
SiX2
k
 w2 f 
Vmin (Yˆs reg )    i i ( SiY2   02i SiX2 ) 
i 1  ni 
N i  ni
where f i  .
Ni
Since SRSWOR is followed in drawing the samples from each stratum, so
E ( six2 )  SiX2
E ( siy2 )  SiY2
E ( sixy )  SiXY
2 2
Thus an unbiased estimator of variance can be obtained by replacing SiX and SiY by their respective

unbiased estimators six2 and siy2 , respectively as


k
 w2 f 
Var (Yˆs reg )    i i ( siy2  oi2 six2  2 0i sixy ) 
i 1  ni 
and

ˆ
k
 wi2 fi 2 
Var min (Ys reg )    ( siy  oi2 six2 )  .
i 1  ni 
Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 10
2. Combined regression estimator:
Assume  is known as 0 . Then
k k
Yˆc reg   wi yi   0 ( X   wi xi )
i 1 i 1

 
k k
E Yˆc reg   wi E ( yi )   0 [ X   wi E ( xi )]
i 1 i 1
k k
  wiYi   0 [ X   wi X i ]
i 1 i 1

 Y  0 ( X  X )
Y.
ˆ
Thus Yc reg is an unbiased estimator of Y .

Var (Yˆc reg )  E[Yc reg  E (Yc reg )]2


k k
 E[ wi yi   0 ( X   wi xi )  Y ]2
i 1 i 1
k k
 E[ wi ( yi  Y )   0  wi ( xi  X i )]2
i 1 i 1
k k k
  wi2Var ( yi )   02  wi2Var ( xi )  2 wi2  0Cov( xi , yi )
i 1 i 1 i 1
k 2
w f
  SiY2   02 SiX2  2 0 SiXY .
i i

i 1 ni
Var (Yˆc reg ) is minimum when

Cov( xst , yst )


0 
Var ( xst )
wi2 fi
k


i 1 ni
SiXY
 k 2
wi fi 2
i 1 ni
SiX

and the minimum variance is given by


k 2
Varmin (Yˆc reg )   i i ( SiY2   02 SiX2 ).
w f
i 1 ni

Since SRSWOR is followed to draw the sample from strata, so using E  six2   Six2 , E  siy2   Siy2 and
E  sixy   SiXY , we get the estimate of variance as

k
 w2 f 
Var (Yˆc reg )    i i ( siy2  o2 six2  2 0i sixy ) 
i 1  ni 
and

ˆ
k
 wi2 fi 2 
Var min (Yc reg )    ( siy  oi2 six2 )  .
i 1  ni 
Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 11
Comparison of Yˆs reg and Yˆc reg :
ˆ
The variance of Ys reg is minimum when  0i   0 for all i.

ˆ
The variance of Yc reg is minimum when  0 
Cov( xst , yst )
  0* .
Var ( xst )

The minimum variance is Var (Yˆc reg )min  Var ( yst )(1  *2 ) where * 
Cov( xst , yst )
.
Var ( xst )Var ( yst )
k 2
Var (Yˆc reg )  Var (Yˆs reg )   (  02i   02 ) i i SiX2
w f
i 1 ni
k
Var (Yˆc reg ) min  Var (Yˆs reg ) 
fi
 (  0i   0 ) 2 wi2 SiX2
0 i  0
i 1 ni
0
which is always true.
So if the regression line of y on x is approximately linear and the regression coefficients do not vary
much among the strata, then separate regression estimate is more efficient than combined regression
estimator.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur Page 12

You might also like