Doe Rls Teoría

Linear Regression Analysis 5th 1
edition Montgomery, Peck & Vining

1.1 Regression and Model Building
• Regression analysis is a statistical

technique for investigating and modeling
the relationship between variables.
• Equation of a straight line (classical)
y = mx +b
we usually write this as
y = 0 +1x
• Not all observations will fall exactly on a

straight line.
y = 0 + 1x + 
where  represents error

- it is a random variable that accounts for the
failure of the model to fit the data exactly.
-  ~ N(0, 2)
Delivery time example

• Simple Linear Regression Model
where y – dependent (response) variable

x – independent (regressor/predictor) variable
0 - intercept
1 - slope
 - random error term
• The mean response at any value, x, of the

regressor variable is
• The variance of y at any given x is

Figure 1.3 Linear regression approximation of a
complex relationship.
Figure 1.4 Piecewise linear approximation of a
complex relationship.
Figure 1.5 The danger of extrapolation in regression.
Chapter 2
Simple Linear Regression
Linear Regression Analysis 5E 12

Montgomery, Peck & Vining
2.1 Simple Linear Regression Model
• Single regressor, x; response, y Population

regression model
y  0  1x  
• 0 – intercept: if x = 0 is in the range, then 0 is
the mean of the distribution of the response y,
when x = 0; if x = 0 is not in the range, then 0 has
no practical interpretation
• 1 – slope: change in the mean of the distribution
of the response produced by a unit change in x
•  - random error
2.1 Simple Linear Regression Model
• The response, y, is a random variable

• There is a probability distribution for y at
each value of x
– Mean:
Ey | x   0  1x
– Variance:
Vary | x   Var0  1x      2

2.2 Least-Squares Estimation of
the Parameters
• 0 and 1 are unknown and must be estimated
Sample
yi   0  1 xi   i , i  1, 2,..., n regression
model
• Least squares estimation seeks to minimize the

sum of squares of the differences between the
observed response, yi, and the straight line.
S (  0 , 1 )    i2  ( yi   0  1 xi ) 2
i i
the Parameters
• Let ˆ 0 , ˆ 1 represent the least squares
estimators of 0 and 1, respectively.
• These estimators must satisfy:
S
 2 ( yi  ˆ0  ˆ1 xi )  0
 0 ˆ0 , ˆ1 i
S
 2 ( yi  ˆ0  ˆ1 xi )xi  0
1 ˆ0 , ˆ1 i
the Parameters
• Simplifying yields the least squares normal
equations: n n
nˆ0  ˆ1  xi   yi
i 1 i 1
n n n
ˆ0  xi  ˆ1  xi2   yi xi
i 1 i 1 i 1

the Parameters
• Solving the normal equations yields the
ordinary least squares estimators:
ˆ 0  y  ˆ 1x
 n
y   n
 x 
 i   i
n
 i 1  i 1 
 yi x i 
ˆ 1  i 1 n
2
 x 
n
 i
n
 i 1 
 xi 
2
i 1 n
the Parameters
• The fitted simple linear regression model:
ŷ  ˆ 0  ˆ 1x
• Sum of Squares Notation:
2
 x 
n
 i
n
Sxx   x i 
2  i 1    x i  x 
n 2
i 1 n i 1
 n
y   
n
x 
 i  i
n
Sxy   y i x i   i 1   i 1    y i x i  x 
n
i 1 n i 1
the Parameters
• Then
 n
  n

 y i   x i 
n
  i 1   i 1 
 i iy x S
ˆ  i 1 n 
xy
1 2
 n
 Sxx
 xi 
n
 i 1 
 xi 
2
i 1 n

the Parameters
• Residuals: ei  y i  ŷ i
• Residuals will be used to determine the

adequacy of the model

2.2 Least-Squares Estimation of the Parameters
• Just because we can fit a linear model

doesn’t mean that we should
– How well does this equation fit the data?
– Is the model likely to be useful as a predictor?
– Are any of the basic assumptions (such as
constant variance and uncorrelated errors)
violated, if so, how serious is this?

2.2 Least-Squares Estimation of the Parameters
Computer Output (Minitab)

2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model
• The ordinary least-squares (OLS) estimator of the

slope is a linear combinations of the observations,
yi:
S n
ˆ1  xy
  ci yi
S xx i 1
Useful in showing expected
value and variance properties
where
n n
ci  ( xi  x ) / S xx ,  ci  0,  ci2  1
i 1 i 1

• The least-squares estimators are unbiased estimators of

their respective parameter:
 
E ˆ1  1  
E ˆ0   0
• The variances are
2  
   
2
1 x
Var ˆ1  ˆ
Var  0    
2

S xx  n S xx 
• The OLS estimators are Best Linear Unbiased Estimators

(BLUE)

• Useful properties of the least-squares fit

1.  y i  ŷ i    ei  0
n n
i 1 i 1
n n
2.  y i   ŷ i
i 1 i 1
3. The least-squares regression line always

passes through the centroid y , x  of the data.
n n
4.  x i ei  0 5.  ŷ i e i  0
i 1 i 1

Problemas

2.2.3 Estimation of 2
• Residual (error) sum of squares

n 2 n
SSRe s    yi  yi    ei
ˆ 2
i 1 i 1
n
  y  ny  ˆ1S xy
2
i
1i 14 2 43
SST    yi  y 
2
 SS  ˆ ST 1 xy

• Unbiased estimator of 2
SS Re s
ˆ 
2
 MSRe s
n2
• The quantity n – 2 is the number of

degrees of freedom for the residual sum of
squares.

• ̂ depends on the residual sum of squares.

2
Then:
– Any violation of the assumptions on the
model errors could damage the usefulness of
this estimate
– A misspecification of the model can damage
the usefulness of this estimate
– This estimate is model dependent

Problemas

2.3 Hypothesis Testing on the Slope
and Intercept
• Three assumptions needed to apply
procedures such as hypothesis testing and
confidence intervals. Model errors, i,
– are normally distributed
– are independently distributed
– have constant variance
i.e. i ~ NID(0, 2)

2.3.1 Use of t-tests
Slope
H0: 1 = 10 H1: 1  10
• Standard error of the slope: se  1  
ˆ MS Re s
S xx
• Test statistic: ˆ
1  10
t0 
se ˆ1  
• Reject H0 if |t0| > t  / 2,n  2
• Can also use the P-value approach

2.3.1 Use of t-tests
Intercept
H0: 0 = 00 H1: 0  00
 
 
2
ˆ 1 x
• Standard error of the intercept: se  0  MS Re s   
 n S xx 
ˆ0   00
t0 
• Test statistic: se ˆ0 
• Reject H0 if |t0| > t  / 2, n  2
• Can also use the P-value approach

2.3.2 Testing Significance of Regression
H0: 1 = 0 H1: 1  0
• This tests the significance of regression; that
is, is there a linear relationship between the
response and the regressor.
• Failing to reject 1 = 0, implies that there is
no linear relationship between y and x

2.3.3 The Analysis of Variance
Relationship between t0 and F0:

• For H0: 1 = 0, it can be shown that:
t 02  F0
So for testing significance of regression, the
t-test and the ANOVA procedure are
equivalent (only true in simple linear
regression)

Problemas

2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for Slope
 
ˆ1  t / 2 , n  2 se ˆ1   1  ˆ1  t / 2 , n  2 se ˆ1 
• 100(1-)% Confidence interval for the
Intercept
   
ˆ 0  t / 2 , n  2 se ˆ 0   0  ˆ 0  t / 2 , n  2 se ˆ 0

2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for 2
( n  2 ) MS 2 ( n  2 ) MS
RES
  RES
 2 / 2 , n  2  12  / 2 , n  2

2.4.2 Interval Estimation of the Mean
Response
• Let x0 be the level of the regressor variable at which we

want to estimate the mean response, i.e.
E  y | x0    y| x0
– Point estimator of E  y | x0  once the model is fit:
E  y | x 0   ˆ y | x0  ˆ 0  ˆ1 x 0
– In order to construct a confidence interval on the mean
response, we need the variance of the point estimator.

Response
• The variance of ˆ y| x0 is
   
Var ˆ y | x 0  Var ˆ 0  ˆ 1 x 0  Var [ y  ˆ 1 ( x 0  x )]
 Var ( y )  Var [ˆ 1 ( x 0  x )]
 2  2 ( x0  x ) 2  1 ( x  x ) 2 
   2   0 
n S xx  n S xx 

Response
• 100(1-)% confidence interval for E(y|x0)
 1 ( x0  x ) 2 
ˆ y| x0  t / 2,n2 MSRe s     E( y | x0 )
 n S xx 
 1 ( x0  x ) 2 
 ˆ y| x0  t / 2,n2 MSRe s   
 n S xx 
Notice that the length of the CI depends on

the location of the point of interest
See pages 34-35, text

2.5 Prediction of New Observations
• Suppose we wish to construct a prediction interval

on a future observation, y0 corresponding to a
particular level of x, say x0.
• The point estimate would be:
ŷ0  ˆ0  ˆ1 x0
• The confidence interval on the mean response at this

point is not appropriate for this situation. Why?
• Let the random variable,  be   y0  ŷ0

•  is normally distributed with
– E() = 0
 1 (  ) 2
– Var() = Var  y 0  ˆ
y 0    2
 1  
x 0 x

 n S xx 
(y0 is independent of ŷ0)

• 100(1 - )% prediction interval on a future

observation, y0, at x0
 1 ( x0  x ) 2 
yˆ 0  t  / 2, n  2 MS Re s 1     y0
 n S xx 
 1 ( x0  x ) 2 
 yˆ 0  t  / 2, n  2 MS Re s 1   
 n S xx 

Problemas

2.6 Coefficient of Determination
• R2 - coefficient of determination
2 SS R SS Re s
R   1
SST SST
• Proportion of variation explained by the
regressor, x
• For the rocket propellant data

2.6 Coefficient of Determination
• R2 can be misleading!
– Simply adding more terms to the model will
increase R2
– As the range of the regressor variable increases
(decreases), R2 generally increases (decreases).
– R2 does not indicate the appropriateness of a
linear model

Problemas

2.7 Considerations in the Use of Regression
• Extrapolating
• Extreme points will often influence the slope.
• Outliers can disturb the least-squares fit
• Linear relationship does not imply cause-effect
relationship (interesting mental defective example in
the book pg 40)
• Sometimes, the value of the regressor variable is
unknown and itself be estimated or predicted.

2.8 Regression Through the Origin
• The no-intercept model is
y  1 x  
• This model would be appropriate for situations
where the origin (0, 0) has some meaning.
• A scatter diagram can aid in determining where an
intercept- or no-intercept model should be used.
• In addition, the practitioner could test both models.
Examine t-tests, residual mean square.


Doe Rls Teoría

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Doe Rls Teoría

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Doe Rls Teoría

Uploaded by

Copyright:

Available Formats

Linear Regression Analysis 5th 1

edition Montgomery, Peck & Vining

• Regression analysis is a statistical

• Not all observations will fall exactly on a

where  represents error

Delivery time example

Linear Regression Analysis 5th 4

where y – dependent (response) variable

• The mean response at any value, x, of the

• The variance of y at any given x is

Linear Regression Analysis 5th 7

Simple Linear Regression

Linear Regression Analysis 5E 12

• Single regressor, x; response, y Population

• The response, y, is a random variable

Linear Regression Analysis 5E 14

• Least squares estimation seeks to minimize the

Linear Regression Analysis 5E 17

Linear Regression Analysis 5E 20

• Residuals will be used to determine the

Linear Regression Analysis 5E 21

• Just because we can fit a linear model

Linear Regression Analysis 5E 22

Computer Output (Minitab)

Linear Regression Analysis 5E 23

• The ordinary least-squares (OLS) estimator of the

Linear Regression Analysis 5E 24

• The least-squares estimators are unbiased estimators of

• The OLS estimators are Best Linear Unbiased Estimators

Linear Regression Analysis 5E 25

• Useful properties of the least-squares fit

3. The least-squares regression line always

Linear Regression Analysis 5E 26

Linear Regression Analysis 5th 27

• Residual (error) sum of squares

Linear Regression Analysis 5E 28

• The quantity n – 2 is the number of

Linear Regression Analysis 5E 29

• ̂ depends on the residual sum of squares.

Linear Regression Analysis 5E 30

Linear Regression Analysis 5th 31

Linear Regression Analysis 5E 32

Linear Regression Analysis 5E 33

Linear Regression Analysis 5E 34

Linear Regression Analysis 5E 35

Relationship between t0 and F0:

Linear Regression Analysis 5E 36

Linear Regression Analysis 5th 37

Linear Regression Analysis 5E 38

Linear Regression Analysis 5E 39

• Let x0 be the level of the regressor variable at which we

– Point estimator of E  y | x0  once the model is fit:

Linear Regression Analysis 5E 40

Linear Regression Analysis 5E 41

• 100(1-)% confidence interval for E(y|x0)

Notice that the length of the CI depends on

Linear Regression Analysis 5E 43

• Suppose we wish to construct a prediction interval