Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Doe Rls Teoría

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Linear Regression Analysis 5th 1

edition Montgomery, Peck & Vining


1.1 Regression and Model Building

• Regression analysis is a statistical


technique for investigating and modeling
the relationship between variables.
• Equation of a straight line (classical)
y = mx +b
we usually write this as
y = 0 +1x
Linear Regression Analysis 5th 2
edition Montgomery, Peck & Vining
1.1 Regression and Model Building

• Not all observations will fall exactly on a


straight line.
y = 0 + 1x + 

where  represents error


- it is a random variable that accounts for the
failure of the model to fit the data exactly.
-  ~ N(0, 2)
Linear Regression Analysis 5th 3
edition Montgomery, Peck & Vining
1.1 Regression and Model Building

Delivery time example

Linear Regression Analysis 5th 4


edition Montgomery, Peck & Vining
1.1 Regression and Model Building
• Simple Linear Regression Model

where y – dependent (response) variable


x – independent (regressor/predictor) variable
0 - intercept
1 - slope
 - random error term
Linear Regression Analysis 5th 5
edition Montgomery, Peck & Vining
Linear Regression Analysis 5th 6
edition Montgomery, Peck & Vining
1.1 Regression and Model Building

• The mean response at any value, x, of the


regressor variable is

• The variance of y at any given x is

Linear Regression Analysis 5th 7


edition Montgomery, Peck & Vining
Linear Regression Analysis 5th 8
edition Montgomery, Peck & Vining
Figure 1.3 Linear regression approximation of a
complex relationship.
Linear Regression Analysis 5th 9
edition Montgomery, Peck & Vining
Figure 1.4 Piecewise linear approximation of a
complex relationship.
Linear Regression Analysis 5th 10
edition Montgomery, Peck & Vining
Figure 1.5 The danger of extrapolation in regression.
Linear Regression Analysis 5th 11
edition Montgomery, Peck & Vining
Chapter 2

Simple Linear Regression

Linear Regression Analysis 5E 12


Montgomery, Peck & Vining
2.1 Simple Linear Regression Model

• Single regressor, x; response, y Population


regression model
y  0  1x  
• 0 – intercept: if x = 0 is in the range, then 0 is
the mean of the distribution of the response y,
when x = 0; if x = 0 is not in the range, then 0 has
no practical interpretation
• 1 – slope: change in the mean of the distribution
of the response produced by a unit change in x
•  - random error
Linear Regression Analysis 5E 13
Montgomery, Peck & Vining
2.1 Simple Linear Regression Model

• The response, y, is a random variable


• There is a probability distribution for y at
each value of x
– Mean:
Ey | x   0  1x
– Variance:
Vary | x   Var0  1x      2

Linear Regression Analysis 5E 14


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• 0 and 1 are unknown and must be estimated
Sample
yi   0  1 xi   i , i  1, 2,..., n regression
model

• Least squares estimation seeks to minimize the


sum of squares of the differences between the
observed response, yi, and the straight line.
S (  0 , 1 )    i2  ( yi   0  1 xi ) 2
i i
Linear Regression Analysis 5E 15
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Let ˆ 0 , ˆ 1 represent the least squares
estimators of 0 and 1, respectively.
• These estimators must satisfy:
S
 2 ( yi  ˆ0  ˆ1 xi )  0
 0 ˆ0 , ˆ1 i

S
 2 ( yi  ˆ0  ˆ1 xi )xi  0
1 ˆ0 , ˆ1 i
Linear Regression Analysis 5E 16
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Simplifying yields the least squares normal
equations: n n
nˆ0  ˆ1  xi   yi
i 1 i 1
n n n
ˆ0  xi  ˆ1  xi2   yi xi
i 1 i 1 i 1

Linear Regression Analysis 5E 17


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Solving the normal equations yields the
ordinary least squares estimators:
ˆ 0  y  ˆ 1x
 n
y   n
 x 
 i   i
n
 i 1  i 1 
 yi x i 
ˆ 1  i 1 n
2
 x 
n
 i
n
 i 1 
 xi 
2

i 1 n
Linear Regression Analysis 5E 18
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• The fitted simple linear regression model:
ŷ  ˆ 0  ˆ 1x
• Sum of Squares Notation:
2
 x 
n
 i
n
Sxx   x i 
2  i 1    x i  x 
n 2

i 1 n i 1

 n
y   
n
x 
 i  i
n
Sxy   y i x i   i 1   i 1    y i x i  x 
n

i 1 n i 1
Linear Regression Analysis 5E 19
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Then
 n
  n

 y i   x i 
n
  i 1   i 1 
 i iy x S
ˆ  i 1 n 
xy
1 2
 n
 Sxx
 xi 
n
 i 1 
 xi 
2

i 1 n

Linear Regression Analysis 5E 20


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters

• Residuals: ei  y i  ŷ i

• Residuals will be used to determine the


adequacy of the model

Linear Regression Analysis 5E 21


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of the Parameters

• Just because we can fit a linear model


doesn’t mean that we should
– How well does this equation fit the data?
– Is the model likely to be useful as a predictor?
– Are any of the basic assumptions (such as
constant variance and uncorrelated errors)
violated, if so, how serious is this?

Linear Regression Analysis 5E 22


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of the Parameters

Computer Output (Minitab)

Linear Regression Analysis 5E 23


Montgomery, Peck & Vining
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model

• The ordinary least-squares (OLS) estimator of the


slope is a linear combinations of the observations,
yi:
S n
ˆ1  xy
  ci yi
S xx i 1
Useful in showing expected
value and variance properties
where
n n
ci  ( xi  x ) / S xx ,  ci  0,  ci2  1
i 1 i 1

Linear Regression Analysis 5E 24


Montgomery, Peck & Vining
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model

• The least-squares estimators are unbiased estimators of


their respective parameter:

 
E ˆ1  1  
E ˆ0   0
• The variances are

2  
   
2
1 x
Var ˆ1  ˆ
Var  0    
2

S xx  n S xx 

• The OLS estimators are Best Linear Unbiased Estimators


(BLUE)

Linear Regression Analysis 5E 25


Montgomery, Peck & Vining
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model

• Useful properties of the least-squares fit


1.  y i  ŷ i    ei  0
n n

i 1 i 1
n n
2.  y i   ŷ i
i 1 i 1

3. The least-squares regression line always


passes through the centroid y , x  of the data.
n n
4.  x i ei  0 5.  ŷ i e i  0
i 1 i 1

Linear Regression Analysis 5E 26


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 27


edition Montgomery, Peck & Vining
2.2.3 Estimation of 2

• Residual (error) sum of squares


n 2 n
SSRe s    yi  yi    ei
ˆ 2

i 1 i 1
n
  y  ny  ˆ1S xy
2
i
1i 14 2 43
SST    yi  y 
2

 SS  ˆ ST 1 xy

Linear Regression Analysis 5E 28


Montgomery, Peck & Vining
2.2.3 Estimation of 2

• Unbiased estimator of 2
SS Re s
ˆ 
2
 MSRe s
n2

• The quantity n – 2 is the number of


degrees of freedom for the residual sum of
squares.

Linear Regression Analysis 5E 29


Montgomery, Peck & Vining
2.2.3 Estimation of 2

• ̂ depends on the residual sum of squares.


2

Then:
– Any violation of the assumptions on the
model errors could damage the usefulness of
this estimate
– A misspecification of the model can damage
the usefulness of this estimate
– This estimate is model dependent

Linear Regression Analysis 5E 30


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 31


edition Montgomery, Peck & Vining
2.3 Hypothesis Testing on the Slope
and Intercept
• Three assumptions needed to apply
procedures such as hypothesis testing and
confidence intervals. Model errors, i,
– are normally distributed
– are independently distributed
– have constant variance
i.e. i ~ NID(0, 2)

Linear Regression Analysis 5E 32


Montgomery, Peck & Vining
2.3.1 Use of t-tests
Slope
H0: 1 = 10 H1: 1  10
• Standard error of the slope: se  1  
ˆ MS Re s
S xx
• Test statistic: ˆ
1  10
t0 
se ˆ1  
• Reject H0 if |t0| > t  / 2,n  2
• Can also use the P-value approach

Linear Regression Analysis 5E 33


Montgomery, Peck & Vining
2.3.1 Use of t-tests
Intercept
H0: 0 = 00 H1: 0  00
 
 
2
ˆ 1 x
• Standard error of the intercept: se  0  MS Re s   
 n S xx 

ˆ0   00
t0 
• Test statistic: se ˆ0 
• Reject H0 if |t0| > t  / 2, n  2
• Can also use the P-value approach

Linear Regression Analysis 5E 34


Montgomery, Peck & Vining
2.3.2 Testing Significance of Regression

H0: 1 = 0 H1: 1  0
• This tests the significance of regression; that
is, is there a linear relationship between the
response and the regressor.
• Failing to reject 1 = 0, implies that there is
no linear relationship between y and x

Linear Regression Analysis 5E 35


Montgomery, Peck & Vining
2.3.3 The Analysis of Variance

Relationship between t0 and F0:


• For H0: 1 = 0, it can be shown that:

t 02  F0
So for testing significance of regression, the
t-test and the ANOVA procedure are
equivalent (only true in simple linear
regression)

Linear Regression Analysis 5E 36


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 37


edition Montgomery, Peck & Vining
2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for Slope

 
ˆ1  t / 2 , n  2 se ˆ1   1  ˆ1  t / 2 , n  2 se ˆ1 
• 100(1-)% Confidence interval for the
Intercept
   
ˆ 0  t / 2 , n  2 se ˆ 0   0  ˆ 0  t / 2 , n  2 se ˆ 0

Linear Regression Analysis 5E 38


Montgomery, Peck & Vining
2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for 2

( n  2 ) MS 2 ( n  2 ) MS
RES
  RES
 2 / 2 , n  2  12  / 2 , n  2

Linear Regression Analysis 5E 39


Montgomery, Peck & Vining
2.4.2 Interval Estimation of the Mean
Response

• Let x0 be the level of the regressor variable at which we


want to estimate the mean response, i.e.
E  y | x0    y| x0

– Point estimator of E  y | x0  once the model is fit:

E  y | x 0   ˆ y | x0  ˆ 0  ˆ1 x 0
– In order to construct a confidence interval on the mean
response, we need the variance of the point estimator.

Linear Regression Analysis 5E 40


Montgomery, Peck & Vining
2.4.2 Interval Estimation of the Mean
Response

• The variance of ˆ y| x0 is

   
Var ˆ y | x 0  Var ˆ 0  ˆ 1 x 0  Var [ y  ˆ 1 ( x 0  x )]
 Var ( y )  Var [ˆ 1 ( x 0  x )]
 2  2 ( x0  x ) 2  1 ( x  x ) 2 
   2   0 
n S xx  n S xx 

Linear Regression Analysis 5E 41


Montgomery, Peck & Vining
2.4.2 Interval Estimation of the Mean
Response

• 100(1-)% confidence interval for E(y|x0)

 1 ( x0  x ) 2 
ˆ y| x0  t / 2,n2 MSRe s     E( y | x0 )
 n S xx 

 1 ( x0  x ) 2 
 ˆ y| x0  t / 2,n2 MSRe s   
 n S xx 

Notice that the length of the CI depends on


the location of the point of interest
Linear Regression Analysis 5E 42
Montgomery, Peck & Vining
See pages 34-35, text

Linear Regression Analysis 5E 43


Montgomery, Peck & Vining
2.5 Prediction of New Observations

• Suppose we wish to construct a prediction interval


on a future observation, y0 corresponding to a
particular level of x, say x0.
• The point estimate would be:

ŷ0  ˆ0  ˆ1 x0

• The confidence interval on the mean response at this


point is not appropriate for this situation. Why?
Linear Regression Analysis 5E 44
Montgomery, Peck & Vining
2.5 Prediction of New Observations

• Let the random variable,  be   y0  ŷ0


•  is normally distributed with
– E() = 0
 1 (  ) 2
– Var() = Var  y 0  ˆ
y 0    2
 1  
x 0 x

 n S xx 

(y0 is independent of ŷ0)

Linear Regression Analysis 5E 45


Montgomery, Peck & Vining
2.5 Prediction of New Observations

• 100(1 - )% prediction interval on a future


observation, y0, at x0
 1 ( x0  x ) 2 
yˆ 0  t  / 2, n  2 MS Re s 1     y0
 n S xx 

 1 ( x0  x ) 2 
 yˆ 0  t  / 2, n  2 MS Re s 1   
 n S xx 

Linear Regression Analysis 5E 46


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 47


edition Montgomery, Peck & Vining
2.6 Coefficient of Determination

• R2 - coefficient of determination

2 SS R SS Re s
R   1
SST SST
• Proportion of variation explained by the
regressor, x
• For the rocket propellant data

Linear Regression Analysis 5E 48


Montgomery, Peck & Vining
2.6 Coefficient of Determination

• R2 can be misleading!
– Simply adding more terms to the model will
increase R2
– As the range of the regressor variable increases
(decreases), R2 generally increases (decreases).
– R2 does not indicate the appropriateness of a
linear model

Linear Regression Analysis 5E 49


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 50


edition Montgomery, Peck & Vining
2.7 Considerations in the Use of Regression

• Extrapolating
• Extreme points will often influence the slope.
• Outliers can disturb the least-squares fit
• Linear relationship does not imply cause-effect
relationship (interesting mental defective example in
the book pg 40)
• Sometimes, the value of the regressor variable is
unknown and itself be estimated or predicted.

Linear Regression Analysis 5E 51


Montgomery, Peck & Vining
2.8 Regression Through the Origin
• The no-intercept model is
y  1 x  
• This model would be appropriate for situations
where the origin (0, 0) has some meaning.
• A scatter diagram can aid in determining where an
intercept- or no-intercept model should be used.
• In addition, the practitioner could test both models.
Examine t-tests, residual mean square.

Linear Regression Analysis 5E 52


Montgomery, Peck & Vining

You might also like