Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views52 pages

Doe Rls Teoría

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 52

Linear Regression Analysis 5th 1

edition Montgomery, Peck & Vining


1.1 Regression and Model Building

• Regression analysis is a statistical


technique for investigating and modeling
the relationship between variables.
• Equation of a straight line (classical)
y = mx +b
we usually write this as
y = 0 +1x
Linear Regression Analysis 5th 2
edition Montgomery, Peck & Vining
1.1 Regression and Model Building

• Not all observations will fall exactly on a


straight line.
y = 0 + 1x + 

where  represents error


- it is a random variable that accounts for the
failure of the model to fit the data exactly.
-  ~ N(0, 2)
Linear Regression Analysis 5th 3
edition Montgomery, Peck & Vining
1.1 Regression and Model Building

Delivery time example

Linear Regression Analysis 5th 4


edition Montgomery, Peck & Vining
1.1 Regression and Model Building
• Simple Linear Regression Model

where y – dependent (response) variable


x – independent (regressor/predictor) variable
0 - intercept
1 - slope
 - random error term
Linear Regression Analysis 5th 5
edition Montgomery, Peck & Vining
Linear Regression Analysis 5th 6
edition Montgomery, Peck & Vining
1.1 Regression and Model Building

• The mean response at any value, x, of the


regressor variable is

• The variance of y at any given x is

Linear Regression Analysis 5th 7


edition Montgomery, Peck & Vining
Linear Regression Analysis 5th 8
edition Montgomery, Peck & Vining
Figure 1.3 Linear regression approximation of a
complex relationship.
Linear Regression Analysis 5th 9
edition Montgomery, Peck & Vining
Figure 1.4 Piecewise linear approximation of a
complex relationship.
Linear Regression Analysis 5th 10
edition Montgomery, Peck & Vining
Figure 1.5 The danger of extrapolation in regression.
Linear Regression Analysis 5th 11
edition Montgomery, Peck & Vining
Chapter 2

Simple Linear Regression

Linear Regression Analysis 5E 12


Montgomery, Peck & Vining
2.1 Simple Linear Regression Model

• Single regressor, x; response, y Population


regression model
y  0  1x  
• 0 – intercept: if x = 0 is in the range, then 0 is
the mean of the distribution of the response y,
when x = 0; if x = 0 is not in the range, then 0 has
no practical interpretation
• 1 – slope: change in the mean of the distribution
of the response produced by a unit change in x
•  - random error
Linear Regression Analysis 5E 13
Montgomery, Peck & Vining
2.1 Simple Linear Regression Model

• The response, y, is a random variable


• There is a probability distribution for y at
each value of x
– Mean:
Ey | x   0  1x
– Variance:
Vary | x   Var0  1x      2

Linear Regression Analysis 5E 14


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• 0 and 1 are unknown and must be estimated
Sample
yi   0  1 xi   i , i  1, 2,..., n regression
model

• Least squares estimation seeks to minimize the


sum of squares of the differences between the
observed response, yi, and the straight line.
S (  0 , 1 )    i2  ( yi   0  1 xi ) 2
i i
Linear Regression Analysis 5E 15
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Let ˆ 0 , ˆ 1 represent the least squares
estimators of 0 and 1, respectively.
• These estimators must satisfy:
S
 2 ( yi  ˆ0  ˆ1 xi )  0
 0 ˆ0 , ˆ1 i

S
 2 ( yi  ˆ0  ˆ1 xi )xi  0
1 ˆ0 , ˆ1 i
Linear Regression Analysis 5E 16
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Simplifying yields the least squares normal
equations: n n
nˆ0  ˆ1  xi   yi
i 1 i 1
n n n
ˆ0  xi  ˆ1  xi2   yi xi
i 1 i 1 i 1

Linear Regression Analysis 5E 17


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Solving the normal equations yields the
ordinary least squares estimators:
ˆ 0  y  ˆ 1x
 n
y   n
 x 
 i   i
n
 i 1  i 1 
 yi x i 
ˆ 1  i 1 n
2
 x 
n
 i
n
 i 1 
 xi 
2

i 1 n
Linear Regression Analysis 5E 18
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• The fitted simple linear regression model:
ŷ  ˆ 0  ˆ 1x
• Sum of Squares Notation:
2
 x 
n
 i
n
Sxx   x i 
2  i 1    x i  x 
n 2

i 1 n i 1

 n
y   
n
x 
 i  i
n
Sxy   y i x i   i 1   i 1    y i x i  x 
n

i 1 n i 1
Linear Regression Analysis 5E 19
Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters
• Then
 n
  n

 y i   x i 
n
  i 1   i 1 
 i iy x S
ˆ  i 1 n 
xy
1 2
 n
 Sxx
 xi 
n
 i 1 
 xi 
2

i 1 n

Linear Regression Analysis 5E 20


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of
the Parameters

• Residuals: ei  y i  ŷ i

• Residuals will be used to determine the


adequacy of the model

Linear Regression Analysis 5E 21


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of the Parameters

• Just because we can fit a linear model


doesn’t mean that we should
– How well does this equation fit the data?
– Is the model likely to be useful as a predictor?
– Are any of the basic assumptions (such as
constant variance and uncorrelated errors)
violated, if so, how serious is this?

Linear Regression Analysis 5E 22


Montgomery, Peck & Vining
2.2 Least-Squares Estimation of the Parameters

Computer Output (Minitab)

Linear Regression Analysis 5E 23


Montgomery, Peck & Vining
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model

• The ordinary least-squares (OLS) estimator of the


slope is a linear combinations of the observations,
yi:
S n
ˆ1  xy
  ci yi
S xx i 1
Useful in showing expected
value and variance properties
where
n n
ci  ( xi  x ) / S xx ,  ci  0,  ci2  1
i 1 i 1

Linear Regression Analysis 5E 24


Montgomery, Peck & Vining
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model

• The least-squares estimators are unbiased estimators of


their respective parameter:

 
E ˆ1  1  
E ˆ0   0
• The variances are

2  
   
2
1 x
Var ˆ1  ˆ
Var  0    
2

S xx  n S xx 

• The OLS estimators are Best Linear Unbiased Estimators


(BLUE)

Linear Regression Analysis 5E 25


Montgomery, Peck & Vining
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model

• Useful properties of the least-squares fit


1.  y i  ŷ i    ei  0
n n

i 1 i 1
n n
2.  y i   ŷ i
i 1 i 1

3. The least-squares regression line always


passes through the centroid y , x  of the data.
n n
4.  x i ei  0 5.  ŷ i e i  0
i 1 i 1

Linear Regression Analysis 5E 26


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 27


edition Montgomery, Peck & Vining
2.2.3 Estimation of 2

• Residual (error) sum of squares


n 2 n
SSRe s    yi  yi    ei
ˆ 2

i 1 i 1
n
  y  ny  ˆ1S xy
2
i
1i 14 2 43
SST    yi  y 
2

 SS  ˆ ST 1 xy

Linear Regression Analysis 5E 28


Montgomery, Peck & Vining
2.2.3 Estimation of 2

• Unbiased estimator of 2
SS Re s
ˆ 
2
 MSRe s
n2

• The quantity n – 2 is the number of


degrees of freedom for the residual sum of
squares.

Linear Regression Analysis 5E 29


Montgomery, Peck & Vining
2.2.3 Estimation of 2

• ̂ depends on the residual sum of squares.


2

Then:
– Any violation of the assumptions on the
model errors could damage the usefulness of
this estimate
– A misspecification of the model can damage
the usefulness of this estimate
– This estimate is model dependent

Linear Regression Analysis 5E 30


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 31


edition Montgomery, Peck & Vining
2.3 Hypothesis Testing on the Slope
and Intercept
• Three assumptions needed to apply
procedures such as hypothesis testing and
confidence intervals. Model errors, i,
– are normally distributed
– are independently distributed
– have constant variance
i.e. i ~ NID(0, 2)

Linear Regression Analysis 5E 32


Montgomery, Peck & Vining
2.3.1 Use of t-tests
Slope
H0: 1 = 10 H1: 1  10
• Standard error of the slope: se  1  
ˆ MS Re s
S xx
• Test statistic: ˆ
1  10
t0 
se ˆ1  
• Reject H0 if |t0| > t  / 2,n  2
• Can also use the P-value approach

Linear Regression Analysis 5E 33


Montgomery, Peck & Vining
2.3.1 Use of t-tests
Intercept
H0: 0 = 00 H1: 0  00
 
 
2
ˆ 1 x
• Standard error of the intercept: se  0  MS Re s   
 n S xx 

ˆ0   00
t0 
• Test statistic: se ˆ0 
• Reject H0 if |t0| > t  / 2, n  2
• Can also use the P-value approach

Linear Regression Analysis 5E 34


Montgomery, Peck & Vining
2.3.2 Testing Significance of Regression

H0: 1 = 0 H1: 1  0
• This tests the significance of regression; that
is, is there a linear relationship between the
response and the regressor.
• Failing to reject 1 = 0, implies that there is
no linear relationship between y and x

Linear Regression Analysis 5E 35


Montgomery, Peck & Vining
2.3.3 The Analysis of Variance

Relationship between t0 and F0:


• For H0: 1 = 0, it can be shown that:

t 02  F0
So for testing significance of regression, the
t-test and the ANOVA procedure are
equivalent (only true in simple linear
regression)

Linear Regression Analysis 5E 36


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 37


edition Montgomery, Peck & Vining
2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for Slope

 
ˆ1  t / 2 , n  2 se ˆ1   1  ˆ1  t / 2 , n  2 se ˆ1 
• 100(1-)% Confidence interval for the
Intercept
   
ˆ 0  t / 2 , n  2 se ˆ 0   0  ˆ 0  t / 2 , n  2 se ˆ 0

Linear Regression Analysis 5E 38


Montgomery, Peck & Vining
2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for 2

( n  2 ) MS 2 ( n  2 ) MS
RES
  RES
 2 / 2 , n  2  12  / 2 , n  2

Linear Regression Analysis 5E 39


Montgomery, Peck & Vining
2.4.2 Interval Estimation of the Mean
Response

• Let x0 be the level of the regressor variable at which we


want to estimate the mean response, i.e.
E  y | x0    y| x0

– Point estimator of E  y | x0  once the model is fit:

E  y | x 0   ˆ y | x0  ˆ 0  ˆ1 x 0
– In order to construct a confidence interval on the mean
response, we need the variance of the point estimator.

Linear Regression Analysis 5E 40


Montgomery, Peck & Vining
2.4.2 Interval Estimation of the Mean
Response

• The variance of ˆ y| x0 is

   
Var ˆ y | x 0  Var ˆ 0  ˆ 1 x 0  Var [ y  ˆ 1 ( x 0  x )]
 Var ( y )  Var [ˆ 1 ( x 0  x )]
 2  2 ( x0  x ) 2  1 ( x  x ) 2 
   2   0 
n S xx  n S xx 

Linear Regression Analysis 5E 41


Montgomery, Peck & Vining
2.4.2 Interval Estimation of the Mean
Response

• 100(1-)% confidence interval for E(y|x0)

 1 ( x0  x ) 2 
ˆ y| x0  t / 2,n2 MSRe s     E( y | x0 )
 n S xx 

 1 ( x0  x ) 2 
 ˆ y| x0  t / 2,n2 MSRe s   
 n S xx 

Notice that the length of the CI depends on


the location of the point of interest
Linear Regression Analysis 5E 42
Montgomery, Peck & Vining
See pages 34-35, text

Linear Regression Analysis 5E 43


Montgomery, Peck & Vining
2.5 Prediction of New Observations

• Suppose we wish to construct a prediction interval


on a future observation, y0 corresponding to a
particular level of x, say x0.
• The point estimate would be:

ŷ0  ˆ0  ˆ1 x0

• The confidence interval on the mean response at this


point is not appropriate for this situation. Why?
Linear Regression Analysis 5E 44
Montgomery, Peck & Vining
2.5 Prediction of New Observations

• Let the random variable,  be   y0  ŷ0


•  is normally distributed with
– E() = 0
 1 (  ) 2
– Var() = Var  y 0  ˆ
y 0    2
 1  
x 0 x

 n S xx 

(y0 is independent of ŷ0)

Linear Regression Analysis 5E 45


Montgomery, Peck & Vining
2.5 Prediction of New Observations

• 100(1 - )% prediction interval on a future


observation, y0, at x0
 1 ( x0  x ) 2 
yˆ 0  t  / 2, n  2 MS Re s 1     y0
 n S xx 

 1 ( x0  x ) 2 
 yˆ 0  t  / 2, n  2 MS Re s 1   
 n S xx 

Linear Regression Analysis 5E 46


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 47


edition Montgomery, Peck & Vining
2.6 Coefficient of Determination

• R2 - coefficient of determination

2 SS R SS Re s
R   1
SST SST
• Proportion of variation explained by the
regressor, x
• For the rocket propellant data

Linear Regression Analysis 5E 48


Montgomery, Peck & Vining
2.6 Coefficient of Determination

• R2 can be misleading!
– Simply adding more terms to the model will
increase R2
– As the range of the regressor variable increases
(decreases), R2 generally increases (decreases).
– R2 does not indicate the appropriateness of a
linear model

Linear Regression Analysis 5E 49


Montgomery, Peck & Vining
Problemas

Linear Regression Analysis 5th 50


edition Montgomery, Peck & Vining
2.7 Considerations in the Use of Regression

• Extrapolating
• Extreme points will often influence the slope.
• Outliers can disturb the least-squares fit
• Linear relationship does not imply cause-effect
relationship (interesting mental defective example in
the book pg 40)
• Sometimes, the value of the regressor variable is
unknown and itself be estimated or predicted.

Linear Regression Analysis 5E 51


Montgomery, Peck & Vining
2.8 Regression Through the Origin
• The no-intercept model is
y  1 x  
• This model would be appropriate for situations
where the origin (0, 0) has some meaning.
• A scatter diagram can aid in determining where an
intercept- or no-intercept model should be used.
• In addition, the practitioner could test both models.
Examine t-tests, residual mean square.

Linear Regression Analysis 5E 52


Montgomery, Peck & Vining

You might also like