Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter13 MAS202

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Chapter 13: Simple Linear Regression

Course Name: MAS2020

Lecturer: Duong Thi Hong

Hanoi, 2023
1 13.1 Simple Linear Regression Models

2 13.2 Determining the Simple Linear Regression Equation

3 13.3 Measures of Variation

4 13.4 Assumptions of Regression

5 13.5 Residual Analysis

6 13.7 Inferences About the Slope and Correlation Coefficient

7 13.9 Potential Pitfalls in Regression


13.1 Simple Linear Regression Models

Simple linear regression models examine the straight line (linear)


relationship between a dependent Y variable and a single independent
X variable.
13.2 Determining the Simple Linear Regression

Equation

The most common approach to finding b0 and b1 is using the


least-squares method.
Computing the Y Intercept, b0 and the Slope, b1
Computing the Y Intercept, b0 and the Slope, b1
13.3 Measures of Variation

Yi = the observed value of Y


Y = the mean value of Y
Ŷi = the predicted value of Y for the given Xi value
Measures of Variation in Regression
SST = SSR + SSE

SST = Total sum of squares


SST measures the variation of the Yi values around their mean, Y .
SSR = Regression sum of squares = Explained variation
SSR represents variation that is explained by the relationship
between X and Y
SSE = Error sum of squares = Unexplained variation
SSE represents variation due to factors other than X
Computing the Sum of Squares
n
!2
X
n n
Yi
i=1
X X
2
SST = (Yi − Y ) = Yi2 −
n
i=1 i=1
n
!2
X
n n n
Yi
i=1
X X X
2
SSR = (Ŷi − Y ) = b0 Yi + b1 Xi Yi −
n
i=1 i=1 i=1
Xn n
X n
X n
X
2
SSE = (Yi − Ŷi ) = Yi2 − b0 Yi − b1 Xi Yi
i=1 i=1 i=1 i=1
SSR = 18934.93478, SSE = 13665.56552, SST = 32600.5
Coefficient of Determination
The coefficient of determination is defined by
SSR
r2 =
SST
The coefficient of determination measures the proportion of
variation in Y that is explained by the variation in the
independent variable X .
0 ≤ r2 ≤ 1
The coefficient of determination measures the strength of the
linear relationship between X and Y .
Standard Error of the Estimate
The standard deviation of the variation of observations around the
regression line is estimated by the standard error of the estimate:
r
SSE
SY X =
n−2
SY X is a measure of the variation of observed Y values from the
predicted Y values.
13.4 Assumptions of Regression

Linearity: The relationship between variables is linear.


Independence of errors: The errors is independent of one another.
Normality of error: Error values are normally distributed for any
given value of X .
Equal variance: The variance of the errors be constant for all
values of X .
13.5 Residual Analysis

Residual
The residual ei is defined by

ei = Yi − Ŷi

Residual analysis visually evaluates the assumptions of regression and


helps you determine whether the regression model that has been
selected is appropriate.
Evaluate Linearity
Plot the residuals on the vertical axis against the corresponding
Xi values on the horizontal axis.
If the linear model is appropriate for the data, the plot of the
residuals will look like a random scattering of points.
Evaluate independence of Errors
Plot the residuals in the order or sequence in which the data were
collected.
If the plot of the residuals versus the time variable show a cyclical
pattern, the assumption of independence is violated.
Evaluate Normality of Error
Construct a histogram
Use a stem-and-leaf display, a boxplot, or a normal probability plot

Normality Assumption Is Appropriate


Evaluate Equal Variance
Use a plot of the residuals with Xi .
If there is approximately the same amount of variation in the
residuals at each value of X , the equal-variance assumption is
appropriate.

Violation of equal variance


13.7 Inferences About the Slope and Correlation

Coefficient

To determine the existence of a linear relationship between the X


and Y variables, we test the hypotheses:

H0 : β1 = 0
H1 : β1 ̸= 0

If reject H0 , there is evidence of a linear relationship.


SY X
The standard error of b1 is Sb1 = √
SSX
t Test for the Slope
b1 − 0
tST AT =
Sb1

If H0 is true, tST AT follows a t distribution with n − 2 degrees of


freedom.
Reject H0 if |tST AT | > tα/2,n−2 , otherwise, do not reject H0 .
Example 1. Attempting to analyze the relationship between
advertising and sales, the owner of a furniture store recorded the
monthly advertising budget ($ thousands) and the sales ($ millions) for
a sample of 12 months. The data are listed here.

Is there evidence of a linear relationship between advertising and sales?


(α = 5%)
F Test for the Slope
M SR
FST AT =
M SE
where
SSR SSE
M SR = = SSR, M SE =
1 n−2

The FST AT test statistic follows an F distribution with 1 and


n − 2 degrees of freedom.
Reject H0 if FST AT > Fα , otherwise, do not reject H0 .
ANOVA Table for Testing the Significance of a Regression Coefficient
Confidence Interval Estimate for the Slope
A 1 − α confidence interval for β1 is

b1 − tα/2,n−2 Sb1 ≤ β1 ≤ b1 + tα/2,n−2 Sb1

Example 2. In Example 1, construct a 95% confidence interval


estimate of the population slope, β1 .
t Test for the Correlation Coefficient
Test the hypotheses:

H0 : ρ = 0 (no linear relationship)


H1 : ρ ̸= 0 (linear relationship does exist)

r n−2
Test statistic tST AT = √
1 − r2
If H0 is true, tST AT follows a t distribution with n − 2 degrees of
freedom.
Reject H0 if |tST AT | > tα/2,n−2 , otherwise, do not reject H0 .

√ √
Note. r= r2 if b1 > 0, r = − r2 if b1 < 0
Example 3. You are testing the null hypothesis that there is no linear
relationship between two variables, X and Y . From your sample of
n = 20, you determine that SSR = 60 and SSE = 40.
a. What is the value of FST AT ?
b. At the α = 0.05 level of significance, what is the critical value?
c. Based on your answers to (a) and (b), what statistical decision
should you make?
d. Compute the correlation coefficient by first computing r2 and
assuming that b1 is negative.
e. At the 0.05 level of significance, is there a significant correlation
between X and Y ?
13.9 Potential Pitfalls in Regression

Lacking an awareness of the assumptions of least-squares


regression.
Not knowing how to evaluate the assumptions of least-squares
regression.
Not knowing the alternatives to least-squares regression if a
particular assumption is violated.
Using a regression model without knowledge of the subject matter.
Extrapolating outside the relevant range.
Concluding that a significant relationship identified always reflects
a cause-and-effect relationship.
Seven Steps for Avoiding the Potential Pitfalls
1. Be clear about the problem or goal being investigated and the
variables that need to be examined.
2. Construct a scatter plot to observe the possible relationship
between X and Y .
3. Perform a residual analysis to check the assumptions of regression.
4. If there are violations of the assumptions, use alternative methods
to least-squares regression or alternative least-squares models.
5. If there are no violations of the assumptions, carry out tests for
the significance of the regression coefficients and develop
confidence and prediction intervals.
6. Refrain from making predictions and forecasts outside the relevant
range of the independent variable.
7. Remember that the relationships identified in observational
studies may or may not be due to cause-and-effect relationships.
(While causation implies correlation, correlation does not imply
causation.)
Exercises

13.1, 13.2, 13.3, 13.5 (p. 525, 526)


13.11-13.15, 13.17, 13.18 (p. 531)
13.39, 13.40, 13.42, 13.45 (p. 545)

You might also like