IE354 Slides 10 Chp11
IE354 Slides 10 Chp11
IE354 Slides 10 Chp11
and statistics
Spring 2023-2024
Chapter 11
Simple Linear Regression and Correlation
Dr. Abedallah Al Kader
11
CHAPTER OUTLINE
Simple Linear Regression
and Correlation
11-1 Empirical Models 11-6 Prediction of New
11-2 Simple Linear Regression Observations
11-3 Properties of the Least 11-7 Adequacy of the Regression
Squares Estimators Model
11-4 Hypothesis Test in 11-7.1 Residual analysis
Linear
Simple Regression 11-7.2 Coefficient of determination
11-4.1 Use of t-tests (R2)
11-4.2 Analysis of variance 11-8 Correlation
approach to test significance of
regression
11-5 Confidence Intervals
11-5.1 Confidence intervals on the
slope and intercept
11-5.2 Confidence interval on the
mean response
2
Chapt
er 11
11-1: Empirical Models
•Many problems in engineering and science
involve exploring the relationships between two or
more variables.
•Regression analysis is a statistical technique
that is very useful for these types of problems.
•For example, in a chemical process, suppose
that the yield of the product is related to the
process- operating temperature.
•Regression analysis can be used to build a
model to predict yield at a given temperature level.
4
11-2: Simple Linear Regression
E(Y|x) = 0 + 1x
5
11-2: Simple Linear Regression
6
11-2: Simple Linear Regression
• Suppose that we have n pairs of observations (x1, y1),
(x2, y2), …, (xn, yn).
• Figure 11-3 shows a typical scatter plot of observed
data and a candidate for the estimated regression line.
• The estimates of β0 and β1 should result in a line that
is (in some sense) a “best fit” to the data.
7
11-2: Simple Linear Regression
8
11-2: Simple Linear Regression
Least Squares Estimates
The least-squares estimates of the intercept and slope in the simple
linear regression model are
n n
yi xi
n
yi xi
i1
n
i1
i1
n
2 (11-2)
ˆ xi
n
x2 i 1n
where y (1/n) n
y 1
i iand x (1/n) n
x.
i1 i
Sec 11-2 Simple Linear Regression i1
i 9
11-2: Simple Linear Regression
The fitted or estimated regression line is therefore
(11-3)
yˆ ˆ ˆx
12
EXAMPLE 11-1 Oxygen Purity
We will fit a simple linear regression model to the oxygen purity data in
Table 11-1. The following quantities may be computed:
20 20
n 20 23.92 1,843.21
i1 xi i
1 yi
x 1.1960 y 92.1605
20 20
y i
2
170,044.5321 x i2 2 9 . 2 8 9 2
i1 i1
20
xi yi 2,214.6566
i1
2
2 0 x
S
20
xi
i
29.2892
(23.92)2
xx
i1 2 i 12 0 20
0.68088
and
2 0 x 2 0 y
20
i i
i1 i1
S xy
i1 xi yi 20
( 2 3 . 9 2 ) (1,843.21)
2,214.6566 20 10.17744
Sec 11-2 Simple Linear Regression
15
EXAMPLE 11-1 Oxygen Purity -
continued
Therefore, the least squares estimates of the slope and intercept are
Sxy 10.17744
1 0.68088 14.94748
and ˆ S xx
16
EXAMPLE 11-1 Oxygen Purity -
continued
yˆ 74.283
14.947x
n n
SS E ei2 yi yˆ i 2
i1 i1
SSE
ˆ 2 (11-4)
n 2
(11-5)
SS E SST ˆ1Sxy
17
11-3: Properties of the Least Squares Estimators
• Slope Properties
ˆ 2
E( ˆ 1 ) 1 V ( 1 )
S xx
• Intercept Properties
E(ˆ 0 ) 0 and ˆ
V ( 0 ) 2 1 x 2
xx
n S
H0: 1 = 1,0
H1: 1 1,0
Where 1,0 is a
constant.
An appropriate T0 ˆ
1, 0
(11-6)
test statistic ˆ xx
would be 2
/S
Sec 11-4 Hypothesis Tests in Simple Linear
Regression 21
11-4: Hypothesis Tests in Simple Linear Regression
ˆ1 ˆ
T0 1,0
se( ˆ 1 )
We would reject the null hypothesis if
H0: 0 = 0,0
H1: 0 0,0
Sec 11-4 Hypothesis Tests in Simple Linear
Regression 23
11-4: Hypothesis Tests in Simple Linear Regression
H0: 1 = 0
H1: 1 0
Table 11-2 presents the Minitab output for this problem. Notice that the t-statistic
value for the slope is computed as 11.35 and that the reported P-value is P = 0.000.
Minitab also reports the t-statistic for testing the hypothesis H0: 0 = 0. This statistic
is computed from Equation 11-7, with 0,0 = 0, as t0 = 46.62. Clearly, then, the
hypothesis that the intercept is zero is rejected.
28
11-4: Hypothesis Tests in Simple Linear Regression
Symbolically,
ˆ 2
Sec 11-4 Hypothesis Tests in Simple Linear
Regression 29
11-4: Hypothesis Tests in Simple Linear Regression
ˆ ˆ 2 ˆ ˆ (11-11)
1 t /2, n 2 S2 xx 1 1 t /2 ,n2 S xx
(11-12)
Sec 11-5 Confidence Intervals
34
11-5: Confidence Intervals
EXAMPLE 11-4 Oxygen Purity Confidence Interval on the Slope We will
find a 95% confidence interval on the slope of the regression line using the data
in Example 11-1. Recall that ˆ1 14.947, S xx 0.68088 , and ˆ 2 1.18 (see Table
11-2). Then, from Equation 11-11 we find
ˆ 2 1 ˆ ˆ
t 0.025,18 S2 x x 1 t 0.025,18 S xx
ˆ
Or
1.18 1.18
14.947 2.101 1 14.947
0.68088 2.101 0.68088
This simplifies to
12.181 1 17.713
ˆ Y | 0 ˆ 0
x
ˆ 1 x0
Definition
A 100(1 - )% confidence interval about the mean response at the value of
x x0, say Y | x 0 , is given by
ˆ 2 1 x0 x 2
ˆ Y | x 0 t / 2 , n S xx
n2
2 1 x0 x 2 (11-13)
Y | x 0 ˆ Y | x 0 t / 2 , S xx
n2 ˆ n
where ˆY | x 0 ˆ0 ˆ1 x 0 is computed from the fitted regression model.
We will construct a 95% confidence interval about the mean response for the
data in Example 11-1. The fitted model is ˆY |x 0 74.283 14.947 x 0 , and the
95% confidence interval on Y |x0 is found from Equation 11-13 as
1 (x 0 1.1960) 2
ˆY |x 0 2.101 1.18 0.68088
20
Suppose that we are interested in predicting mean oxygen purity when
x0 = 1.00%. Then ˆY |x 74.283 14.947 (1.00)
1.00
89.23
and the 95% confidence interval is
89.23 2.101 1.18 1 (1.00 1.1960)
2
2 0 0.68088
or
89.23 0.75
Therefore, the 95% CI on Y|1.00 is
88.48 Y|1.00 89.98
This is a reasonable narrow CI.
yˆ
0
2 1 x 0 x 2
yˆ 0 t / 2 , n 2 ˆ 1 S xx
n
(11-14)
Y0 2 1 x 0 x 2
t /2,n2 ˆ 1 S xx
n
yˆ 0
The value yˆ 0 is computed from the regression model yˆ 0
ˆ1x0 .
ˆ0
1 1.00
1 . 1 9 6 0 2
89.23 2.101 1.18 1 0.68088
20
1 1.00 1 . 1 9 6 0 2
Y0 8 9 .2 3 2.101. 1.18 1 0.68088
20
which simplifies to
86.83 y0 91.63
Table 11-4 presents the observed and predicted values of y at each value
of x from this data set, along with the corresponding residual. These values
were computed using Minitab and show the number of decimal places
typical of computer output.
A normal probability plot of the residuals is shown in Fig. 11-10. Since the
residuals fall approximately along a straight line in the figure, we conclude
that there is no severe departure from normality.
The residuals are also plotted against the predicted value yˆ i in Fig. 11-
11 and against the hydrocarbon levels xi in Fig. 11-12. These plots do
not indicate any serious model inadequacies.
Sec 11-7 Adequacy of the Regression
Model 43
11-7: Adequacy of the Regression Model
Example 11-7
R2 SSR 1
SSE
SS T
SS T
Y (11-17)
0 Y X
X
Y (11-18)
1
X
1/2
SS T (11-20)
1
ˆ R
S XX
We may also write:
S XXˆ1 S X Y S S R
R
2 1ˆ2
S YY SST SST
which is just the coefficient of determination. That is, the coefficient of determination
R 2 is just the square of the correlation coefficient between Y and X.
Sec 11-8 Correlation
53
11-8: Correlation
It is often useful to test the hypotheses
H0: = 0
H1: 0
H0: = 0
H1: 0
(11-22)
z/2 z/2 (11-23)
tanh arctanh r tanh arctanh r
n 3 n 3
Figure 11-13 shows a scatter diagram of wire bond strength versus wire
length. We have used the Minitab option of displaying box plots of each
individual variable on the scatter diagram. There is evidence of a linear
relationship between the two variables.
The Minitab output for fitting a simple linear regression model to the data
is shown below.
Sec 11-8 Correlation
57
11-8: Correlation
Figure 11-13 Scatter plot of wire bond strength versus wire length, Example 11-8.
Analysis of Variance
Source DF SS MS F P
Regression 1 5885.9 5885.9 615.08 0.000
Residual Error 23 220.1 9.6
Total 24 6105.9
Sec 11-8 Correlation
59
11-8: Correlation
Example 11-8 (continued)
Now Sxx = 698.56 and Sxy = 2027.7132, and the sample correlation coefficient
is
S xy 2027.7132
r 0.9818
S x x SS T 1/2
698.5606105.9 1/2
H0: = 0
H1: 0
t0 r n2 0.9818 23
1 r2
1 0.9640 24.8
1.96 1.96
tanh 2.3452 tanh 2.3452
22 22
which reduces to
0.9585 0.9921
A research engineer is 1
2
5.00
6.00
1.582
1.822
investigating the use of a 3
4
3.40
2.70
1.057
0.500
windmill to generate electricity 5 10.00 2.236
6 9.70 2.386
and has collected data on the 7 9.55 2.294
8 3.05 0.558
DC output from this windmill 9 8.15 2.166
Figure 11-14 Plot of DC output y versus wind Figure 11-15 Plot of residuals ei versus fitted
velocity x for the windmill data. values yˆi for the windmill data.
Figure 11-17 Plot of residuals versus fitted values Figure 11-18 Normal probability plot of the residuals
yˆi for the transformed model for the windmill data. for the transformed model for the windmill data.
A plot of the residuals from the transformed model versus yˆi is shown in Figure 11-17. This plot does not
reveal any serious problem with inequality of variance. The normal probability plot, shown in Figure 11-18,
gives a mild indication that the errors come from a distribution with heavier tails than the normal (notice
the slight upward and downward curve at the extremes). This normal probability plot has the z-score value
plotted on the horizontal axis. Since there is no strong signal of model inadequacy, we conclude that the
transformed model is satisfactory.
Sec 11-9 Transformation and Logistic Regression
67
Important Terms & Concepts of Chapter 11
Analysis of variance test in Odds ratio
regression Prediction interval on a future
Confidence interval on observation
mean Regression analysis
response
Residual plots
Correlation coefficient
Residuals
Empirical model
Scatter diagram
Confidence intervals
on model Simple linear regression model
parameters standard error
Intrinsically linear model Statistical test on model
parameters
Least squares estimation of
regression model Transformations
parameters
Logistics regression
Model adequacy checkingChapter 11 Summary 70