IE354 Slides 10 Chp11

IEN 251: : Principals of engineering probability
and statistics
Spring 2023-2024
Chapter 11
Simple Linear Regression and Correlation
Dr. Abedallah Al Kader
11
CHAPTER OUTLINE
Simple Linear Regression
and Correlation
11-1 Empirical Models 11-6 Prediction of New
11-2 Simple Linear Regression Observations
11-3 Properties of the Least 11-7 Adequacy of the Regression
Squares Estimators Model
11-4 Hypothesis Test in 11-7.1 Residual analysis
Linear
Simple Regression 11-7.2 Coefficient of determination
11-4.1 Use of t-tests (R2)
11-4.2 Analysis of variance 11-8 Correlation
approach to test significance of
regression
11-5 Confidence Intervals
11-5.1 Confidence intervals on the
slope and intercept
11-5.2 Confidence interval on the
mean response
2
Chapt
er 11
11-1: Empirical Models
•Many problems in engineering and science
involve exploring the relationships between two or
more variables.
•Regression analysis is a statistical technique
that is very useful for these types of problems.
•For example, in a chemical process, suppose
that the yield of the product is related to the
process- operating temperature.
•Regression analysis can be used to build a
model to predict yield at a given temperature level.
Sec 11-1 Empirical Models

4
11-2: Simple Linear Regression
Example:
4
•The simple linear regression considers a

single regressor or predictor x and a dependent
or response variable Y.
•The expected value of Y at each level of x is

a random variable:
E(Y|x) = 0 + 1x
5
• We assume that each observation, Y, can be

described by the model
Y = 0 + 1 x + 
• Where  is the random error term with

mean 0 and standard deviation σ.
• The random errors corresponding to different

observations are also assumed to be
uncorrelated random variables.
6
• Suppose that we have n pairs of observations (x1, y1),
(x2, y2), …, (xn, yn).
• Figure 11-3 shows a typical scatter plot of observed
data and a candidate for the estimated regression line.
• The estimates of β0 and β1 should result in a line that
is (in some sense) a “best fit” to the data.
7
• The German scientist Karl Gauss (1777–1855) proposed

estimating the parameters β 0 and β 1 in Equation 11-2 to
minimize the sum of the squares of the vertical deviations
in Fig. 11-3.
• We call this criterion for estimating the regression

coefficients the method of least squares.
• Using Equation 11-2, we may express the n observations

in the sample as
8
Least Squares Estimates
The least-squares estimates of the intercept and slope in the simple
linear regression model are
ˆ  y  ˆx (11-1)
 n  n 
  yi   xi 
n   

 yi xi
  i1

n
i1
i1


 n 
2 (11-2)
ˆ   xi 
n  
 x2   i  1n 
where y  (1/n)  n
y 1
i iand x  (1/n)  n
x.
i1 i
Sec 11-2 Simple Linear Regression i1
i 9
The fitted or estimated regression line is therefore
(11-3)
yˆ  ˆ  ˆx
Note that each pair of observations satisfies the relationship
yi  ˆ  ˆxi  ei , i  1, 2,…, n
where ei  y i  yˆ i is called the residual. The residual describes

the error in the fit of the model to the ith observation yi.
Sec 11-2 Simple Linear Regression

10
Notation 2
  n
  xi 
n n  
S xx   xi  x 2
  xi   i1 
2
n
i1
 i1
 n  n 
  xi    yi 
n n   
Sx y  
i1 y i xi  x 2  xi yi  
 i1 i1 n i1 

11
EXAMPLE 11-1 Oxygen Purity
12
EXAMPLE 11-1 Oxygen Purity
We will fit a simple linear regression model to the oxygen purity data in
Table 11-1. The following quantities may be computed:
20 20
n  20  23.92  1,843.21

i1 xi i
1 yi
x  1.1960 y  92.1605
20 20
 y i
2
 170,044.5321  x i2  2 9 . 2 8 9 2
i1 i1
20
 xi yi  2,214.6566
i1
2
 2 0 x 

S 
20
xi
 i
 29.2892 
(23.92)2

xx 
i1 2   i  12 0  20
 0.68088
and 
 2 0 x   2 0 y 
20    
i i
 i1   i1 
S xy  
i1 xi yi  20
( 2 3 . 9 2 ) (1,843.21)
 2,214.6566  20  10.17744
15
EXAMPLE 11-1 Oxygen Purity -
continued
Therefore, the least squares estimates of the slope and intercept are
Sxy 10.17744
1   0.68088  14.94748
and ˆ S xx
ˆ0  y  ˆx  92.1605  (14.94748 )1.196  74.28331

The fitted simple linear regression model (with the coefficients reported to
three decimal places) is
yˆ  74.283  14.947x
16
EXAMPLE 11-1 Oxygen Purity -
continued
yˆ  74.283 
14.947x

15
Estimating 2
The error sum of squares is
n n
SS E   ei2   yi  yˆ i 2
i1 i1

It can be shown that the expected value of the

error sum of squares is E(SSE) = (n – 2)2.

16
Estimating 2
An unbiased estimator of 2 is
SSE
ˆ 2  (11-4)
n  2
where SSE can be easily computed using
(11-5)
SS E  SST  ˆ1Sxy
17
11-3: Properties of the Least Squares Estimators
• Slope Properties
ˆ  2
E( ˆ 1 )   1 V ( 1 ) 
S xx
• Intercept Properties
E(ˆ 0 )   0 and ˆ
V ( 0 )  2  1 x 2 

  xx
n S
 
Sec 11-3 Properties of the Least Squares Estimators

20
11-4: Hypothesis Tests in Simple Linear Regression
• An important part of assessing the adequacy of a

linear regression model is testing statistical
hypotheses about the model parameters and
constructing certain confidence intervals.
• Hypothesis testing in simple linear regression

presents methods for constructing confidence
intervals.
Sec 11-4 Hypothesis Tests in Simple Linear

Regression 19
• To test hypotheses about the slope and intercept

of the regression model, we must make the
additional assumption that the error component
in the model, ε , is normally distributed.
• Thus, the complete assumptions are that the

errors are normally and independently distributed
with mean zero and variance σ 2, abbreviated
NID(0, σ 2).

Regression 20
11-4.1 Use of t-Tests
Suppose we wish to test
H0: 1 = 1,0
H1: 1  1,0
Where 1,0 is a
constant.
An appropriate T0 ˆ  
  1, 0
(11-6)
test statistic ˆ xx
would be 2
/S
Regression 21

The test statistic could also be written as:
ˆ1   ˆ
T0  1,0
se( ˆ 1 )
We would reject the null hypothesis if
|t0| > t/2,n - 2

Regression 22

Suppose we wish to test
H0: 0 = 0,0
H1: 0  0,0
An appropriate test statistic would be

T0  ˆ 0   0 , 0 ˆ   0 , 0 (11-7)

2 1 x2  se(ˆ 0 )
0  
ˆ 
 n S xx

Regression 23
We would reject the null hypothesis if
|t0| > t/2,n - 2

Regression 24

An important special case of the hypotheses of
Equation 11-18 is
H0: 1 = 0
H1: 1  0
These hypotheses relate to the significance of

regression.
Failure to reject H0 is equivalent to concluding that
there is no linear relationship between x and Y.
Regression 25
EXAMPLE 11-2 Oxygen Purity Tests of Coefficients We will test for significance of
regression using the model for the oxygen purity data from Example 11-1. The hypotheses are
H0: 1 = 0
H1: 1  0
and we will use  = 0.01. From Example 11-1 and Table 11-2 we have
ˆ1  14.947 n  20, S xx 

0.68088 , ˆ 2  1.18
ˆ1
14.947
t0     11.35
so the t-statistic in Equation 11-6 becomes
ˆ1 1.18/0.68088
ˆˆ xx se( )
2
/S
Practical Interpretation: Since the reference value of t is t0.005,18 = 2.88, the value of
the test statistic is very far into the critical region, implying that H0: 1 = 0 should be
rejected. There is strong evidence to support this claim. The P-value for this test is
P ~ 1.23 109 . This was obtained manually with a calculator.
Table 11-2 presents the Minitab output for this problem. Notice that the t-statistic
value for the slope is computed as 11.35 and that the reported P-value is P = 0.000.
Minitab also reports the t-statistic for testing the hypothesis H0: 0 = 0. This statistic
is computed from Equation 11-7, with 0,0 = 0, as t0 = 46.62. Clearly, then, the
hypothesis that the intercept is zero is rejected.
28
11-4.2 Analysis of Variance Approach to Test

Significance of Regression
The analysis of variance identity is
n n n
(11-8)
i1 y i
  y  i1y i  y 
  i1 y i  yˆ i 
2 2 2
ˆ
Symbolically,
SST = SSR + SSE (11-9)

Regression 27

If the null hypothesis, H0: 1 = 0 is true, the
statistic
S S R /1 MSR

F0  (11-10)
S S E /n  2 MSE
follows the F1,n-2 distribution and we would

reject if f0 > f,1,n-2.

Regression 28

The quantities, MSR and MSE are called mean
squares.
Analysis of variance table:
Source of Sum of Squares Degrees of Mean Square F0
Variation Freedom
Regression SSR  ˆ1Sxy 1 MSR MSR/MSE
Error SS E  SST  ˆ1Sxy n-2 MSE
Total SST n-1
Note that MSE =
ˆ 2
Regression 29
EXAMPLE 11-3 Oxygen Purity ANOVA We will use the analysis of

variance approach to test for significance of regression using the oxygen
purity data model from Example 11-1. Recall that SST  173.38, ˆ1  14.947 , Sxy =
10.17744, and n = 20.

Regression 30
EXAMPLE 11-3 Oxygen Purity ANOVA We will use the analysis of

variance approach to test for significance of regression using the oxygen
purity data model from Example 11-1. Recall that SST  173.38, ˆ1  14.947 , Sxy =
10.17744, and n = 20. The regression sum of squares is
SS R  ˆ1Sxy  (14.947)10.17744  152.13
and the error sum of squares is
SSE = SST - SSR = 173.38 - 152.13 = 21.25
The analysis of variance for testing H0: 1 = 0 is summarized in the Minitab

output in Table 11-2. The test statistic is f0 = MSR/MSE = 152.13/1.18 = 128.86,
for which we find that the P-value isP ~ 1.23 109 , so we conclude that 1 is not
zero.
There are frequently minor differences in terminology among computer
packages. For example, sometimes the regression sum of squares is called the
“model” sum of squares, and the error sum of squares is called the “residual”
sum of squares.
Regression 31
11-5: Confidence Intervals
11-5.1 Confidence Intervals on the Slope and Intercept
Definition
Under the assumption that the observation are normally and independently
distributed, a 100(1 - )% confidence interval on the slope 1 in simple linear
regression is
ˆ ˆ 2 ˆ ˆ (11-11)
 1  t /2, n  2 S2 xx   1   1  t /2 ,n2 S xx
Similarly, a 100(1 - )% confidence interval on the intercept 0 is
(11-12)
Sec 11-5 Confidence Intervals
34
EXAMPLE 11-4 Oxygen Purity Confidence Interval on the Slope We will
find a 95% confidence interval on the slope of the regression line using the data
in Example 11-1. Recall that ˆ1  14.947, S xx  0.68088 , and ˆ 2  1.18 (see Table
11-2). Then, from Equation 11-11 we find
 ˆ 2   1  ˆ ˆ
 t 0.025,18 S2 x x 1  t 0.025,18 S xx
ˆ
Or
1.18 1.18
14.947  2.101  1  14.947 
0.68088 2.101 0.68088
This simplifies to
12.181  1  17.713
Practical Interpretation: This CI does not include zero, so

there is strong evidence (at  = 0.05) that the slope is not
zero. The CI is reasonably narrow (2.766) because the
error variance is fairly small. 35
11-5.2 Confidence Interval on the Mean
Response
ˆ Y | 0  ˆ 0 
x
ˆ 1 x0

34
Response
Also, is normally distributed, and
has a t distribution with n - 2 degrees of freedom. This leads to the

following confidence interval definition.
35
Response
ˆY |x0  ˆ0  ˆ1 x 0
Definition
A 100(1 - )% confidence interval about the mean response at the value of
x  x0, say  Y | x 0 , is given by
ˆ 2  1  x0  x 2 

ˆ Y | x 0  t  / 2 , n S xx 
n2 
 2  1  x0  x  2  (11-13)
  Y | x 0  ˆ Y | x 0  t  / 2 , S xx 
n2 ˆ  n
where ˆY | x 0  ˆ0  ˆ1 x 0 is computed from the fitted regression model.

36
Example 11-5 Oxygen Purity Confidence Interval on the Mean Response
We will construct a 95% confidence interval about the mean response for the
data in Example 11-1. The fitted model is ˆY |x 0  74.283  14.947 x 0 , and the
95% confidence interval on Y |x0 is found from Equation 11-13 as
 1 (x 0  1.1960) 2 
ˆY |x 0  2.101 1.18   0.68088
 20
Suppose that we are interested in predicting mean oxygen purity when
x0 = 1.00%. Then ˆY |x  74.283  14.947 (1.00) 
1.00
89.23
and the 95% confidence interval is
89.23  2.101 1.18  1  (1.00  1.1960) 
2
 2 0 0.68088 
or
89.23  0.75  
Therefore, the 95% CI on Y|1.00 is
88.48  Y|1.00  89.98
This is a reasonable narrow CI.

37
11-6: Prediction of New Observations
Prediction Interval
yˆ
0
Sec 11-6 Prediction of New Observations

38
Prediction Interval
A 100(1 - ) % prediction interval on a future observation Y0 at the value x0
is given by
2 1  x 0  x 2 
yˆ 0 t  / 2 , n  2 ˆ  1   S xx
 n 
(11-14)

 Y0  2  1  x 0  x  2

 t /2,n2 ˆ  1   S xx 
n
yˆ 0


The value yˆ 0 is computed from the regression model yˆ 0 
ˆ1x0 .
ˆ0

39
EXAMPLE 11-6 Oxygen Purity Prediction Interval
To illustrate the construction of a prediction interval, suppose we use the data in

Example 11-1 and find a 95% prediction interval on the next observation of
oxygen purity x0 = 1.00%. Using Equation 11-14 and recalling from Example 11-5
that yˆ0  89.23 , we find that the prediction interval is
 1  1.00 
1 . 1 9 6 0  2 
89.23  2.101 1.18 1   0.68088
 20 

  1  1.00  1 . 1 9 6 0  2 
 Y0  8 9 .2 3  2.101. 1.18 1   0.68088
 20 


which simplifies to
86.83  y0  91.63
This is a reasonably narrow prediction interval.

40
11-7: Adequacy of the Regression Model
• Fitting a regression model requires several
assumptions.
1. Errors are uncorrelated random variables
with mean zero;
2. Errors have constant variance; and,
3. Errors be normally distributed.
• The analyst should always consider the validity
of these assumptions to be doubtful and
conduct analyses to examine the adequacy of
the model
Sec 11-7 Adequacy of the Regression

Model 41
11-7.1 Residual Analysis
•The residuals from a regression model are ei = yi - ŷi ,
where yi is an actual observation and ŷi is the corresponding
fitted value from the regression model.
•Analysis of the residuals is frequently helpful in

checking the assumption that the errors are approximately
normally distributed with constant variance, and in
determining whether additional terms in the model would
be useful.

Model 42
EXAMPLE 11-7 Oxygen Purity Residuals
The regression model for the oxygen purity data in Example 11-1 is
yˆ  7 4 . 2 8 3  14. 947 x .
Table 11-4 presents the observed and predicted values of y at each value
of x from this data set, along with the corresponding residual. These values
were computed using Minitab and show the number of decimal places
typical of computer output.
A normal probability plot of the residuals is shown in Fig. 11-10. Since the
residuals fall approximately along a straight line in the figure, we conclude
that there is no severe departure from normality.
The residuals are also plotted against the predicted value yˆ i in Fig. 11-
11 and against the hydrocarbon levels xi in Fig. 11-12. These plots do
not indicate any serious model inadequacies.
Model 43
Example 11-7

Model 44
Example 11-7
Figure 11-10 Normal

probability plot of residuals,
Example 11-7.

Model 45
Example 11-7
Figure 11-11 Plot of

residuals versus predicted
oxygen purity, ŷ, Example
11-7.

Model 46
Example 11-7

Model 47
Example 11-7
• A normal probability plot of the residuals is shown in Fig. 11-
10. Since the residuals fall approximately along a straight line
in the figure, we conclude that there is no severe departure
from normality.
• The residuals are also plotted against the predicted value in Fig.
11-11 and against the hydrocarbon levels xt in Fig. 11-12.
These plots do not indicate any serious model inadequacies.

Model 48
11-7.2 Coefficient of Determination (R2)
• The quantity
R2  SSR 1
SSE
SS T
SS T
is called the coefficient of determination and is

often used to judge the adequacy of a regression
model.
• 0  R2  1;
•We often refer (loosely) to R2 as the amount of
variability in the data explained or accounted for by the
regression model.
Model 49
11-7.2 Coefficient of Determination (R2)
• For the oxygen purity regression model,
R2 = SSR/SST
= 152.13/173.38
= 0.877
•Thus, the model accounts for 87.7% of
the variability in the data.

Model 50
11-8: Correlation
Our development of regression

analysis has assumed that x is a
mathematical
measured variable,
with negligible error, and that Y is a
random variable. Many applications of regression
analysis involve situations in which both X and Y
are random variables. In these situations, it is
usually assumed that the observations (Xi, Yi), i = 1,
2, … , n are jointly distributed random variables
obtained from the distribution f (x, y).
Sec 11-8 Correlation
51
11-8: Correlation
We assume that the joint distribution of Xi and Yi is the bivariate normal distribution
presented in Chapter 5, and  Y and Y2 are the mean and variance of Y, X and  2X
are the mean and variance X, and  is the correlation coefficient between Y and
X. Recall that the correlation coefficient is defined as
 XY
  (11-15)
 X Y
where XY is the covariance between Y and X.
The conditional distribution of Y for a given value of X = x is

1 2 
| x y
fY  1  y   0   1 x  
(11-16)
2Y e x p   
 
|x
 2   Y |x  

where
Y (11-17)
0  Y   X 
X
Y (11-18)
1   
X

52
11-8: Correlation
It is possible to draw inferences about the correlation coefficient  in this model.
The estimator of  is the sample correlation coefficient
n
Xi  X
 Yi S XY
R  i1 n 1/ 2  (11-19)
 n X  X 2
 i 2  Y  Y   S X X SS T 1/2

Note that
i1 
i1 i
1/2
 SS T  (11-20)
1
ˆ  R
 S XX 
We may also write:
S XXˆ1 S X Y  S S R
R  
2 1ˆ2 
S YY SST SST
which is just the coefficient of determination. That is, the coefficient of determination
R 2 is just the square of the correlation coefficient between Y and X.
53
11-8: Correlation
It is often useful to test the hypotheses
H0:  = 0
H1:   0
The appropriate test statistic for these hypotheses is

R n2
T0  (11-21)
1  R2
Reject H0 if |t0| > t/2,n-2.

54
11-8: Correlation
The test procedure for the hypothesis
H0:  = 0
H1:   0
where 0  0 is somewhat more complicated. In

this case, the appropriate test statistic is
Z0 = (arctanh R - arctanh 0)(n - 3)1/2
(11-22)
Reject H0 if |z0| > z/2.

55
11-8: Correlation
The approximate 100(1- )% confidence interval is
   
z/2 z/2 (11-23)
tanh arctanh r     tanh arctanh r 
 n  3   n  3 

56
11-8: Correlation
EXAMPLE 11-8 Wire Bond Pull Strength In Chapter 1 (Section 1-3) an
application of regression analysis is described in which an engineer at a
semiconductor assembly plant is investigating the relationship between
pull strength of a wire bond and two factors: wire length and die height.
In this example, we will consider only one of the factors, the wire length.
A random sample of 25 units is selected and tested, and the wire bond
pull strength and wire length arc observed for each unit. The data are
shown in Table 1-2. We assume that pull strength and wire length are
jointly normally distributed.
Figure 11-13 shows a scatter diagram of wire bond strength versus wire
length. We have used the Minitab option of displaying box plots of each
individual variable on the scatter diagram. There is evidence of a linear
relationship between the two variables.
The Minitab output for fitting a simple linear regression model to the data
is shown below.
57
11-8: Correlation
Figure 11-13 Scatter plot of wire bond strength versus wire length, Example 11-8.

58
11-8: Correlation
Minitab Output for Example 11-8
Regression Analysis: Strength versus Length
The regression equation is

Strength = 5.11 + 2.90 Length
Predictor Coef SE Coef T P

Constant 5.115 1.146 4.46 0.000
Length 2.9027 0.1170 24.80 0.000
S = 3.093 R-Sq = 96.4% R-Sq(adj) = 96.2%

PRESS = 272.144 R-Sq(pred) = 95.54%
Analysis of Variance
Source DF SS MS F P
Regression 1 5885.9 5885.9 615.08 0.000
Residual Error 23 220.1 9.6
Total 24 6105.9
59
11-8: Correlation
Example 11-8 (continued)
Now Sxx = 698.56 and Sxy = 2027.7132, and the sample correlation coefficient
is
S xy 2027.7132
r    0.9818
 S x x SS T  1/2
698.5606105.9 1/2
Note that r2 = (0.9818)2 = 0.9640 (which is reported in the Minitab output), or

that approximately 96.40% of the variability in pull strength is explained by the
linear relationship to wire length.

60
11-8: Correlation
Now suppose that we wish to test the hypotheses
H0:  = 0
H1:   0
with  = 0.05. We can compute the t-statistic of Equation 11-21 as
t0  r n2 0.9818 23
1  r2
 1  0.9640  24.8
This statistic is also reported in the Minitab output as a test of H0: 1 = 0.

Because t0.025,23 = 2.069, we reject H0 and conclude that the correlation
coefficient   0.

61
11-8: Correlation
Finally, we may construct an approximate 95% confidence interval on r from

Equation 11-23. Since arctanh r = arctanh 0.9818 = 2.3452, Equation 11-23
becomes
 1.96   1.96 
tanh  2.3452     tanh 2.3452
 22  22 
which reduces to
0.9585    0.9921

62
11-9: Transformation and Logistic Regression
We occasionally find that the straight-line regression model

Y = 0 + 1x +  inappropriate because the true regression
function is nonlinear. Sometimes nonlinearity is visually
determined from the scatter diagram, and sometimes, because
of prior experience or underlying theory, we know in advance
that the model is nonlinear.
Occasionally, a scatter diagram will exhibit an apparent

nonlinear relationship between Y and x. In some of these
situations, a nonlinear function can be expressed as a straight
line by using a suitable transformation. Such nonlinear models
are called intrinsically linear.
Sec 11-9 Transformation and Logistic Regression

63
EXAMPLE 11-9 Windmill Power Observation
Number, i
Wind Velocity (mph), xi DC Output, yi
A research engineer is 1
2
5.00
6.00
1.582
1.822
investigating the use of a 3
4
3.40
2.70
1.057
0.500
windmill to generate electricity 5 10.00 2.236
6 9.70 2.386
and has collected data on the 7 9.55 2.294
8 3.05 0.558
DC output from this windmill 9 8.15 2.166
and the corresponding wind 10

11
6.20
2.90
1.866
0.653
velocity. The data are plotted in 12

13
6.35
4.60
1.930
1.562
Figure 11-14 and listed in Table 14

15
5.80
7.40
1.737
2.088
11-5. 16
17
3.60
7.85
1.137
2.179
18 8.80 2.112
19 7.00 1.800
20 5.45 1.501
Table 11-5 Observed Values and 21 9.10 2.303
Regressor Variable for Example 22 10.20 2.310
11-9. 23 4.10 1.194
24 3.95 1.144
25 2.45 0.123
64
Example 11-9 (Continued)
Figure 11-14 Plot of DC output y versus wind Figure 11-15 Plot of residuals ei versus fitted
velocity x for the windmill data. values yî for the windmill data.

65
Figure 11-16 Plot of DC output versus x’ = 1/x for

the windmill data.

66
Figure 11-17 Plot of residuals versus fitted values Figure 11-18 Normal probability plot of the residuals
yî for the transformed model for the windmill data. for the transformed model for the windmill data.
A plot of the residuals from the transformed model versus yî is shown in Figure 11-17. This plot does not
reveal any serious problem with inequality of variance. The normal probability plot, shown in Figure 11-18,
gives a mild indication that the errors come from a distribution with heavier tails than the normal (notice
the slight upward and downward curve at the extremes). This normal probability plot has the z-score value
plotted on the horizontal axis. Since there is no strong signal of model inadequacy, we conclude that the
transformed model is satisfactory.
67
Important Terms & Concepts of Chapter 11
Analysis of variance test in Odds ratio
regression Prediction interval on a future
Confidence interval on observation
mean Regression analysis
response
Residual plots
Correlation coefficient
Residuals
Empirical model
Scatter diagram
Confidence intervals
on model Simple linear regression model
parameters standard error
Intrinsically linear model Statistical test on model
parameters
Least squares estimation of
regression model Transformations
parameters
Logistics regression
Model adequacy checkingChapter 11 Summary 70

IE354 Slides 10 Chp11

Uploaded by

Copyright:

Available Formats

IE354 Slides 10 Chp11

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IE354 Slides 10 Chp11

Uploaded by

Copyright:

Available Formats

IEN 251: : Principals of engineering probability

Sec 11-1 Empirical Models

•The simple linear regression considers a

•The expected value of Y at each level of x is

• We assume that each observation, Y, can be

• Where  is the random error term with

• The random errors corresponding to different

• The German scientist Karl Gauss (1777–1855) proposed

• We call this criterion for estimating the regression

• Using Equation 11-2, we may express the n observations

ˆ  y  ˆx (11-1)

Note that each pair of observations satisfies the relationship

yi  ˆ  ˆxi  ei , i  1, 2,…, n

where ei  y i  yˆ i is called the residual. The residual describes

Sec 11-2 Simple Linear Regression

Sec 11-2 Simple Linear Regression

ˆ0  y  ˆx  92.1605  (14.94748 )1.196  74.28331

Sec 11-2 Simple Linear Regression

It can be shown that the expected value of the

Sec 11-2 Simple Linear Regression

where SSE can be easily computed using

Sec 11-3 Properties of the Least Squares Estimators

• An important part of assessing the adequacy of a

• Hypothesis testing in simple linear regression

Sec 11-4 Hypothesis Tests in Simple Linear

• To test hypotheses about the slope and intercept

• Thus, the complete assumptions are that the

Sec 11-4 Hypothesis Tests in Simple Linear

11-4.1 Use of t-Tests

Suppose we wish to test

11-4.1 Use of t-Tests

|t0| > t/2,n - 2

Sec 11-4 Hypothesis Tests in Simple Linear

11-4.1 Use of t-Tests

An appropriate test statistic would be

11-4.1 Use of t-Tests

We would reject the null hypothesis if

|t0| > t/2,n - 2

Sec 11-4 Hypothesis Tests in Simple Linear

11-4.1 Use of t-Tests

These hypotheses relate to the significance of

ˆ1  14.947 n  20, S xx 

11-4.2 Analysis of Variance Approach to Test

SST = SSR + SSE (11-9)

Sec 11-4 Hypothesis Tests in Simple Linear

11-4.2 Analysis of Variance Approach to Test

follows the F1,n-2 distribution and we would

Sec 11-4 Hypothesis Tests in Simple Linear

11-4.2 Analysis of Variance Approach to Test

EXAMPLE 11-3 Oxygen Purity ANOVA We will use the analysis of

Sec 11-4 Hypothesis Tests in Simple Linear

EXAMPLE 11-3 Oxygen Purity ANOVA We will use the analysis of

SS R  ˆ1Sxy  (14.947)10.17744  152.13

and the error sum of squares is

SSE = SST - SSR = 173.38 - 152.13 = 21.25

The analysis of variance for testing H0: 1 = 0 is summarized in the Minitab

Similarly, a 100(1 - )% confidence interval on the intercept 0 is

Practical Interpretation: This CI does not include zero, so

Sec 11-5 Confidence Intervals