0% found this document useful (0 votes)

438 views

Module 5

The document discusses simple linear regression and correlation. [1] It describes how regression analysis can help build predictive models, optimize processes, and aid in process control. [2] Regression analysis is used to explore relationships between variables like oxygen purity (y) and hydrocarbon levels (x) in a sample dataset. [3] The simple linear regression model assumes y is linearly related to x, with random error ε.

Uploaded by

Eugene James Gonzales Palmes

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

438 views

Module 5

Uploaded by

Eugene James Gonzales Palmes

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

COLLEGE OF ENGINEERING
DEPARTMENT OF INDUSTRIAL ENGINEERING
MODULE 5
SIMPLE LINEAR REGRESSION AND CORRELATION

I. Empirical Models

Many problems in Industrial Engineering involve exploring the

relationships between two or more variables. IEs use regression
analysis as a statistical technique to help understand such
relationships. The following are some uses of regression
analysis:

1. To help build a model to predict an output given certain

inputs
2. To help in process optimization of an output given varying
levels of input
3. To help in process control purposes

Consider the data in Table 1-1 in the next page.

In Table 1-1, y is the purity of oxygen produced in a chemical

distillation process, while x is the percentage of hydrocarbons
that are present in the main condenser of the distillation unit.
The figure immediately below Table 1-1 is called a scatter
diagram; it represents a graph on which each (xi , yi ) pair is
represented as a single point.

Looking at the scatter diagram, we can see that although no

straight line will pass exactly through all the points, there is a
strong indication that the points lie scattered randomly
around a straight line.

It is therefore reasonable to assume that the mean of the random

variable Y is related to x by the following straight-line
relationship:

E (Y x ) = Y x = 0 + 1 x

1 of 28
Table 1-1: Oxygen and Hydrocarbon Levels
Observation Hydrocarbon Level Purity
Number x (%) y (%)
1 0.99 90.01
2 1.02 89.05
3 1.15 91.43
4 1.29 93.74
5 1.46 96.73
6 1.36 94.45
7 0.87 87.59
8 1.23 91.77
9 1.55 99.42
10 1.40 93.65
11 1.19 93.54
12 1.15 92.52
13 0.98 90.56
14 1.01 89.54
15 1.11 89.85
16 1.20 90.39
17 1.26 93.25
18 1.32 93.41
19 1.43 94.98
20 0.95 87.33
Scatter Diagram of Oxygen Purity vs Hydrocarbon
Level

100
98
96
Purity (y)

94
92
90
88
86
0.85 0.95 1.05 1.15 1.25 1.35 1.45 1.55
Hydrocarbon Level (x)

2 of 28
Figure 1. Scatter Diagram of Oxygen Purity vs Hydrocarbon
Level
where 0 and 1 are referred to as the slope and intercept of the
line; collectively, they are called the regression coefficients.

While the mean of Y is a linear function of x, the actual

observed value y does not fall exactly on a straight line. To
generalize this, we need to assume that the expected value of Y
is a linear function of x, but that for a fixed value of x, the
actual value of Y is determined by the mean value function
(the linear model) plus a random error term:

Y = 0 + 1 x + (Equation 1-1)

where is the random error term. Equation 1-1 is what is

commonly known as the simple linear regression model; this
is because the model above only has one independent variable
(x) or regressor.

The simple linear regression model is an example of an

empirical model, i.e. a simplified representation of a system or
phenomenon that is based on experience of experimentation.

Y, being a dependent variable, has certain properties. It can be

described via the mean and variance. Let us assume that the
mean and variance of are 0 and 2 respectively and that the
value of x is fixed, i.e. constant. Using the theorems from your
previous probability and statistics course:

E (Y x ) = E ( 0 + 1 x + ) = 0 + 1 x + E ( ) = 0 + 1 x

V (Y x ) = V ( 0 + 1 x + ) = V ( 0 + 1 x ) + V ( ) = 0 + 2 = 2

Thus, in the true regression model, the height of the regression

line at any value of x is just the expected value of Y for that x

3 of 28
while the slope, i.e. 1 can be interpreted as the change in the
mean of Y for a unit change in x. Also, the variability of Y at a
particular value of x is determined by the error variance 2 .

If the variability of Y is determined by the error variance 2 ,

what does this imply? This implies that when 2 is small, the
observed values of Y will fall close to the regression line; and
when 2 is large, the observed values of Y may be significantly
far away from the regression line. And since 2 is constant, the
variability of Y at any value of x is the same.

II. Simple Linear Regression

As mentioned earlier, the simple linear regression model only

considers a single regressor or predictor x and a dependent or
response variable Y.

We assume that each observation Y can be described by the

model

Y = 0 + 1 x +

where is the random error term with mean zero and

(unknown) variance 2 . The random errors corresponding to
different observations are also assumed to be uncorrelated
random variables.

The estimates of 0 and 1 should result in a line that is a best

fit to the data. A line that is considered as a best fit was
described by a German scientist named Karl Gauss as the line
that minimizes the sum of squares of the vertical deviations of
each actual observed data from the expected value. This
methodology is known as the method of least squares. Figure 2
shows the scatter plot from figure 1 with the vertical deviations
of each actual data from a potential regression line.

4 of 28
Figure 2. Deviations of the data from the estimated regression
model

To estimate the values of 0 and 1 for a set of data values (x,y),

we start with equation 1-1:

y i = 0 + 1 xi + i , i = 1, 2, ,n

The sum of the squares of the deviations of the observations

from the true regression line is:

n n
L= = 2
( y i 0 1 xi )2
i =1 i =1

The least squares estimators of 0 and 1 , signified as 0 and

, must satisfy:
1
L
(y )
n
= 2 i 0 1 xi = 0
0 0 , 1 i =1

L
(y )
n
= 2 i 0 1 xi xi = 0
1 0 , 1 i =1

5 of 28
Simplifying the two preceding partial differential equations, we
get:

n n
n0 + 1 xi = yi
i =` i =1
n n n
(equations 1-2 and1-3)
0 xi + 1 x = 2
i y i xi
i =1 i =1 i =1

Equations 1-2 and 1-3 are referred to as the least squares

normal equations. From these equations, we get the solution to
estimate our regression coefficients:

0 = y 1 x
n n

n
yi xi
i =1 i =1
y i xi
n
1 = i =1
2
n

n
xi
i =1
xi2
i =1 n

where y and x are the average values of your y and x values

respectively.

The fitted or estimated regression line is therefore:

y = 0 + 1 x

with each pair of observations satisfying the relationship:

yi = 0 + 1 xi + ei , i = 1, 2, ,n

6 of 28
where ei = y i y i is called the residual. The residual describes
the error in the fit of the model for each actual observation.

Notationally, it is convenient to give special symbols to the

numerator and denominator of equation for 1 :

n 2

xi
(x )
n n
2 i =1
S xx = i x = x 2
i
i =1 i =1 n
n n
xi yi
( )
n n
2 i =1 i =1
S xy = y i xi x = xi y i
i =1 i =1 n

therefore the equation for 1 can be rewritten as:

S xy
1 =
S xx
EXAMPLE: Fit a simple linear regression model to the oxygen
purity data in Table 1-1.

The following quantities may be computed from the given data

in Table 1-1:

20 20
n = 20 xi = 23.92 y i = 1,843.21 x = 1.196 y = 92.1605
i =1 i =1
20 20 20
y i2 = 170,044.5321 xi2 = 29.2892 xi y i = 2,214.6566
i =1 i =1 i =1

7 of 28
n 2

xi
S xx =
n
x
2 i =1
= 29.2892
(23.92)
2
= 0.68088
i
i =1 n 20
n n
xi yi
S xy =
n
xi yi i =1 i =1
= 2,214.6566
(23.92)(1,843.21) = 10.17744
i =1 n 20

S xy 10.17744
1 = = = 14.94748
S xx 0.68088
0 = y 1 x = 92.1605 (14.94748)1.196 = 74.28331

The fitted simple linear regression model is therefore:

y = 74.283 + 14.947 x
Using the regression model above, we would predict oxygen
purity of y = 89.23% when the hydrocarbon level is x = 1.00% .
The purity y = 89.23% may be interpreted as an estimate of the
true population mean purity when x = 1.00% or as an estimate of
a new observation when x = 1.00% . Such estimate is, of course,
subject to error, i.e. you cant expect that a future observation of
purity at x = 1.00% be exactly equal to 89.23%.

Estimating
2

To estimate , we use the residuals ei = y i y i . The sum of

the squares of the residuals, which is often called the error sum
of squares, is equal to

n n
SS E = e =
2
i ( yi y i )2
i =1 i =1

which is used to estimate for :

8 of 28
SS E
2 =
n2
However, computing for SS E given using the equation that was
just present can be very mind-numbing. Another way to
compute for SS E is through:

SS E = SST 1 S xy , where SS T =
n
i =1
(y i y )
2
=
n
i =1
y 2
i ny
2

EXERCISES

1. An article in Concrete Research (Near Surface

Characteristics of Concrete: Intrinsic Permeability, Vol. 41,
1989), presented data on compressive strength x and intrinsic
permeability y of various concrete mixes and cures. Summary
quantities are n = 14, yi = 572 , yi2 = 23,530 , xi = 43 ,
xi2 = 157.42 , and xi yi = 1697.80 . Assume that the two
variables are related according to the simple linear regression
model.

a. Calculate the least squares estimates of the slope and

intercept.
b. Use the equation of the fitted line to predict what
permeability would be observed when the compressive
strength is x = 4.3.
c. Give a point estimate of the mean permeability when
compressive strength is x = 3.7.
d. Suppose that the observed value of permeability at x =
3.7 is y = 46.1. Calculate the value of the corresponding
residual.

2. An article in Technometrics by S.C. Narula and J.F.

Wellington (Prediction, Linear Regression, and a Minimum
Sum of Relative Errors, Vol. 19, 1977) presents data on the

9 of 28
selling price and annual taxes for 24 houses. The data is given in
the table immediately proceeding below.

Sale Price (in Taxes (in Sale Price (in Taxes (in
US$k) US$k) US$k) US$k)
25.9 4.9176 30.0 5.0500
29.5 5.0208 36.9 8.2464
27.9 4.5429 41.9 6.6969
25.9 4.5573 40.5 7.7841
29.9 5.0597 43.9 9.0384
29.9 3.8910 37.5 5.9894
30.9 5.8980 37.9 5.7422
28.9 5.6039 44.5 8.7951
35.9 5.8282 37.9 6.0831
31.5 5.3003 38.9 8.3607
31.0 6.2712 36.9 8.1400
30.9 5.9592 45.8 9.1416

a. Assuming that a simple linear regression model is

appropriate, obtain the least squares fit relating selling
price to taxes paid. What is the estimate of ?
2

b. Find the mean selling price given that the taxes paid are
x = 7.50.
c. Calculate the fitted value of y corresponding to x =
5.8980. Find the corresponding residual.

3. Suppose we wish to fit a regression model for which the true

regression line passes through the point (0,0). The appropriate
model is Y = x + . Assume that we have n pairs of data. Find
the least squares estimate of .

III. Properties of the Least Squares Estimators

Recall that we have assumed that the error term in the model
Y = 0 + 1 x + is a random variable with mean zero and

10 of 28
variance . Because the values of x are fixed and Y is a
2

random variable with mean Y x = 0 + 1x and variance ,

the values of 0 and 1 depend on the observed ys. Because of

these properties, the estimate of could be used to provide
2

estimates of the variance of the slope and the intercept. The

square roots of such variance estimators are known as the
estimated standard errors of the slope and intercept,
respectively.

In simple linear regression, the estimated standard error of the

slope and estimated standard error of the intercept are

( )
se 1 =
2
S xx and
( )
se 0 = 2
1 x
+
n S xx

respectively, where = SS E (n 2 ) as mentioned previously.

IV. Hypothesis Tests in Simple Linear Regression

One of the important parts of assessing the adequacy of a linear

regression model is testing statistical hypotheses about the
model parameters and constructing confidence intervals. To test
the hypotheses about the slope and intercept of the regression
model, the following assumptions need to be made:

1. The errors i are normally and independently distributed

2. The errors i have a mean equal to zero
3. The errors i have a variance equal to
2

The three assumptions above are notationally abbreviated as

NID(0, ).
2

11 of 28
Use of t-Tests

Suppose we wish to test the hypothesis that the slope equals a

constant, say 1, 0 . The appropriate hypotheses are:

H 0 : 1 = 1, 0
H1 : 1 1, 0

Review Question: Given the hypotheses above, do we have a

one-sided test of hypothesis or a two-sided test of hypothesis?

The test statistic to be used for this case is:

1 1,0 1 1,0
T0 = =
S xx
2
( )
se 1

which follows the t-distribution with n 2 degrees of freedom.

The critical region is given as:

t0 > t 2,n2

This means that if the absolute value of the computed test

statistic is greater than the critical value t 2,n2 , we reject the
null hypothesis. Otherwise, we fail to reject the null hypothesis.

NOTE: We NEVER SAY we accept the null hypothesis or we

accept the alternative hypothesis. Saying these statements is a
major sin in Statistics. You will be laughed at and made fun of
by people who know and study Statistics if you say these things.

A similar method can be used to test hypotheses about the

intercept:

12 of 28
H 0 : 0 = 0, 0
H 1 : 0 0, 0

For this case, we use the test statistic:

0 0, 0 0 0, 0
T0 = =
1 x
2 ( )
se 0
2
+
n S xx

The critical region is given as:

t0 > t 2,n2

Same with that of the slope, if the absolute value of the

computed test statistic is greater than the critical value t 2,n2 ,
we reject the null hypothesis. Otherwise, we fail to reject the
null hypothesis.

One special case of the test of hypotheses that we have just

discussed is the case wherein:

H 0 : 1 = 0
H1 : 1 0

The test of hypotheses above relates to what we term as

significance of regression. If we fail to reject the null
hypothesis for this case, this means that there is no linear
relationship between x and Y. This conclusion implies that:

1. x is of little value in explaining the variation in Y

2. the true relationship between x and Y is not linear

13 of 28
EXAMPLE: Let us test the significance of regression using the
model for the oxygen purity data. Use = 0.01

Step 1: Declare the null and alternative hypotheses:

H 0 : 1 = 0
H1 : 1 0

Step 2: Declare the significance level:

= 0.01
Step 3: Declare the test statistic:

1 1,0 1 1,0
T0 = =
S xx
2
( )
se 1

Step 4: Compute for the test statistic:

From the given data, the following estimates were computed:

1 = 14.97 n = 20 S xx = 0.68088 2 = 1.18

therefore, computing for the test statistic:

1 1, 0
T0 =
2 S xx
14.97 0
=
1.18 0.68088
= 11.35

14 of 28
Step 5: Declare the critical region:

t0 > t0.01 2, 202

t0 > t0.005,18
t0 > 2.88

Step 6: State the result of your test:

Since the computed test statistic is within the critical region, i.e.
11.35 > 2.88, we reject the null hypothesis.

Step 7: State your conclusion:

The regression model is significant, i.e. the values of x can

significantly explain the variability in Y.

EXERCISES

4. Consider the data from Exercise # 1.

a. Test for significance of regression using = 0.05 .

b. Estimate and the standard deviation of 1 .
2

c. What is the standard error of the intercept in this

model?

5. Regression methods were used to analyze the data from a

study investigating the relationship between roadway surface
temperature (x) and pavement deflection (y). Summary
quantities were n = 20, yi = 12.75 , yi2 = 8.86 , xi = 1,478 ,
xi2 = 143,215.8 , xi yi = 1,083.67 .

a. Test for significance of regression using = 0.05 .

b. Estimate the standard errors of the slope and intercept.

15 of 28
6. A rocket motor is manufactured by bonding together two
types of propellants, an igniter and a sustainer. The shear
strength of the bond y is thought to be a linear function of the
age of the propellant x when the motor is cast. Data from twenty
observations are shown below:

Observation Strength Age x Observation Strength Age x

Number y (psi) (weeks) Number y (psi) (weeks)
1 2158.70 15.50 11 2165.20 13.00
2 1678.15 23.75 12 2399.55 3.75
3 2316.00 8.00 13 1779.80 25.00
4 2061.30 17.00 14 2336.75 9.75
5 2207.50 5.00 15 1765.30 22.00
6 1708.30 19.00 16 2053.50 18.00
7 1784.70 24.00 17 2414.40 6.00
8 2575.00 2.50 18 2200.50 12.50
9 2357.90 7.50 19 2654.20 2.00
10 2277.70 11.00 20 1753.70 21.50

a. Test for the significance of regression with = 0.01 .

b. Estimate the standard errors of 0 and 1 .
c. Test H 0 : 1 = 30 versus H1 : 1 30 using
= 0.01 .
d. Test H 0 : 0 = 0 versus H1 : 0 0 using = 0.01 .
e. Test H 0 : 0 = 2500 versus H1 : 0 > 2500 using
= 0.01 .

V. Confidence Intervals

Confidence Intervals on the Slope and Intercept

We already learned how to compute for the point estimates of

the slope and intercept, denoted by 1 and 0 respectively.
However, we can also obtain confidence interval estimates of

16 of 28
these parameters. The width of the confidence interval is a
measure on the overall quality of the regression line.

Given the assumption that the observations are NID, a

100(1 )% confidence interval on the slope 1 in simple
linear regression is

2 2
1 t 2,n 2 1 1 + t 2,n 2
S xx S xx

Consequently, a 100(1 )% confidence interval on the

intercept 0 is

2 2
1 x 1 x
0 t 2, n 2 2 + 0 0 t 2, n 2 2 +
n S xx n S xx

EXAMPLE: Using the oxygen purity data, let us construct a

95% confidence interval on the slope of the regression line.
Recalling that:
1 = 14.97 S xx = 0.68088 2 = 1.18
and using the equation for confidence interval on the slope
above, we get:

2 2
1 t0.025,18 1 1 + t0.025,18
S xx S xx

1.18 1.18
14.97 2.101 1 14.97 + 2.101
0.68088 0.68088

12.197 1 17.697

17 of 28
Confidence Interval on the Mean Response

A confidence interval may be constructed on the mean response

at a specified value of x, which can be denoted as x0 . This is
often called a confidence interval about the regression line.

A 100(1 )% confidence interval about the mean response at

the value of x = x0 , say Y x is given by 0

Y x t 2,n 2 2
+
(
1 x0 x )2

Y x0 Y x0 + t 2,n 2 2 (
1 x0 x
+
)2

0
n S xx n S xx

where Y x = 0 + 1 x0 is computed from the given regression

model.

EXAMPLE: Let us construct a 95% confidence interval about

the mean response for the oxygen purity data when x = 1.00%.
The fitted model is Y x0 = 74.283 + 14.947 x0 . Using this model,
we compute for Y x : 0

Y x = 74.283 + 14.947(1.00)
0

Y x = 89.23
0

Using the computed value for Y x , we now construct the 95%

confidence interval:

1 (1.00 1.196)
2
89.23 2.101 1.18 +
20 0.68088

89.23 0.75
88.48 Y 1.00 89.98

18 of 28
VI. Prediction of New Observations

One important application of a regression model is predicting

new or future observations of Y. Linear regression is one of the
forecasting techniques that IEs usually employ. You will learn
more about other forecasting techniques in your higher IE
subjects.

If x0 is the value of the regressor variable of interest,

Y = 0 + 1 x0

is the point estimator of the new or future value of the response

Y0 .

Consider obtaining an interval estimate for this future

observation Y0 . This new observation is independent of the
observations used to develop the regression model. Thus, the
confidence interval for Y x is inappropriate, since it is based
0

only on the data used to fit the regression model. Thus, we use a
difference equation to construct the confidence interval for
predicting new values.

A 100(1 )% prediction interval on a future observation Y0

at the value x0 is given by:

y 0 t 2,n 2
(
1 x x
1 + + 0
2 ) 2

y 0 y 0 t 2,n 2 2 (
1 x x
1 + + 0
)
2

n S xx n S xx

EXAMPLE: Let us find 95% prediction interval on the next

observation of oxygen purity at x0 = 1.00% . Recalling from the

19 of 28
previous example that y 0 = 89.23 . Constructing the prediction
interval gives us:
1 (1.00 1.196 )
2
89.23 2.101 1.18 1 + +
20 0.68088
86.83 y 0 91.63

EXERCISES

7. Refer to the data in Exercise # 1. Find a 95% confidence

interval on each of the following:
a. Slope
b. Intercept
c. Mean permeability when x = 2.5
d. Find a 95% prediction interval on permeability when
x = 2.5

8. The number of pounds of steam used per month by a chemical

plant is thought to be related to the average ambient temperature
(in oF) for that month. The past years usage and temperature are
shown in the table below:

Month Temp Usage/1000 Month Temp Usage/1000

January 21 185.79 July 68 621.55
February 24 214.47 August 74 675.06
March 32 288.03 September 62 562.03
April 47 424.84 October 50 452.93
May 50 454.84 November 41 369.95
June 59 539.03 December 30 273.98

a. Find a 99% confidence interval for 1 .

b. Find a 99% confidence interval for 0 .
c. Find a 95% confidence interval on mean steam usage
when the average temperature is 55oF.
d. Find a 95% prediction interval on steam usage when
temperature is 55oF.

20 of 28
VII. Adequacy of the Regression Model

In using regression models, IEs should always consider the

validity of the assumptions used and conduct analyses to
examine the adequacy of the regression model.

There are two ways to assess the adequacy of the regression

model:

1. Residual Analysis
2. Coefficient of Determination (R2)

Residual Analysis

The residuals from a regression model are ei = y i y i where yi

is an actual observation and y i is the corresponding fitted value
from the regression model. To check if the regression model is
adequate through residual analysis, the residuals should show
that they approximate the normal distribution with constant
variance.

To check for normality, the experimenter can construct a

frequency histogram of the residuals or a normal probability plot
of residuals.

21 of 28
As can be seen from the two immediately preceding figures, we
can infer that the residuals approximate a normal distribution.
The first graph plots the residuals against the predicted values.
We can see that there is a random pattern evident in the figure
implying the residuals approximating a normal distribution. The
second figure on the other hand is a normal probability plot of
residuals. We can see that the residuals are fall approximately
along a straight line, which implies that there is no severe
departure from normality.

To summarize:

In residual analysis, the residuals approximate the normal

distribution if:

1. in a residual plot against predicted values, there is a

random pattern evident
2. in a normal probability plot of residuals, the residual fall
approximately along a straight line

If the residuals are proven to approximate the normal

distribution, then the regression model is adequate in relation to
its error terms.

22 of 28
Coefficient of Determination (R2)

The quantity

SS R SS
R2 = = 1 E
SS T SS T

is called the coefficient of determination, which is often used to

judge the adequacy of a regression model. The range of values
of the coefficient of determination is:

0 R2 1

R2 is often referred to loosely as the amount of variability in

the data explained or accounted for by the regression model. For
the oxygen purity model, we have R = 0.877 ; this means that
2

the regression model accounts for 87.7% of the variability in the

data.

There are some misconceptions about R2 though:

1. it does not measure the magnitude of the slope of the

regression line
2. a large value of R2 does not imply a steep slope
3. R2 does not measure the appropriateness of the model,
since it can be artificially inflated
2
4. even if R is large, it is not an assurance that the
regression model will provide accurate predictions of
future observations

EXERCISE: 9. Refer to the data in Exercise # 2. Find the

residuals for the least squares model and determine the
proportion of total variability explained by the regression model.

23 of 28
VIII. Transformations to a Straight Line

There are situations wherein the true regression function is

nonlinear. However, there are times that we can express such
nonlinear functions as linear. Nonlinear regression functions that
can be transformed to linear regression functions are called
intrinsically linear.

Consider the exponential function:

Y = 0 e 1 x

This function is intrinsically linear since it can be transformed to

a straight line by a logarithmic transformation:

ln Y = ln 0 + 1 x + ln

The transformed linear regression model above is working with

the assumption that the transformed error terms ln are
NID(0, ).
2

Another intrinsically linear function is:

1
Y = 0 + 1 +
x

1
If we let z = , the model is linearized to:
x

Y = 0 + 1 z +

Sometimes several transformation can be employed jointly to

linearize a function. Consider the function:

24 of 28
1
Y=
exp( 0 + 1 x + )

If we let Y * = 1 Y , we have the linearized form:

ln Y * = 0 + 1 x +

IX. Correlation

The discussion in previous sections had us treating x as a

mathematical variable and Y as a random variable. However,
most regression analyses involve situation in which both X and
Y are random variables!

For example, we wish to develop a regression model relating the

shear strength of spot welds (Y) to the weld diameter (X). In this
example, weld diameter cannot be controlled making X a
random variable. Each pair of observations ( X i , Yi ) then is
treated as jointly distributed random variables.

The estimators of the intercept and slope when both X and Y are
random variables are identical to what was already discussed.
An additional estimator though can be computed from the case
when both X and Y are random variables and that is the
estimator for the correlation coefficient, . Recall that the
correlation coefficient, is defined as:

XY
=
XY

The estimator of is the sample correlation coefficient, R:

25 of 28
( )
n
Yi X i X
S XY SS R SS
R= i =1
= = = 1 E
SST SST
(X ) (Y Y ) S XX ST
n n
2 2
i X i
i =1 i =1

Note that the sample correlation coefficient, R is just the square

root of the coefficient of determination, R2.

It is often useful to test the hypotheses

H0 : = 0
H1 : 0

wherein the appropriate test statistic for such is

R n2
T0 =
1 R2

which has the t-distribution with n 2 degrees of freedom. The

null hypothesis is to be rejected if t 0 > t 2,n 2 .

Another test of hypotheses that we can do is

H 0 : = 0
H1 : 0

For a sample size greater than or equal to 25 (n 25) , the test

statistic to be used is

Z 0 = (arctanh R arctanh 0 ) (n 3)

We reject the null hypothesis if the value of the test statistic falls
within the critical region z0 > z 2 .

26 of 28
It is also possible to construct an approximate 100(1 )%
confidence interval for :

z 2 z 2
tanh arctanh r tanh arctanh r +
n3 n3

EXERCISES

10. The final test and exam averages for 20 randomly selected
students taking a course in engineering statistics and a course in
operations research follow. Assume that the final averages are
jointly normally distributed.

Statistics 86 75 69 75 90
OR 80 81 75 81 82

Statistics 94 83 86 71 65
OR 95 80 81 76 72

Statistics 84 71 62 90 83
OR 85 72 65 93 81

Statistics 75 71 76 84 97
OR 70 73 72 80 98

a. Find the regression line relating the statistics final

average to the OR final average.
b. Test for significance of regression using = 0.05 .
c. Estimate the correlation coefficient.
d. Test the hypothesis that = 0 , using = 0.05 .
e. Test the hypothesis that = 0.5 using = 0.05 .
f. Construct a 95% confidence interval for the correlation
coefficient.

27 of 28
11. A random sample of 50 observations was made on the
diameter of spot welds and the corresponding weld shear
strength.

a. Given that r = 0.62, test the hypothesis that = 0 ,

using = 0.01 .
b. Find a 99% confidence interval for .
c. Based on the confidence interval in letter (b), can you
conclude that = 0.5 at the 0.01 level of significance?

28 of 28

221-230 (6th Grade) Math NWEA MAP Application Quiz Preview
No ratings yet
221-230 (6th Grade) Math NWEA MAP Application Quiz Preview
5 pages
PA 202, Statistics in Public Administration FINAL EXAM
50% (2)
PA 202, Statistics in Public Administration FINAL EXAM
4 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Chapt 11 Simples Linear Regression and Correlation
No ratings yet
Chapt 11 Simples Linear Regression and Correlation
48 pages
Chapter 6-Simple Linear Regression and Correlation
No ratings yet
Chapter 6-Simple Linear Regression and Correlation
23 pages
ch12 0
No ratings yet
ch12 0
82 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
NASA Regression Lecture
No ratings yet
NASA Regression Lecture
268 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
60 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
Chapter 2 - 1907876925
No ratings yet
Chapter 2 - 1907876925
33 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Econometric estimation BETA
No ratings yet
Econometric estimation BETA
36 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
CHAPTER 1 - INTRODUCTION - Introduction To Linear Regression Analysis, 5th Edition
No ratings yet
CHAPTER 1 - INTRODUCTION - Introduction To Linear Regression Analysis, 5th Edition
11 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
LinearRegression_FoundationalMathofAI_S24
No ratings yet
LinearRegression_FoundationalMathofAI_S24
4 pages
Lecture 2. Simple Linear Regression
No ratings yet
Lecture 2. Simple Linear Regression
49 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Simple_linear_regression-Presentation -Review-analysis -covariance
No ratings yet
Simple_linear_regression-Presentation -Review-analysis -covariance
10 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Chapter2 Regression SimpleLinearRegressionAnalysis
No ratings yet
Chapter2 Regression SimpleLinearRegressionAnalysis
41 pages
Manual ML 1
No ratings yet
Manual ML 1
8 pages
CH 2
No ratings yet
CH 2
31 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Noakhali Science and Technology University
No ratings yet
Noakhali Science and Technology University
28 pages
Statistical Inference Part 2
No ratings yet
Statistical Inference Part 2
8 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
No ratings yet
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
10 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Regression 101
No ratings yet
Regression 101
18 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
125.785 Module 2.1
No ratings yet
125.785 Module 2.1
94 pages
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
No ratings yet
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
23 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
No ratings yet
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
23 pages
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
No ratings yet
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
17 pages
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
No ratings yet
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
27 pages
Module 4
No ratings yet
Module 4
15 pages
Module 6
No ratings yet
Module 6
8 pages
The Role of Self-Concept in The Challenges and Coping Mechanisms of Nursing Students
No ratings yet
The Role of Self-Concept in The Challenges and Coping Mechanisms of Nursing Students
20 pages
R Squared and Adjusted R Squared
100% (1)
R Squared and Adjusted R Squared
2 pages
Instant Access to Test Bank for Applied Statistics in Business and Economics, 4th Edition: David Doane ebook Full Chapters
100% (18)
Instant Access to Test Bank for Applied Statistics in Business and Economics, 4th Edition: David Doane ebook Full Chapters
59 pages
The Locomotion of The Cockroach Periplaneta Americana
No ratings yet
The Locomotion of The Cockroach Periplaneta Americana
10 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
Chapter 5 Thesis Recommendation
100% (2)
Chapter 5 Thesis Recommendation
6 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
(Title of Your Research)
No ratings yet
(Title of Your Research)
24 pages
Data Science Methodology
No ratings yet
Data Science Methodology
26 pages
M.Sc. Course in Applied Mathematics With Oceanology and Computer Programming Semester-IV Paper-MTM402 Unit-1
No ratings yet
M.Sc. Course in Applied Mathematics With Oceanology and Computer Programming Semester-IV Paper-MTM402 Unit-1
15 pages
Statistics and Probability Week 6. LAS 1
No ratings yet
Statistics and Probability Week 6. LAS 1
1 page
DRC Manual
No ratings yet
DRC Manual
51 pages
Branding Spec
No ratings yet
Branding Spec
4 pages
Statistic Process
No ratings yet
Statistic Process
39 pages
Sample Paper
No ratings yet
Sample Paper
54 pages
SRM Math - Fourth Quarter
No ratings yet
SRM Math - Fourth Quarter
48 pages
Full Download of Design and Analysis of Experiments 8th Edition Montgomery Solutions Manual in PDF DOCX Format
100% (15)
Full Download of Design and Analysis of Experiments 8th Edition Montgomery Solutions Manual in PDF DOCX Format
58 pages
Kolter PGM
No ratings yet
Kolter PGM
75 pages
Jawaban UTS
No ratings yet
Jawaban UTS
2 pages
Aaron's Resume, 2016
No ratings yet
Aaron's Resume, 2016
2 pages
Solution of MT4 PDF
No ratings yet
Solution of MT4 PDF
6 pages
FC 502 FARSy 1
No ratings yet
FC 502 FARSy 1
2 pages
837 1751 1 SM
No ratings yet
837 1751 1 SM
7 pages
Calculating Process Capability of Cleaning Processes Analysis of TOCData
No ratings yet
Calculating Process Capability of Cleaning Processes Analysis of TOCData
28 pages
1.-Book of Program 2018
100% (1)
1.-Book of Program 2018
308 pages
Chapter 4 - Anova (EC2206 B2) 22dec22 (Thursday) - 7-12
No ratings yet
Chapter 4 - Anova (EC2206 B2) 22dec22 (Thursday) - 7-12
6 pages
RESEARCH Definition
No ratings yet
RESEARCH Definition
13 pages
Fluids Engineering
No ratings yet
Fluids Engineering
29 pages