Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

Chapter 6-Simple Linear Regression and Correlation

Uploaded by

Fatin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Chapter 6-Simple Linear Regression and Correlation

Uploaded by

Fatin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

MATH 2330

COMPUTATIONAL
METHOD AND
STATISTICS
CHAPTER 6- SIMPLE
LINEAR REGRESSION
AND CORRELATION

PREPARED BY :
DR MAZIATI AKMAL BT
MOHD HATTA
Simple Linear Regression and Correlation
• Scientist are frequently interested in studying the functional relationship between two variables
(y and x1) or more than two variables (y and x1, x2,…, xn). For example, in a chemical process, the
yield of the product is related to the process-operating temperature level, pressure,
concentration of reactants and some other factors/variable.
Simple Linear Regression and Correlation
• The most commonly used techniques for investigating the relationship between two
quantitative variables are correlation and linear regression. Correlation quantifies the strength
of the linear relationship between a pair of variables, whereas regression expresses the
relationship in the form of an equation. Let say, we are interested to investigate the
relationship between the yield of the product (y) and only with the process operating
temperature (x1), then the regression is called as simple regression model. The improved of the
simple regression model by adding one or more explanatory variables is known as multiple
regression model.
① Correlation
• The correlation coefficient computed from the sample data measures the strength and direction of a linear
relationship between two quantitative variables. The symbol for the sample correlation coefficient is r. The
symbol for the population correlation coefficient is ρ. The range of the correlation coefficient is from -1 to
+1.
1 , perfect positive correlation
𝑟=ቐ0 , no correlation (the values doen′ t seem to linked at all)
−1 , perfect negative correlation
① Correlation
The formula for correlation Pearson's correlation coefficient is

Guideline of Pearson’s correlation coefficient interpretation:


0 < |r| < 0 .3 weak correlation
0.3 < |r| < 0.7 moderate correlation
|r| > 0.7 strong correlation
① Correlation
Eg 1: The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day,
here are their figures for the last 4 days. Compute the value of the correlation coefficient for the data given
below.
① Correlation
Sol (Eg 1):
① Correlation
① Correlation
② Simple Linear Regression
• A simple regression includes only two variables:

i) one independent variable, x - use to explained the variation in the dependent variable (regressor

variable/ predictor variable)

ii) one dependent variable, y - is the one being explained (response variable).
② Simple Linear Regression
• As an illustration, consider the data in Table 11-1 below.

y: purity of oxygen produced in chemical distillation process.

x: percentage of hydrocarbons that are present in the main condenser of the distillation unit.
② Simple Linear Regression
• By looking at the scatter diagram, we can observe that there exists a strong linear relationship
between purity and hydrocarbon level. If a straight line is drawn through the points, the points will be
scattered closely around the line. Then, our simple linear regression model is written as
𝐸(𝑌ȁ𝑥) = 𝛽𝑜 + 𝛽1 𝑥
• where the y-intercept βo and the slope β1 are unknown regression coefficients. Now the data points
do not fall exactly on a straight line, so the above equation need to be modified to account for this. Let
the difference between the observed value of y and the straight line (𝛽𝑜 +𝛽1 𝑥) be an error ϵ. It is
convenient to think of ϵ as a statistical error; that is, it is a random variable that accounts for the failure
of the model to fit the data exactly. Thus, the complete regression model is written as 𝐸 𝑌ȁ𝑥 = 𝛽𝑜 +
𝛽1 𝑥 + 𝜖. The difference between the actual value of y and the predicted value of y for a given x value
is also known as residual.
② Simple Linear Regression
• The difference between the actual value of y and the predicted value of y for a given x value is also
known as residual.
• Assumptions:
i) mean zero and unknown variance σ2
ii) errors are uncorrelated
② Simple Linear Regression
• However, a large numbers of straight lines can be drawn through the scatter diagram. The question
now, which line will give the best fit to the data? Or what is the estimation of βo and β1 should result
in a line that is a “best fit” to the data?

• In regression analysis we try to find a line that best fits the points in the scatter diagram. Such a line
provides the best possible description of the relationship between the dependent and independent
variables. This can be done via the least squares regression.
Least square estimates
• The least squares estimates of the intercept and slope in the simple linear regression model are

𝑆𝑥𝑦
=
𝑆𝑥𝑥

𝒏 𝒏
𝟏 𝟏
ഥ=
𝐰𝐡𝐞𝐫𝐞 𝒚 ഥ=
෍ 𝒚𝒊 𝐚𝐧𝐝 𝒙 ෍ 𝒙𝒊 .
𝒏 𝒏
𝒊=𝟏 𝒊=𝟏
Estimating σ2
• Notice that, there is unknown parameter in our regression model, σ2 (the variance of the error term 𝜖).
The unbiased estimator of σ2 is given by
Eg 2: Consider

(a) calculate the least squares estimates of the slope and intercept. Estimates 𝜎̂2
(b) predict the oxygen purity 𝑦̂ when the hydrocarbon level is 𝑥̂ = 1.00%.
(c) suppose the hydrocarbon level is 0.99. Calculate the fitted value of 𝑦̂ and the corresponding residual.
(d) what change in mean purity is expected when the hydrocarbon level changes by 2%?
Sol (Eg 2):
By using minitab
By using minitab
Eg 3: As machines are used over long periods of time, the output product can get off target. Below is the average
value of how much off target a product is getting manufactured as a function of machine use.

a) Assuming that a simple linear regression model is appropriate,


i) fit the regression model relating the target (y) to the hours of machine use (x) [Ans: 𝒚̂ = 0.56050+0.019179]
ii) what is the estimate of 𝜎̂2? [Ans: 𝟎. 𝟎𝟎𝟎𝟔𝟏𝟒]
b) What is the estimate of expected hours when the average is 2 mm off target. [Ans: 75.056 hours]
c) Suppose the hours used is 39. Calculate the fitted value of 𝑦̂ and the corresponding residual.[Ans: −𝟎. 𝟎𝟎𝟖𝟒𝟖𝟏]
d) What change in mm off target is expected when the duration of machine used changes by 4hours?
[Ans: 𝟎. 𝟎𝟕𝟔𝟕𝐦𝐦]
By using minitab
Sol (Eg 3):
Coefficient of Determination
The quantity

2
𝑆𝑆𝐸 𝛽መ1 𝑆𝑥𝑦
𝑟 =1− =
𝑆𝑆𝑇 𝑆𝑆𝑇

is called the coefficient of determination and is often used to judge the adequacy of a regression model.
r2 is the square of the correlation coefficient between X and Y in which we often refer loosely to r2 as the
amount of variability in the data explained or accounted for by the regression model.

Eg 4: For the oxygen purity regression model in Eg 2, calculate the r2


Thank you and happy reading☺

You might also like