8.-Linear-Regression
8.-Linear-Regression
REGRESSION
ANALYSIS
The concept of regression analysis
deals with finding the best
relationship between Y and x,
quantifying the strength of that
relationship, and using methods that
allow for prediction of the response
given values of the regressor x.
Thus, we wish to estimate the value of a
variable Y (dependent variable) corresponding
to a given value of a variable X (independent
variable or regressors). This can be
accomplished by estimating the value of Y from
a least-squares curve that fits the sample data.
The resulting curve is called a regression curve
of Y on X, since Y is estimated from X.
For a simple linear regression, where there is
only 1 dependent variable and 1 independent
variable, we have
𝑌 = 𝑎 + 𝑏𝑥
Where
a is the y-intercept of the line
b is the slope of the regression line
For a quick review,
•The slope represents the amount the
dependent variable increases or decreases
with unit increase or decrease in the
independent variable.
•The intercept indicates the value of the
dependent variable when the independent
variable takes the value zero.
The preceding equation is also called the
least-squares regression equation. It creates
a least-squares regression line where it is
the best-fitting regression line for
summarizing the relationship between 2
variables measured at the interval and/or
ratio scale.
To find the y-intercept and the slope, we
have
𝑁 σ 𝑥𝑌−(σ 𝑥)(σ 𝑌)
𝑎 = 𝑌ത − 𝑏𝑥ҧ and 𝑏=
𝑁 σ(𝑥 2 )− σ 𝑥 2
Where b is actually the Pearson’s product
moment correlation coefficient between Y
and X.
Then, we can now create an equation
that predict the value of Y.
𝑌 = 𝑎 + 𝑏𝑥
In linear regression, the less the spread
of observation around the best-fitted
regression line, the more accurate will be our
prediction of values of Y from the values of x.
Example:
The amount spent on medical expenses
(Medcost) per year is correlated with
alcohol for 15 adult males. For alcohol
consumption, the recorded data were the
money spent on alcohol per week in Table
1. (a) Find the linear regression equation of
the medcost. (b) Estimate the value of the
medcost if the money spent on alcohol per
week is 35. (Standard deviations: Medcost
= 544.40; Alcohol = 9.54)
Solution:
𝑁 σ 𝑥 𝑌 − (σ 𝑥)(σ 𝑌) (15)(778510) − (295)(36927)
𝑏= 2 2
= 2
= 41.057
𝑁 σ(𝑥 ) − σ 𝑥 (15)(7075) − 295
σ𝑦 σ 𝑥 36927 295
𝑎 = 𝑌ത − 𝑏𝑥ҧ = −𝑏 = − 41.057 = 1654.346
𝑛 𝑛 15 15
a. So, the simple linear regression equation for the medcost is
𝑌 = 𝑎 + 𝑏𝑥 = 1654.346 + 41.057 𝑥
b. If the money spent on alcohol per week is 35, we can predict that the
medical cost would be