Lecture 4 Linear Regression
Lecture 4 Linear Regression
6
Empirical Models Example
7
Example 1
y = 0 + 1x +
where:
and are called parameters of the actual model,
is a random variable called the error term.
Simple Linear Regression Equation
E(y)
Intercept Slope b1
b0 is positive
x
Simple Linear Regression Equation
Negative Linear Relationship
E(y)
Intercept
b0
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
Regression line
Intercept
b0 Slope b1is 0
x
Estimated Simple Linear Regression Equation
𝑦^ = 𝑏0 + 𝑏1 𝑥
b1 = ( x − x )( y − y )
i i
(x − x )
i
2
Least Squares Method
y-Intercept for the Estimated Regression Equation
b0 = y − b1x
Simple Linear Regression
b1 =
( x i − x )( y i − y )
=
20
=
5 (x i − x )2 4
3. y-Intercept for the Estimated Regression Equation
b0 = y − b1 x = 2 0 − 5 ( 2 ) = 1 0
25
20
y = 5x + 10
Cars
Sold
15
10
5
0
0 1 2 3 4
TV Ads
second example-
the model is a statsmodel and is working only
on complete data i.e. considered as train data.
We have not done any split to get test data.
• As you can see, a column of ones is added to t. This column of ones corresponds to x_0 in the
simple linear regression equation: y_hat = b_0 * x_0 + b_1 * x_1. If you don't do
sm.add_constant then both statsmodels and sklearn algorithms assume that b_0=0 in y_hat =
b_0 * x_0 + b_1 * x_1., and it'll fit the model using b_0=0 instead of calculating what b_0 is
supposed to be based on your data.
• The OLS() function of the statsmodels.api module is used to perform OLS regression. It returns an
OLS object. Then fit() method is called on this object for fitting the regression line to the data.
• The summary() method is used to obtain a table which gives an extensive description about the
regression results
# Third Example –
for prediction purpose-Scikit-learn follows the machine
learning tradition where the main supported task is
chosing the "best" model for prediction.
Third Example
• The data in the file hardness.xls provide measurements on the hardness and
tensile strength for 35 specimens of die-cast aluminum.
• It is believed that hardness (measured in Rockwell E units) can be used to
predict tensile strength (measured in thousands of pounds per square inch).
a. Construct a scatter plot.
b. Assuming a linear relationship, use the least-squares method to find the
regression coefficients b 0 and b 1.
c. Interpret the meaning of the slope, b1, in this problem.
d. Predict the mean tensile strength for die-cast aluminum that has a
hardness of 30 Rockwell E units.
Tensile strength Hardness
53 29.31
70.2 34.86
84.3 36.82
55.3 30.12
78.5 34.02
63.5 30.82
71.4 35.4
53.4 31.26
82.5 32.18
67.3 33.42
69.5 37.69
73 34.88
55.7 24.66
85.8 34.76
95.4 38.02
51.1 25.68
74.4 25.81
54.1 26.46
77.8 28.67
52.4 24.64
69.1 25.77
53.5 23.69
64.3 28.65
82.7 32.38
55.7 23.21
70.5 34
87.5 34.47
50.7 29.25
72.3 28.71
59.5 29.83
71.3 29.25
52.7 27.99
76.5 31.85
63.7 27.65
69.2 31.7
To explain about Coefficient of determination on
white board
Coefficient of Determination
r x y = ( s i g n o f b1 ) Coefficient of Determination
r x y = ( s i g n o f b1 ) r2
ŷ = b 0 + b 1 x
where:
b1 = the slope of the estimated regression
equation
Sample Correlation Coefficient for second example
r x y = (s ign o f b 1 ) r2
rxy = +.9366
Source:
1. NPTEL/SWAYAM
2. https://www.geeksforgeeks.org/ml-linear-regression/
3. https://www.javatpoint.com/linear-regression-in-machine-
learning
Thanks