Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
39 views

Lecture 4 Linear Regression

1. This document discusses simple linear regression, including the linear regression equation, least squares method, and examples. 2. Simple linear regression finds the linear relationship between a dependent (y) and independent (x) variable. The regression line minimizes the squared errors between observed and predicted y-values. 3. Examples demonstrate calculating the regression equation from sample data and using it to predict outcomes.

Uploaded by

iamboss086
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
39 views

Lecture 4 Linear Regression

1. This document discusses simple linear regression, including the linear regression equation, least squares method, and examples. 2. Simple linear regression finds the linear relationship between a dependent (y) and independent (x) variable. The regression line minimizes the squared errors between observed and predicted y-values. 3. Examples demonstrate calculating the regression equation from sample data and using it to predict outcomes.

Uploaded by

iamboss086
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Linear Regression

Dr. Kavita Pabreja


Associate Professor
Deptt. of Computer Applications-MSI
Supervised vs. Unsupervised machine learning
Supervised and Unsupervised learning are the two techniques of machine
learning.
Both the techniques are used in different scenarios and with different
datasets.
Supervised vs. Unsupervised machine learning
• Supervised learning is a machine learning method in which models
are trained using labeled data. In supervised learning, models need
to find the mapping function to map the input variable (X) with the
output variable (Y).
• Supervised learning needs supervision to train the model, which is
similar to as a student learns things in the presence of a teacher. It
can be used for two types of problems: Classification and Regression
• Unsupervised learning is another machine learning method in which
patterns inferred from the unlabeled input data. The goal of
unsupervised learning is to find the structure and patterns from the
input data.
• Unsupervised learning does not need any supervision. Instead, it
finds patterns from the data by its own. It can be used for two types
of problems: Clustering and Association.
Types of Supervised machine learning

• Classification: It predicts the class of the dataset based on the


independent input variable. Class is the categorical or discrete
values. like the image of an animal is a cat or dog?

• Regression: It predicts the continuous output variables based on


the independent input variable. like the prediction of house prices
based on different parameters like house age, distance from the
main road, location, area, etc.
Simple Linear Regression

• Simple Linear Regression Model


• Least Squares Method
• Coefficient of Determination
Empirical Models

• Many problems in engineering and science involve exploring the


relationships between two or more variables
• Regression analysis is a statistical technique that is very useful for these
types of problems where cause and effect are to be found.

• Example - to predict weight if we knew an individual's height.

6
Empirical Models Example

• As an illustration, consider the data in Hydrocarbon level (X) Purity (Y)


the table. 0.99 90.01
1.02 89.05
• In this table y is the purity of oxygen
1.15 91.43
produced in a chemical distillation
1.29 93.74
process, and x is the percentage of 1.46 96.73
hydrocarbons that are present in the 1.36 94.45
main condenser of the distillation unit. 0.87 87.59
1.23 91.77
1.55 99.42
1.4 93.65

7
Example 1

regplot() method is used to plot data and draw


a linear regression model fit.
See file “linear regression.ipynb”
Simple Linear Regression Model
• The equation that describes how y is related to x and an error term is
called the regression model.
• The simple linear regression model is:

y = 0 + 1x +
where:
 and  are called parameters of the actual model,
 is a random variable called the error term.
Simple Linear Regression Equation

The simple linear regression equation is:


E(y) = 0 + 1x

• Graph of the regression equation is a straight line.


•  is the y intercept of the regression line.
•  is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression Equation
Positive Linear Relationship

E(y)

Intercept Slope b1
b0 is positive
x
Simple Linear Regression Equation
Negative Linear Relationship
E(y)

Intercept
b0
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
Regression line
Intercept
b0 Slope b1is 0

x
Estimated Simple Linear Regression Equation

◼ The estimated simple linear regression equation

𝑦^ = 𝑏0 + 𝑏1 𝑥

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• y^ is the estimated value of y for a given x value.
Least Squares Method
• Least Squares Criterion

min (yi − yˆi ) 2


where:
yi = observed value of the dependent variable for the ith
observation
y^i = estimated value of the dependent variable for the ith
observation
• We have to minimize the squared error.

• By differentiating, we get two points through which the


best fit line must pass.
Least Squares Method

• Slope for the Estimated Regression Equation

b1 =  ( x − x )( y − y )
i i

 (x − x )
i
2
Least Squares Method
y-Intercept for the Estimated Regression Equation

b0 = y − b1x
Simple Linear Regression

Deviation from the estimated regression model


First example -Simple Linear Regression
Example: Auto Sales
• An Auto company periodically has a special week-long sale.
• As part of the advertising campaign runs one or more
television commercials during the weekend preceding the
sale.
• Data from a sample of 5 previous sales are shown on the next
slide.
Simple Linear Regression
Example: Auto Sales
Number of Number of
TV Ads Cars Sold
1 14
3 24
2 18
1 17
3 27
Estimated Regression Equation
1. Find
2. Slope for the Estimated Regression Equation

b1 =
 ( x i − x )( y i − y )
=
20
=
5  (x i − x )2 4
3. y-Intercept for the Estimated Regression Equation
b0 = y − b1 x = 2 0 − 5 ( 2 ) = 1 0

4. Estimated Regression Equation


yˆ = 1 0 + 5 x
Scatter Diagram and Trend Line
30

25

20
y = 5x + 10
Cars
Sold

15

10
5

0
0 1 2 3 4
TV Ads
second example-
the model is a statsmodel and is working only
on complete data i.e. considered as train data.
We have not done any split to get test data.
• As you can see, a column of ones is added to t. This column of ones corresponds to x_0 in the
simple linear regression equation: y_hat = b_0 * x_0 + b_1 * x_1. If you don't do
sm.add_constant then both statsmodels and sklearn algorithms assume that b_0=0 in y_hat =
b_0 * x_0 + b_1 * x_1., and it'll fit the model using b_0=0 instead of calculating what b_0 is
supposed to be based on your data.
• The OLS() function of the statsmodels.api module is used to perform OLS regression. It returns an
OLS object. Then fit() method is called on this object for fitting the regression line to the data.
• The summary() method is used to obtain a table which gives an extensive description about the
regression results
# Third Example –
for prediction purpose-Scikit-learn follows the machine
learning tradition where the main supported task is
chosing the "best" model for prediction.
Third Example
• The data in the file hardness.xls provide measurements on the hardness and
tensile strength for 35 specimens of die-cast aluminum.
• It is believed that hardness (measured in Rockwell E units) can be used to
predict tensile strength (measured in thousands of pounds per square inch).
a. Construct a scatter plot.
b. Assuming a linear relationship, use the least-squares method to find the
regression coefficients b 0 and b 1.
c. Interpret the meaning of the slope, b1, in this problem.
d. Predict the mean tensile strength for die-cast aluminum that has a
hardness of 30 Rockwell E units.
Tensile strength Hardness
53 29.31
70.2 34.86
84.3 36.82
55.3 30.12
78.5 34.02
63.5 30.82
71.4 35.4
53.4 31.26
82.5 32.18
67.3 33.42
69.5 37.69
73 34.88
55.7 24.66
85.8 34.76
95.4 38.02
51.1 25.68
74.4 25.81
54.1 26.46
77.8 28.67
52.4 24.64
69.1 25.77
53.5 23.69
64.3 28.65
82.7 32.38
55.7 23.21
70.5 34
87.5 34.47
50.7 29.25
72.3 28.71
59.5 29.83
71.3 29.25
52.7 27.99
76.5 31.85
63.7 27.65
69.2 31.7
To explain about Coefficient of determination on
white board
Coefficient of Determination

SST is total sum of squares


SSR is sum of squares due to regression
SSE is sum of squares due to error
Coefficient of Determination

◼ The coefficient of determination is:


r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
Coefficient of Determination
in second example
Coefficient of Determination
in second example
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 88% of the
variability in the number of cars sold can be explained by the
linear relationship between the number of TV ads and the
number of cars sold.
Sample Correlation Coefficient

r x y = ( s i g n o f b1 ) Coefficient of Determination
r x y = ( s i g n o f b1 ) r2

ŷ = b 0 + b 1 x

where:
b1 = the slope of the estimated regression
equation
Sample Correlation Coefficient for second example

r x y = (s ign o f b 1 ) r2

The sign of b1 in the equation yˆ = 1 0 + 5 x is “+”.


rxy = + .8772

rxy = +.9366
Source:
1. NPTEL/SWAYAM
2. https://www.geeksforgeeks.org/ml-linear-regression/
3. https://www.javatpoint.com/linear-regression-in-machine-
learning
Thanks

You might also like