100% found this document useful (1 vote)

39 views

Lecture 4 Linear Regression

1. This document discusses simple linear regression, including the linear regression equation, least squares method, and examples. 2. Simple linear regression finds the linear relationship between a dependent (y) and independent (x) variable. The regression line minimizes the squared errors between observed and predicted y-values. 3. Examples demonstrate calculating the regression equation from sample data and using it to predict outcomes.

Uploaded by

iamboss086

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

39 views

Lecture 4 Linear Regression

Uploaded by

iamboss086

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Linear Regression

Dr. Kavita Pabreja

Associate Professor
Deptt. of Computer Applications-MSI
Supervised vs. Unsupervised machine learning
Supervised and Unsupervised learning are the two techniques of machine
learning.
Both the techniques are used in different scenarios and with different
datasets.
Supervised vs. Unsupervised machine learning
• Supervised learning is a machine learning method in which models
are trained using labeled data. In supervised learning, models need
to find the mapping function to map the input variable (X) with the
output variable (Y).
• Supervised learning needs supervision to train the model, which is
similar to as a student learns things in the presence of a teacher. It
can be used for two types of problems: Classification and Regression
• Unsupervised learning is another machine learning method in which
patterns inferred from the unlabeled input data. The goal of
unsupervised learning is to find the structure and patterns from the
input data.
• Unsupervised learning does not need any supervision. Instead, it
finds patterns from the data by its own. It can be used for two types
of problems: Clustering and Association.
Types of Supervised machine learning

• Classification: It predicts the class of the dataset based on the

independent input variable. Class is the categorical or discrete
values. like the image of an animal is a cat or dog?

• Regression: It predicts the continuous output variables based on

the independent input variable. like the prediction of house prices
based on different parameters like house age, distance from the
main road, location, area, etc.
Simple Linear Regression

• Simple Linear Regression Model

• Least Squares Method
• Coefficient of Determination
Empirical Models

• Many problems in engineering and science involve exploring the

relationships between two or more variables
• Regression analysis is a statistical technique that is very useful for these
types of problems where cause and effect are to be found.

• Example - to predict weight if we knew an individual's height.

6
Empirical Models Example

• As an illustration, consider the data in Hydrocarbon level (X) Purity (Y)

the table. 0.99 90.01
1.02 89.05
• In this table y is the purity of oxygen
1.15 91.43
produced in a chemical distillation
1.29 93.74
process, and x is the percentage of 1.46 96.73
hydrocarbons that are present in the 1.36 94.45
main condenser of the distillation unit. 0.87 87.59
1.23 91.77
1.55 99.42
1.4 93.65

7
Example 1

regplot() method is used to plot data and draw

a linear regression model fit.
See file “linear regression.ipynb”
Simple Linear Regression Model
• The equation that describes how y is related to x and an error term is
called the regression model.
• The simple linear regression model is:

y = 0 + 1x +
where:
 and  are called parameters of the actual model,
 is a random variable called the error term.
Simple Linear Regression Equation

The simple linear regression equation is:

E(y) = 0 + 1x

• Graph of the regression equation is a straight line.

•  is the y intercept of the regression line.
•  is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression Equation
Positive Linear Relationship

E(y)

Intercept Slope b1
b0 is positive
x
Simple Linear Regression Equation
Negative Linear Relationship
E(y)

Intercept
b0
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
Regression line
Intercept
b0 Slope b1is 0

x
Estimated Simple Linear Regression Equation

◼ The estimated simple linear regression equation

𝑦^ = 𝑏0 + 𝑏1 𝑥

• The graph is called the estimated regression line.

• b0 is the y intercept of the line.
• b1 is the slope of the line.
• y^ is the estimated value of y for a given x value.
Least Squares Method
• Least Squares Criterion

min (yi − yˆi ) 2

where:
yi = observed value of the dependent variable for the ith
observation
y^i = estimated value of the dependent variable for the ith
observation
• We have to minimize the squared error.

• By differentiating, we get two points through which the

best fit line must pass.
Least Squares Method

• Slope for the Estimated Regression Equation

b1 =  ( x − x )( y − y )
i i

 (x − x )
i
2
Least Squares Method
y-Intercept for the Estimated Regression Equation

b0 = y − b1x
Simple Linear Regression

Deviation from the estimated regression model

First example -Simple Linear Regression
Example: Auto Sales
• An Auto company periodically has a special week-long sale.
• As part of the advertising campaign runs one or more
television commercials during the weekend preceding the
sale.
• Data from a sample of 5 previous sales are shown on the next
slide.
Simple Linear Regression
Example: Auto Sales
Number of Number of
TV Ads Cars Sold
1 14
3 24
2 18
1 17
3 27
Estimated Regression Equation
1. Find
2. Slope for the Estimated Regression Equation

b1 =
 ( x i − x )( y i − y )
=
20
=
5  (x i − x )2 4
3. y-Intercept for the Estimated Regression Equation
b0 = y − b1 x = 2 0 − 5 ( 2 ) = 1 0

4. Estimated Regression Equation

yˆ = 1 0 + 5 x
Scatter Diagram and Trend Line
30

20
y = 5x + 10
Cars
Sold

10
5

0
0 1 2 3 4
TV Ads
second example-
the model is a statsmodel and is working only
on complete data i.e. considered as train data.
We have not done any split to get test data.
• As you can see, a column of ones is added to t. This column of ones corresponds to x_0 in the
simple linear regression equation: y_hat = b_0 * x_0 + b_1 * x_1. If you don't do
sm.add_constant then both statsmodels and sklearn algorithms assume that b_0=0 in y_hat =
b_0 * x_0 + b_1 * x_1., and it'll fit the model using b_0=0 instead of calculating what b_0 is
supposed to be based on your data.
• The OLS() function of the statsmodels.api module is used to perform OLS regression. It returns an
OLS object. Then fit() method is called on this object for fitting the regression line to the data.
• The summary() method is used to obtain a table which gives an extensive description about the
regression results
# Third Example –
for prediction purpose-Scikit-learn follows the machine
learning tradition where the main supported task is
chosing the "best" model for prediction.
Third Example
• The data in the file hardness.xls provide measurements on the hardness and
tensile strength for 35 specimens of die-cast aluminum.
• It is believed that hardness (measured in Rockwell E units) can be used to
predict tensile strength (measured in thousands of pounds per square inch).
a. Construct a scatter plot.
b. Assuming a linear relationship, use the least-squares method to find the
regression coefficients b 0 and b 1.
c. Interpret the meaning of the slope, b1, in this problem.
d. Predict the mean tensile strength for die-cast aluminum that has a
hardness of 30 Rockwell E units.
Tensile strength Hardness
53 29.31
70.2 34.86
84.3 36.82
55.3 30.12
78.5 34.02
63.5 30.82
71.4 35.4
53.4 31.26
82.5 32.18
67.3 33.42
69.5 37.69
73 34.88
55.7 24.66
85.8 34.76
95.4 38.02
51.1 25.68
74.4 25.81
54.1 26.46
77.8 28.67
52.4 24.64
69.1 25.77
53.5 23.69
64.3 28.65
82.7 32.38
55.7 23.21
70.5 34
87.5 34.47
50.7 29.25
72.3 28.71
59.5 29.83
71.3 29.25
52.7 27.99
76.5 31.85
63.7 27.65
69.2 31.7
To explain about Coefficient of determination on
white board
Coefficient of Determination

SST is total sum of squares

SSR is sum of squares due to regression
SSE is sum of squares due to error
Coefficient of Determination

◼ The coefficient of determination is:

r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
Coefficient of Determination
in second example
Coefficient of Determination
in second example
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 88% of the
variability in the number of cars sold can be explained by the
linear relationship between the number of TV ads and the
number of cars sold.
Sample Correlation Coefficient

r x y = ( s i g n o f b1 ) Coefficient of Determination
r x y = ( s i g n o f b1 ) r2

ŷ = b 0 + b 1 x

where:
b1 = the slope of the estimated regression
equation
Sample Correlation Coefficient for second example

r x y = (s ign o f b 1 ) r2

The sign of b1 in the equation yˆ = 1 0 + 5 x is “+”.

rxy = + .8772

rxy = +.9366
Source:
1. NPTEL/SWAYAM
2. https://www.geeksforgeeks.org/ml-linear-regression/
3. https://www.javatpoint.com/linear-regression-in-machine-
learning
Thanks

QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Homework 2
100% (1)
Homework 2
14 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Homework 2
100% (1)
Homework 2
12 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
7. Heteroscedasticity: y = β + β x + · · · + β x + u
100% (1)
7. Heteroscedasticity: y = β + β x + · · · + β x + u
21 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Poly
100% (1)
Poly
108 pages
Stat1012 Cheatsheet Double-Sided
100% (1)
Stat1012 Cheatsheet Double-Sided
2 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Community Medicine Trans - Epidemic Investigation 2
100% (1)
Community Medicine Trans - Epidemic Investigation 2
10 pages
Logistic Regression
100% (1)
Logistic Regression
17 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
Linear Regression (Check List)
100% (1)
Linear Regression (Check List)
2 pages
Correlation and Regression - The Simple Case
100% (2)
Correlation and Regression - The Simple Case
106 pages
Correlation & Regression
100% (1)
Correlation & Regression
53 pages
Regression Models Course Project
100% (1)
Regression Models Course Project
4 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Import As
100% (1)
Import As
27 pages
Human Life Span Prediction Using Machine Learning
100% (1)
Human Life Span Prediction Using Machine Learning
9 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Data Analytics Week 3
100% (1)
Data Analytics Week 3
42 pages
Regression
100% (1)
Regression
20 pages
Python Vs R in Data and Machine Learning PDF
100% (1)
Python Vs R in Data and Machine Learning PDF
6 pages
Risk Return Summery
100% (1)
Risk Return Summery
85 pages
Forecasting of Stock Prices Using Multi Layer Perceptron
100% (1)
Forecasting of Stock Prices Using Multi Layer Perceptron
6 pages
KPMG - Data Set
100% (1)
KPMG - Data Set
1,685 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Python For You and Me: Release 0.3.alpha1
100% (1)
Python For You and Me: Release 0.3.alpha1
143 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Tutor
100% (1)
Tutor
309 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
100% (1)
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
25 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Statistical Methods For Decision Making (SMDM) Project Report
100% (2)
Statistical Methods For Decision Making (SMDM) Project Report
22 pages
Glass Classification
100% (2)
Glass Classification
3 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
KPMG Data
50% (2)
KPMG Data
3,723 pages
Practical Problems in Statistic
100% (1)
Practical Problems in Statistic
8 pages
EFFIE 2002 Case Studies
100% (1)
EFFIE 2002 Case Studies
16 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
No ratings yet
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
9 pages
Naive Bayes
No ratings yet
Naive Bayes
32 pages
Multi-View_Orthogonal_Projection_Regression_with_A
No ratings yet
Multi-View_Orthogonal_Projection_Regression_with_A
21 pages
Pengaruh Persepsi Atas Task Commitment Dan Sikap Terhadap Pemahaman Konsep Matematika Siswa
No ratings yet
Pengaruh Persepsi Atas Task Commitment Dan Sikap Terhadap Pemahaman Konsep Matematika Siswa
5 pages
Lampiran SPSS
No ratings yet
Lampiran SPSS
18 pages
Logistic Regression-A Conceptual Framework: Presentation Dr. P.K.Viswanathan Professor (Analytics)
No ratings yet
Logistic Regression-A Conceptual Framework: Presentation Dr. P.K.Viswanathan Professor (Analytics)
12 pages
Econometrics Specification Data Issues
No ratings yet
Econometrics Specification Data Issues
22 pages
MSF 566 Topic 04 Modeling With Volatility
No ratings yet
MSF 566 Topic 04 Modeling With Volatility
74 pages
Chapter 6 in Class Questions Handout 2
No ratings yet
Chapter 6 in Class Questions Handout 2
2 pages
Estimating Multilevel Models Using SPSS, Stata, SAS and R
No ratings yet
Estimating Multilevel Models Using SPSS, Stata, SAS and R
35 pages
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
No ratings yet
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
30 pages
Data Fix
No ratings yet
Data Fix
19 pages
Bivariate Data
No ratings yet
Bivariate Data
8 pages
QB AMT305module 2
No ratings yet
QB AMT305module 2
4 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Multivariate Laplace Distribution
No ratings yet
Multivariate Laplace Distribution
3 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Package cSEM': March 29, 2020
No ratings yet
Package cSEM': March 29, 2020
92 pages
Cheat Sheet Quantitative Methods in Finance Nova Cheat Sheet Quantitative Methods in Finance Nova
0% (1)
Cheat Sheet Quantitative Methods in Finance Nova Cheat Sheet Quantitative Methods in Finance Nova
3 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
Uji Linearitas: ANOVA Table
No ratings yet
Uji Linearitas: ANOVA Table
5 pages
Demand Estimation and Forecasting
No ratings yet
Demand Estimation and Forecasting
14 pages
TP05 Econometrics p1
No ratings yet
TP05 Econometrics p1
22 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
RSH - Qam11 - ch04 GE Regression Models
No ratings yet
RSH - Qam11 - ch04 GE Regression Models
71 pages
Corr N Reg
No ratings yet
Corr N Reg
5 pages
Hasil Statistik Spss 29 April
No ratings yet
Hasil Statistik Spss 29 April
2 pages