Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Linear_regression_final

The document is an introduction to machine learning, covering its definition, history, and major classifications such as supervised, unsupervised, and reinforcement learning. It discusses the evolution of data science, the distinction between analysis and analytics, and outlines the steps involved in machine learning. Additionally, it highlights examples of machine learning applications and the roles of electrical engineers in the field.

Uploaded by

jacky pundu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Linear_regression_final

The document is an introduction to machine learning, covering its definition, history, and major classifications such as supervised, unsupervised, and reinforcement learning. It discusses the evolution of data science, the distinction between analysis and analytics, and outlines the steps involved in machine learning. Additionally, it highlights examples of machine learning applications and the roles of electrical engineers in the field.

Uploaded by

jacky pundu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 160

Introduction to Machine learning

(3-0-0-3)

Dr. D. Harimurugan
Department of Electrical Engineering
Dr B R Ambedkar National Institute of Technology
Jalandhar

February 3, 2025
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Machine Learning

Machine learning is the science of getting computeres to learn


without being explicitly programmed.
Algorithms for inferring unknowns from knowns
Algorithms that improve on some tasks with experience
Some examples are:

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Machine Learning

Machine learning is the science of getting computeres to learn


without being explicitly programmed.
Algorithms for inferring unknowns from knowns
Algorithms that improve on some tasks with experience
Some examples are:
Email spams
Instagram photo suggestions
Netflix recommendations
Climate modelling
Handwritting recognition

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

AI vs Machine Learning vs Deep learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI/ML

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1946: ENIAC (Electronic numeric integerator and


computer)
Used for artillary firing tables
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1952: Samule’s checker player


Shannon’s minimax algorithm
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1957 Preceptron - Frank Rosenblatt ( Cornell Aeronautical


Laboratory)
Multilayer perceptron - Artifical neural network - Deep
learning
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1969: Minskey and Papert “Killed” AI


Funding for AI research collapsed for decades

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1969: Minskey and Papert “Killed” AI


Funding for AI research collapsed for decades

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

AI-Winter-1 : 1974-1980
Failure of machine translation
Negative results in neural nets
Poor speech understanding
AI-Winter-2 : 1987-1993
Expert systems hardware cost
Failed to live up to their promises

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Rebirth as Machine learning

Mostly a name game to get funding


More practical smaller goals
Based on statistics and optimisation (not logic)
ML: Bottom Up, AI: Top down
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

TD-Gammon-1994

Gerry Tesauro(IBM) teachs a neural network to play


backgammon
Net plays 100K+ games against itself and it beats world
champion
Algorithm teaches itself how to paly so well!
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Deep Blue-1997

IBM Deep blue ends human supremacy in chess


Won against grandmaster Gary Kasparov
Deep Blue lost the overall match 4-2.
Its 100,000,000 moves a second still weren’t enough to
beat the human ability to strategise.
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Rover Movement -2000

NASA extensively used in space missions

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Self driving cars- DARPA Grand challenge - 2005

Stanford team won the challenge

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

AI in recent times

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians


responsible for gathering and cleaning data sets
applying statistical methods

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians


responsible for gathering and cleaning data sets
applying statistical methods
After few years (20 years ago), with growth of data and
radical improvement of technology, this statistican become
data mining specialist
responsible for extracting patterns from data

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians


responsible for gathering and cleaning data sets
applying statistical methods
After few years (20 years ago), with growth of data and
radical improvement of technology, this statistican become
data mining specialist
responsible for extracting patterns from data
After few years (10 years ago), with new mathematical
models, his position involves predictive analysis
responsible for more accurate forecast

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians


responsible for gathering and cleaning data sets
applying statistical methods
After few years (20 years ago), with growth of data and
radical improvement of technology, this statistican become
data mining specialist
responsible for extracting patterns from data
After few years (10 years ago), with new mathematical
models, his position involves predictive analysis
responsible for more accurate forecast
Statisticians job remain same, he is responsible for making
some sense out of data. Before 25 years, if he is an
statistican, with learning of new techniques, he is called
Data scientist now!!!
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!


Analysis: In analysis, we study the data in parts and
examine to understand why it has happend or how it has
happened!!

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!


Analysis: In analysis, we study the data in parts and
examine to understand why it has happend or how it has
happened!!
It is mostly concerned about the past values!!
Analytics: It takes about future values! Instead of talking
about past, it explores future values!
we are looking for patterns and exploring what we can do
with it in future.

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!


Analysis: In analysis, we study the data in parts and
examine to understand why it has happend or how it has
happened!!
It is mostly concerned about the past values!!
Analytics: It takes about future values! Instead of talking
about past, it explores future values!
we are looking for patterns and exploring what we can do
with it in future.
Types:
Quantiative analysis: Intituion + analysis
Qualitative analysis: Algorithm + formulas

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!


Analysis: In analysis, we study the data in parts and
examine to understand why it has happend or how it has
happened!!
It is mostly concerned about the past values!!
Analytics: It takes about future values! Instead of talking
about past, it explores future values!
we are looking for patterns and exploring what we can do
with it in future.
Types:
Quantiative analysis: Intituion + analysis
Qualitative analysis: Algorithm + formulas
Analytics and analysis are different and so the data
analysis and data analytics!!
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

ML for Electrical Engineer??


Very few people can design new ML algorithms.
It is hard to design!
But many people use them!

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

STEPS INVOLVED IN MACHINE LEARNING

Collection of data
Preprocessing of data and label the samples
Selection of model/function/procedure
Determination of parameters
Evaluation of the model
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

STEPS INVOLVED IN MACHINE LEARNING

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

TYPES OF MACHINE LEARNING

Supervised algorithm: Given the labeled training


examples, find the correct prediction for unlabeled
examples. Eg: Spam or ham
Unsupervised algorithm: Given data try to discover
similar patterns, structure and sub-spaces. Eg:
automatically cluster news articles by topic
Reinforcement learning: Try to learn through feedback.
Eg:Drones learn to fly, play chess

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Supervised learning: Linear regression

Labeled output in dataset


Uses linear relationship between independent variable
(features) and dependent variable
Prediction of continuous values
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Supervised learning: Logistic regression

Labeled output in dataset


Predicting class or category
Discrete value classification
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Unsupervised learning

No labeled output in dataset


Clustering: K-means, Hierarchal clustering, DBSCAN
DImensionality reduction: PCA and SSA
Density estimation : Kernel density estimation
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Major Classifications

Major classifications of ML
Supervised learning (needs labeled output)
Linear regression (uses linear relationship for continuous
variable prediction)
Predicting CGPA of last sem with known previous sem CG
(0-10, continuous value)
Predicitng the score of a T20 cricket match (1-250)
Logistic regression (predict class or category, Discrete
variable)
Email spam or ham
Cat or dog
Unsupervised learning (no labeled output)
Clustering
Market segmentation (offer discount to a particular segment)
Social network analysis (To target voters)
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Other variations

Semi supervised learning


Active learning
Decision trees
Density estimation : Kernel density estimation
Reinforcement learning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Nomenclature
Consider a dataset shown below
CGPA upto 4th sem Prediction of 5th sem CG
7.23 7.15
8.10 8.3
7.8 7.7
. .
. .
m ⇒ Number of training examples
x ⇒ Input features
y ⇒ Output variable
(x,y) ⇒ one training example
(x (i) , y (i) ) ⇒ i th trianing example
x (1) = 7.23, y (1) = 7.15, x (2) = 8.10, y (2) = 8.3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Examples of output

Binary classification ⇒ y={0,1}


Multiclass classification ⇒ y={0,1,..., K} (K classes)
Regression ⇒ y=R, R ⇒ real number
Dataset D=(x (1) , y (1) ),(x (2) , y (2) ),.......,(x (m) , y (m) )
It is important to train and test the model with the same
distribution.
Every ’x’ has same dimensionality

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

Hypothesis ’h’ is given by

y = θ0 + θ1 X

Value of parameters(θ0 , θ1 ) is obtained by cost function


ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

Hypothesis ’h’ is given by

y = θ0 + θ1 X

Value of parameters(θ0 , θ1 ) is obtained by cost function


ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

The cost function/loss function is given by


m
1 X
J(θ0 , θ1 ) = (h(xi ) − yi )2
2m
i=1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

The cost function/loss function is given by


m
1 X
J(θ0 , θ1 ) = (h(xi ) − yi )2
2m
i=1

The function is called “Squared error loss function”.


square loss function will amplify the one large wrong
prediction.
Get effected by outliers
To reduce such errors we use Absolute loss function
which wont such amplify such errors to large extent.
Its a trade off one has to decide based on problem and
data
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

The Absolute cost function/loss function is given by


m
1 X
J(θ0 , θ1 ) = |h(xi ) − yi |
2m
i=1

Zero-one loss function measures total error in the model


m
1 X
J(θ0 , θ1 ) = δ[h(xi ) − yi ]
2m
i=1

if h(xi ) ̸= yi ⇒ delta function(δ) = 1


if h(xi ) = yi ⇒ delta function(δ) = 0

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)


θ1 =0 ⇒
θ1 =0.5 ⇒
θ1 =1 ⇒
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)


θ1 =0 ⇒ h(θ1 )=0.x → J=?
θ1 =0.5 ⇒
θ1 =1 ⇒
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)


1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
θ1 =0.5 ⇒
θ1 =1 ⇒
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)


1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
1
θ1 =0.5 ⇒ h(θ1 )=0.5x → J= 2∗3 [0.52 + 12 + 1.52 ]=0.58
θ1 =1
θ1 =-0.5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)


1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
1
θ1 =0.5 ⇒ h(θ1 )=0.5x → J= 2∗3 [0.52 + 12 + 1.52 ]=0.58
θ1 =1 ⇒ h(θ1 )=x → J=0
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)


1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
1
θ1 =0.5 ⇒ h(θ1 )=0.5x → J= 2∗3 [0.52 + 12 + 1.52 ]=0.58
θ1 =1 ⇒ h(θ1 )=x → J=0
θ1 =-0.5 ⇒ h(θ1 )=-0.5x → J=5.25
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

θ0 = 800 θ1 = −1.5

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

θ0 = 360 θ1 = 0
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

For obtaining the optimal value of parameters, we use


“Gradient Descent algorithm”
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient Descent Algorithm

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient Descent Algorithm

Step-1: Intialise parameters (θ0 = 0, θ1 = 0)


Step-2: Update parameters till minimum value of J is
obtained.

θj = θj − α J(θ0 , θ1 )
∂θj
α is the learning rate.
It is always positive and it depicts the step size
It is in the range of 0 to 1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient Descent Updation

For updating θ0 , θ1 : Incorrect updation:

∂ ∂
temp0 = θ0 − α J(θ0 , θ1 ) temp0 = θ0 − α J(θ0 , θ1 )
∂θ0 ∂θ0
∂ θ0 = temp0
temp1 = θ1 − α J(θ0 , θ1 )
∂θ1

θ0 = temp0 temp1 = θ1 − α J(θ0 , θ1 )
∂θ1
θ1 = temp1 θ1 = temp1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

Gradient Descent Algorithm


Repeat until convergence,


θj = θj − α J(θ0 , θ1 )
∂θj

Linear Regression model

hθ (x) = θ0 + θ1 x
m
1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
2m
i=1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

Differentiating J
 m 
∂ ∂ 1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
∂θj ∂θj 2m
i=1
 m 
∂ ∂ 1 X
J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) )2
∂θj ∂θj 2m
i=1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

Differentiating J
 m 
∂ ∂ 1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
∂θj ∂θj 2m
i=1
 m 
∂ ∂ 1 X
J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ 2 X
θ0 = 0 ⇒ j = 0 ⇒ J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) ).(1)
∂θ0 2m
i=1
m
∂ 1 X
θ0 = 0 ⇒ j = 0 ⇒ J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) ).(1)
∂θ0 m
i=1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression


m
∂ 2 X
θ1 ⇒ j = 1 ⇒ J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) ).(x (i) )
∂θ1 2m
i=1
m
∂ 1 X
θ1 ⇒ j = 1 ⇒ J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) ).(x (i) )
∂θ1 m
i=1
Gradient descent for squared error cost function:
Repeat until convergence
{
m
1 X
θ0 = θ0 − α (hθ (x (i) − y (i) ))
m
i=1
m
1 X
θ1 = θ1 − α (hθ (x (i) − y (i) ).(x (i) )
m
i=1
}
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent working

Effect of α:

As we move towards minimum, step size automatically reduces.

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent working

Effect of α:

α is small, it will take time to converge with large


computational time
α is large, it may diverge
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

Mean absolute error:


m
1 X
MAE = |hθ (x (i) ) − y (i) |
m
i=1

Mean squared error:


m
1 X
MSE = (hθ (x (i) ) − y (i) )2
m
i=1

Taking square results in change of unit which dont


precisely represents the error value.
To overcome this, we take square root of this vlaue which
is called as “Root Mean Square Error (RMSE)”
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

Root Mean square error:


v
u
u1 X m
RMSE = t (hθ (x (i) ) − y (i) )2
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=
RMSE(M2)=
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

Root Mean square error:


v
u
u1 X m
RMSE = t (hθ (x (i) ) − y (i) )2
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400
RMSE(M2)=400
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

To overcome such problems we go for Root Mean Squared


Log Error-RMSLE
v
u
u1 X m
2
RMSLE = t log(y (i) + 1) − log(hθ (x (i) ) + 1
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400 RMLSE(M1)=
RMSE(M2)=400 RMLSE(M2)=
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

To overcome such problems we go for Root Mean Squared


Log Error-RMSLE
v
u
u1 X m
2
RMSLE = t log(y (i) + 1) − log(hθ (x (i) ) + 1
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400 RMLSE(M1)=2.65
RMSE(M2)=400 RMLSE(M2)=0.01
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices: R2 Value

The problem with previous metrices is error value vary


between −∞ to +∞ which does not give exact
significance of how good a particular model
To overcome this problem, we need to compare the model
performance with some baseline model
m
1 X (i)
MSEmodel = (y − hθ (x (i) ))2
m
i=1
m
1 X (i)
MSEbaseline = (y − yavg )2
m
i=1
MSEmodel
α=
MSEbase line
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices: R2 Value

α is the relative square error


α = 1 ⇒ Developed model value is equal to baseline
model
α > 1 ⇒ Error in the developed model is higher
compared to the base line model
Lower the value of α, better is the model
Inorder to make it directly proportional, we use R 2
MSEactual
R2 = 1 −
MSEbase line

The problem : Value will increase with addition of features


It never decreases, irrespective of how new feature is
going to impact the model.
Not penalised for any random additional features
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Significance of R 2

R2 value varies between 0 and 1


Higher the R 2 , Better the model
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices: Adjusted R2 Value

m−1 
R ′2 = 1 − (1 − R 2 )

m − (n + 1)
m ⇒ number of samples
n ⇒ number of features
Value will decrease when the random (unwanted) feature
is added to the model
For large n, the whole fraction value increases. The
increase in R2 because of feature addition is compensated.

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression


CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
x1 x2 x3 x4 y

n ⇒ Number of features (n=4)


x (i) ⇒ i th training example
(i)
xj ⇒ Value of j th feature in i th training example
CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

Hypothesis (L.R)⇒ hθ (x) = θ0 + θ1 x


Multiple L.R ⇒ hθ (x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4
Sample solution: hθ (x) = 2.3 + 0.3x1 + 0.5x2 + 2x3 + 3x4

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression


CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

hθ (x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 x0 = 1
In general,
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 .....θn xn
In term of matrices h = X .θ
 
7.5 1 700 2
X= 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
7.2 0 600 2

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression


CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

In general,
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 .....θn xn
In term of matrices h = X .θ
 
7.5 1 700 2
X= 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
7.2 0 600 2
Adding
 x0 
1 7.5 1 700 2 Find the order of h matrix?
X= 1 8 1 800 2
 
1 7.2 0 600 2

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression


CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

 
1 7.5 1 700 2
X=1 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
1 7.2 0 600 2

Order of X ⇒ m x (n+1)
Order of θ ⇒ 1 x (n+1)
Order of y ⇒ m x 1
Order of h ⇒ m x (n+1) * (n+1) x 1 ⇒ m x 1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Gradient descent for Multiple linear regression


Repeat until convergence
{
m
1 X (i)
θj = θj − α (hθ (x (i) ) − y (i) ).(xj )
m
i=1
Simultaneous update θj for j=0,1,2...n}
m
1 X (i)
θ0 = θ0 − α (hθ (x (i) ) − y (i) ).(x0 )
m
i=1
m
1 X (i)
θ1 = θ1 − α (hθ (x (i) ) − y (i) ).(x1 )
m
i=1
m
1 X (i)
θ2 = θ2 − α (hθ (x (i) ) − y (i) ).(x2 )
m
i=1
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling

Most critical steps during the pre-processing of data for


creating better ML model
CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7
[0-10] [0-1] [0-1000] [0-6] ⇐ Range

If there is an vast difference in the range, then that


particular feature will play a significant role in the trained
model

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling

Feature scaling is needed to bring every feature in the


same range
Converge will be much faster with feature scaling
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Very steep curve Rapid movement towards


More oscillation to find the global minimum
global minimum Convergence much faster
Much longer time to reach
minimum

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling:Min-Max scaler

x − xmin
xnew =
xmax − xmin

Shrinks all data to [0,1] or [-1,1]


Responds well if standard deviation is small and
distribution is not gaussian
Scaler is sensitive to outliners
Before scaling:

Weight Price
15 1
12 2
18 3
10 5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling:Min-Max scaler

x − xmin
xnew =
xmax − xmin

Shrinks all data to [0,1] or [-1,1]


Responds well if standard deviation is small and
distribution is not gaussian
Scaler is sensitive to outliners
Before scaling: After scaling:

Weight Price Weight Price


15 1 0.625 0
12 2 1 0.25
18 3 0.25 0.5
10 5 0 1
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling: Standard scaler

x −µ
xnew =
σ

It assumes data is normally distributed


It scales feature such that distribution in centered around
zero with SD=1
Not the best scaler if the data is not normally distributed.
Before scaling: After scaling:

Weight Price Weight Price


15 1
12 2
18 3
10 5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Standard Scaler

Min-Max Scaler: Standard Scaler:

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Robust scaler

Outliers which have low probability of occurrence are


overpresented in standard scaling
Calculated mean and standard deviation is skewed by the
presence of outliers
Robust scaler uses median and Interquantile Range (IQR)
to scale the data.

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Robust scaler

Outlier doesnt affect the median


Median doesnt depend on the every value in the list
Last value could have been 1000 or 10000, it wouldnt
change the median at all
x − xmed
Xnew = i
x75 − x25

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Robust scaler

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Max-Abs scaler

x
Xnew =
max(x)

Each feature is scaled by its maximum value


Maximum value is 1
Affected by outliers

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Other scaling techniques

Unit vector scaling:


x
Xnew =
||x||
m m
X X 1
L1 norm ⇒ ||x|| = |xi | L2 norm ⇒ ||x|| = ( |xi |2 ) 2
i=1 i=1

Quantile Transformer Scaler


Power Transformer Scaler

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature selection

Strong correlation between two features lead to


multi-collinearity
Negatively correlated feature tend to cancel out each other
It’s worthwhile to include one variable in the model,
redundant to include both variables
To avoid multi-colinearity, relationship between the features
are checked using scatterplot, pairplot or correlation score.
Correlation indicates the degree that, on an average, two
variables change correspondingly

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Selection

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Selection

It is the process of selecting set of features which results in


enhanced model performance
It is the process of removing irrelevant and redundant
features from the data set
Filter Methods: Do not take into account the model being
employed
Wrapper Methods: Involves specific model in arriving at
the best features
Embedded Methods: Performs feature selection as part of
model training process

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Filter Methods

Filter methods measure the relevance of the features with


the target variable via statistical tests
Based on certain criteria between each feature and
response variable, it will filter out features that fall below a
threshold
Do not take into account the model being employed
Cheaper but may not able to select the right features for
the model

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Filter Methods: Techniques

Correlation coefficient (Pearson & Spearman)


Variance threshold
Dispersion ratio
Information Gain
Fisher’s score
Chi-square test

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Filter Methods: Corelation coefficient

Strong correlation between two features lead to


multi-collinearity
Negatively correlated features tend to cancel out each
other
It’s worthwhile to include one variable in the model,
redundant to include both variables
To avoid multi-colinearity, the relationship between the
features is checked using a scatterplot, pair plot, or
correlation score.
Correlation indicates the degree that, on an average, two
variables change correspondingly

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Pearson correlation coefficient


P P P
n( xy) − x y
r=p P
[n x 2 − ( x)2 ][n y 2 − ( y)2 ]
P P P

r=1 ⇒ Strong positive relationship


r=-1 ⇒ Strong negative relationship
r=0 ⇒ No relationship
MATLAB Command: corr(x,y)

Heatmap is used to represent pictorially the correlation among


the features

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Correlation: Heat Map

Correlation value varies between -1 to 1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Correlation: Variable Inflation Factor


1
VIF =
1 − R2
Higher VIF indicates the variable is probably linear
combination of other two variables
Drop the variable with higher VIF value
Higher VIF indicates the variable is probably linear
combination of other two variables
VIF is the diagonal elements of the inverse of the
correlation matrix
MATLAB Command: R0=corrcoef(x); V=diag(inv(R0))’;

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Correlation: VIF

Remove x1
Build model with x2 , x3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Wrapper methods

Unlike filter methods that evaluate the individual features


based on their relationship with the target variable or other
statistical criteria, wrapper methods use a predictive model
to evaluate subsets of features.
Model’s performance is used as a criterion to select the
most relevant features
Wrapper methods are computationally more expensive but
potentially more effective than filter methods.

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Variable selection methods

Forward selection method:


Build model with all variables
Remove variables with High VIF value
Remove variables with high p-value (checking adjusted R 2 )
Continue the process till all the variables are significant
(p<0.05)
Backward selection method:
Start with a single variable
Add variables one by one
Check p-value and adjusted R 2 (add variable only when it
increases adjusted R 2 or else drop it)

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Variable selection methods

Stepwise selection method:


Build model with all variables
Drop the least significant variable
Reconsider previously dropped variables for
reinsertion.(Variables inserted should have less p-value
than the recently dropped one)

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Why removing features?

To make the model simpler


To interpret the model easily
People will be interested in knowing significant variables
which contribute to the output

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variable

To transform categorical data into numerical data


Assign unique integer to each category of data
One hot encoding is used which creates new attributes
according to the number of classes present in the
categorical attribute.
If there are ’n’ categories, ’n-1’ new attributes will be created.
These attributes are called dummy variables.

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variables

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial Regression

hθ (x) = θ0 + θ1 x + θ2 x 2 + θ3 x 3 + ... + θn x n

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial Regression

hθ (x) = θ0 + θ1 x + θ2 x 2 hθ (x) = θ0 + θ1 x + θ2 x 2 + θ3 x 3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial regression

“Underfit” “Overfit”

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Regularization

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit

Model wont be able to generalize new data


Less error in training dataset and more error in test dataset

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit

Reduce number of features


Regularization
Keeps all features, but reduce magnitude values of
parameters θj
Works well when we have lot of features, each of the
contribution is reduced a bit in predicting ’y’

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit: Regularization


Regularization is the process of deliberately simplifying models
to achieve a compramise betwen keeping the model simple and
yet not too naive
Penalize the parameters of higher order terms
Making simpler hypothesis and less prone to overfitting

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit: Regularization


Regularization is the process of deliberately simplifying models
to achieve a compramise betwen keeping the model simple and
yet not too naive
Penalize the parameters of higher order terms
Making simpler hypothesis and less prone to overfitting
Modified cost fuction
 m n 
1 X (i) (i) 2
X
2
J(θ) = (h(x ) − y ) + λ θj
2m
i=1 j=1

λ is the Regularization parameter


The above Regularization is called Ridge Regression or
L2 Regularization
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

Good choice of ’λ’ is very important

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit: Regularization


Lasso Regularization or L1 Regularization or L1 norm
 m n 
1 X (i) (i) 2
X
J(θ) = (h(x ) − y ) + λ |θj |
2m
i=1 j=1

L1 regularization penalizes the sum of absolute values of


the weights, whereas L2 regularization penalizes the sum
of squares of the weights.
L1 shrinks coefficients to zero where as L2 shrinks all
coefficients equally
L1 has inbuilt feature selection (some coefficients tend to
become zero)
L1 regularization is robust to outliers, L2 regularization is
not.
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Underfit

Decrease Regularization
Increase the duriation of training
Feature selection

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Regularized linear regression


 m n 
1 X (i) (i) 2
X
2
J(θ) = (h(x ) − y ) + λ θj
2m
i=1 j=1
Normal gradient descent
m
1 X (i)
θj = θj − α (hθ (x (i) ) − y (i) ).(xj )
m
i=1
Regularized gradient descent
m
1 X (i)
θ0 = θ0 − α (hθ (x (i) ) − y (i) ).(x0 )
m
i=1
 Xm 
1 (i) (i) (i) λ
θj = θ j − α (hθ (x ) − y ).(xj ) + θj
m m
i=1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Regularized linear regression

m
αX (i) αλ
θj = θj − (hθ (x (i) ) − y (i) ).(xj ) − θj
m m
i=1
m
αλ αX (i)
θj = (1 − )θj − (hθ (x (i) ) − y (i) ).(xj )
m m
i=1
αλ
(1 − m) < 1 ⇒ θj = 0.99
m
αX (i)
θj = 0.99θj − (hθ (x (i) ) − y (i) ).(xj )
m
i=1

Gradient descent update remains same for regularized


linear regression
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Tuning of Hyperparameters

Hyperparameters are passed on to the learning algorithm


to control the complexity of the model
Hyperparameters are the choices that the algorithm
designer makes to tune behaviour of the learning algorithm
The choice of hyperparameter has lot of bearing on the
final model produced by the learning algorithm

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Hyperparameter Tuning

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

Divide the dataset into train and test dataset


For each value of d, develop the best fit model and test it
on test dataset

d = 1 ⇒ hθ (x) = θ0 + θ1 x

d = 2 ⇒ hθ (x) = θ0 + θ1 x + θ2 x 2
.....

d = 10 ⇒ hθ (x) = θ0 + θ1 x + θ2 x 2 + ..... + θ10 x 10

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

d = 1 ⇒ θ(1) ⇒ Jtest (θ(1) )


d = 2 ⇒ θ(2) ⇒ Jtest (θ(2) )
.....

d = 10 ⇒ θ(10) ⇒ Jtest (θ(10) )

One can choose ’d’ such that it gives least Jtest value
This technique fails to generalize the model.
We are using test dataset to select the model parameters
which is not allowed

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

To overcome this problem, we are going to divide the given


data set into three categories
Training set (60%) ⇒ (x (1) , y (1) ), ...., (x (m) , y (m) )
(1) (1) (m) (m)
Cross validation set (20%) ⇒ (xcv , ycv ), ...., (xcv , ycv )
(1) (1) (m) (m)
Test set (20%) ⇒ (xtest , ytest ), ...., (xtest , ytest )
Before dividing the dataset into categories, randomize the
dataset
Instead of using test set to select the model, one can use
cross validation set to select the model
Use test set to find the generalization error

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

d = 1 ⇒ θ(1) ⇒ Jcv (θ(1) )


d = 2 ⇒ θ(2) ⇒ Jcv (θ(2) )
.....

d = 10 ⇒ θ(10) ⇒ Jcv (θ(10) )

One can choose ’d’ such that it gives least Jcv value
If d=4 gives the least value, then the hypothesis is

h(θ) = θ0 + θ1 x + θ2 x 2 + θ3 x 3 + θ4 x 4

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

Training error
m
1 X
Jtrain (θ) = (h(x (i) ) − y (i) )2
2m
i=1

Cross validation error


mcv
1 X (i) (i)
Jcv (θ) = (h(xcv ) − ycv )2
2mcv
i=1

Test error
mtest
1 X (i) (i)
Jtest (θ) = (h(xtest ) − ytest )2
2mtest
i=1

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of λ: Learning parameter


m
X n 
1 (i) (i) 2
X
2
J(θ) = (h(x ) − y ) + λ θj
2m
i=1 j=1

λ = 0.00 ⇒ θ(1) ⇒ Jcv (θ(1) )


λ = 0.01 ⇒ θ(2) ⇒ Jcv (θ(2) )
.....

λ = 10 ⇒ θ(10) ⇒ Jcv (θ(10) )


From the minimum value of cross validation error, select
the corresponding λ
After selecting, calculate the generalization error with test
set.
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of λ: Learning parameter

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Bias

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Bias

Bias quantifies how accurate the model is likely to behave


on the test data
Extremely simple models are likely to fail in predicting
complex real world phenomenon

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Bias
Bias quantifies how accurate the model is likely to behave
on the test data
Extremely simple models are likely to fail in predicting
complex real world phenomenon

If the learning algorithm is suffering from high bias, getting


more training data will not help much
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Variance

Variance refers to the degree of changes in model itself


with respect to changes in training data
If the learning algorithm is suffering from high variance,
getting more training data is likely to help

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Variance

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Model complexity Vs Error

In ideal case, we want to reduce both bias and variance


As the model complexity increases, bias reduces while
variance increases. Hence the trade off!

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Debugging the model

Get more training examples (Fix high variance)


Try smaller set of features (Fix high variance)
Try getting additional features (Fix high bias)
Try adding polynomial features (Fix high bias)
Try decreasing lamda (Fix high bias)
Try increasing lamda (Fix high variance)

ML Dr. D. Harimurugan, EE - NITJ


Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

END OF LINEAR REGRESSION


ML Dr. D. Harimurugan, EE - NITJ

You might also like