0% found this document useful (0 votes)

6 views

Linear_regression_final

The document is an introduction to machine learning, covering its definition, history, and major classifications such as supervised, unsupervised, and reinforcement learning. It discusses the evolution of data science, the distinction between analysis and analytics, and outlines the steps involved in machine learning. Additionally, it highlights examples of machine learning applications and the roles of electrical engineers in the field.

Uploaded by

jacky pundu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Linear_regression_final

Uploaded by

jacky pundu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 160

Introduction to Machine learning

(3-0-0-3)

Dr. D. Harimurugan
Department of Electrical Engineering
Dr B R Ambedkar National Institute of Technology
Jalandhar

February 3, 2025
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Traditional CS vs Machine Learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Machine Learning

Machine learning is the science of getting computeres to learn

without being explicitly programmed.
Algorithms for inferring unknowns from knowns
Algorithms that improve on some tasks with experience
Some examples are:

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Machine Learning

Machine learning is the science of getting computeres to learn

without being explicitly programmed.
Algorithms for inferring unknowns from knowns
Algorithms that improve on some tasks with experience
Some examples are:
Email spams
Instagram photo suggestions
Netflix recommendations
Climate modelling
Handwritting recognition

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

AI vs Machine Learning vs Deep learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI/ML

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1946: ENIAC (Electronic numeric integerator and

computer)
Used for artillary firing tables
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1952: Samule’s checker player

Shannon’s minimax algorithm
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1957 Preceptron - Frank Rosenblatt ( Cornell Aeronautical

Laboratory)
Multilayer perceptron - Artifical neural network - Deep
learning
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1969: Minskey and Papert “Killed” AI

Funding for AI research collapsed for decades

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

1969: Minskey and Papert “Killed” AI

Funding for AI research collapsed for decades

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

History of AI

AI-Winter-1 : 1974-1980
Failure of machine translation
Negative results in neural nets
Poor speech understanding
AI-Winter-2 : 1987-1993
Expert systems hardware cost
Failed to live up to their promises

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Rebirth as Machine learning

Mostly a name game to get funding

More practical smaller goals
Based on statistics and optimisation (not logic)
ML: Bottom Up, AI: Top down
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

TD-Gammon-1994

Gerry Tesauro(IBM) teachs a neural network to play

backgammon
Net plays 100K+ games against itself and it beats world
champion
Algorithm teaches itself how to paly so well!
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Deep Blue-1997

IBM Deep blue ends human supremacy in chess

Won against grandmaster Gary Kasparov
Deep Blue lost the overall match 4-2.
Its 100,000,000 moves a second still weren’t enough to
beat the human ability to strategise.
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Rover Movement -2000

NASA extensively used in space missions

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Self driving cars- DARPA Grand challenge - 2005

Stanford team won the challenge

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

AI in recent times

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians

responsible for gathering and cleaning data sets
applying statistical methods

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Evolution of Data science industry

Before 25 years, statisticians

responsible for gathering and cleaning data sets
applying statistical methods
After few years (20 years ago), with growth of data and
radical improvement of technology, this statistican become
data mining specialist
responsible for extracting patterns from data
After few years (10 years ago), with new mathematical
models, his position involves predictive analysis
responsible for more accurate forecast
Statisticians job remain same, he is responsible for making
some sense out of data. Before 25 years, if he is an
statistican, with learning of new techniques, he is called
Data scientist now!!!
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!

Analysis: In analysis, we study the data in parts and
examine to understand why it has happend or how it has
happened!!

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Analysis Vs Analytics

There is an distinction between two!!

Analysis: In analysis, we study the data in parts and
examine to understand why it has happend or how it has
happened!!
It is mostly concerned about the past values!!
Analytics: It takes about future values! Instead of talking
about past, it explores future values!
we are looking for patterns and exploring what we can do
with it in future.
Types:
Quantiative analysis: Intituion + analysis
Qualitative analysis: Algorithm + formulas
Analytics and analysis are different and so the data
analysis and data analytics!!
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

ML for Electrical Engineer??

Very few people can design new ML algorithms.
It is hard to design!
But many people use them!

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

STEPS INVOLVED IN MACHINE LEARNING

Collection of data
Preprocessing of data and label the samples
Selection of model/function/procedure
Determination of parameters
Evaluation of the model
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

STEPS INVOLVED IN MACHINE LEARNING

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

TYPES OF MACHINE LEARNING

Supervised algorithm: Given the labeled training

examples, find the correct prediction for unlabeled
examples. Eg: Spam or ham
Unsupervised algorithm: Given data try to discover
similar patterns, structure and sub-spaces. Eg:
automatically cluster news articles by topic
Reinforcement learning: Try to learn through feedback.
Eg:Drones learn to fly, play chess

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Supervised learning: Linear regression

Labeled output in dataset

Uses linear relationship between independent variable
(features) and dependent variable
Prediction of continuous values
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Supervised learning: Logistic regression

Labeled output in dataset

Predicting class or category
Discrete value classification
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Unsupervised learning

No labeled output in dataset

Clustering: K-means, Hierarchal clustering, DBSCAN
DImensionality reduction: PCA and SSA
Density estimation : Kernel density estimation
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Major Classifications

Major classifications of ML
Supervised learning (needs labeled output)
Linear regression (uses linear relationship for continuous
variable prediction)
Predicting CGPA of last sem with known previous sem CG
(0-10, continuous value)
Predicitng the score of a T20 cricket match (1-250)
Logistic regression (predict class or category, Discrete
variable)
Email spam or ham
Cat or dog
Unsupervised learning (no labeled output)
Clustering
Market segmentation (offer discount to a particular segment)
Social network analysis (To target voters)
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Other variations

Semi supervised learning

Active learning
Decision trees
Density estimation : Kernel density estimation
Reinforcement learning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Nomenclature
Consider a dataset shown below
CGPA upto 4th sem Prediction of 5th sem CG
7.23 7.15
8.10 8.3
7.8 7.7
. .
. .
m ⇒ Number of training examples
x ⇒ Input features
y ⇒ Output variable
(x,y) ⇒ one training example
(x (i) , y (i) ) ⇒ i th trianing example
x (1) = 7.23, y (1) = 7.15, x (2) = 8.10, y (2) = 8.3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies

Examples of output

Binary classification ⇒ y={0,1}

Multiclass classification ⇒ y={0,1,..., K} (K classes)
Regression ⇒ y=R, R ⇒ real number
Dataset D=(x (1) , y (1) ),(x (2) , y (2) ),.......,(x (m) , y (m) )
It is important to train and test the model with the same
distribution.
Every ’x’ has same dimensionality

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

Hypothesis ’h’ is given by

y = θ0 + θ1 X

Value of parameters(θ0 , θ1 ) is obtained by cost function

ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

Hypothesis ’h’ is given by

y = θ0 + θ1 X

Value of parameters(θ0 , θ1 ) is obtained by cost function

ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

The cost function/loss function is given by

m
1 X
J(θ0 , θ1 ) = (h(xi ) − yi )2
2m
i=1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

The cost function/loss function is given by

m
1 X
J(θ0 , θ1 ) = (h(xi ) − yi )2
2m
i=1

The function is called “Squared error loss function”.

square loss function will amplify the one large wrong
prediction.
Get effected by outliers
To reduce such errors we use Absolute loss function
which wont such amplify such errors to large extent.
Its a trade off one has to decide based on problem and
data
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function

The Absolute cost function/loss function is given by

m
1 X
J(θ0 , θ1 ) = |h(xi ) − yi |
2m
i=1

Zero-one loss function measures total error in the model

m
1 X
J(θ0 , θ1 ) = δ[h(xi ) − yi ]
2m
i=1

if h(xi ) ̸= yi ⇒ delta function(δ) = 1

if h(xi ) = yi ⇒ delta function(δ) = 0

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)

θ1 =0 ⇒
θ1 =0.5 ⇒
θ1 =1 ⇒
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)

θ1 =0 ⇒ h(θ1 )=0.x → J=?
θ1 =0.5 ⇒
θ1 =1 ⇒
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)

1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
θ1 =0.5 ⇒
θ1 =1 ⇒
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)

1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
1
θ1 =0.5 ⇒ h(θ1 )=0.5x → J= 2∗3 [0.52 + 12 + 1.52 ]=0.58
θ1 =1
θ1 =-0.5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)

1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
1
θ1 =0.5 ⇒ h(θ1 )=0.5x → J= 2∗3 [0.52 + 12 + 1.52 ]=0.58
θ1 =1 ⇒ h(θ1 )=x → J=0
θ1 =-0.5 ⇒
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function and Hypothesis

With θ0 = 0, let’s calculate loss for different h(θ)

1
θ1 =0 ⇒ h(θ1 )=0.x → J= 2∗3 [12 + 22 + 32 ]=2.3
1
θ1 =0.5 ⇒ h(θ1 )=0.5x → J= 2∗3 [0.52 + 12 + 1.52 ]=0.58
θ1 =1 ⇒ h(θ1 )=x → J=0
θ1 =-0.5 ⇒ h(θ1 )=-0.5x → J=5.25
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

θ0 = 800 θ1 = −1.5

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

θ0 = 360 θ1 = 0
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

For obtaining the optimal value of parameters, we use

“Gradient Descent algorithm”
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Cost function with two variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient Descent Algorithm

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient Descent Algorithm

Step-1: Intialise parameters (θ0 = 0, θ1 = 0)

Step-2: Update parameters till minimum value of J is
obtained.
∂
θj = θj − α J(θ0 , θ1 )
∂θj
α is the learning rate.
It is always positive and it depicts the step size
It is in the range of 0 to 1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient Descent Updation

For updating θ0 , θ1 : Incorrect updation:

∂ ∂
temp0 = θ0 − α J(θ0 , θ1 ) temp0 = θ0 − α J(θ0 , θ1 )
∂θ0 ∂θ0
∂ θ0 = temp0
temp1 = θ1 − α J(θ0 , θ1 )
∂θ1
∂
θ0 = temp0 temp1 = θ1 − α J(θ0 , θ1 )
∂θ1
θ1 = temp1 θ1 = temp1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

Gradient Descent Algorithm

Repeat until convergence,

∂
θj = θj − α J(θ0 , θ1 )
∂θj

Linear Regression model

hθ (x) = θ0 + θ1 x
m
1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
2m
i=1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

Differentiating J
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) )2
∂θj ∂θj 2m
i=1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

Differentiating J
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ 2 X
θ0 = 0 ⇒ j = 0 ⇒ J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) ).(1)
∂θ0 2m
i=1
m
∂ 1 X
θ0 = 0 ⇒ j = 0 ⇒ J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) ).(1)
∂θ0 m
i=1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent in Linear Regression

m
∂ 2 X
θ1 ⇒ j = 1 ⇒ J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) ).(x (i) )
∂θ1 2m
i=1
m
∂ 1 X
θ1 ⇒ j = 1 ⇒ J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) ).(x (i) )
∂θ1 m
i=1
Gradient descent for squared error cost function:
Repeat until convergence
{
m
1 X
θ0 = θ0 − α (hθ (x (i) − y (i) ))
m
i=1
m
1 X
θ1 = θ1 − α (hθ (x (i) − y (i) ).(x (i) )
m
i=1
}
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent working

Effect of α:

As we move towards minimum, step size automatically reduces.

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Gradient descent working

Effect of α:

α is small, it will take time to converge with large

computational time
α is large, it may diverge
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

Mean absolute error:

m
1 X
MAE = |hθ (x (i) ) − y (i) |
m
i=1

Mean squared error:

m
1 X
MSE = (hθ (x (i) ) − y (i) )2
m
i=1

Taking square results in change of unit which dont

precisely represents the error value.
To overcome this, we take square root of this vlaue which
is called as “Root Mean Square Error (RMSE)”
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

Root Mean square error:

v
u
u1 X m
RMSE = t (hθ (x (i) ) − y (i) )2
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=
RMSE(M2)=
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

Root Mean square error:

v
u
u1 X m
RMSE = t (hθ (x (i) ) − y (i) )2
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400
RMSE(M2)=400
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

To overcome such problems we go for Root Mean Squared

Log Error-RMSLE
v
u
u1 X m
2
RMSLE = t log(y (i) + 1) − log(hθ (x (i) ) + 1
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400 RMLSE(M1)=
RMSE(M2)=400 RMLSE(M2)=
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices

To overcome such problems we go for Root Mean Squared

Log Error-RMSLE
v
u
u1 X m
2
RMSLE = t log(y (i) + 1) − log(hθ (x (i) ) + 1
m
i=1

Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400 RMLSE(M1)=2.65
RMSE(M2)=400 RMLSE(M2)=0.01
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices: R2 Value

The problem with previous metrices is error value vary

between −∞ to +∞ which does not give exact
significance of how good a particular model
To overcome this problem, we need to compare the model
performance with some baseline model
m
1 X (i)
MSEmodel = (y − hθ (x (i) ))2
m
i=1
m
1 X (i)
MSEbaseline = (y − yavg )2
m
i=1
MSEmodel
α=
MSEbase line
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices: R2 Value

α is the relative square error

α = 1 ⇒ Developed model value is equal to baseline
model
α > 1 ⇒ Error in the developed model is higher
compared to the base line model
Lower the value of α, better is the model
Inorder to make it directly proportional, we use R 2
MSEactual
R2 = 1 −
MSEbase line

The problem : Value will increase with addition of features

It never decreases, irrespective of how new feature is
going to impact the model.
Not penalised for any random additional features
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Significance of R 2

R2 value varies between 0 and 1

Higher the R 2 , Better the model
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization

Evaluation Metrices: Adjusted R2 Value

m−1
R ′2 = 1 − (1 − R 2 )

m − (n + 1)
m ⇒ number of samples
n ⇒ number of features
Value will decrease when the random (unwanted) feature
is added to the model
For large n, the whole fraction value increases. The
increase in R2 because of feature addition is compensated.

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression

CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
x1 x2 x3 x4 y

n ⇒ Number of features (n=4)

x (i) ⇒ i th training example
(i)
xj ⇒ Value of j th feature in i th training example
CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

Hypothesis (L.R)⇒ hθ (x) = θ0 + θ1 x

Multiple L.R ⇒ hθ (x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4
Sample solution: hθ (x) = 2.3 + 0.3x1 + 0.5x2 + 2x3 + 3x4

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression

CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

hθ (x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 x0 = 1
In general,
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 .....θn xn
In term of matrices h = X .θ
 
7.5 1 700 2
X= 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
7.2 0 600 2

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression

CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

In general,
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 .....θn xn
In term of matrices h = X .θ
 
7.5 1 700 2
X= 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
7.2 0 600 2
Adding
 x0 
1 7.5 1 700 2 Find the order of h matrix?
X= 1 8 1 800 2
 
1 7.2 0 600 2

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Multiple linear regression

CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7

 
1 7.5 1 700 2
X=1 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
1 7.2 0 600 2

Order of X ⇒ m x (n+1)
Order of θ ⇒ 1 x (n+1)
Order of y ⇒ m x 1
Order of h ⇒ m x (n+1) * (n+1) x 1 ⇒ m x 1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Gradient descent for Multiple linear regression

Repeat until convergence
{
m
1 X (i)
θj = θj − α (hθ (x (i) ) − y (i) ).(xj )
m
i=1
Simultaneous update θj for j=0,1,2...n}
m
1 X (i)
θ0 = θ0 − α (hθ (x (i) ) − y (i) ).(x0 )
m
i=1
m
1 X (i)
θ1 = θ1 − α (hθ (x (i) ) − y (i) ).(x1 )
m
i=1
m
1 X (i)
θ2 = θ2 − α (hθ (x (i) ) − y (i) ).(x2 )
m
i=1
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling

Most critical steps during the pre-processing of data for

creating better ML model
CGPA upto 4th sem Minor degree No of classes attended No of new faculty CGPA in 5th sem
7.5 1 700 2 7.8
8 1 800 2 7.9
7.2 0 600 2 7
[0-10] [0-1] [0-1000] [0-6] ⇐ Range

If there is an vast difference in the range, then that

particular feature will play a significant role in the trained
model

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling

Feature scaling is needed to bring every feature in the

same range
Converge will be much faster with feature scaling
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Very steep curve Rapid movement towards

More oscillation to find the global minimum
global minimum Convergence much faster
Much longer time to reach
minimum

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling:Min-Max scaler

x − xmin
xnew =
xmax − xmin

Shrinks all data to [0,1] or [-1,1]

Responds well if standard deviation is small and
distribution is not gaussian
Scaler is sensitive to outliners
Before scaling:

Weight Price
15 1
12 2
18 3
10 5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling:Min-Max scaler

x − xmin
xnew =
xmax − xmin

Shrinks all data to [0,1] or [-1,1]

Responds well if standard deviation is small and
distribution is not gaussian
Scaler is sensitive to outliners
Before scaling: After scaling:

Weight Price Weight Price

15 1 0.625 0
12 2 1 0.25
18 3 0.25 0.5
10 5 0 1
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Scaling: Standard scaler

x −µ
xnew =
σ

It assumes data is normally distributed

It scales feature such that distribution in centered around
zero with SD=1
Not the best scaler if the data is not normally distributed.
Before scaling: After scaling:

Weight Price Weight Price

15 1
12 2
18 3
10 5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Standard Scaler

Min-Max Scaler: Standard Scaler:

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Robust scaler

Outliers which have low probability of occurrence are

overpresented in standard scaling
Calculated mean and standard deviation is skewed by the
presence of outliers
Robust scaler uses median and Interquantile Range (IQR)
to scale the data.

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Robust scaler

Outlier doesnt affect the median

Median doesnt depend on the every value in the list
Last value could have been 1000 or 10000, it wouldnt
change the median at all
x − xmed
Xnew = i
x75 − x25

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Robust scaler

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature scaling: Max-Abs scaler

x
Xnew =
max(x)

Each feature is scaled by its maximum value

Maximum value is 1
Affected by outliers

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Other scaling techniques

Unit vector scaling:

x
Xnew =
||x||
m m
X X 1
L1 norm ⇒ ||x|| = |xi | L2 norm ⇒ ||x|| = ( |xi |2 ) 2
i=1 i=1

Quantile Transformer Scaler

Power Transformer Scaler

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature selection

Strong correlation between two features lead to

multi-collinearity
Negatively correlated feature tend to cancel out each other
It’s worthwhile to include one variable in the model,
redundant to include both variables
To avoid multi-colinearity, relationship between the features
are checked using scatterplot, pairplot or correlation score.
Correlation indicates the degree that, on an average, two
variables change correspondingly

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Selection

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Feature Selection

It is the process of selecting set of features which results in

enhanced model performance
It is the process of removing irrelevant and redundant
features from the data set
Filter Methods: Do not take into account the model being
employed
Wrapper Methods: Involves specific model in arriving at
the best features
Embedded Methods: Performs feature selection as part of
model training process

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Filter Methods

Filter methods measure the relevance of the features with

the target variable via statistical tests
Based on certain criteria between each feature and
response variable, it will filter out features that fall below a
threshold
Do not take into account the model being employed
Cheaper but may not able to select the right features for
the model

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Filter Methods: Techniques

Correlation coefficient (Pearson & Spearman)

Variance threshold
Dispersion ratio
Information Gain
Fisher’s score
Chi-square test

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Filter Methods: Corelation coefficient

Strong correlation between two features lead to

multi-collinearity
Negatively correlated features tend to cancel out each
other
It’s worthwhile to include one variable in the model,
redundant to include both variables
To avoid multi-colinearity, the relationship between the
features is checked using a scatterplot, pair plot, or
correlation score.
Correlation indicates the degree that, on an average, two
variables change correspondingly

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Pearson correlation coefficient

P P P
n( xy) − x y
r=p P
[n x 2 − ( x)2 ][n y 2 − ( y)2 ]
P P P

r=1 ⇒ Strong positive relationship

r=-1 ⇒ Strong negative relationship
r=0 ⇒ No relationship
MATLAB Command: corr(x,y)

Heatmap is used to represent pictorially the correlation among

the features

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Correlation: Heat Map

Correlation value varies between -1 to 1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Correlation: Variable Inflation Factor

1
VIF =
1 − R2
Higher VIF indicates the variable is probably linear
combination of other two variables
Drop the variable with higher VIF value
Higher VIF indicates the variable is probably linear
combination of other two variables
VIF is the diagonal elements of the inverse of the
correlation matrix
MATLAB Command: R0=corrcoef(x); V=diag(inv(R0))’;

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Correlation: VIF

Remove x1
Build model with x2 , x3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Wrapper methods

Unlike filter methods that evaluate the individual features

based on their relationship with the target variable or other
statistical criteria, wrapper methods use a predictive model
to evaluate subsets of features.
Model’s performance is used as a criterion to select the
most relevant features
Wrapper methods are computationally more expensive but
potentially more effective than filter methods.

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Variable selection methods

Forward selection method:

Build model with all variables
Remove variables with High VIF value
Remove variables with high p-value (checking adjusted R 2 )
Continue the process till all the variables are significant
(p<0.05)
Backward selection method:
Start with a single variable
Add variables one by one
Check p-value and adjusted R 2 (add variable only when it
increases adjusted R 2 or else drop it)

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Variable selection methods

Stepwise selection method:

Build model with all variables
Drop the least significant variable
Reconsider previously dropped variables for
reinsertion.(Variables inserted should have less p-value
than the recently dropped one)

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Why removing features?

To make the model simpler

To interpret the model easily
People will be interested in knowing significant variables
which contribute to the output

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variable

To transform categorical data into numerical data

Assign unique integer to each category of data
One hot encoding is used which creates new attributes
according to the number of classes present in the
categorical attribute.
If there are ’n’ categories, ’n-1’ new attributes will be created.
These attributes are called dummy variables.

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Dummy variables

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial Regression

hθ (x) = θ0 + θ1 x + θ2 x 2 + θ3 x 3 + ... + θn x n

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial Regression

hθ (x) = θ0 + θ1 x + θ2 x 2 hθ (x) = θ0 + θ1 x + θ2 x 2 + θ3 x 3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial regression

“Underfit” “Overfit”

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Regularization

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit

Model wont be able to generalize new data

Less error in training dataset and more error in test dataset

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit

Reduce number of features

Regularization
Keeps all features, but reduce magnitude values of
parameters θj
Works well when we have lot of features, each of the
contribution is reduced a bit in predicting ’y’

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit: Regularization

Regularization is the process of deliberately simplifying models
to achieve a compramise betwen keeping the model simple and
yet not too naive
Penalize the parameters of higher order terms
Making simpler hypothesis and less prone to overfitting

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit: Regularization

λ is the Regularization parameter

The above Regularization is called Ridge Regression or
L2 Regularization
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Polynomial regression

Good choice of ’λ’ is very important

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Overfit: Regularization

Lasso Regularization or L1 Regularization or L1 norm
m n
1 X (i) (i) 2
X
J(θ) = (h(x ) − y ) + λ |θj |
2m
i=1 j=1

L1 regularization penalizes the sum of absolute values of

the weights, whereas L2 regularization penalizes the sum
of squares of the weights.
L1 shrinks coefficients to zero where as L2 shrinks all
coefficients equally
L1 has inbuilt feature selection (some coefficients tend to
become zero)
L1 regularization is robust to outliers, L2 regularization is
not.
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Preventing Underfit

Decrease Regularization
Increase the duriation of training
Feature selection

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Regularized linear regression

m n
1 X (i) (i) 2
X
2
J(θ) = (h(x ) − y ) + λ θj
2m
i=1 j=1
Normal gradient descent
m
1 X (i)
θj = θj − α (hθ (x (i) ) − y (i) ).(xj )
m
i=1
Regularized gradient descent
m
1 X (i)
θ0 = θ0 − α (hθ (x (i) ) − y (i) ).(x0 )
m
i=1
Xm
1 (i) (i) (i) λ
θj = θ j − α (hθ (x ) − y ).(xj ) + θj
m m
i=1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Regularized linear regression

m
αX (i) αλ
θj = θj − (hθ (x (i) ) − y (i) ).(xj ) − θj
m m
i=1
m
αλ αX (i)
θj = (1 − )θj − (hθ (x (i) ) − y (i) ).(xj )
m m
i=1
αλ
(1 − m) < 1 ⇒ θj = 0.99
m
αX (i)
θj = 0.99θj − (hθ (x (i) ) − y (i) ).(xj )
m
i=1

Gradient descent update remains same for regularized

linear regression
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Tuning of Hyperparameters

Hyperparameters are passed on to the learning algorithm

to control the complexity of the model
Hyperparameters are the choices that the algorithm
designer makes to tune behaviour of the learning algorithm
The choice of hyperparameter has lot of bearing on the
final model produced by the learning algorithm

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Hyperparameter Tuning

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

Divide the dataset into train and test dataset

For each value of d, develop the best fit model and test it
on test dataset

d = 1 ⇒ hθ (x) = θ0 + θ1 x

d = 2 ⇒ hθ (x) = θ0 + θ1 x + θ2 x 2
.....

d = 10 ⇒ hθ (x) = θ0 + θ1 x + θ2 x 2 + ..... + θ10 x 10

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

d = 1 ⇒ θ(1) ⇒ Jtest (θ(1) )

d = 2 ⇒ θ(2) ⇒ Jtest (θ(2) )
.....

d = 10 ⇒ θ(10) ⇒ Jtest (θ(10) )

One can choose ’d’ such that it gives least Jtest value
This technique fails to generalize the model.
We are using test dataset to select the model parameters
which is not allowed

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

To overcome this problem, we are going to divide the given

data set into three categories
Training set (60%) ⇒ (x (1) , y (1) ), ...., (x (m) , y (m) )
(1) (1) (m) (m)
Cross validation set (20%) ⇒ (xcv , ycv ), ...., (xcv , ycv )
(1) (1) (m) (m)
Test set (20%) ⇒ (xtest , ytest ), ...., (xtest , ytest )
Before dividing the dataset into categories, randomize the
dataset
Instead of using test set to select the model, one can use
cross validation set to select the model
Use test set to find the generalization error

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

d = 1 ⇒ θ(1) ⇒ Jcv (θ(1) )

d = 2 ⇒ θ(2) ⇒ Jcv (θ(2) )
.....

d = 10 ⇒ θ(10) ⇒ Jcv (θ(10) )

One can choose ’d’ such that it gives least Jcv value
If d=4 gives the least value, then the hypothesis is

h(θ) = θ0 + θ1 x + θ2 x 2 + θ3 x 3 + θ4 x 4

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

Training error
m
1 X
Jtrain (θ) = (h(x (i) ) − y (i) )2
2m
i=1

Cross validation error

mcv
1 X (i) (i)
Jcv (θ) = (h(xcv ) − ycv )2
2mcv
i=1

Test error
mtest
1 X (i) (i)
Jtest (θ) = (h(xtest ) − ytest )2
2mtest
i=1

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of d: Degree of polynomial

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of λ: Learning parameter

m
X n
1 (i) (i) 2
X
2
J(θ) = (h(x ) − y ) + λ θj
2m
i=1 j=1

λ = 0.00 ⇒ θ(1) ⇒ Jcv (θ(1) )

λ = 0.01 ⇒ θ(2) ⇒ Jcv (θ(2) )
.....

λ = 10 ⇒ θ(10) ⇒ Jcv (θ(10) )

From the minimum value of cross validation error, select
the corresponding λ
After selecting, calculate the generalization error with test
set.
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Selection of λ: Learning parameter

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Bias

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Bias

Bias quantifies how accurate the model is likely to behave

on the test data
Extremely simple models are likely to fail in predicting
complex real world phenomenon

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Bias
Bias quantifies how accurate the model is likely to behave
on the test data
Extremely simple models are likely to fail in predicting
complex real world phenomenon

If the learning algorithm is suffering from high bias, getting

more training data will not help much
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Variance

Variance refers to the degree of changes in model itself

with respect to changes in training data
If the learning algorithm is suffering from high variance,
getting more training data is likely to help

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

High Variance

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Model complexity Vs Error

In ideal case, we want to reduce both bias and variance

As the model complexity increases, bias reduces while
variance increases. Hence the trade off!

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

Debugging the model

Get more training examples (Fix high variance)

Try smaller set of features (Fix high variance)
Try getting additional features (Fix high bias)
Try adding polynomial features (Fix high bias)
Try decreasing lamda (Fix high bias)
Try increasing lamda (Fix high variance)

ML Dr. D. Harimurugan, EE - NITJ

Machine Learning Overfitting
Linear regression Underfitting
Multiple linear regression Regularized linear regression
Regularization Hyperparameter tuning

END OF LINEAR REGRESSION

ML Dr. D. Harimurugan, EE - NITJ

Immediate Download (Ebook PDF) Human Resource Management: People, Data, and Analytics 1st Edition Ebooks 2024
100% (4)
Immediate Download (Ebook PDF) Human Resource Management: People, Data, and Analytics 1st Edition Ebooks 2024
49 pages
Chapter-0 (1)
No ratings yet
Chapter-0 (1)
173 pages
CHAP Introduction 1.2 Environmental Data Science 18p
No ratings yet
CHAP Introduction 1.2 Environmental Data Science 18p
18 pages
01 Intro To ML Wo Videos
No ratings yet
01 Intro To ML Wo Videos
46 pages
Slides Basics Whatisml
No ratings yet
Slides Basics Whatisml
10 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
01 - Introduction To ML
No ratings yet
01 - Introduction To ML
84 pages
MAI Lecture 01 Introduction
No ratings yet
MAI Lecture 01 Introduction
52 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
33 pages
FML CSOE-007 FML B Tech 6th Sem OE Till 9th Feb 2024
No ratings yet
FML CSOE-007 FML B Tech 6th Sem OE Till 9th Feb 2024
134 pages
Unit-1 Introduction To Machine Learning
No ratings yet
Unit-1 Introduction To Machine Learning
17 pages
Intro ML Lecture 1
No ratings yet
Intro ML Lecture 1
9 pages
1 - ML Introduction1
No ratings yet
1 - ML Introduction1
23 pages
14021306
No ratings yet
14021306
21 pages
Introduction To Machine Learning and Hands On Sessions
No ratings yet
Introduction To Machine Learning and Hands On Sessions
50 pages
machine learning unit 1 ppt
No ratings yet
machine learning unit 1 ppt
40 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Introduction To AI-ML-and Applications
No ratings yet
Introduction To AI-ML-and Applications
115 pages
Lecture 1
100% (1)
Lecture 1
81 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Machine Learning: ML by Poonam Dhamal
No ratings yet
Machine Learning: ML by Poonam Dhamal
72 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
ML
No ratings yet
ML
18 pages
Introduction (15 Files Merged)
No ratings yet
Introduction (15 Files Merged)
43 pages
Elements of Machine Learning
No ratings yet
Elements of Machine Learning
116 pages
Machine Learning
No ratings yet
Machine Learning
81 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
81 pages
ML Module No 01.Pptx
No ratings yet
ML Module No 01.Pptx
138 pages
Class Material - 1
No ratings yet
Class Material - 1
66 pages
ML_Concepts&Algorithms
No ratings yet
ML_Concepts&Algorithms
193 pages
Unit-I - Machine Learning Concepts
No ratings yet
Unit-I - Machine Learning Concepts
135 pages
ML 1
No ratings yet
ML 1
13 pages
Lecture 1 Ai
No ratings yet
Lecture 1 Ai
38 pages
A Beginner's Guide To Machine Learning Fundamentals (Compressed)
No ratings yet
A Beginner's Guide To Machine Learning Fundamentals (Compressed)
10 pages
Csit (r22) 3-2 Machine Learning Digital Notes
No ratings yet
Csit (r22) 3-2 Machine Learning Digital Notes
120 pages
Note 1
No ratings yet
Note 1
14 pages
Ete report updated 4
No ratings yet
Ete report updated 4
18 pages
Lecture - 1 Introduction To ML
No ratings yet
Lecture - 1 Introduction To ML
38 pages
ML All Chapter ppt
No ratings yet
ML All Chapter ppt
118 pages
Introduction To Machine Learning - 2023
No ratings yet
Introduction To Machine Learning - 2023
44 pages
Chapter 5 Introduction To ML-1
100% (1)
Chapter 5 Introduction To ML-1
32 pages
Week09a Intro ML
No ratings yet
Week09a Intro ML
17 pages
Introduction To AI, ML and DL: Dr. Manjubala Bisi
No ratings yet
Introduction To AI, ML and DL: Dr. Manjubala Bisi
33 pages
Lec1 PDF
No ratings yet
Lec1 PDF
16 pages
Introductions To Data Science - Lecture 1 - Introduction
No ratings yet
Introductions To Data Science - Lecture 1 - Introduction
15 pages
Introduction To Machine Learning and Python
No ratings yet
Introduction To Machine Learning and Python
19 pages
Lecture Slides - ML - Part 1
No ratings yet
Lecture Slides - ML - Part 1
12 pages
Lecture 1 -Intro
No ratings yet
Lecture 1 -Intro
63 pages
ML-cahp-1
No ratings yet
ML-cahp-1
35 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
11 pages
Machine learning Unit 1
No ratings yet
Machine learning Unit 1
14 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
1 - ML_Introduction
No ratings yet
1 - ML_Introduction
47 pages
Introduction - Ch.1: Data Mining For Business Analytics in R
No ratings yet
Introduction - Ch.1: Data Mining For Business Analytics in R
17 pages
Session 02 - Introduction To ML
No ratings yet
Session 02 - Introduction To ML
13 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
CE802_Lec_IntroML_handouts
No ratings yet
CE802_Lec_IntroML_handouts
24 pages
Introduction To Machine Learning 1
No ratings yet
Introduction To Machine Learning 1
18 pages
Beyond Silicon
From Everand
Beyond Silicon
Piyush yadav
5/5 (1)
Freshservice - 2022
No ratings yet
Freshservice - 2022
17 pages
Research Paper Published
No ratings yet
Research Paper Published
10 pages
Gartner Report On Emerging Tech - 2024
No ratings yet
Gartner Report On Emerging Tech - 2024
17 pages
Coursera M8ZGC83Y34WT
No ratings yet
Coursera M8ZGC83Y34WT
1 page
SAP A Assignment
100% (1)
SAP A Assignment
9 pages
ANDREJEVIC - Automated Media
No ratings yet
ANDREJEVIC - Automated Media
181 pages
Statistical Consulting Services
No ratings yet
Statistical Consulting Services
3 pages
Practice Assignment 2 TCS Ignio Madan Koushik
No ratings yet
Practice Assignment 2 TCS Ignio Madan Koushik
7 pages
s4c Ibp2402 Set-Up en XX
No ratings yet
s4c Ibp2402 Set-Up en XX
61 pages
MIM RESUME and COVER LETTER GUIDE - 2021
No ratings yet
MIM RESUME and COVER LETTER GUIDE - 2021
19 pages
Stop Marketing Start Selling - The Good
No ratings yet
Stop Marketing Start Selling - The Good
197 pages
Partner With Microsoft PDF
No ratings yet
Partner With Microsoft PDF
541 pages
Kamad - Data Analysis (Nudge-IAF Punjab)
No ratings yet
Kamad - Data Analysis (Nudge-IAF Punjab)
2 pages
(GI) MKTG 3001 MASP - Fall 2025
No ratings yet
(GI) MKTG 3001 MASP - Fall 2025
11 pages
Fico Management of Debt Collections System Banks Use
No ratings yet
Fico Management of Debt Collections System Banks Use
4 pages
GITAM - 2026 Batch - Assessment Structure & Syllabus (1) (1)
No ratings yet
GITAM - 2026 Batch - Assessment Structure & Syllabus (1) (1)
14 pages
Defining Digital Transformation-20220315120851
No ratings yet
Defining Digital Transformation-20220315120851
25 pages
Syllabus Engg Courses 21 22 4Y
No ratings yet
Syllabus Engg Courses 21 22 4Y
10 pages
Ashish Bansal Business Analytics File
No ratings yet
Ashish Bansal Business Analytics File
18 pages
Data Science and Big Data Analytics: Course Overview
No ratings yet
Data Science and Big Data Analytics: Course Overview
2 pages
R1
No ratings yet
R1
18 pages
An Outline of Big Data Tools & Technologies
No ratings yet
An Outline of Big Data Tools & Technologies
6 pages
PL 300T00A ENU Powerpoint01
No ratings yet
PL 300T00A ENU Powerpoint01
20 pages
Biography: Nikhil Kumar Singh
No ratings yet
Biography: Nikhil Kumar Singh
1 page
Assessing The Real ROI From Siebel
No ratings yet
Assessing The Real ROI From Siebel
8 pages
Worldpay Ecomm Iq Reporting and Analytics User Guide V4.10
No ratings yet
Worldpay Ecomm Iq Reporting and Analytics User Guide V4.10
474 pages
AI Ethics - Use 5 Common Guidelines As Your Starting Point
No ratings yet
AI Ethics - Use 5 Common Guidelines As Your Starting Point
13 pages
Pearson Recommended Link
No ratings yet
Pearson Recommended Link
3 pages
Beyond Digital Transformations Modernizing Core Technology For The Ai Bank of The Future
No ratings yet
Beyond Digital Transformations Modernizing Core Technology For The Ai Bank of The Future
11 pages