Linear_regression_final
Linear_regression_final
(3-0-0-3)
Dr. D. Harimurugan
Department of Electrical Engineering
Dr B R Ambedkar National Institute of Technology
Jalandhar
February 3, 2025
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies
Machine Learning
Machine Learning
History of AI/ML
History of AI
History of AI
History of AI
History of AI
History of AI
History of AI
AI-Winter-1 : 1974-1980
Failure of machine translation
Negative results in neural nets
Poor speech understanding
AI-Winter-2 : 1987-1993
Expert systems hardware cost
Failed to live up to their promises
TD-Gammon-1994
Deep Blue-1997
AI in recent times
Analysis Vs Analytics
Analysis Vs Analytics
Analysis Vs Analytics
Analysis Vs Analytics
Analysis Vs Analytics
Collection of data
Preprocessing of data and label the samples
Selection of model/function/procedure
Determination of parameters
Evaluation of the model
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies
Unsupervised learning
Major Classifications
Major classifications of ML
Supervised learning (needs labeled output)
Linear regression (uses linear relationship for continuous
variable prediction)
Predicting CGPA of last sem with known previous sem CG
(0-10, continuous value)
Predicitng the score of a T20 cricket match (1-250)
Logistic regression (predict class or category, Discrete
variable)
Email spam or ham
Cat or dog
Unsupervised learning (no labeled output)
Clustering
Market segmentation (offer discount to a particular segment)
Social network analysis (To target voters)
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies
Other variations
Nomenclature
Consider a dataset shown below
CGPA upto 4th sem Prediction of 5th sem CG
7.23 7.15
8.10 8.3
7.8 7.7
. .
. .
m ⇒ Number of training examples
x ⇒ Input features
y ⇒ Output variable
(x,y) ⇒ one training example
(x (i) , y (i) ) ⇒ i th trianing example
x (1) = 7.23, y (1) = 7.15, x (2) = 8.10, y (2) = 8.3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Introduction
Linear regression History of AI/ML
Multiple linear regression Major classifications in ML
Regularization Common Terminologies
Examples of output
Cost function
y = θ0 + θ1 X
Cost function
y = θ0 + θ1 X
Cost function
Cost function
Cost function
θ0 = 800 θ1 = −1.5
θ0 = 360 θ1 = 0
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization
∂ ∂
temp0 = θ0 − α J(θ0 , θ1 ) temp0 = θ0 − α J(θ0 , θ1 )
∂θ0 ∂θ0
∂ θ0 = temp0
temp1 = θ1 − α J(θ0 , θ1 )
∂θ1
∂
θ0 = temp0 temp1 = θ1 − α J(θ0 , θ1 )
∂θ1
θ1 = temp1 θ1 = temp1
∂
θj = θj − α J(θ0 , θ1 )
∂θj
hθ (x) = θ0 + θ1 x
m
1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
2m
i=1
Differentiating J
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) )2
∂θj ∂θj 2m
i=1
Differentiating J
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ ∂ 1 X
J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) )2
∂θj ∂θj 2m
i=1
m
∂ 2 X
θ0 = 0 ⇒ j = 0 ⇒ J(θ0 , θ1 ) = (θ0 + θ1 x (i) − y (i) ).(1)
∂θ0 2m
i=1
m
∂ 1 X
θ0 = 0 ⇒ j = 0 ⇒ J(θ0 , θ1 ) = (hθ (x (i) ) − y (i) ).(1)
∂θ0 m
i=1
Effect of α:
Effect of α:
Evaluation Metrices
Evaluation Metrices
Evaluation Metrices
Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=
RMSE(M2)=
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization
Evaluation Metrices
Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400
RMSE(M2)=400
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization
Evaluation Metrices
Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400 RMLSE(M1)=
RMSE(M2)=400 RMLSE(M2)=
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization
Evaluation Metrices
Model-1 Model-2
Actual Predicted Actual Predicted
1 401 10001 10401
Calculate RMSE in both models
RMSE(M1)=400 RMLSE(M1)=2.65
RMSE(M2)=400 RMLSE(M2)=0.01
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning
Cost function
Linear regression
Gradient Descent
Multiple linear regression
Evaluation metrices
Regularization
Significance of R 2
m−1
R ′2 = 1 − (1 − R 2 )
m − (n + 1)
m ⇒ number of samples
n ⇒ number of features
Value will decrease when the random (unwanted) feature
is added to the model
For large n, the whole fraction value increases. The
increase in R2 because of feature addition is compensated.
hθ (x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 x0 = 1
In general,
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 .....θn xn
In term of matrices h = X .θ
7.5 1 700 2
X= 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
7.2 0 600 2
In general,
hθ (x) = θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + θ4 x4 .....θn xn
In term of matrices h = X .θ
7.5 1 700 2
X= 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
7.2 0 600 2
Adding
x0
1 7.5 1 700 2 Find the order of h matrix?
X= 1 8 1 800 2
1 7.2 0 600 2
1 7.5 1 700 2
X=1 8 1 800 2 θ = [θ0 θ1 θ2 ....θn ]T
1 7.2 0 600 2
Order of X ⇒ m x (n+1)
Order of θ ⇒ 1 x (n+1)
Order of y ⇒ m x 1
Order of h ⇒ m x (n+1) * (n+1) x 1 ⇒ m x 1
Feature Scaling
Feature Scaling
Feature Scaling
x − xmin
xnew =
xmax − xmin
Weight Price
15 1
12 2
18 3
10 5
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression
x − xmin
xnew =
xmax − xmin
x −µ
xnew =
σ
x
Xnew =
max(x)
Feature selection
Feature Selection
Feature Selection
Filter Methods
Correlation: VIF
Remove x1
Build model with x2 , x3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression
Wrapper methods
Dummy variables
Dummy variable
Dummy variables
Dummy variables
Polynomial Regression
hθ (x) = θ0 + θ1 x + θ2 x 2 + θ3 x 3 + ... + θn x n
Polynomial Regression
hθ (x) = θ0 + θ1 x + θ2 x 2 hθ (x) = θ0 + θ1 x + θ2 x 2 + θ3 x 3
ML Dr. D. Harimurugan, EE - NITJ
Machine Learning Gradient descent for multiple linear regression
Linear regression Feature scaling
Multiple linear regression Feature selection & Dummy variables
Regularization Polynomial regression
Polynomial regression
“Underfit” “Overfit”
Polynomial regression
Regularization
Preventing Overfit
Preventing Overfit
Polynomial regression
Polynomial regression
Polynomial regression
Polynomial regression
Preventing Underfit
Decrease Regularization
Increase the duriation of training
Feature selection
m
αX (i) αλ
θj = θj − (hθ (x (i) ) − y (i) ).(xj ) − θj
m m
i=1
m
αλ αX (i)
θj = (1 − )θj − (hθ (x (i) ) − y (i) ).(xj )
m m
i=1
αλ
(1 − m) < 1 ⇒ θj = 0.99
m
αX (i)
θj = 0.99θj − (hθ (x (i) ) − y (i) ).(xj )
m
i=1
Tuning of Hyperparameters
Hyperparameter Tuning
d = 1 ⇒ hθ (x) = θ0 + θ1 x
d = 2 ⇒ hθ (x) = θ0 + θ1 x + θ2 x 2
.....
One can choose ’d’ such that it gives least Jtest value
This technique fails to generalize the model.
We are using test dataset to select the model parameters
which is not allowed
One can choose ’d’ such that it gives least Jcv value
If d=4 gives the least value, then the hypothesis is
h(θ) = θ0 + θ1 x + θ2 x 2 + θ3 x 3 + θ4 x 4
Training error
m
1 X
Jtrain (θ) = (h(x (i) ) − y (i) )2
2m
i=1
Test error
mtest
1 X (i) (i)
Jtest (θ) = (h(xtest ) − ytest )2
2mtest
i=1
High Bias
High Bias
High Bias
Bias quantifies how accurate the model is likely to behave
on the test data
Extremely simple models are likely to fail in predicting
complex real world phenomenon
High Variance
High Variance