Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

ML Supervised Learning SNU

The document provides an introduction to machine learning, focusing on supervised learning techniques, including regression and classification problems. It discusses various learning types, algorithms like gradient descent for solving regression problems, and the importance of feature scaling. The content is structured to guide readers through the fundamental concepts and applications of machine learning in predictive analytics.

Uploaded by

Swarnali Sarkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ML Supervised Learning SNU

The document provides an introduction to machine learning, focusing on supervised learning techniques, including regression and classification problems. It discusses various learning types, algorithms like gradient descent for solving regression problems, and the importance of feature scaling. The content is structured to guide readers through the fundamental concepts and applications of machine learning in predictive analytics.

Uploaded by

Swarnali Sarkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Introduction to Machine Learning

and
Supervised Learning

Kritanta Saha

Assistant Professor
Dept. of Computer Science & Engineering
Sister Nivedita University

Jan 2025

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 1 / 34


Outline

1 Introduction to Machine Learning

2 Types of Learning Techniques: Supervised, Unsupervised, Reinforcement Learning

3 Regression Problem: Linear & Polynomial Regression

4 Classification: Logistic Regression

5 Algorithm for Regression & Classification: Gradient Descent

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 2 / 34


Introduction to Machine Learning
What is Machine Learning (ML)?

It is a field of study that gives computers the ability to learn without begin explicitly
programmed.[Arthur Samuel 1959]

What is Learning?

A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T as measured by P,
improves with experience E. [Tom Mitchell 1998]

ML Applications:
Prediction: e.g. Used Car Price Prediction
Classification: e.g. Detect a Tumor is Malignant or Benign?
Clustering: e.g. User engagement in social media
Association Rule Mining: e.g. Market Basket Analysis
etc.
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 3 / 34
Different Types of Learning
Price(Rs.)
in Lakhs Used Car Price Prediction Tumor Malignant or Benign?
Supervised Learning: Given, 3

a data set and known correct


2
output i.e., having the idea that Age
there is a relationship between 1

the input and the output. 0


2000 4000 6000 8000 Tumor Size
3000 Miles Driven

Unsupervised Learning: It allows us to


x2
approach problems without any idea about what
the results should look like.
x1

Semi-supervised Learning (Reinforcement Learning): It is a trial-and-error


learning process where an agent interacts with an environment, takes actions, receives
rewards, and improves its policy to maximize long-term rewards.
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 4 / 34
Overview of Supervised Learning

Training Set

Learning Algorithm

Features Predicted
Hypothesis (h)
X Y
Overview of Supervised Learning

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 5 / 34


Supervised Learning

Regression Problem: Maps input Classification Problem: Maps input


variables to some continuous function. variables into discrete categories.
e.g. Used Car Price Prediction e.g. Detect whether tumor is malignant
Feature: Miles Driven or benign?
Output: Price Feature: Age and Tumor size
Output: Class label ( Malignant or
Price(Rs.) Benign )
in Lakhs Used Car Price Prediction
3
Tumor Malignant or Benign?

1 Age

0
2000 4000 6000 8000
3000 Miles Driven Tumor Size
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 6 / 34
Solving Regression Problem using Supervised Learning
Consider Regression Problem: Used Car Price Prediction
Training Dataset:
Miles Driven (x1 ) Engine Capacity(x2 ) in hp Price(y)
1230 1000 220000
3230 890 140500
.. .. ..
.. .. ..
4230 980 80000

m: No. of Training examples


x’s: Input variables or Features
y’s: Output variables or targets
(x, y) : One training example
(x(i) , y (i) ) : ith training example
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 7 / 34
Linear Regression

Linear Regression with one variable/ Univariate Linear Regression


How to represent hypothesis h?
y
hθ (x) = θ0 + θ1 x hθ = θ0 + θ1 x
(i) (i)
(x ; hθ (x ))
Choose θ0 , θ1 so that hθ (x) is close to y for (x, y).
(hθ (x(i) ) − y (i) )
Error at ith point = hθ (x(i) ) − y (i) ∀i.


(x(i) ; y (i) ) 1 Pm 2
Mean Squared Error: 2m i=1 hθ (x(i) ) − y (i)

Find θ0 , θ1 , such that


x m
1 X 2
Univarite Linear Regression Minimize hθ (x(i) ) − y (i)
θ0 ,θ1 2m
i=1

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 8 / 34


Linear Regression with One Variable
Mean Squared Error Loss/Cost Function
m
1 X 2
J(θ0 , θ1 ) = hθ (x(i) ) − y (i)
2m
i=1

In summary, for Linear Regression with one variable


Hypothesis : hθ = θ0 + θ1 x
Parameters : θ0 , θ1
Loss/Cost Function :
m
1 X 2
J(θ0 , θ1 ) = hθ (x(i) ) − y (i)
2m
i=1

Goal :
Minimize J(θ0 , θ1 )
θ0 ,θ1
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 9 / 34
Understanding The Loss/Cost Function
Consider simplified hypothesis: hθ = θ1 x [ i.e., θ0 = 0 ]
m
1 X 2
J(θ1 ) = hθ (x(i) ) − y (i)
2m
i=1
Minimize J(θ1 )
θ1

θ1 = 0:5 θ1 = 1 hθ (x) = x
J(0:5) = 0:58 y J(1) = 0 J(θ1 )
y 3
3 3
hθ (x) = 0:5x 2
2
2
1.5 1
1
1 0
0.5 0 1 2 3 x 0
0 0 0.5 1 1.5 2 θ1
0 1 2 3 x Varying θ1 θ1 = 1 minimizes the cost function
1
J(0:5) = 2:3 [(0:5 − 1)2 + (1 − 2)2 + (1:5 − 3)2 ] = 0:58
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 10 / 34
Understanding The Loss/Cost Function (cont..)
Consider hypothesis hθ = θ0 + θ1 x

Loss/Cost function J(θ0 , θ1 ), Note that: The loss function is Convex


Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 11 / 34
Understanding The Loss/Cost Function (cont..)

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 12 / 34


Linear Regression with Multiple variable
n : Number of features
x(i) : Feature of ith training example.
(i)
xj : Value of feature j in ith training example.
Hypothesis:
hθ (x) = θ0 + θ1 x1 + θ2 x2 + ... + θn xn
Parameters: θ0 , θ1 , ....θn
  
x0 θ0
 x1   θ1 
  n+1
  n+1
X=  . ∈R

 . ∈R
Θ=  hθ (x) = ΘT X
 .  .
xn θn
m
1 X 2
J(Θ) = hθ (x(i) ) − y (i) Minimize J(Θ)
2m Θ
i=1
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 13 / 34
Polynomial Regression

Fit a Quadratic Model i.e., Hypothesis:

hθ (x) = θ0 + θ1 x + θ2 x2

Fit a Cubic Model i.e., Hypothesis :

hθ (x) = θ0 + θ1 x + θ2 x2 + θ3 x3

Fit other Models such as √


hθ (x) = θ0 + θ1 x + θ2 x

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 14 / 34


Algorithms for solving Linear Regression Problems

How to find the parameter vector Θ such that it minimizes Loss/cost function
J(Θ)?

To find the parameters that minimizes cost function J(Θ) use Gradient Descent

Start with some random value of θ0 , θ1 , ..., θn .(Ex. say θi = 0 ∀i).

Keep changing θ0 , θ1 , ..., θn to reduce J(θ0 , θ1 , ..., θn ) until we hopefully end up at a


minimum.

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 15 / 34


Gradient Descent Algorithm
Start with some value of θ0 , θ1 , ..., θn .(Ex. say θi = 0 ∀i).
Keep changing θ0 , θ1 , ..., θn to reduce J(θ0 , θ1 , ..., θn ) until we hopefully end up at a
minimum.

Intuition of Gradient Descent


Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 16 / 34
Gradient Descent Algorithm

Gradient Descent for Solving Linear Regression with One Variable


Repeat until convergence {


θj := θj − α J(θ0 , θ1 ) ( f or j = 0 and j = 1 )
∂θj
}
where α is the Learning Rate.

Correct Simultaneous Update


temp0 := θ0 − α ∂θ∂ 0 J(θ0 , θ1 )
temp1 := θ1 − α ∂θ∂ 1 J(θ0 , θ1 )
θ0 := temp0
θ1 := temp1

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 17 / 34


Gradient Descent for Univariate Linear Regression
m
!
∂ ∂ 1 X 2
J(θ0 , θ1 ) = hθ (x(i) ) − y (i) (f or j = 0 and j = 1)
∂θj ∂θj 2m
i=1
m m
∂ 1 X  ∂ 1 X 
J(θ0 , θ1 ) = hθ (x(i) ) − y (i) J(θ0 , θ1 ) = hθ (x(i) ) − y (i) x(i)
∂θ0 m ∂θ1 m
i=1 i=1

Repeat until convergence {


m
1 X 
θ0 := θ0 − α hθ (x(i) ) − y (i)
m
i=1
m
1 X 
θ1 := θ1 − α hθ (x(i) ) − y (i) x(i)
m
i=1

}
where α is the Learning Rate.
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 18 / 34
Gradient Descent For Multivariate Linear Regression

Repeat until convergence {


θj := θj − α J(Θ) ( simultaneously update f or every j = 0, 1, ...n )
∂θj
}
where α is the Learning Rate.

m
1 X 2
J(Θ) = hθ (x(i) ) − y (i)
2m
i=1
m
∂ 1 X 
(i)
J(Θ) = hθ (x(i) ) − y (i) xj
∂θj m
i=1

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 19 / 34


Effect of Learning Rate α

α too small =⇒ Gradient descent can be slow.

α too large =⇒ Gradient descent can overshoot the minimum.


Fail to converge or even diverge.

For a Fixed α, Gradient descent can converge to local minimum.

How to make sure gradient descent is working correctly?


− If Gradient descent works properly then J(Θ) should decrease after every iteration.

How to choose α?
− If Gradient descent increases then use smaller α.

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 20 / 34


Preprocessing Features

Feature Scaling
Idea: Make sure features are on a similar scale.
Feature Scaling to eliminate the need for extra steps to reach at local optimal.
For example:
Let, x1 =Size of House ( 0 - 2000 f eet2 )
and x2 = Number of bedrooms (1-5)
Size of house N o. of Bedrooms
x1 = 2000 and x2 = 5

Mean Normalization Feature Scaling Technique


Replace xi with xi − µi (Do not apply to x0 )
Size of house−1000 N o. of Bedrooms−2
Ex. x1 = 2000 and x2 = 5

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 21 / 34


Revisit Supervised Learning

Regression Problem
Classification Problem

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 22 / 34


Classification Problem

Binary Classification problem: y ∈ {0, 1}


where 0 : Negative class 1 : Positive class.
Multiclass Classification problem: y ∈ {0, 1, 2, 3} etc.
Ex. Email Spam/Not Spam?
Tumor malignant/Benign?
Linear regression may not work and wrongly classify.

Malignant
y1 y
Malignant 1
y=0
y=0
0.5 0.5
y=1 y=1
Benign 0 Benign 0
0 1 2 3 0 1 2 3 Tumor Size
Tumor Size

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 23 / 34


Logistic Regression
Hypothesis Representation 1
g(z)
Required : 0 ≤ hθ (x) ≤ 1
0.5
g(z) : Sigmoid or Logistic function
1
g(z) =
1 + e−z 1 0 z
T
hθ (x) = g(Θ x) =
1 + e−ΘT x Sigmoid or Logistic Function
Interpretation of Hypothesis

hθ (x) = Estimated P robability that y = 1 on input x.

i.e., hθ (x) = P (y = 1|x; θ)


Example
   
x 1 hθ (x) = 0.7
x= 0 =
x1 T umorsize =⇒ 70% chance of tumor being Malignant.
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 24 / 34
Logistic Regression: Decision Boundary

hθ (x) = g(ΘT X) Example


Predict hθ (x) = g(θ0 + θ1 x1 + θ2 x2 )
y = 1 if hθ (x) ≥ 0.5
Let θ0 = −3, θ1 = 1 and θ2 = 1
y = 0 if hθ (x) < 0.5
Predict
y = 1 if −3 + x1 + x2 ≥ 0
1
g(z) y = 0 if x1 + x2 < 3
0.5
Tumor Malignant or Benign?
x 1 + x2 ≥ 3

0 z y=0
g(ΘT X) ≥ 0.5 when ΘT X ≥ 0 y=1
Age
Decision
=⇒ y = 1 if ΘT X ≥0 Boundary
y=0 if ΘT X < 0
Tumor Size
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 25 / 34
Logistic Regression: Cost Function
m training examples (1) (1) (2) (2) (m) (m)
  (x , y ) ,(x , y ), ...,(x , y )
x0
 x1  n+1 where x = 1 and y ∈ {0, 1}
 . ∈ R
n features x =  0

xn

Hypothesis :
1
hθ (x) =
1 + e−ΘT x
Cost Function for Linear Regression
m
1 X1 2
J(Θ) = hθ (x(i) ) − y (i)
m 2
i=1
  1 2
Cost hθ (x(i) ), y (i) = hθ (x(i) ) − y (i)
2
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 26 / 34
Logistic Regression: Cost Function
Binary Classification Problem
Cost Function for Logistic Regression
(
− log(hθ (x)) if y = 1
Cost (hθ (x), y) =
− log(1 − hθ (x)) if y = 0

m
1 X  
J(Θ) = Cost hθ (x(i) ), y (i)
m
i=1
m
!
1 X
J(Θ) = − y (i) log hθ (x(i) ) + (1 − y (i) ) log(1 − hθ (x(i) )
m
i=1

To fit parameter θ
Minimize J(Θ)
Θ

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 27 / 34


Gradient Descent for Logistic Regression

m
∂ 1 X 
(i)
J(Θ) = hθ (x(i) ) − y (i) xj
∂θj m
i=1

Repeat until convergence {


θj := θj − α J(Θ) ( simultaneously update f or every j = 0, 1, ...n )
∂θj
}
where α is the Learning Rate.

Faster ways to optimize : Conjugate gradient, BFGS, and L-BFGS.

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 28 / 34


Logistic Regression : Multiclass Classification
(1)
x2 x2 hθ (X)

(3)
x1 x1
hθ (X)
x2 x2

(2)
hθ (X)
x1 x1
(i) (i)
hθ (x) = P (y = i|x; θ) f or i = 1, 2, 3, train hθ (x) for each class i to predict the
probability that y = 1.
(i)
On a new input x, to make a prediction pick the class i that maximizes hθ (x)

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 29 / 34


Regularization : Overfitting Problem

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 30 / 34


Regularization : Solving Overfitting Problem
Reduce No. of Features
Regularization
Keep all the features but reduce magnitude of parameter θj

For Example,
m
1 X 2
Min hθ (x(i) ) − y (i) + 1000θ3 + 1000θ4
θ 2m
i=1

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 31 / 34


Regularization for Linear Regression
 
m n
1 X  2 X
J(Θ) = hθ (x(i) ) − y (i) + λ θj2 
2m
i=1 j=1

where λ is the regularization parameter.

Gradient Descent With Regularization


Repeat until convergence { m
1 X 
(i)
θ0 := θ0 − α hθ (x(i) ) − y (i) x0
m
m i=1
! !
1 X 
(i) λ
θj := θj − α hθ (x(i) ) − y (i) x0 + θj j ∈ {1, 2..n}
m m
i=1

}
where α is the Learning Rate and λ is the regularization parameter.
Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 32 / 34
Regularization for Logistic Regression

m
! n
1 X (i) (i) (i) (i) λ X 2
J(Θ) = − y log hθ (x ) + (1 − y ) log(1 − hθ (x ) + θj
m i=1
2m j=1

where λ is the regularization parameter.

Gradient Descent With Regularization


Repeat until convergence { m
1 X 
(i)
θ0 := θ0 − α hθ (x(i) ) − y (i) x0
m
m i=1
! !
1 X (i) (i)

(i) λ
θj := θj − α hθ (x ) − y x0 + θj j ∈ {1, 2..n}
m m
i=1

}
where α is the Learning Rate and λ is the regularization parameter.

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 33 / 34


References

Book 1: Ethem Alpaydin, “Introduction to Machine Learning”, 3rd Edition.


Book 2: Tom M. Mitchell, “Machine Learning” , McGraw-Hill
Online: Machine Learning course by Stanford University.

Kritanta Saha (Asst. Prof. SNU) Machine Learning Jan 2025 34 / 34

You might also like