0% found this document useful (0 votes)

25 views

Module 3

Uploaded by

beebeefathimabagum435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Module 3

Uploaded by

beebeefathimabagum435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 79

Module 3

Introduction to Machine Learning -

4CS1201
Syllabus • MODULE 3
Supervised Learning
• Types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the
output.
• The labelled data means some input data is already tagged with the
correct output.
• Supervised learning is a process of providing input data as well as
correct output data to the machine learning model.
• The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
How Supervised Learning Works?

Supervised Machine learning - Javatpoint

Types of Supervised Learning
1. Classification:
• used when the output variable is categorical,
which means there are two classes such as Yes-
No, Male-Female, True-false, etc.
• Algorithms
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
Types of Supervised Learning
2. Regression
• used if there is a relationship between the input variable
and the output variable.
• It is used for the prediction of continuous variables, such
as Weather forecasting, Market Trends, etc.
• Algorithms
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
Types of
Supervised
Learning
Difference
Regression Classification
• Output- Continuous or real value • Output-Discrete value
• Best fit line is found out • Classification Boundary is found
• Egs- Weather prediction, House out
price prediction • Egs- Spam emails, Identification
• Problems – Linear and Non- of cancerous cells
Linear • Problems- binary or multi-class
Adv and Disadv of Supervised Learning
1. K-Nearest Neighbor
• Simplest Supervised Machine Learning algorithms.
• Assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available
categories
• Used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• Non-parametric algorithm, which means it does not make any assumption
on underlying data.
• A lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
1. K-Nearest Neighbor
1. K-Nearest Neighbor
1. K-Nearest Neighbor
• Distance Metric Used in K-NN
Euclidean Distance
This is nothing but the cartesian distance between the two points which are in the plane/hyperplane.
Euclidean distance can also be visualized as the length of the straight line that joins the two points which are into
consideration. This metric helps us calculate the net displacement done between the two states of an object.

Manhattan Distance
Manhattan Distance metric is generally used when we are interested in the total distance traveled by the object
instead of the displacement. This metric is calculated by summing the absolute difference between the coordinates of
the points in n-dimensions.
1. K-Nearest Neighbor
• Distance Metric Used in K-NN
Understanding data points and Euclidean Distance
Let X be the training dataset with n data points, where each data point is
represented by a d-dimensional feature vector and Y be the corresponding
labels or values for each data point in X. Given a new data point x, the
algorithm calculates the distance between x and each data point in X using
Euclidean distance as :
1. K-Nearest Neighbor
• How to select the value of K in the K-NN Algorithm?

• If the input data has more outliers or noise, a higher value of k would
be better.
• It is recommended to choose an odd value for k to avoid ties in binary
classification.
1. K-Nearest Neighbor
1. K-Nearest Neighbor
• 1. Solved Numerical Example of KNN Classifier to classify New Insta
nce IRIS Example by Mahesh
Huddar – YouTube
• 2. Solved Example KNN Classifier to classify New Instance Height an
d Weight Example by
mahesh Huddar (youtube.com)
• 3. K nearest Neighbor Learning Algorithm Lazy Learner Solved Exam
ple Dr. Mahesh
Huddar (youtube.com)
• Solved Example K Nearest Neighbors Algorithm Weighted KNN to cl
assify New Instance by Mahesh
Huddar (youtube.com)
KNN Algorithm In Machine Learning | KNN Algorithm Using Python | K Nearest Neighbor | Simplilearn (youtube.com)
Support Vector Machines (SVM)
• Used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.
Support Vector Machines (SVM)
Support
Vector
Machines
(SVM)
Support Vector Machines (SVM)
• Hyperplane: There can be multiple lines/decision boundaries to
segregate the classes in n-dimensional space, but we need to find out
the best decision boundary that helps to classify the data points. This
best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in
the dataset, which means if there are 2 features (as shown in image),
then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which
means the maximum distance between the data points.
Support Vector Machines (SVM)
• Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed
as Support Vector.
• Since these vectors support the hyperplane, hence called a Support
vector.
Types of SVM
• 2 Types
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Support Vector Machines (SVM)
1. Linear SVM
Support Vector Machines (SVM)
Since projection of any vector on
another vector is dot-product .
Therefore,
We all know the equation of a hyperplane is w.x+b=0 where w is a
vector normal to hyperplane and b is an offset. To classify a point as
negative or positive we need to define a decision rule.

We can define decision rule as:

Support Vector Machines (SVM)
Support Vector Machines (SVM)
So to separate these data points, we need to add one more dimension.
1. Non-Linear SVM For linear data, we have used two dimensions x and y, so for non-linear data,
we will add a third dimension z. It can be calculated as:
Support Vector Machines (SVM)
1. Solved Support Vector Machine | Linear SVM Example by Mahesh
Huddar (youtube.com)
2. How to draw a hyper plane in Support Vector Machine | Linear SVM
– Solved Example by Mahesh
Huddar (youtube.com)

Support Vector Machine (SVM) Algorithm - Javatpoint

Decision Tree Algorithm
• Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for
solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
Decision Tree Algorithm
• The decisions or the test are performed on the basis of features of the given
dataset.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
• A decision tree simply asks a question and based on the answer (Yes/No), it
further split the tree into subtrees.
Decision
Tree
Algorithm
Decision Tree Algorithm
• Need?
Decision Tree Algorithm
Terminologies
Decision Tree Algorithm
Working
Decision
Tree
Algorithm
Decision Tree Algorithm
• Attribute Selection Measures or ASM
• how to select the best attribute for the root node and for sub-nodes. So,
to solve such problems there is a technique which is called
as Attribute selection measure or ASM.
• 2 Techniques
Decision Tree Algorithm
• Attribute Selection Measures or ASM
Decision Tree Algorithm
• Attribute Selection Measures or ASM
Example of Decision Tree
Decision Tree Algorithm
CART Decision Tree Example
on same Dataset
Gini index
Gini = 1 – Σ (Pi)^2 for i=1 to number of classes

Outlook Yes No Number of instances

Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
Gini(Outlook=Sunny) = 1 – (2/5)^2 – (3/5)^2 = 1 – 0.16 – 0.36 = 0.48
Gini(Outlook=Overcast) = 1 – (4/4)^2 – (0/4)^2 = 0
Gini(Outlook=Rain) = 1 – (3/5)^2 – (2/5)^2 = 1 – 0.36 – 0.16 = 0.48
Then weighted sum of gini indexes for outlook feature.
Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342
Similarly

Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x

0.445 = 0.142 + 0.107 + 0.190 = 0.439

Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 =

0.367

Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 = 0.428

Therefore lowest is Gini(Outlook) and thus the root node

and the process is repeated for subsets
We apply same principles to the sub datasets
Final form of the decision tree built by CART
algorithm
Decision Tree Algorithm
1. 1. Decision Tree Solved Play Tennis Example Big Data Analytics C
ART Algorithm by Mahesh
Huddar (youtube.com)
2. 2. Decision Tree Solved Numerical Example Big Data Analytics CA
RT Algorithm by Mahesh
Huddar (youtube.com)
3. 3. Decision Tree Solved Numerical Example Big Data Analytics ML
CART Algorithm by Mahesh
Huddar (youtube.com)
Random Forest Classifier
• Used for both Classification and Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and to improve
the performance of the model.
• "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset.“
• Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts
the final output.
• The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
Random
Forest
Classifier
Random Forest Classifier
Need?
Random Forest Classifier
Working
Random Forest Classifier
Working
Random
Forest
Classifier
Evaluating
classification
model
performance
Evaluating classification model performance
Confusion Matrix
• A confusion matrix is defined as the table that is often used to
describe the performance of a classification model on a set of the test
data for which the true values are known.
Evaluating classification model performance
Confusion Matrix
Evaluating classification model performance
Confusion Matrix
Evaluating classification model performance
Confusion Matrix - Need
Evaluating classification model performance
Calculations using Confusion Matrix
Evaluating classification model performance
Calculations using Confusion Matrix
Evaluating classification model performance
Calculations using Confusion Matrix
Evaluating classification model performance
ROC and AUC
• The Receiver Operator Characteristic (ROC) curve is an evaluation
metric for binary classification problems. It is a probability curve that
plots the TPR against FPR at various threshold values. In other
words, it shows the performance of a classification model at all
classification thresholds.
• The Area Under the Curve (AUC) is the measure of the ability of a
binary classifier to distinguish between classes and is used as a
summary of the ROC curve.
• The higher the AUC, the better the model’s performance at
distinguishing between the positive and negative classes.
Evaluating classification model performance
ROC and AUC
• Sensitivity / True Positive Rate / Recall

Sensitivity tells us what proportion of the positive class got correctly classified.

• False Negative Rate

False Negative Rate (FNR) tells us what proportion of the positive class got incorrectly
classified by the classifier.

A higher TPR and a lower FNR are desirable since we want to classify the positive class
correctly.
• Specificity / True Negative Rate

Specificity tells us what proportion of the negative class got correctly classified.

• False Positive Rate

FPR tells us what proportion of the negative class got incorrectly classified by the
classifier.

A higher TNR and a lower FPR are desirable since we want to classify the negative
class correctly.
Let’s understand ROC curve with an example. Suppose we need to distinguish between
patients with a particular disease(say phobic) and those that do not have a disease (non-
phobic). The patient population for both these states forms two overlapping normal
distributions as shown below. It can be seen that the curves overlap, as they almost
always will in real life. Some individuals with disease have test scores or other
characteristics that are similar to those that do not have disease.
Streiner (2007) examines the ROC with a problem of classifying patients into Phobic (disease state positive category)
and Non Phobic ( disease state negative category) using a 10 point test score. Table 1 shows the test score from 1–10
and a frequency table of the test results categorized by label categories:
To predict Phobic and Non Phobic cases, we need to define a cutoff score. Lets do it for different cutoff scores
Plotting the ROC Curve
*The lower left-hand corner of the curve shows the beginning of the
classification process. No classifications are identified initially.

*Initially the cutoff is 9/10 ( cutoff of 10+ represents Phobic and less than or
equal to 9 is Non Phobic) — see Table 1. In this case the classification criteria
is very strict and only strong TP cases are classified. But there are only a few
examples in the sample which meet this criterion(21 instances for score 10).

*Once we make the cutoff a little more relaxed, for example 8/9, we can begin
to see some FPs as well.

*The cutoff 7/8 is the one which is closest to the upper left corner and hence it
minimizes the overall classification error. At this point, TPR =.8 and FPR =.1
approximately.
Evaluating classification model performance
• Confusion Matrix Solved Example Accuracy Precision Recall F1 Scor
e Prevalence by Mahesh
Huddar - YouTube
Thank you

Statistical Treatment 2-20-18
0% (2)
Statistical Treatment 2-20-18
3 pages
Stats Test II
No ratings yet
Stats Test II
3 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Unit 2
No ratings yet
Unit 2
16 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
SVM Unit3
No ratings yet
SVM Unit3
23 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
SVM&Decision Tree
No ratings yet
SVM&Decision Tree
10 pages
Algorithm of Neural Network M4
No ratings yet
Algorithm of Neural Network M4
25 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
Unit 1
No ratings yet
Unit 1
15 pages
SVM
No ratings yet
SVM
11 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Unit-1 DL
No ratings yet
Unit-1 DL
29 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
SVMs[1]
No ratings yet
SVMs[1]
30 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
CSL0777 L23
No ratings yet
CSL0777 L23
39 pages
DSBDUNITIII_T1729232981820-1
No ratings yet
DSBDUNITIII_T1729232981820-1
26 pages
support_vector_machines
No ratings yet
support_vector_machines
12 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
SVMs
No ratings yet
SVMs
30 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
AP for NLP-LO2
No ratings yet
AP for NLP-LO2
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
This Is
No ratings yet
This Is
7 pages
ML UNIT-4
No ratings yet
ML UNIT-4
20 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
9 pages
SVM Algorithm
No ratings yet
SVM Algorithm
17 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
ML Module No 04.Pptx
No ratings yet
ML Module No 04.Pptx
35 pages
SVM.pptx
No ratings yet
SVM.pptx
67 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Classification Regression: Mostly Used in Classification Problems
No ratings yet
Classification Regression: Mostly Used in Classification Problems
8 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
Unit 5
No ratings yet
Unit 5
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Unit 3 big data
No ratings yet
Unit 3 big data
50 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
A Guide To Structural Equation Modeling PDF
No ratings yet
A Guide To Structural Equation Modeling PDF
33 pages
Statistical Treatment of Data: Slovin's Formula
No ratings yet
Statistical Treatment of Data: Slovin's Formula
4 pages
Statistics Notes
No ratings yet
Statistics Notes
8 pages
Pratima Education® 9898168041: D. Ratio
No ratings yet
Pratima Education® 9898168041: D. Ratio
68 pages
21 K-Nearest Neighbors Regression
No ratings yet
21 K-Nearest Neighbors Regression
8 pages
Exponential Smoothing Methods
No ratings yet
Exponential Smoothing Methods
74 pages
(eBook PDF) Methods in Behavioural Research 2rd by Paul C. Cozby all chapter instant download
100% (6)
(eBook PDF) Methods in Behavioural Research 2rd by Paul C. Cozby all chapter instant download
45 pages
TS CV
No ratings yet
TS CV
208 pages
Baiq Maziza Adawiach Mandala 1710105081
No ratings yet
Baiq Maziza Adawiach Mandala 1710105081
17 pages
Lestari & Lestari. 2022. Analisis Pengaruh Penyelesaian Tindak Lanjut
No ratings yet
Lestari & Lestari. 2022. Analisis Pengaruh Penyelesaian Tindak Lanjut
13 pages
Measurement Errors
No ratings yet
Measurement Errors
2 pages
ML Question Bank
No ratings yet
ML Question Bank
4 pages
Tutorial Chapter 3-STA
No ratings yet
Tutorial Chapter 3-STA
2 pages
Goodness of Fit Test
No ratings yet
Goodness of Fit Test
3 pages
Period Data Naive Forecasts Abs. Error Error Abs. % Error
No ratings yet
Period Data Naive Forecasts Abs. Error Error Abs. % Error
3 pages
Biostatistic د- داليا احمد
No ratings yet
Biostatistic د- داليا احمد
12 pages
Activity 1: Statistics and Data Handling in Analytical Chemistry
No ratings yet
Activity 1: Statistics and Data Handling in Analytical Chemistry
16 pages
Math 403 Quiz 1 - Answer Key v1.0
No ratings yet
Math 403 Quiz 1 - Answer Key v1.0
9 pages
Multilevel and Mixed-Effects Hamilton - ch7 PDF
No ratings yet
Multilevel and Mixed-Effects Hamilton - ch7 PDF
18 pages
Exercise 1 (Exam December 2012)
No ratings yet
Exercise 1 (Exam December 2012)
30 pages
Incremental Clicks Impact of Search Advertising by Google
No ratings yet
Incremental Clicks Impact of Search Advertising by Google
5 pages
Single Variable Data Analysis
No ratings yet
Single Variable Data Analysis
9 pages
Structural Equation Modeling
No ratings yet
Structural Equation Modeling
14 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
A Statistical Exploration of The Relationships of Soil Moisture Characteristics To The Physical Properties of Soils
No ratings yet
A Statistical Exploration of The Relationships of Soil Moisture Characteristics To The Physical Properties of Soils
9 pages
Improved Estimator For The Estimation of Sensitive Variable Using ORRT Models
No ratings yet
Improved Estimator For The Estimation of Sensitive Variable Using ORRT Models
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Answers and Solutions To Exercises
No ratings yet
Answers and Solutions To Exercises
14 pages

Module 3

Uploaded by

Module 3

Uploaded by

Module 3

Introduction to Machine Learning -

Supervised Machine learning - Javatpoint

We can define decision rule as:

Support Vector Machine (SVM) Algorithm - Javatpoint

Outlook Yes No Number of instances

Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x

Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 =

Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 = 0.428

Therefore lowest is Gini(Outlook) and thus the root node

• False Negative Rate

• False Positive Rate

You might also like