Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

siddu AIml

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Similarity-based

Seminar Presentation
Learning On
“Similarity-based Learning”
PRESENTED BY:SUDHARSHANKV 1VI21EC161
FACULTY:prof .LAVANYA
Instance-based Learning VS Model
- based Learning
Instance-basedLearning Model-basedLearning

Nomodelisbuilt withthetraininginstancesbeforeit receivesa testinstance Generalizesa modelwiththetraininginstancesbeforeit receivesa testinstance


predictstheclassofthetestinstancedirectly fromthetrainingdata Predictstheclassofthetestinstancefromthemodelbuilt

Slow in testing phase Fast in testing hase

Learns by m many local a proximations Learnsb creatinglobalapproximation


NEAREST-NEIGHBOR LEARNING
 A natural approach to similarity-based classification is k-Nearest-
Neighbors (k-NN), which is a non-parametric method used for both
classification and regression problems.
 It is a simple and powe ul non-parametric algorithm that predicts
the category of the test instance according to the 'K training
samples which are closer to the test instance and classifies it to that
category which has the largest probability
 A visual representation of this learning is shown in Figure
 algorithm relies on the assumption that similar objects are close to
each other in the feature space.
 k-NN per forms instance-based learning which just stores the
training data
instances and learning instances case by case.
 The model is also 'memory-based' as it uses training data at
time when predictions need to be made.
 It is a lazy learning algorithm since no prediction model is built
earlier with training instances and classification happens only af ter
get ting the test
NEAREST-NEIGHBOR LEARNING

ALGORITHM
Inputs: Training dataset T, distance metric d, Test instance t, the number of nearest neighbors k
▶ Output: Predicted class or category
▶ Prediction: For test instance t,
1. For each instance i in T,compute the distance between the test instance t and every other instance i
in the training dataset using a distance metric (Euclidean distance). [Continuous at tributes -
Euclidean distance between two points in the plane with coordinates (x1,y1) and (x2,y2) is given as
distance
2. ((x1,y1),(x2,y2)) = (x2 – x1)^2 + (y2 – y1)^2 [Categorical at tributes (Binary) - Hamming Distance: If
the
value of the two instances is same, the distance d will be equal to O otherwise d = 1.]
3. Sor tthe distances in an ascending order and select the first k nearest training data instances to the
test instance.
4. Predict the class of the test instance by majority voting (if target at tribute is discrete valued) or mean
(if target
at tribute is continuous valued) of the k selected nearest instances.
Consider the student per formance training dataset of 8 data instances shown in
table below which describes the per formance of individual students in a course
and their CGPA and their CGPA obtained in previous semesters. The independent
a ributes are CGPA, Assessment and Project. The target variable is ‘Result’ which is
a discrete value variable that takes two values ‘Pass’ or ‘Fail’. Based on the
pe ormance of a student, classify whether a student will pass or fail in that course.
Given test instance
S.No.(6.1,40,5).CGPA Assessment Project Submi ed Result
1 9.2 85 8 Pass
2 8.0 80 7 Pass
3 8.5 81 8 Pass
4 6.0 45 5 Fail
5 6.5 50 4 Fail
6 8.2 72 7 Pass
7 5.8 38 5 Fail
8 8.9 91 9 Pass
WEIGHTED K-NEAREST-NEIGHBOR
 The Weighted k-NN is an extension of k-NN.
 It chooses the neighbors by using the weighted distance.
 The k-Nearest Neighbor (k-NN) algorithm has some serious limitations
as its per formance is solely dependent on choosing the k nearest
neighbors, the distance metric used and the decision rule.
 The principle idea of Weighted k-NN is that k closest neighbors to the
test instance are assigned a higher weight in the decision as
compared to neighbors that are fa her away from the test instance.
 The idea is that weights are inversely propo ional to distances.
 The classifier would declare the query point to belong to the class 0.
 But in the plot, it is clear that the point is more closer to the class 1
points
compared to the class 0 points.
 To overcome this disadvantage, weighted kNN is used. In weighted
kNN, the nearest k points are given a weight using a function called as
the kernel function.
 The intuition behind weighted kNN, is to give more weight to the
points which are nearby and less weight to the points which are
WEIGHTED K-NEAREST-NEIGHBOR
ALGORITHM
▶ Output: Predicted class or category
▶ Prediction: For test instance t,
1.For each instance 'i' in Training dataset T, compute the distance between the test instance t and
every other instance 'i' using a distance metric (Euclidean distance). [Continuous at tributes - Euclidean
distance between two points in the plane with coordinates (x1,y1) and (x2,y2) is given as distance .
2.((x1,y1),(x2,y2)) = (x2 – x1)^2 + (y2 – y1)^2 [Categorical at tributes (Binary) -
Hamming Distance: If the values of two instances are the same, the distance d will be equal to O.
Otherwise d 1.]
3. Sor tthe distances in the ascending order and select the first 'k' nearest training data
instances to the test
instance.
4. Predict the class of the test instance by weighted voting technique for the k selected nearest
instances: Compute the inverse of each distance of the 'k' selected nearest instances.
Find the sum of the inverses. Compute the weight by dividing each inverse distance by the
sum. (Each
weight is a vote for its associated class).
THANK YOU

You might also like