Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ShortCourse-QTT-Lecture1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Unsupervised Learning

Lecture 1: Introduction and Evaluation

Tho Quan
qttho@hcmut.edu.vn
Agenda

• Introduction to supervised learning


• A case study with simple classification approach
• Evaluation the classification
• Make it practical: un upgraded version
Supervised vs. Unsupervised Learning
Classification: Overview

• Classification

Training Data Classification Model Make Predictions on


unseen Data

• Example :
• Play goft
• Neuron Network
Process 1: Model Construction

Classification
Algorithms
Training
Data

NAM E RANK YEARS TENURED Classifier


M ike Assistant Prof 3 no (Model)
M ary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
9
Process 2: Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no
Tenured?
M erlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Classification Algorithms

• Classification Algorithms:
• Support Vector Machines
• Neural Network (multi-layer perceptron)
• Decision Tree
• K-Nearest Neighbor
• Naive Bayes Classifier…
K-Nearest Neighbor

• Classifying objects based on closest training examples in the feature


space
• Approximated locally
• All computation is deferred until classification
• Classified by a majority vote of its neighbors
Data

• The training examples are vectors in a multidimensional feature space,


each with a class label
• The training phase of the algorithm consists only of storing the feature
vectors and class labels of the training samples
Algorithm
- k is a user-defined constant
- Choose k nearest neighbors (NNS)
- Label is label which is most frequent among the k training samples
nearest.
K-Nearest Neighbor

• Classifying objects based on closest training examples in the feature


space
• Approximated locally
• All computation is deferred until classification
• Classified by a majority vote of its neighbors
Similarity and Representation

Ø How do we define similarity?


Ø How do we represent the objects whose similarities we wish to
measure?
Motivation for the VSM

• We get back to the case study of document classification


• VSM is an algebraic model for representing text documents
as vectors of index term
• A document is represented as a vector. Each dimension
corresponds to a separate term. Its values are tf-idf
weighting
Document Collection
• A collection of n documents can be represented in the
vector space model by a term-document matrix.
• An entry in the matrix corresponds to the “weight” of a
term in the document; zero means the term has no
significance in the document or it simply doesn’t exist in
the document. T1 T2 …. Tt
D1 w11 w21 … wt1
D2 w12 w22 … wt2
: : : :
: : : :
Dn w1n w2n … wtn
Measuring Distance

• Euclidean Distance
• Manhattan Distance
• Cosine Similarity
• Vector Length & Normalization
Euclidean Distance

a +b = c
2 2 2

v (2,1)
1

b
2 +1 = c
2 2 2

c= 5
a

u x
(0, 0) 1 2
Euclidean Distance

In p dimensions, let Euclidean distance between


two points u and v be:

p
dist(u, v) = å (u - v )
i =1
i i
2
Manhattan Distance

• The Manhattan distance (a k a city block distance) is


the number of units on a rectangular grid it takes to
travel from point u to v. y

2 units horizontally +
1 unit vertically = 3 units
p
DM (u, v) = å ui - vi 1
v (2,1)

i =1
b

u x
(0, 0) 1 2
Cosine Similarity

• The traditional vector space model is based on a


different notion of similarity: the cosine of the
angle between two vectors.

• To start our consideration of cosine similarity,


consider our document vectors not as points in
term space, but as arrows that travel from the
origin of the space to a particular address.
Calculating Cosine(x, y)

x(x1, …, xp) & y(y1,…, yp) are two


vectors in p-dimensions:

x× y
cos(x, y) =
p p
(å x i × x i ) (å yi × yi )
i =1 i =1
p

åx ×y i i
= i =1
(1)
p p
(å x i × x i ) (å yi × yi )
i =1 i =1
Vector Length and Normalization

v
a +b =c
2 2 2

c 2 +1 = c
2 2 2
b

a
c= 5
x
1 2
Vector Length and Normalization

p
• Vector length:
v = åv
i =1
2
i = v×v

• Normalization: v
u=
v
Thus x×y
cos( x, y ) =
x y
Using the Cosine for IR

• From now on, unless otherwise specified, we can assume that


all document vectors in our term-document matrix A have been
normalized to unit length.
• Likewise we always normalize our query to unit length.
• Given these assumptions, we have the classic vector space
model for IR.
Accuracy Measures

• A natural criterion for judging the performance of a classifier is


the probability for making a misclassification error.

• Misclassification
• The observation belongs to one class, but the model classifies it as a
member of a different class.

• A classifier that makes no errors would be perfect


• Do not expect to be able to construct such classifiers in the real world
• Due to “noise”
• Not having all the information needed to precisely classify cases.
Accuracy Measures

• To obtain an honest estimate of classification error, we use the


classification matrix that is computed from the validation data.
• We first partition the data into training and validation sets by random
selection of cases.
• We then construct a classifier using the training data,
• Apply it to the validation data,
• Yields predicted classifications for the observations in the validation set.
• We then summarize these classifications in a classification matrix.
Problem with accuracy

• If the training and test data are skewed towards one classification,
then the model will predict everything as being that class.
• In Titanic training data, 68% of people died. If one trained a model
to predict everybody died, it would be 68% accurate.
Confusion Matrix
Practical Metrics for Data Science Projects

• Case study: in the Flight Delay project with NTU, we applied the
following metrics
• Accuracy
• Precision
• Recall
• F-measure
Practical k-NN

• Need a data structure for calculate near neighbors fast.


• Data structure
• KD-tree
Example: 2-D Tree

2-d Tree for these points :


(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
Tree Operation - Insert
(9, 5) => 9 > 7

(9, 5) => 5 < 6

(9, 5) => 9 > 8

9,5
Complexity

• Building a static k-d tree from n points O(n log n)


• Inserting a new point into a balanced k-d tree takes O(log n) time.
• Removing a point from a balanced k-d tree takes O(log n) time.
• Finding 1 nearest neighbour in a balanced k-d tree with randomly distributed
points takes O(log n) time on average.
Search Nearest Neighbor
Search Nearest Neighbor
Search K-Nearest Neighbor

• Apply the Search Nearest Neighbor with maintain the list


of points.
• Kd-trees are not suitable for efficiently finding the nearest
neighbour in high-dimensional spaces and approximate
nearest-neighbour methods should be used instead.

You might also like