ShortCourse-QTT-Lecture1
ShortCourse-QTT-Lecture1
ShortCourse-QTT-Lecture1
Tho Quan
qttho@hcmut.edu.vn
Agenda
• Classification
• Example :
• Play goft
• Neuron Network
Process 1: Model Construction
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no
Tenured?
M erlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Classification Algorithms
• Classification Algorithms:
• Support Vector Machines
• Neural Network (multi-layer perceptron)
• Decision Tree
• K-Nearest Neighbor
• Naive Bayes Classifier…
K-Nearest Neighbor
• Euclidean Distance
• Manhattan Distance
• Cosine Similarity
• Vector Length & Normalization
Euclidean Distance
a +b = c
2 2 2
v (2,1)
1
b
2 +1 = c
2 2 2
c= 5
a
u x
(0, 0) 1 2
Euclidean Distance
p
dist(u, v) = å (u - v )
i =1
i i
2
Manhattan Distance
2 units horizontally +
1 unit vertically = 3 units
p
DM (u, v) = å ui - vi 1
v (2,1)
i =1
b
u x
(0, 0) 1 2
Cosine Similarity
x× y
cos(x, y) =
p p
(å x i × x i ) (å yi × yi )
i =1 i =1
p
åx ×y i i
= i =1
(1)
p p
(å x i × x i ) (å yi × yi )
i =1 i =1
Vector Length and Normalization
v
a +b =c
2 2 2
c 2 +1 = c
2 2 2
b
a
c= 5
x
1 2
Vector Length and Normalization
p
• Vector length:
v = åv
i =1
2
i = v×v
• Normalization: v
u=
v
Thus x×y
cos( x, y ) =
x y
Using the Cosine for IR
• Misclassification
• The observation belongs to one class, but the model classifies it as a
member of a different class.
• If the training and test data are skewed towards one classification,
then the model will predict everything as being that class.
• In Titanic training data, 68% of people died. If one trained a model
to predict everybody died, it would be 68% accurate.
Confusion Matrix
Practical Metrics for Data Science Projects
• Case study: in the Flight Delay project with NTU, we applied the
following metrics
• Accuracy
• Precision
• Recall
• F-measure
Practical k-NN
9,5
Complexity