ML3 Some Supervised
ML3 Some Supervised
K Nearest Neighbours
Decision Trees
Random Forest
Linear Regression
Logistic Regression
Support Vector Machine
K=7
K = 1:
Belongs to square class
K=3
K=1 K = 3:
? Belongs to triangle class
K = 7:
Belongs to square class
Choice of K:
– If K is too small, sensitive to noise points
– If K is too large, neighborhood may include points from other classes
Disadvantages
• Classifying unknown data is expensive
– Requires distance computation with all training data
– Computationally intensive when the training set is large
• Accuracy can be severely degraded by the presence of
noisy or irrelevant features
(X5,Y5) (X6,Y6)
(X2,Y2)
(X1,Y1) (X4,Y4)
(X3,Y3)
Yi 0 +1 Xi i
Dependent Independent
(Response) (Explanatory)
Variable Variable
N. Sudha, SASTRA University 12
Linear regression model
Y Yi 0 1 X i i
i = Random error
Y 0 1 X
X
N. Sudha, SASTRA University 13
Estimating parameters
– Least Squares method
ˆ
n 2 n
Yi Yˆi 2
i
i 1 i 1
^4
^2
^1 ^3
Yˆi ˆ0 ˆ1 X i
X
N. Sudha, SASTRA University 16
Derivation of parameters
n n
y 1 xi
2 2
Minimize squared error i i 0
i 1 i 1
2 i2
i
0 0
0 1
Cov ( x, y ) x x y y
1
ˆ ˆ0 y ˆ1x
i i
x x
2
Var ( x) i