3b Features PDF
3b Features PDF
3b Features PDF
Sudeshna Sarkar
IIT Kharagpur
Feature Reduction in ML
- The information about the target class is inherent
in the variables.
- Naïve view:
More features
=> More information
=> More discrimination power.
- In practice:
many reasons why this is not the case!
Curse of Dimensionality
• number of training examples is fixed
=> the classifier’s performance usually will
degrade for a large number of features!
Feature Reduction in ML
- Irrelevant and
- redundant features
- can confuse learners.
Feature Extraction
7
Feature Selection Steps
Feature selection is an
optimization problem.
o Step 1: Search the space
of possible feature
subsets.
o Step 2: Pick the subset
that is optimal or near-
optimal with respect to
some objective function.
Feature Selection Steps (cont’d)
Search strategies
– Optimum
– Heuristic
– Randomized
Evaluation strategies
- Filter methods
- Wrapper methods
Evaluating feature subset
• Supervised (wrapper method)
– Train using selected subset
– Estimate error on validation dataset
From Wikipedia
Signal to noise ratio
• Difference in means divided by difference in
standard deviation between the two classes
Sudeshna Sarkar
IIT Kharagpur
Feature extraction - definition
Feature Extraction
PCA
Geometric picture of principal components (PCs)
Geometric picture of principal components (PCs)
Geometric picture of principal components (PCs)
Algebraic definition of PCs
Given a sample of p observations on a vector of N variables
x , x ,, x
1 2 p
N
Original
p=16 p=32 p=64 p=100 Image
Is PCA a good criterion for classification?
• Data variation
determines the
projection direction
• What’s missing?
– Class information
What is a good projection?
Two classes
• Similarly, what is a
overlap
good criterion?
– Separating different
classes
Between-class distance
What class information may be useful?
• Between-class distance
– Distance between the centroids of
different classes
• Within-class distance
• Accumulated distance of an instance
to the centroid of its class
m1 m2
2
J w
s s
2
1
2
2
Multiple Classes