Part 11 MD
Part 11 MD
Part 11 MD
Ji Hui
1
Basics on Classification
Data: A set of data records described by
attributes:
a class: each example is labelled with a pre-defined class.
Classifier: a procedure that accepts a set of features and
produce a class label for them.
Classification: To train a classifier from the data that can be
used to predict the classes of new data records
2
Classification
Three classes;
3
What it means "learning/"training"
Input
a data set
A task , e.g. classification.
A performance measure
A machinery is said to learning/training from to perform the task , if after learning, the
machinery perform better than no learning, as measured by .
Performance measure : Consider a classifier :
Loss function:
: the cost of an sample with label is labelled by as .
: the minimum loss is with correct label, e.g.
4
Training error, testing error and overfitting
Define
Training error: empirical risk for training samples .
With only training samples, we can train a classifier t minimize the training error However,
we really want is to minimize the testing error.
Smaller training error does not imply smaller testing error
Generalization error: the difference between training error and testing error
Overfitting: The phenomenon that causes large generalization error is called overfitting in
classification
5
Cross-validation for evaluating generalization error
Split the original dataset into different subsets (folds)
Retains 1 fold as test set and uses the other folds to train the classifier, and calculate
the error rate: the percentage of wrong labels on test set.
After totally runs, calculate the average error rate, which gives us an idea of how well our
model generalizes.
6
Learning classifier
7
Support vector Machine: A linear classifier that maximizing the margin
a linear classifier define a hyper-plane as the decision boundary
For example, in 2D case, the boundary is a line.
The classifier:
: the weight vector
: the bias
for any , Class "+1" if , Class "-1" if , uncertain, if .
8
Binary linear classifier with maximum margin
Which classifier to use?
9
Margin of linear classifer
10
Support vector machine (SVM) and its dual form.
SVM is to learn a linear classifier with maximum margin
Equivalently
Its classifier is
11
Multi-class classifier
One-versus-all
Train binary classifiers, one for each class against all other classes.
Predicted class is the class of the most confident classifier
One-versus-one
Train , classifiers, each discriminating between a pair of classes
Selecting the final classification based on the output of the binary classifier, e.g., choose
the class with a majority vote among all the classifiers.
12
Non-linear classifier
13
Feature map
Find a mapping to map feature vector to a higher-dimensional space:
Learning:
14
Kernel trick for dual classifier
Recall that
The function is called a kernel.
Dual classifier in kernel space
Classifier:
Learning:
subject to
Example
Linear kernel
Gaussian kernel:
15
Practical Gaussian SVM with regularization parameter
Classifier:
Learning:
subject to
Two parameters
-`Smaller value large robustness to outliers
Larger value smoother decision boundary
16
Example
17
Example
18
Example
19
Summary of kernel method
Classifiers can be learned for high dimensional features spaces, without actually having to
map the points into the high dimensional space
Data may be linearly separable in the high dimensional space, but not linearly separable in
the original feature space
Kernels can be used for an SVM because of the scalar product in the dual form, but can also
be used elsewhere – they are not tied to the SVM formalism
20
SVM from scikit-learn (A machine learning package)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
21
Visual classification
Visual categorization: assign an input image a label from pre-defined set
23
Challenges: viewpoint variations
24
Challenges: illumination variations
25
Challenges: deformations
26
Challenges: occlusion
27
Challenges: background clutter
28
Challenges: intra-class variations
29
Basics of visual classification
30
Representation: Building feature vector
Basic principle
Expose inter-class (between-class) variation
Suppress intra-class (with-in class) variation
Invariance to environmental changes, e.g. light, viewpoint. 31
Hand-crafting good image features
Detect Interest Points
DoG detector in SIFT; Harris; Dense/random: (randomly) taking every -th pixel
32
General point for descriptor
35
Origin: Bag of Words for document representation
Orderless document representation: frequencies of words from a dictionary Salton & McGill
(1983)
36
Pipeline of Bag of words (Bow)
37
Feature extraction
Sampling strategy
38
Codebook learning: building visual vocabulary
Generating dictionary (codebook) with clustering
Using the clustering technique, like K-means, to generate the codebook: visual vocabulary.
39
Recap of K-means
Clustering: Dividing data points into a number of groups such that data points in the same
groups are more similar to other data points in the same group than those in other groups.
40
Recap of K-means
An algorithm for partitioning data points into disjoint subsets so as to minimize
the sum-of-squares criterion
42
cont'
After using clustering to generate multiple centers
Each cluster center becomes a codevector == visual word
the whole set of cluster centers is a codebook == visual vocabulary
Codebook can be learned on separate training set
The codebook can be used for quantizing feature
For example, the feature vector is the occurrence of each visual word, i.e. the histogram of
visual words after normalized with sum=1.
44
Image representation
45
Image classification using Bow histogram
Histogram of visual words: Summarize entire image based on the occurrences of visual
words
Leaning a classifier, e.g. SVM or KNN, using BoW histogram based features
46
Object Detection
Assigning a label and a bounding box to one or many objects in the image
Two-stage approach
Locating the object in the image
Classifying the object in the image
47
Detection: A case study on face detection
Face detection is the first step in automated face recognition
There are many cues for face detection
skin color
facial/head shape
facial appearance
Basic procedure
An image is scanned at all possible locations and many possible scales of bounding box.
Face detection is posed as classifying the pattern in each bounding box as either face or
non-face.
The face/nonface classifier is learned from face and non-face training examples using
machine learning methods
The first challenge: how to efficiently calculate the features for each bounding box?
The total number of the possible bounding boxes of an image of size with 5
possible box lengths is = Millions. 48
Haar-type image features with fast computation
49
Fast computing via Integral image
For each bounding box, one need to generate a feature vector with sufficient discrimination.
Suppose the feature vector to be the collection of the responses of the bounding box with
different haar-type filters.
For a bounding box, about 160,000 Haar-type box filters with varying size, aspect
rotation, orientation.
It is critical to have an efficient computational scheme to compute the response for a large
number of different haar-type filters
The solution: computation via integral image
50
Integral image based computation
Pre-computing a look-up table for all possible
Computing the sum of the pixel values in a rectangular region
Then the sum of pixel values within the rectangle can be computed as:
Only 4 lookups and three additions are required for calculating sum of any size of rectangle,
given the table of integral image.
Then the output for Haar-type box filter can be computing efficiently
51
Feature selection
For a detection region, the number of possible Haar-type (rectangle) features is
about
52
Object detection using object classification
53