Part 11 MD

Visual Information Interpretation
Visual classification and Object detection
Ji Hui
1
Basics on Classification
Data: A set of data records described by
attributes:
a class: each example is labelled with a pre-defined class.
Classifier: a procedure that accepts a set of features and
produce a class label for them.
Classification: To train a classifier from the data that can be
used to predict the classes of new data records
2
Classification
Classification: assign input to one or more classes via a decision rule

A decision rule divides the space of input into decision regions, separated by decision
boundary
Three classes;
3
What it means "learning/"training"
Input
a data set
A task , e.g. classification.
A performance measure
A machinery is said to learning/training from to perform the task , if after learning, the
machinery perform better than no learning, as measured by .
Performance measure : Consider a classifier :
Loss function:
: the cost of an sample with label is labelled by as .
: the minimum loss is with correct label, e.g.
4
Training error, testing error and overfitting
Define
Training error: empirical risk for training samples .
Testing error: empirical risk for unseen testing samples .
With only training samples, we can train a classifier t minimize the training error However,
we really want is to minimize the testing error.
Smaller training error does not imply smaller testing error
Generalization error: the difference between training error and testing error
Overfitting: The phenomenon that causes large generalization error is called overfitting in
classification
5
Cross-validation for evaluating generalization error
Split the original dataset into different subsets (folds)
Retains 1 fold as test set and uses the other folds to train the classifier, and calculate
the error rate: the percentage of wrong labels on test set.
After totally runs, calculate the average error rate, which gives us an idea of how well our
model generalizes.
6
Learning classifier
7
Support vector Machine: A linear classifier that maximizing the margin
a linear classifier define a hyper-plane as the decision boundary
For example, in 2D case, the boundary is a line.
The classifier:
: the weight vector
: the bias
for any , Class "+1" if , Class "-1" if , uncertain, if .
8
Binary linear classifier with maximum margin
Which classifier to use?
Classifier with large margin is likely to be more robust to perturbations
9
Margin of linear classifer
Choose and of a linear classifier s.t.
where ( ) are subset in category ( ).

The margin of the linear classifier is defined by
10
Support vector machine (SVM) and its dual form.
SVM is to learn a linear classifier with maximum margin
Equivalently
Dual form of SVM
Its classifier is
11
Multi-class classifier
One-versus-all
Train binary classifiers, one for each class against all other classes.
Predicted class is the class of the most confident classifier
One-versus-one
Train , classifiers, each discriminating between a pair of classes
Selecting the final classification based on the output of the binary classifier, e.g., choose
the class with a majority vote among all the classifiers.
12
Non-linear classifier
Solution: mapping data to higher dimension
13
Feature map
Find a mapping to map feature vector to a higher-dimensional space:
Linear SVM in higher-dimensional space

Classifier: with :
Dual form of SVM in higher dimensional space

Classifier:
Learning:
14
Kernel trick for dual classifier
Recall that
The function is called a kernel.
Dual classifier in kernel space
Classifier:
Learning:
subject to
Example
Linear kernel
Gaussian kernel:
15
Practical Gaussian SVM with regularization parameter
Classifier:
Learning:
subject to
Two parameters
-`Smaller value large robustness to outliers
Larger value smoother decision boundary
16
Example
17
Example
18
Example
19
Summary of kernel method
Classifiers can be learned for high dimensional features spaces, without actually having to
map the points into the high dimensional space
Data may be linearly separable in the high dimensional space, but not linearly separable in
the original feature space
Kernels can be used for an SVM because of the scalar product in the dual form, but can also
be used elsewhere – they are not tied to the SVM formalism
20
SVM from scikit-learn (A machine learning package)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
# Simulate a binary classification with 2D feature vectors

xx, yy = np.meshgrid(np.linspace(-3, 3, 500), np.linspace(-3, 3, 500))
np.random.seed(0)
X = np.random.randn(300, 2)
Y = np.random.randint(2,size=300)
# Fitting a non-linear SVM
clf = svm.SVC(C=1.0, kernel='rbf', gamma='auto')
clf.fit(X, Y)
# Predict the labels of two vectors

clf.predict([[0.7,0.3],[0.1,0.9]])
21
Visual classification
Visual categorization: assign an input image a label from pre-defined set
Categorization usually does not assume object detection

Labels are granularity-based
General:
Basic:
specfic:
More specific labels are, easier task for computer
more specific less intra-class variations, more visual cues. 22
Number of categories
How many object categories in general visual categorization?
About 10,000 to 30,000.
23
Challenges: viewpoint variations
24
Challenges: illumination variations
25
Challenges: deformations
26
Challenges: occlusion
27
Challenges: background clutter
28
Challenges: intra-class variations
29
Basics of visual classification
Representation: Building feature

vectors
How to represent an object
category
Learning to classify
Which classifier?
How to learn the classifier, given
training data
Classification
How the classifier is to be used on
real data
30
Representation: Building feature vector
Basic principle
Expose inter-class (between-class) variation
Suppress intra-class (with-in class) variation
Invariance to environmental changes, e.g. light, viewpoint. 31
Hand-crafting good image features
Detect Interest Points
DoG detector in SIFT; Harris; Dense/random: (randomly) taking every -th pixel
Compute Descriptor around each interest point

SIFT, Histogram of Oriented Gradients (HoG)
32
General point for descriptor
Exact intensity values are NOT important

Image edges or contours are important
Image texture is important
Exact feature location is NOT important
33
Discriminative models
Modeling essential content of an image by a feature vector with sufficient discriminative
information.
Building discriminative features of images

Training the classifier using the input of discriminative features
SVM, -Nearest neighbors, and others.
34
Discriminative model: Bag of word
35
Origin: Bag of Words for document representation
Orderless document representation: frequencies of words from a dictionary Salton & McGill
(1983)
36
Pipeline of Bag of words (Bow)
37
Feature extraction
Sampling strategy
Computing image features, e.g., SIFT
38
Codebook learning: building visual vocabulary
Generating dictionary (codebook) with clustering
Using the clustering technique, like K-means, to generate the codebook: visual vocabulary.
39
Recap of K-means
Clustering: Dividing data points into a number of groups such that data points in the same
groups are more similar to other data points in the same group than those in other groups.
K-means: a clustering technique

Partitioning data points into disjoint subsets
40
Recap of K-means
An algorithm for partitioning data points into disjoint subsets so as to minimize
the sum-of-squares criterion
: The number of clusters

: The index set of each clusters
: The geometric centroid of each cluster.
Basic procedure
i. Initialize cluster centroids randomly
ii. Repeat until convergence:
a. For every , set
b. For every , set
Metric for defining the distance is key, and it is domain-dependent. 41

Demonstration of K-means
42
cont'
After using clustering to generate multiple centers
Each cluster center becomes a codevector == visual word
the whole set of cluster centers is a codebook == visual vocabulary
Codebook can be learned on separate training set
The codebook can be used for quantizing feature
For example, the feature vector is the occurrence of each visual word, i.e. the histogram of
visual words after normalized with sum=1.
How to choose vocabulary size?

Too small: visual words not representative of all
Too large: quantization artifacts, overfitting 43
Demonstration of codebook
44
Image representation
45
Image classification using Bow histogram
Histogram of visual words: Summarize entire image based on the occurrences of visual
words
Leaning a classifier, e.g. SVM or KNN, using BoW histogram based features
46
Object Detection
Assigning a label and a bounding box to one or many objects in the image
Two-stage approach
Locating the object in the image
Classifying the object in the image
47
Detection: A case study on face detection
Face detection is the first step in automated face recognition
There are many cues for face detection
skin color
facial/head shape
facial appearance
Basic procedure
An image is scanned at all possible locations and many possible scales of bounding box.
Face detection is posed as classifying the pattern in each bounding box as either face or
non-face.
The face/nonface classifier is learned from face and non-face training examples using
machine learning methods
The first challenge: how to efficiently calculate the features for each bounding box?
The total number of the possible bounding boxes of an image of size with 5
possible box lengths is = Millions. 48
Haar-type image features with fast computation
Haar-type filters: Value = pixel values in blue - pixel values in pink.
49
Fast computing via Integral image
For each bounding box, one need to generate a feature vector with sufficient discrimination.
Suppose the feature vector to be the collection of the responses of the bounding box with
different haar-type filters.
For a bounding box, about 160,000 Haar-type box filters with varying size, aspect
rotation, orientation.
It is critical to have an efficient computational scheme to compute the response for a large
number of different haar-type filters
The solution: computation via integral image
[Definition] For an image , its integral image is defined by
50
Integral image based computation
Pre-computing a look-up table for all possible
Computing the sum of the pixel values in a rectangular region
Then the sum of pixel values within the rectangle can be computed as:
Only 4 lookups and three additions are required for calculating sum of any size of rectangle,
given the table of integral image.
Then the output for Haar-type box filter can be computing efficiently
51
Feature selection
For a detection region, the number of possible Haar-type (rectangle) features is
about
It is impractical to computing the entire feature set

The solution is to compose a feature vector with a small number of Haar-type features.
How to select which features are included in the feature vector of bounding box
Training a good classifier such that it can identify "good"
rectangular features and "bad" ones of all possible features
52
Object detection using object classification
Calculating the feature of each bounding box

Training a classifier to categorize different objects
53

Part 11 MD

Uploaded by

Copyright:

Available Formats

Part 11 MD

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 11 MD

Uploaded by

Copyright:

Available Formats

Visual Information Interpretation

Visual classification and Object detection

Classification: assign input to one or more classes via a decision rule

Testing error: empirical risk for unseen testing samples .

Classifier with large margin is likely to be more robust to perturbations

Choose and of a linear classifier s.t.

where ( ) are subset in category ( ).

Dual form of SVM

Solution: mapping data to higher dimension

Linear SVM in higher-dimensional space

Dual form of SVM in higher dimensional space

# Simulate a binary classification with 2D feature vectors

# Predict the labels of two vectors

Categorization usually does not assume object detection

Representation: Building feature

Compute Descriptor around each interest point

Exact intensity values are NOT important

Building discriminative features of images

Computing image features, e.g., SIFT

K-means: a clustering technique

: The number of clusters

b. For every , set

Metric for defining the distance is key, and it is domain-dependent. 41

How to choose vocabulary size?

Haar-type filters: Value = pixel values in blue - pixel values in pink.

[Definition] For an image , its integral image is defined by

It is impractical to computing the entire feature set

Calculating the feature of each bounding box

You might also like