Visual Information Interpretation

Visual classification and Object detection

Ji Hui

Basics on Classification
Data: A set of data records described by
a class: each example is labelled with a pre-defined class.
Classifier: a procedure that accepts a set of features and
produce a class label for them.
Classification: To train a classifier from the data that can be
used to predict the classes of new data records


Classification: assign input to one or more classes via a decision rule

A decision rule divides the space of input into decision regions, separated by decision

Three classes;

What it means "learning/"training"
a data set
A task , e.g. classification.
A performance measure
A machinery is said to learning/training from to perform the task , if after learning, the
machinery perform better than no learning, as measured by .
Performance measure : Consider a classifier :
Loss function:
: the cost of an sample with label is labelled by as .
: the minimum loss is with correct label, e.g.

Training error, testing error and overfitting
Training error: empirical risk for training samples .

Testing error: empirical risk for unseen testing samples .

With only training samples, we can train a classifier t minimize the training error However,
we really want is to minimize the testing error.
Smaller training error does not imply smaller testing error
Generalization error: the difference between training error and testing error
Overfitting: The phenomenon that causes large generalization error is called overfitting in
Cross-validation for evaluating generalization error
Split the original dataset into different subsets (folds)
Retains 1 fold as test set and uses the other folds to train the classifier, and calculate
the error rate: the percentage of wrong labels on test set.
After totally runs, calculate the average error rate, which gives us an idea of how well our
model generalizes.

Learning classifier

Support vector Machine: A linear classifier that maximizing the margin
a linear classifier define a hyper-plane as the decision boundary
For example, in 2D case, the boundary is a line.

The classifier:
: the weight vector
: the bias
for any , Class "+1" if , Class "-1" if , uncertain, if .
Binary linear classifier with maximum margin
Which classifier to use?

Classifier with large margin is likely to be more robust to perturbations

Margin of linear classifer

Choose and of a linear classifier s.t.

where ( ) are subset in category ( ).

The margin of the linear classifier is defined by

Support vector machine (SVM) and its dual form.
SVM is to learn a linear classifier with maximum margin


Dual form of SVM

Its classifier is

Multi-class classifier

Train binary classifiers, one for each class against all other classes.
Predicted class is the class of the most confident classifier
Train , classifiers, each discriminating between a pair of classes
Selecting the final classification based on the output of the binary classifier, e.g., choose
the class with a majority vote among all the classifiers.

Non-linear classifier

Solution: mapping data to higher dimension

Feature map
Find a mapping to map feature vector to a higher-dimensional space:

Linear SVM in higher-dimensional space

Classifier: with :

Dual form of SVM in higher dimensional space



Kernel trick for dual classifier
Recall that
The function is called a kernel.
Dual classifier in kernel space

subject to
Linear kernel
Gaussian kernel:

Practical Gaussian SVM with regularization parameter



subject to
Two parameters
-`Smaller value large robustness to outliers
Larger value smoother decision boundary




Summary of kernel method
Classifiers can be learned for high dimensional features spaces, without actually having to
map the points into the high dimensional space
Data may be linearly separable in the high dimensional space, but not linearly separable in
the original feature space
Kernels can be used for an SVM because of the scalar product in the dual form, but can also
be used elsewhere – they are not tied to the SVM formalism

SVM from scikit-learn (A machine learning package)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

# Simulate a binary classification with 2D feature vectors

xx, yy = np.meshgrid(np.linspace(-3, 3, 500), np.linspace(-3, 3, 500))
X = np.random.randn(300, 2)
Y = np.random.randint(2,size=300)
# Fitting a non-linear SVM
clf = svm.SVC(C=1.0, kernel='rbf', gamma='auto')
clf.fit(X, Y)

# Predict the labels of two vectors


Visual classification
Visual categorization: assign an input image a label from pre-defined set

Categorization usually does not assume object detection

Labels are granularity-based
More specific labels are, easier task for computer
more specific less intra-class variations, more visual cues. 22
Number of categories
How many object categories in general visual categorization?
About 10,000 to 30,000.

Challenges: viewpoint variations

Challenges: illumination variations

Challenges: deformations

Challenges: occlusion

Challenges: background clutter

Challenges: intra-class variations

Basics of visual classification

Representation: Building feature

How to represent an object
Learning to classify
Which classifier?
How to learn the classifier, given
training data
How the classifier is to be used on
real data

Representation: Building feature vector

Basic principle
Expose inter-class (between-class) variation
Suppress intra-class (with-in class) variation
Invariance to environmental changes, e.g. light, viewpoint. 31
Hand-crafting good image features
Detect Interest Points
DoG detector in SIFT; Harris; Dense/random: (randomly) taking every -th pixel

Compute Descriptor around each interest point

SIFT, Histogram of Oriented Gradients (HoG)

General point for descriptor

Exact intensity values are NOT important

Image edges or contours are important
Image texture is important
Exact feature location is NOT important
Discriminative models
Modeling essential content of an image by a feature vector with sufficient discriminative

Building discriminative features of images

Training the classifier using the input of discriminative features
SVM, -Nearest neighbors, and others.
Discriminative model: Bag of word

Origin: Bag of Words for document representation
Orderless document representation: frequencies of words from a dictionary Salton & McGill

Pipeline of Bag of words (Bow)

Feature extraction
Sampling strategy

Computing image features, e.g., SIFT

Codebook learning: building visual vocabulary
Generating dictionary (codebook) with clustering

Using the clustering technique, like K-means, to generate the codebook: visual vocabulary.

Recap of K-means
Clustering: Dividing data points into a number of groups such that data points in the same
groups are more similar to other data points in the same group than those in other groups.

K-means: a clustering technique

Partitioning data points into disjoint subsets

Recap of K-means
An algorithm for partitioning data points into disjoint subsets so as to minimize
the sum-of-squares criterion

: The number of clusters

: The index set of each clusters
: The geometric centroid of each cluster.
Basic procedure
i. Initialize cluster centroids randomly
ii. Repeat until convergence:
a. For every , set

b. For every , set

Metric for defining the distance is key, and it is domain-dependent. 41

Demonstration of K-means

After using clustering to generate multiple centers
Each cluster center becomes a codevector == visual word
the whole set of cluster centers is a codebook == visual vocabulary
Codebook can be learned on separate training set
The codebook can be used for quantizing feature
For example, the feature vector is the occurrence of each visual word, i.e. the histogram of
visual words after normalized with sum=1.

How to choose vocabulary size?

Too small: visual words not representative of all
Too large: quantization artifacts, overfitting 43
Demonstration of codebook

Image representation

Image classification using Bow histogram
Histogram of visual words: Summarize entire image based on the occurrences of visual

Leaning a classifier, e.g. SVM or KNN, using BoW histogram based features

Object Detection

Assigning a label and a bounding box to one or many objects in the image

Two-stage approach
Locating the object in the image
Classifying the object in the image

Detection: A case study on face detection
Face detection is the first step in automated face recognition
There are many cues for face detection
skin color
facial/head shape
facial appearance
Basic procedure
An image is scanned at all possible locations and many possible scales of bounding box.
Face detection is posed as classifying the pattern in each bounding box as either face or
The face/nonface classifier is learned from face and non-face training examples using
machine learning methods
The first challenge: how to efficiently calculate the features for each bounding box?
The total number of the possible bounding boxes of an image of size with 5
possible box lengths is = Millions. 48
Haar-type image features with fast computation

Haar-type filters: Value = pixel values in blue - pixel values in pink.

Fast computing via Integral image
For each bounding box, one need to generate a feature vector with sufficient discrimination.
Suppose the feature vector to be the collection of the responses of the bounding box with
different haar-type filters.
For a bounding box, about 160,000 Haar-type box filters with varying size, aspect
rotation, orientation.
It is critical to have an efficient computational scheme to compute the response for a large
number of different haar-type filters
The solution: computation via integral image

[Definition] For an image , its integral image is defined by

Integral image based computation
Pre-computing a look-up table for all possible
Computing the sum of the pixel values in a rectangular region

Then the sum of pixel values within the rectangle can be computed as:

Only 4 lookups and three additions are required for calculating sum of any size of rectangle,
given the table of integral image.
Then the output for Haar-type box filter can be computing efficiently

Feature selection
For a detection region, the number of possible Haar-type (rectangle) features is

It is impractical to computing the entire feature set

The solution is to compose a feature vector with a small number of Haar-type features.
How to select which features are included in the feature vector of bounding box
Training a good classifier such that it can identify "good"
rectangular features and "bad" ones of all possible features

Object detection using object classification

Calculating the feature of each bounding box

Training a classifier to categorize different objects


