Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Part 11 MD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Visual Information Interpretation

Visual classification and Object detection

Ji Hui

1
Basics on Classification
Data: A set of data records described by
attributes:
a class: each example is labelled with a pre-defined class.
Classifier: a procedure that accepts a set of features and
produce a class label for them.
Classification: To train a classifier from the data that can be
used to predict the classes of new data records

2
Classification

Classification: assign input to one or more classes via a decision rule


A decision rule divides the space of input into decision regions, separated by decision
boundary

Three classes;

3
What it means "learning/"training"
Input
a data set
A task , e.g. classification.
A performance measure
A machinery is said to learning/training from to perform the task , if after learning, the
machinery perform better than no learning, as measured by .
Performance measure : Consider a classifier :
Loss function:
: the cost of an sample with label is labelled by as .
: the minimum loss is with correct label, e.g.

4
Training error, testing error and overfitting
Define
Training error: empirical risk for training samples .

Testing error: empirical risk for unseen testing samples .

With only training samples, we can train a classifier t minimize the training error However,
we really want is to minimize the testing error.
Smaller training error does not imply smaller testing error
Generalization error: the difference between training error and testing error
Overfitting: The phenomenon that causes large generalization error is called overfitting in
classification
5
Cross-validation for evaluating generalization error
Split the original dataset into different subsets (folds)
Retains 1 fold as test set and uses the other folds to train the classifier, and calculate
the error rate: the percentage of wrong labels on test set.
After totally runs, calculate the average error rate, which gives us an idea of how well our
model generalizes.

6
Learning classifier

7
Support vector Machine: A linear classifier that maximizing the margin
a linear classifier define a hyper-plane as the decision boundary
For example, in 2D case, the boundary is a line.

The classifier:
: the weight vector
: the bias
for any , Class "+1" if , Class "-1" if , uncertain, if .
8
Binary linear classifier with maximum margin
Which classifier to use?

Classifier with large margin is likely to be more robust to perturbations

9
Margin of linear classifer

Choose and of a linear classifier s.t.

where ( ) are subset in category ( ).


The margin of the linear classifier is defined by

10
Support vector machine (SVM) and its dual form.
SVM is to learn a linear classifier with maximum margin

Equivalently

Dual form of SVM

Its classifier is

11
Multi-class classifier

One-versus-all
Train binary classifiers, one for each class against all other classes.
Predicted class is the class of the most confident classifier
One-versus-one
Train , classifiers, each discriminating between a pair of classes
Selecting the final classification based on the output of the binary classifier, e.g., choose
the class with a majority vote among all the classifiers.

12
Non-linear classifier

Solution: mapping data to higher dimension

13
Feature map
Find a mapping to map feature vector to a higher-dimensional space:

Linear SVM in higher-dimensional space


Classifier: with :

Dual form of SVM in higher dimensional space


Classifier:

Learning:

14
Kernel trick for dual classifier
Recall that
The function is called a kernel.
Dual classifier in kernel space
Classifier:
Learning:

subject to
Example
Linear kernel
Gaussian kernel:

15
Practical Gaussian SVM with regularization parameter

Classifier:

Learning:

subject to
Two parameters
-`Smaller value large robustness to outliers
Larger value smoother decision boundary

16
Example

17
Example

18
Example

19
Summary of kernel method
Classifiers can be learned for high dimensional features spaces, without actually having to
map the points into the high dimensional space
Data may be linearly separable in the high dimensional space, but not linearly separable in
the original feature space
Kernels can be used for an SVM because of the scalar product in the dual form, but can also
be used elsewhere – they are not tied to the SVM formalism

20
SVM from scikit-learn (A machine learning package)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

# Simulate a binary classification with 2D feature vectors


xx, yy = np.meshgrid(np.linspace(-3, 3, 500), np.linspace(-3, 3, 500))
np.random.seed(0)
X = np.random.randn(300, 2)
Y = np.random.randint(2,size=300)
# Fitting a non-linear SVM
clf = svm.SVC(C=1.0, kernel='rbf', gamma='auto')
clf.fit(X, Y)

# Predict the labels of two vectors


clf.predict([[0.7,0.3],[0.1,0.9]])

21
Visual classification
Visual categorization: assign an input image a label from pre-defined set

Categorization usually does not assume object detection


Labels are granularity-based
General:
Basic:
specfic:
More specific labels are, easier task for computer
more specific less intra-class variations, more visual cues. 22
Number of categories
How many object categories in general visual categorization?
About 10,000 to 30,000.

23
Challenges: viewpoint variations

24
Challenges: illumination variations

25
Challenges: deformations

26
Challenges: occlusion

27
Challenges: background clutter

28
Challenges: intra-class variations

29
Basics of visual classification

Representation: Building feature


vectors
How to represent an object
category
Learning to classify
Which classifier?
How to learn the classifier, given
training data
Classification
How the classifier is to be used on
real data

30
Representation: Building feature vector

Basic principle
Expose inter-class (between-class) variation
Suppress intra-class (with-in class) variation
Invariance to environmental changes, e.g. light, viewpoint. 31
Hand-crafting good image features
Detect Interest Points
DoG detector in SIFT; Harris; Dense/random: (randomly) taking every -th pixel

Compute Descriptor around each interest point


SIFT, Histogram of Oriented Gradients (HoG)

32
General point for descriptor

Exact intensity values are NOT important


Image edges or contours are important
Image texture is important
Exact feature location is NOT important
33
Discriminative models
Modeling essential content of an image by a feature vector with sufficient discriminative
information.

Building discriminative features of images


Training the classifier using the input of discriminative features
SVM, -Nearest neighbors, and others.
34
Discriminative model: Bag of word

35
Origin: Bag of Words for document representation
Orderless document representation: frequencies of words from a dictionary Salton & McGill
(1983)

36
Pipeline of Bag of words (Bow)

37
Feature extraction
Sampling strategy

Computing image features, e.g., SIFT

38
Codebook learning: building visual vocabulary
Generating dictionary (codebook) with clustering

Using the clustering technique, like K-means, to generate the codebook: visual vocabulary.

39
Recap of K-means
Clustering: Dividing data points into a number of groups such that data points in the same
groups are more similar to other data points in the same group than those in other groups.

K-means: a clustering technique


Partitioning data points into disjoint subsets

40
Recap of K-means
An algorithm for partitioning data points into disjoint subsets so as to minimize
the sum-of-squares criterion

: The number of clusters


: The index set of each clusters
: The geometric centroid of each cluster.
Basic procedure
i. Initialize cluster centroids randomly
ii. Repeat until convergence:
a. For every , set

b. For every , set

Metric for defining the distance is key, and it is domain-dependent. 41


Demonstration of K-means

42
cont'
After using clustering to generate multiple centers
Each cluster center becomes a codevector == visual word
the whole set of cluster centers is a codebook == visual vocabulary
Codebook can be learned on separate training set
The codebook can be used for quantizing feature
For example, the feature vector is the occurrence of each visual word, i.e. the histogram of
visual words after normalized with sum=1.

How to choose vocabulary size?


Too small: visual words not representative of all
Too large: quantization artifacts, overfitting 43
Demonstration of codebook

44
Image representation

45
Image classification using Bow histogram
Histogram of visual words: Summarize entire image based on the occurrences of visual
words

Leaning a classifier, e.g. SVM or KNN, using BoW histogram based features

46
Object Detection

Assigning a label and a bounding box to one or many objects in the image

Two-stage approach
Locating the object in the image
Classifying the object in the image

47
Detection: A case study on face detection
Face detection is the first step in automated face recognition
There are many cues for face detection
skin color
facial/head shape
facial appearance
Basic procedure
An image is scanned at all possible locations and many possible scales of bounding box.
Face detection is posed as classifying the pattern in each bounding box as either face or
non-face.
The face/nonface classifier is learned from face and non-face training examples using
machine learning methods
The first challenge: how to efficiently calculate the features for each bounding box?
The total number of the possible bounding boxes of an image of size with 5
possible box lengths is = Millions. 48
Haar-type image features with fast computation

Haar-type filters: Value = pixel values in blue - pixel values in pink.

49
Fast computing via Integral image
For each bounding box, one need to generate a feature vector with sufficient discrimination.
Suppose the feature vector to be the collection of the responses of the bounding box with
different haar-type filters.
For a bounding box, about 160,000 Haar-type box filters with varying size, aspect
rotation, orientation.
It is critical to have an efficient computational scheme to compute the response for a large
number of different haar-type filters
The solution: computation via integral image

[Definition] For an image , its integral image is defined by

50
Integral image based computation
Pre-computing a look-up table for all possible
Computing the sum of the pixel values in a rectangular region

Then the sum of pixel values within the rectangle can be computed as:

Only 4 lookups and three additions are required for calculating sum of any size of rectangle,
given the table of integral image.
Then the output for Haar-type box filter can be computing efficiently

51
Feature selection
For a detection region, the number of possible Haar-type (rectangle) features is
about

It is impractical to computing the entire feature set


The solution is to compose a feature vector with a small number of Haar-type features.
How to select which features are included in the feature vector of bounding box
Training a good classifier such that it can identify "good"
rectangular features and "bad" ones of all possible features

52
Object detection using object classification

Calculating the feature of each bounding box


Training a classifier to categorize different objects

53

You might also like