Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
Networks
CMSC 733 Fall 2015
Angjoo Kanazawa
Overview
Goal: Understand what Convolutional Neural
Networks (ConvNets) are & intuition behind it.
1. Brief Motivation for Deep Learning
2. What are ConvNets?
3. ConvNets for Object Detection
First of all what is Deep Learning?
● Composition of non-linear transformation of
the data.
● Goal: Learn useful representations, aka
features, directly from data.
Slide: M. Ranzato
Supervised Learning: Examples
Slide: M. Ranzato
Supervised Deep Learning
So deep learning is about learning
feature representation in a
compositional manner.
But wait,
why learn features?
The Black Box in a
Traditional Recognition Approach
Hand
Engin
eered
Feature Post-processing Classifier
Preprocessing Extraction (Feature selection, (SVM,
(HOG, SIFT, etc) MKL etc) boosting, etc)
Feature Post-processing
Preprocessing Extraction (Feature selection,
(HOG, SIFT, etc) MKL etc)
Slide: M. Ranzato
Building a complicated function
Slide: M. Ranzato
Building a complicated function
Slide: M. Ranzato
Intuition behind Deep Neural Nets
Slide: M. Ranzato
Intuition behind Deep Neural Nets
f
f
Joint training architecture overview
Neural Net Training
Slide: M. Ranzato
Neural Net Training
Slide: M. Ranzato
Neural Net Training
Slide: M. Ranzato
Neural Net Training
Slide: M. Ranzato
Neural Net Training
Slide: M. Ranzato
Neural Net Training
Slide: M. Ranzato
Neural Net Training
Slide: M. Ranzato
When the input data is an image..
Slide: M. Ranzato
When the input data is an image..
Reduce connection to local regions
Reuse the same kernel everywhere
Because interesting
features (edges) can
happen at anywhere in
the image.
Convolutional Neural Nets
Detail
If the input has 3 channels (R,G,B),
3 separate k by k filter is applied to
each channel.
Slide: R.
Fergus
Building Translation Invariance
Building Translation Invariance via
Spatial Pooling
Slide: R.
Fergus
Architecture of Alex Krizhevsky et al.
First layer filters
Showing 81 filters of
11x11x3.
Capture low-level
features like oriented
edges, blobs.