0% found this document useful (0 votes)

188 views

Introduction To Deep Learning

This document provides an introduction to deep learning. It defines deep learning as extracting patterns from data using neural networks. Deep learning has become popular due to the availability of big data, powerful hardware like GPUs, improved software techniques, and new models. The basic building block of deep learning is the perceptron, which performs a linear combination of its inputs and passes the result through an activation function to produce an output. Common activation functions include the sigmoid, tanh, and ReLU functions. Activation functions introduce non-linearities that allow neural networks to learn complex patterns from data.

Uploaded by

xtian_villaruz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

188 views

Introduction To Deep Learning

Uploaded by

xtian_villaruz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 151

Introduction to Deep Learning

MIT 6.S191
Alexander Amini
January 28, 2019

Follow me of LinkedIn for more:

Steve Nouri
https://www.linkedin.com/in/stevenouri/
The Rise of Deep Learning

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
What is Deep Learning?

ARTIFICIAL
INTELLIGENCE
MACHINE LEARNING
Any technique that enables
Ability to learn without DEEP LEARNING
computers to mimic
human behavior explicitly being programmed Extract patterns from data using
neural networks

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Why Deep Learning and Why Now?
Why Deep Learning?
Hand engineered features are time consuming, brittle and not scalable in practice
Can we learn the underlying features directly from data?

Low Level Features Mid Level Features High Level Features

Lines & Edges Eyes & Nose & Ears Facial Structure

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Why Now?
Neural Networks date back decades, so why the resurgence?
Stochastic Gradient
1952
Descent

1958
Perceptron 1. Big Data 2. Hardware 3. Software
• Learnable Weights
• Larger Datasets • Graphics • Improved
• Easier Collection Processing Units Techniques
& Storage (GPUs) • New Models
1986 Backpropagation • Massively • Toolboxes
• Multi-Layer Perceptron Parallelizable
1995 Deep Convolutional NN
• Digit Recognition

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron
The structural building block of deep learning
The Perceptron: Forward Propagation

Linear combination
Output of inputs

&" !" #

(' = * + &, !,
!$
Σ
,-"
&$ ('
!# Non-linear
activation function
&#

Inputs Weights Sum Non-Linearity Output

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Forward Propagation

Linear combination
Output of inputs
1
!%
#

&" !" )( = + !%+ - &. !.

Σ
./"
!$ )(
&$ Non-linear Bias
!# activation function

Inputs Weights Sum Non-Linearity Output

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Forward Propagation

1
!% #

)( = + !%+ - &. !.
&" !" ./"

!$ Σ )( )( = + ! % + 1 2 3
&$ &" !"
!#
where: 1 = ⋮ and 3 = ⋮
&# !#
&#

Inputs Weights Sum Non-Linearity Output

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Forward Propagation

Activation Functions
1
!%
*) = , !% + . / 0
&" !"

!$ Σ *) • Example: sigmoid function

&$ 1
!# , 1 = 2 1 =
1 + 3 45

Inputs Weights Sum Non-Linearity Output

1
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Common Activation Functions
Sigmoid Function Hyperbolic Tangent Rectified Linear Unit (ReLU)

1 & ( − & '(

! " = ! " = ( ! " = max ( 0 , " )
1 + & '( & + & '(
1, " > 0
! ′ " = !(") 1 − !(") ! ′ " = 1 − !(")- !′(") = 3
0, otherwise

tf.nn.sigmoid(z) tf.nn.tanh(z) tf.nn.relu(z)

NOTE: All activation functions are non-linear

6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Importance of Activation Functions
The purpose of activation functions is to introduce non-linearities into the network

What if we wanted to build a Neural Network to

distinguish green vs red points?

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Importance of Activation Functions
The purpose of activation functions is to introduce non-linearities into the network

Linear Activation functions produce linear

decisions no matter the network size

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Importance of Activation Functions
The purpose of activation functions is to introduce non-linearities into the network

Linear Activation functions produce linear Non-linearities allow us to approximate

decisions no matter the network size arbitrarily complex functions

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Example

3
We have: + , = 1 and . =
1 −2
1
*) = / + , + 1 2 .
3
&' Σ *) = / 1+ &
&' 2 3
−2 ( −2
*) = / ( 1 + 3 & ' − 2 & ( )
&(
This is just a line in 2D!

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Example
*) = , ( 1 + 3 & ' − 2 & ( )
&(

0
1 1

=
(
2&
−
3
&' Σ *)

'
3&
1+
−2
&'
&(

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Example
*) = , ( 1 + 3 & ' − 2 & ( )
&(

0
1 1

=
−1

(
2&
2

−
3
&' Σ *)

'
3&
1+
−2
&'
&(

−1
Assume we have input: 0 =
2
*) = , 1 + 3 ∗ −1 − 2 ∗ 2
= , −6 ≈ 0.002
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
The Perceptron: Example
*) = , ( 1 + 3 & ' − 2 & ( )
&(

0
1 1

=
1 < 0

(
2&
* < 0.5

−
3
&' Σ *)

'
3&
1+
−2
&'
&(

1 > 0
* > 0.5

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Building Neural Networks with Perceptrons
The Perceptron: Simplified

1
!%

&" !"

!$ Σ *)
&$
!#

Inputs Weights Sum Non-Linearity Output

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
The Perceptron: Simplified

&=( %
!" %

#
% = )* + , !- )-
-.$
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Multi Output Perceptron

!$
&$ = ( %$
%$
!"
&" = ( %"
%"
!#

#
%) = *+,) + . !/ */,)
/0$
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Single Layer Neural Network

($) (")
7 7
6 %$
%$
!$
6 %"
%" ('$
!"
%& ('"
6 %&
!#
%)*
6 %)*

Inputs Hidden Final Output

# )*
($) ($) (") (")
%+ = -.,+ +3 !4 -4,+ ('+ = 6 -.,+ +3 %4 -4,+
45$ 45$
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Single Layer Neural Network

%$
($)
!$ 5$,"

($)
5"," %" ('$
!"
($) %& ('"
5#,"
!#
%)*
#
($) ($)
%" = ,-," +2 !3 ,3,"
34$
($) ($) ($) ($)
= ,-," + !$ ,$," + !" ,"," + !# ,#,"
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Multi Output Perceptron
from tf.keras.layers import *

inputs = Inputs(m)
hidden = Dense(d1)(inputs)
outputs = Dense(2)(hidden)
%$ model = Model(inputs, outputs)

!$
%" ('$
!"
%& ('"
!#
%)*

Inputs Hidden Output

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Deep Neural Network

%&,$
!$
%&," *)$
!" ⋯ ⋯
%&,( *)"
!#
%&,+,

Inputs Hidden Output

+,78
(&) (&)
%&,- = /0,- +4 9(%&:$,5 ) /5,-
56$
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Applying Neural Networks
Example Problem

Will I pass this class?

Let’s start with a simple two feature model

! " = Number of lectures you attend

! $ = Hours spent on the final project

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Example Problem: Will I pass this class?

! " = Hours
spent on the
final project

Legend

Pass
Fail

! $ = Number of lectures you attend

6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Example Problem: Will I pass this class?

! " = Hours
spent on the
final project

Legend
? Pass
4
5 Fail

! $ = Number of lectures you attend

6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Example Problem: Will I pass this class?

$#
!#
#
! = 4 ,5 $" '&# Predicted: 0.1
!"
$%

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Example Problem: Will I pass this class?

$#
!#
# Predicted: 0.1
! = 4 ,5 $" '&#
Actual: 1
!"
$%

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Quantifying Loss

The loss of our network measures the cost incurred from incorrect predictions

$#
!#
# Predicted: 0.1
! = 4 ,5 $" '&#
Actual: 1
!"
$%

ℒ , ! (.) ; 1 , ' (.)

Predicted Actual
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Empirical Loss

The empirical loss measures the total loss over our entire dataset

+(!) '
$#
4, 5 !# 0.1 1
)= 2, 1 0.8 0
$" '&#
5, 8 0.6 1
⋮ ⋮ !"
⋮ ⋮
$%

1 5
Also known as: . / = 2 ℒ + ! (3) ; / , ' (3)
• Objective function
• Cost function
1 34#
• Empirical Risk Predicted Actual
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Binary Cross Entropy Loss

Cross entropy loss can be used with models that output a probability between 0 and 1
+(!) '
$#
4, 5 !# 0.1 1
) = 2, 1 0.8 0
$" '&#
5, 8 0.6 1
⋮ ⋮ !"
⋮ ⋮
$%

1 5
. / = 2 ' (3) log + ! 3 ; / + (1 − ' (3) ) log 1 − + ! 3 ; /
1 34#
Actual Predicted Actual Predicted

loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(model.y, model.pred) )

Mean Squared Error Loss

Mean squared error loss can be used with regression models that output continuous real numbers
+(!) '
$#
4, 5 !# 30 90
) = 2, 1 80 20
$" '&#
5, 8 85 95
⋮ ⋮ !"
⋮ ⋮
$%

1 5 "
. / = 2 ' 3 3
− + ! ;/ Final Grades
1 34# (percentage)
Actual Predicted

loss = tf.reduce_mean( tf.square(tf.subtract(model.y, model.pred) )

Training Neural Networks
Loss Optimization

We want to find the network weights that achieve the lowest loss

1 0
!∗ = argmin , ℒ 2 3 (-) ; ! , 8 (-)
! + -./

!∗ = argmin 9(!)
!

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Loss Optimization

We want to find the network weights that achieve the lowest loss

1 0
!∗ = argmin , ℒ 2 3 (-) ; ! , 8 (-)
! + -./

!∗ = argmin 9(!)
!

Remember:
! = !(:) , !(/) , ⋯

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Loss Optimization
!∗ = argmin *(!)
! Remember:
Our loss is a function of
the network weights!

*(-., -0)

-0
-.
Loss Optimization
Randomly pick an initial ("#, "%)

'("#, "%)

"%
"#
Loss Optimization
!"($)
Compute gradient, !$

&('(, '*)

'*
'(
Loss Optimization
Take small step in opposite direction of gradient

!(#$, #&)

#&
#$
Gradient Descent
Repeat until convergence

!(#$, #&)

#&
#$
Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ') weights = tf.random_normal(shape, stddev=sigma)

2. Loop until convergence:

3. )*(+)
Compute gradient, )+ grads = tf.gradients(ys=loss, xs=weights)

4. )*(+)
Update weights, + ← + − . weights_new = weights.assign(weights – lr * grads)
)+
5. Return weights

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ') weights = tf.random_normal(shape, stddev=sigma)

2. Loop until convergence:

3. )*(+)
Compute gradient, )+ grads = tf.gradients(ys=loss, xs=weights)

4. )*(+)
Update weights, + ← + − . weights_new = weights.assign(weights – lr * grads)
)+
5. Return weights

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Computing Gradients: Backpropagation

!) !"
' () +* #(%)

How does a small change in one weight (ex. !") affect the final loss #(%)?

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Computing Gradients: Backpropagation

&+ &'
) *+ -, "($)

!"($)
=
!&'

Let’s use the chain rule!

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Computing Gradients: Backpropagation

&. &'
, -. *) "($)

!"($) !"($) !*)

= ∗
!&' !*) !&'

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Computing Gradients: Backpropagation

&' &.
, -' *) "($)

!"($) !"($) !*)

= ∗
!&' !*) !&'

Apply chain rule! Apply chain rule!

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Computing Gradients: Backpropagation

&' &.
- ,' *) "($)

!"($) !"($) !*) !,'

= ∗ ∗
!&' !*) !,' !&'

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Computing Gradients: Backpropagation

&' &.
- ,' *) "($)

!"($) !"($) !*) !,'

= ∗ ∗
!&' !*) !,' !&'

Repeat this for every weight in the network using gradients from later layers
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Neural Networks in Practice:
Optimization
Training Neural Networks is Difficult

“Visualizing the loss landscape

of neural nets”. Dec 2017.
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Loss Functions Can Be Difficult to Optimize

Remember:
Optimization through gradient descent

%& (!)
!←!−$
%!

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Loss Functions Can Be Difficult to Optimize

Remember:
Optimization through gradient descent

%& (!)
!←!−$
%!

How can we set the

learning rate?

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Setting the Learning Rate

Small learning rate converges slowly and gets stuck in false local minima

"(!)

Initial guess

!
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Setting the Learning Rate

Large learning rates overshoot, become unstable and diverge

"(!)

Initial guess

!
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Setting the Learning Rate

Stable learning rates converge smoothly and avoid local minima

"($)

Initial guess

!
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
How to deal with this?

Idea 1:
Try lots of different learning rates and see what works “just right”

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
How to deal with this?

Idea 1:
Try lots of different learning rates and see what works “just right”

Idea 2:
Do something smarter!
Design an adaptive learning rate that “adapts” to the landscape

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Adaptive Learning Rates

• Learning rates are no longer fixed

• Can be made larger or smaller depending on:
• how large gradient is
• how fast learning is happening
• size of particular weights
• etc...

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Adaptive Learning Rate Algorithms
Qian et al. “On the momentum term in gradient
• Momentum tf.train.MomentumOptimizer descent learning algorithms.” 1999.

Duchi et al. “Adaptive Subgradient Methods for Online

• Adagrad tf.train.AdagradOptimizer Learning and Stochastic Optimization.” 2011.

Zeiler et al. “ADADELTA: An Adaptive Learning Rate

• Adadelta tf.train.AdadeltaOptimizer
Method.” 2012.

• Adam tf.train.AdamOptimizer
Kingma et al. “Adam: A Method for Stochastic
Optimization.” 2014.

• RMSProp tf.train.RMSPropOptimizer

Additional details: http://ruder.io/optimizing-gradient-descent/

6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Neural Networks in Practice:
Mini-batches
Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ')
2. Loop until convergence:
3. )*(+)
Compute gradient, )+
4. )*(+)
Update weights, + ← + − .
)+
5. Return weights

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ')
2. Loop until convergence:
3. )*(+)
Compute gradient, )+
4. )*(+)
Update weights, + ← + − .
)+
5. Return weights
Can be very
computational to
compute!

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Stochastic Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ')
2. Loop until convergence:
3. Pick single data point )
4. *+, (-)
Compute gradient, *-
5. *+(-)
Update weights, - ← - − 0 *-
6. Return weights

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Stochastic Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ')
2. Loop until convergence:
3. Pick single data point )
4. *+, (-)
Compute gradient, *-
5. *+(-)
Update weights, - ← - − 0 *-
6. Return weights
Easy to compute but
very noisy
(stochastic)!
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Stochastic Gradient Descent

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Stochastic Gradient Descent

Algorithm
1. Initialize weights randomly ~"(0, & ')
2. Loop until convergence:
3. Pick batch of ) data points
4. *+(,) . *+3 (,)
Compute gradient, *,
= / ∑/
12. *,
5. *+(,)
Update weights, , ← , − 6 *,
6. Return weights
Fast to compute and a much better
estimate of the true gradient!
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Mini-batches while training

More accurate estimation of gradient

Smoother convergence
Allows for larger learning rates

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Mini-batches while training

More accurate estimation of gradient

Smoother convergence
Allows for larger learning rates

Mini-batches lead to fast training!

Can parallelize computation + achieve significant speed increases on GPU’s

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Neural Networks in Practice:
Overfitting
The Problem of Overfitting

Underfitting Ideal fit Overfitting

Model does not have capacity Too complex, extra parameters,
to fully learn the data does not generalize well

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Regularization

What is it?
Technique that constrains our optimization problem to discourage complex models

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Regularization

What is it?
Technique that constrains our optimization problem to discourage complex models

Why do we need it?

Improve generalization of our model on unseen data

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Regularization 1: Dropout
• During training, randomly set some activations to 0

'$,$ '",$
!$
'$," '"," &%$
!"
'$,# '",# &%"
!#
'$,) '",)

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Regularization 1: Dropout
• During training, randomly set some activations to 0
• Typically ‘drop’ 50% of activations in layer tf.keras.layers.Dropout(p=0.5)
• Forces network to not rely on any 1 node

'$,$ '",$
!$
'$," '"," &%$
!"
'$,# '",# &%"
!#
'$,) '",)

6.S191 Introduction to Deep Learning

'$,$ '",$
!$
'$," '"," &%$
!"
'$,# '",# &%"
!#
'$,) '",)

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Loss

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Legend

Loss Testing

Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Legend

Loss Testing

Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Legend

Loss Testing

Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Legend

Loss Testing

Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Legend

Loss Testing

Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Legend

Loss Stop training Testing

here!
Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Regularization 2: Early Stopping
• Stop training before we have a chance to overfit

Under-fitting Over-fitting

Legend

Loss Stop training Testing

here!
Training

Training Iterations
6.S191 Introduction to Deep Learning
1/28/19
introtodeeplearning.com
Core Foundation Review

The Perceptron Neural Networks Training in Practice

• Structural building blocks • Stacking Perceptrons to • Adaptive learning

• Nonlinear activation form neural networks • Batching
functions • Optimization through • Regularization
backpropagation

(),%
"% "%
(),# '&%
"# Σ '& "# ⋯ ⋯
(),+ '&#
"$
"$
(),,-

6.S191 Introduction to Deep Learning

1/28/19
introtodeeplearning.com
Images are Numbers
What the computer sees

An image is just a matrix of numbers [0,255]!

i.e., 1080x1080x3 for an RGB image

6.S191 Introduction to Deep Learning

[1] 1/29/19
introtodeeplearning.com
Tasks in Computer Vision

Lincoln 0.8

Washington 0.1
classification
Jefferson 0.05

Obama 0.05

Input Image Pixel Representation

- Regression: output variable takes continuous value

- Classification: output variable takes class label. Can produce probability of belonging to a particular class

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
High Level Feature Detection
Let’s identify key features in each image category

Nose, Wheels, Door,

Eyes, License Plate, Windows,
Mouth Headlights Steps

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Manual Feature Extraction
Detect features
Domain knowledge Define features
to classify

Problems?

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Learning Feature Representations
Can we learn a hierarchy of features directly from the data
instead of hand engineering?
Low level features Mid level features High level features

Edges, dark spots Eyes, ears, nose Facial structure

6.S191 Introduction to Deep Learning

[3] 1/29/19
introtodeeplearning.com
Learning Visual Features
Fully Connected Neural Network

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Fully Connected Neural Network

Input: Fully Connected:

• 2D image • Connect neuron in hidden
• Vector of pixel values layer to all neurons in input
layer
• No spatial information!
• And many, many parameters!

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Fully Connected Neural Network

Input: Fully Connected:

• 2D image • Connect neuron in hidden
• Vector of pixel values layer to all neurons in input
layer
• No spatial information!
• And many, many parameters!

How can we use spatial structure in the input to inform the architecture of the network?

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Using Spatial Structure

Input: 2D image. Idea: connect patches of input

Array of pixel values to neurons in hidden layer.
Neuron connected to region of
input. Only “sees” these values.

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer.

Use a sliding window to define connections.
How can we weight the patch to detect particular features?

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Applying Filters to Extract Features

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features

3) Spatially share parameters of each filter

(features that matter in one part of the input should matter elsewhere)

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Feature Extraction with Convolution
- Filter of size 4x4 : 16 different weights
- Apply this same filter to 4x4 patches in input
- Shift by 2 pixels for next patch

This “patchy” operation is convolution

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features

3) Spatially share parameters of each filter

6.S191 Introduction to Deep Learning

1/29/19
introtodeeplearning.com
Feature Extraction and Convolution
A Case Study
X or X?

Image is represented as matrix of pixel values… and computers are literal!

We want to be able to classify an X as an X even if it’s shifted, shrunk, rotated, deformed.

6.S191 Introduction to Deep Learning

[4] 1/29/19
introtodeeplearning.com
Features of X

6.S191 Introduction to Deep Learning

[4] 1/29/19
introtodeeplearning.com
Filters to Detect X Features

filters

6.S191 Introduction to Deep Learning

[4] 1/29/19
introtodeeplearning.com
The Convolution Operation
1 X 1 =1
element wise
add outputs
multiply

= 9

6.S191 Introduction to Deep Learning

[4] 1/29/19
introtodeeplearning.com
The Convolution Operation
Suppose we want to compute the convolution of a 5x5 image and a 3x3 filter:

filter

image
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs…