Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
23 views

L12_intro-cnn-part1__slides

Uploaded by

Osman Hamdi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

L12_intro-cnn-part1__slides

Uploaded by

Osman Hamdi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Lecture 12

Introduction to
Convolutional Neural Networks
Part 1
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

https://github.com/rasbt/stat453-deep-learning-ss20/tree/master/L12-cnns

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
CNNs for Image Classification

output

Image Source:
twitter.com%2Fcats&psig=AOvVaw30_o-PCM-
K21DiMAJQimQ4&ust=1553887775741551
p(y=cat)

Image Source: https://www.pinterest.com/pin/


244742560974520446

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
Object Detection

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779-788).

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Object Segmentation

umbrella.98 bus.99

umbrella.98
person1.00

person1.00
person1.00
backpack1.00
person1.00 person.99
handbag.96 person.99
person1.00 person1.00 person1.00
person1.00 person1.00
person.95 person.98
person1.00
person1.00 person1.00 person.94 person1.00 person1.00 person.89

person1.00 sheep.99
backpack.99
sheep.99 sheep.86
backpack.93 sheep.82 sheep.96
sheep.96 sheep.93 sheep.91 sheep.95 sheep.96 sheep1.00
sheep1.00
sheep.99
sheep1.00
sheep.99
sheep.96

sheep.99

person.99
bottle.99
dining table.96

bottle.99
bottle.99

person.99person1.00
person1.00
traffic light.96 tv.99

chair.98 chair.99
chair.90
dining table.99 chair.96 wine glass.97
chair.86
bottle.99wine glass.93 chair.99
bowl.85 wine glass1.00

elephant1.00
wine glass.99
wine glass1.00
person1.00 chair.96 chair.99 fork.95

person1.00 traffic light.95 bowl.81


person1.00
traffic light.92 traffic light.84
person1.00 person.85
person.96 truck1.00 person.99
motorcycle1.00 person.96person1.00
person.83 person1.00
motorcycle1.00 person.98
person.99 person.91
person.90 person.87 car.99 car.92
person.99
person.92 car.99 car.93
car1.00
motorcycle.95
knife.83

person.96

Figure 2. Mask R-CNN results on the COCO test set. These results are based on ResNet-101 [15], achieving a mask AP of 35.7 and
running at 5 fps. Masks are shown in color, and bounding box, category, and confidences are also shown.

ingly minor change, RoIAlign has a large impact: it im- 2. Related Work
proves mask accuracy by relative 10% to 50%, showing
He,bigger
Kaiming,
gainsGeorgia Gkioxari,
under stricter Piotr Dollár,
localization and Ross
metrics. Girshick.
Second, we R-CNN:
"Mask The Region-based
R-CNN." CNN
In Proceedings (R-CNN)
of the approach [10]
IEEE International
Conference on Computer
found it essential Vision,mask
to decouple pp. 2961-2969. 2017. we
and class prediction: to bounding-box object detection is to attend to a manage-
predict a binary mask for each class independently, without able number of candidate object regions [33, 16] and evalu-
competition among classes, and rely on the network’s RoI ate convolutional networks [20, 19] independently on each
Sebastian
classification Raschka
branch to predict theSTAT 453: In
category. Intro RoI. R-CNN
to Deep Learning
contrast, and was extendedModels
Generative [14, 9] to allow attending
SS 2020to RoIs 4
Face Recognition

[1]
x
<latexit sha1_base64="p8Wx+cqqkWj+1zNtDaf7R0Gpalg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122y7dbMLuRCyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLEykMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3Q2q4FIo3UKDk7URzGoWSt8LRzdRvPXJtRKzucZzwIKIDJfqCUbRS6+kh871g0i2V3Yo7A1kmXk7KkKPeLX11ejFLI66QSWqM77kJBhnVKJjkk2InNTyhbEQH3LdU0YibIJudOyGnVumRfqxtKSQz9fdERiNjxlFoOyOKQ7PoTcX/PD/F/lWQCZWkyBWbL+qnkmBMpr+TntCcoRxbQpkW9lbChlRThjahog3BW3x5mTSrFe+8Ur27KNeu8zgKcAwncAYeXEINbqEODWAwgmd4hTcncV6cd+dj3rri5DNH8AfO5w81Jo97</latexit>

Similarity/
Distance
Score

[2]
x
<latexit sha1_base64="vzgd/QPklE2GpKgvXahAxpOTUdw=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122i7dbMLuRiyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLE8G1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1Q6pRcIkNw43AdqKQRqHAVji6mfqtR1Sax/LejBMMIjqQvM8ZNVZqPT1kfjWYdEtlt+LOQJaJl5My5Kh3S1+dXszSCKVhgmrte25igowqw5nASbGTakwoG9EB+pZKGqEOstm5E3JqlR7px8qWNGSm/p7IaKT1OAptZ0TNUC96U/E/z09N/yrIuExSg5LNF/VTQUxMpr+THlfIjBhbQpni9lbChlRRZmxCRRuCt/jyMmlWK955pXp3Ua5d53EU4BhO4Aw8uIQa3EIdGsBgBM/wCm9O4rw4787HvHXFyWeO4A+czx82rI98</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
Lecture Overview

Today Next Lecture

Image Classification Padding


Convolutional Neural Network Basics Dropout2d and BatchNorm2d
CNN Architectures CNNs on the GPU
What a CNN Can See Common CNN Architectures in Detail
CNNs in PyTorch Transfer Learning

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 6
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
Why Image Classification is Hard

Different lighting, contrast, viewpoints, etc.

Image Source: Image Source: https://www.123rf.com/


twitter.com%2Fcats&psig=AOvVaw30_o-PCM- photo_76714328_side-view-of-tabby-cat-face-over-
K21DiMAJQimQ4&ust=1553887775741551 white.html

Or even simple translation This is hard for traditional


methods like multi-layer
perceptrons, because
the prediction is
basically based on a sum
of pixel intensities

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Traditional Approaches

a) Use hand-engineered features

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
Traditional Approaches

a) Use hand-engineered features

Sasaki, K., Hashimoto, M., & Nagata, N. (2016). Person Invariant Classification of Subtle Facial Expressions Using Coded Movement Direction of
Keypoints. In Video Analytics. Face and Facial Expression Recognition and Audience Measurement (pp. 61-72). Springer, Cham.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
Traditional Approaches

b) Preprocess images (centering, cropping, etc.)

Image Source: https://www.tokkoro.com/2827328-cat-animals-nature-feline-park-green-trees-grass.html

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
Main Concepts Behind
Convolutional Neural Networks

• Sparse-connectivity: A single element in the feature map is


connected to only a small patch of pixels. (This is very different
from connecting to the whole input image, in the case of multi-layer
perceptrons.)

• Parameter-sharing: The same weights are used for different


patches of the input image.

• Many layers: Combining extracted local patterns to global patterns

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 13
Convolutional Neural Networks

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to
Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, Winter 1989.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 14
Convolutional Neural Networks
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 15
Hidden Layers
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

"Automatic feature extractor" "Regular classifier"

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 16
Hidden Layers
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

Each "bunch" of feature maps represents one


hidden layer in the neural network.

Counting the FC layers, this network has 5 layers

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 17
Convolutional Neural Networks
Size of the resulting layers
Number of feature detectors
C3: f. maps 16@10x10
INPUT
C1: feature maps
6@28x28
S4: f. maps 16@5x5 Multi-layer perceptron
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

basically a fully-connected
nowadays called "pooling"
layer + MSE loss
"Feature detectors" (weight matrices) (nowadays better to use
that are being reused ("weight sharing") fc-layer + softmax
=> also called "kernel" or "filter" + cross entropy

Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 18
Weight Sharing
A "feature detector" (filter, kernel) slides over the inputs to generate
a feature map

9
X
The pixels are w j xj
j=1
referred to
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>

as "receptive field"

"feature map"

Rationale: A feature detector that works well in one region


may also work well in another region

Plus, it is a nice reduction in parameters to fit

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 19
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 20
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
9
X (@1)
wj xj
j=1
<latexit sha1_base64="f26ph3SsblXR0kXxlacC2FehXAE=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaWeSMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaMBqSqqGGlEgiDuMVL3Blcjv35HhKRhcKuGEXE46gbUpxgpLbnmfkvG3E36F3baTs5TeO/220mhbB+l8MHtu2beKlpjwHliT0keTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0lwrliRCeIC6pKlpgDiRTjJ+I4WHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Mh2LMvz5NaqWgfF0s3J/ny5TSOLNgDB6AAbHAKyuAaVEAVYPAInsEreDOejBfj3fiYtGaM6cwu+APj8wfkL5gZ</latexit>

Multiple "feature
detectors" (kernels) are used
to create multiple feature
maps
9
X (@2)
w j xj
<latexit sha1_base64="nCqyd07UuJkUGWlSGLMV2F7bQVM=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaSeTMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaOcVBVVjDQiQVDgMVL3Blcjv35HhKQhv1XDiDgB6nLqU4yUllxzvyXjwE36F3baTs5TeO/220mhXDpK4YPbd828VbTGgPPEnpI8mKLiml+tTojjgHCFGZKyaVuRchIkFMWMpLlWLEmE8AB1SVNTjgIinWT8RgoPtdKBfih0cQXH6u+JBAVSDgNPdwZI9eSsNxL/85qx8s+chPIoVoTjySI/ZlCFcJQJ7FBBsGJDTRAWVN8KcQ8JhJVOLqdDsGdfnie1UtE+LpZuTvLly2kcWbAHDkAB2OAUlME1qIAqwOARPINX8GY8GS/Gu/Exac0Y05ld8AfG5w/luZga</latexit>
j=1

9
X (@3)
wj xj
<latexit sha1_base64="N3BOf0nmcHBzr6vnBzaSoMFhcQo=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBHqpiStoC6EohuXFewD2jRMppN22pkkzEzUErJy46+4caGIW7/BnX/j9LHQ1gMXDufcy733eBGjUlnWt7GwuLS8sppZy65vbG5tmzu7NRnGApMqDlkoGh6ShNGAVBVVjDQiQRD3GKl7g6uRX78jQtIwuFXDiDgcdQPqU4yUllzzoCVj7ib9CzttJ+cpvHf77SRfLh2n8MHtu2bOKlhjwHliT0kOTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0mwrliRCeIC6pKlpgDiRTjJ+I4VHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Uh2LMvz5NasWCXCsWbk1z5chpHBuyDQ5AHNjgFZXANKqAKMHgEz+AVvBlPxovxbnxMWheM6cwe+APj8wfnQ5gb</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 22
Size Before and After Convolutions

Feature map size: kernel width


input width

padding
W K + 2P
O= +1
<latexit sha1_base64="F3e+5qMk1hWaddof/b46u0hNgJ4=">AAACBXicbZC7SgNBFIbPxluMt1VLLQaDIIhhNwraCEEbwcKI5gLJEmYns8mQ2Qszs0JYtrHxVWwsFLH1Hex8GyfJFpr4w8DHf87hzPndiDOpLOvbyM3NLywu5ZcLK6tr6xvm5lZdhrEgtEZCHoqmiyXlLKA1xRSnzUhQ7LucNtzB5ajeeKBCsjC4V8OIOj7uBcxjBCttdczdG3SO2p7AJGmgI3SNDlG5miZ3qQa7YxatkjUWmgU7gyJkqnbMr3Y3JLFPA0U4lrJlW5FyEiwUI5ymhXYsaYTJAPdoS2OAfSqdZHxFiva100VeKPQLFBq7vycS7Es59F3d6WPVl9O1kflfrRUr78xJWBDFigZkssiLOVIhGkWCukxQovhQAyaC6b8i0sc6EqWDK+gQ7OmTZ6FeLtnHpfLtSbFykcWRhx3YgwOw4RQqcAVVqAGBR3iGV3gznowX4934mLTmjGxmG/7I+PwBia6VZg==</latexit>
S

output width stride

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 23
Kernel Dimensions and Trainable Parameters

For a grayscale image with a


5x5 feature detector (kernel),
we have the following dimensions
(number of parameters to learn)

What do you think is the output size


for this 28x28 image?

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 24
Cross-Correlation vs Convolution

Deep Learning Jargon: convolution in DL is actually cross-correlation


Cross-correlation is our sliding dot product over the image

Z[i, j]
<latexit sha1_base64="pgja+B12YuauQl9BN2y3pM0zL0U=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbBg5TdKuix6MVjBfuB26Vk02wbm2SXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtHHdb6ewsrq2vlHcLG1t7+zulfcPWjpOFaFNEvNYdUKsKWeSNg0znHYSRbEIOW2Ho5up336iSrNY3ptxQgOBB5JFjGBjpfaDz87QY9ArV9yqOwNaJl5OKpCj0St/dfsxSQWVhnCste+5iQkyrAwjnE5K3VTTBJMRHlDfUokF1UE2O3eCTqzSR1GsbEmDZurviQwLrccitJ0Cm6Fe9Kbif56fmugqyJhMUkMlmS+KUo5MjKa/oz5TlBg+tgQTxeytiAyxwsTYhEo2BG/x5WXSqlW982rt7qJSv87jKMIRHMMpeHAJdbiFBjSBwAie4RXenMR5cd6dj3lrwclnDuEPnM8faFKO9Q==</latexit>

"feature map"

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 25
Cross-Correlation vs Convolution

Z[i, j]
<latexit sha1_base64="pgja+B12YuauQl9BN2y3pM0zL0U=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbBg5TdKuix6MVjBfuB26Vk02wbm2SXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtHHdb6ewsrq2vlHcLG1t7+zulfcPWjpOFaFNEvNYdUKsKWeSNg0znHYSRbEIOW2Ho5up336iSrNY3ptxQgOBB5JFjGBjpfaDz87QY9ArV9yqOwNaJl5OKpCj0St/dfsxSQWVhnCste+5iQkyrAwjnE5K3VTTBJMRHlDfUokF1UE2O3eCTqzSR1GsbEmDZurviQwLrccitJ0Cm6Fe9Kbif56fmugqyJhMUkMlmS+KUo5MjKa/oz5TlBg+tgQTxeytiAyxwsTYhEo2BG/x5WXSqlW982rt7qJSv87jKMIRHMMpeHAJdbiFBjSBwAie4RXenMR5cd6dj3lrwclnDuEPnM8faFKO9Q==</latexit>

"feature map"

Cross-Correlation:
k
X k
X
Z[i, j] = K[u, v]A[i + u, j + v]
<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k

Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 26
Cross-Correlation vs Convolution
Cross-Correlation:
k
X k
X
Z[i, j] = K[u, v]A[i + u, j + v]
<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k

Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>
1) 2) 3)
-1,-1 -1,0 -1,1

Looping direction 4) 5) 6)
indicated in red 0,-1 0,0 0,1

7) 8) 9)
1,-1 1,0 1,1

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 27
Cross-Correlation vs Convolution
k
X k
X
Cross-Correlation: Z[i, j] = K[u, v]A[i + u, j + v] Z[i, j] = K ⌦ A
<latexit sha1_base64="i/izFKQB/27hVxepexqMFOTaYd8=">AAAB/nicbVBNS8NAEN34WetXVDx5WSyCBylJFfQiVL0IXirYD0xD2Wy37dpNNuxOhBIK/hUvHhTx6u/w5r9x2+agrQ8GHu/NMDMviAXX4Djf1tz8wuLScm4lv7q2vrFpb23XtEwUZVUqhVSNgGgmeMSqwEGwRqwYCQPB6kH/auTXH5nSXEZ3MIiZH5JuxDucEjBSy9699/gRfvDxOb7BTQk8ZBpftOyCU3TGwLPEzUgBZai07K9mW9IkZBFQQbT2XCcGPyUKOBVsmG8mmsWE9kmXeYZGxKzx0/H5Q3xglDbuSGUqAjxWf0+kJNR6EAamMyTQ09PeSPzP8xLonPkpj+IEWEQnizqJwCDxKAvc5opREANDCFXc3IppjyhCwSSWNyG40y/Pklqp6B4XS7cnhfJlFkcO7aF9dIhcdIrK6BpVUBVRlKJn9IrerCfrxXq3Piatc1Y2s4P+wPr8AYAlk+g=</latexit>

<latexit sha1_base64="yrGEywl1Y1LByNhPCkHvbOrPHc4=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiAIiWEmCgoSiHoRvEQwC07G0NPpSTrpWeglEIZ8jBd/xYsHFzx48VvsLIhGCxrqVb3H61duxKiQpvlhJObmFxaXksupldW19Y305lZVhIpjUsEhC3ndRYIwGpCKpJKResQJ8l1Gam7vYuTX+oQLGgY3chARx0ftgHoUI6mlZvr01qY52HWKDaH8ZqyKB73hXdwbwknd/66vbJWDfQee2TSrWTfbd5rpjJk3x4B/iTUlGTBFuZl+bbRCrHwSSMyQELZlRtKJEZcUMzJMNZQgEcI91Ca2pgHyiXDi8ZFDuKeVFvRCrl8g4Vj9OREjX4iB7+pOH8mOmPVG4n+eraR34sQ0iJQkAZ4s8hSDMoSjxGCLcoIlG2iCMKf6rxB3EEdY6lxTOgRr9uS/pFrIW4f5wvVRpnQ+jSMJdsAu2AcWOAYlcAnKoAIwuAeP4Bm8GA/Gk/FmvE9aE8Z0Zhv8gvH5BZ6lo4U=</latexit>
u= k v= k

k
X k
X
Convolution:
Z[i, j] = K[u, v]A[i u, j v]
<latexit sha1_base64="gciLvvrtiG4n9L4bASqngj19+7w=">AAACJHicbVDJSgNBFOyJW4xb1KOXxiB4SMJMFBQkEPUieIlgFpyMoafTSTrTs9BLIAz5GC/+ihcPLnjw4rfYWRBNLGioV/Uer1+5EaNCmuankVhYXFpeSa6m1tY3NrfS2ztVESqOSQWHLOR1FwnCaEAqkkpG6hEnyHcZqbne5civ9QkXNAxu5SAijo86AW1TjKSWmumzO5tmYc8pNoTym7Eq5rzhfewN4aTu/9TXtsrCvgPPbZrTrJfrO810xsybY8B5Yk1JBkxRbqbfGq0QK58EEjMkhG2ZkXRixCXFjAxTDSVIhLCHOsTWNEA+EU48PnIID7TSgu2Q6xdIOFZ/T8TIF2Lgu7rTR7IrZr2R+J9nK9k+dWIaREqSAE8WtRWDMoSjxGCLcoIlG2iCMKf6rxB3EUdY6lxTOgRr9uR5Ui3kraN84eY4U7qYxpEEe2AfHAILnIASuAJlUAEYPIAn8AJejUfj2Xg3PiatCWM6swv+wPj6BqTHo4k=</latexit>
u= k v= k
Z[i, j] = K ⇤ A
<latexit sha1_base64="vZGNFWQeTgSkyiycymTGu76B7qc=">AAAB+HicbVDLSgNBEOyNrxgfWfXoZTAIIhJ2o6AXIepF8BLBPHCzhNnJbDJm9sHMrBCXfIkXD4p49VO8+TdOkj1oYkFDUdVNd5cXcyaVZX0buYXFpeWV/GphbX1js2hubTdklAhC6yTikWh5WFLOQlpXTHHaigXFgcdp0xtcjf3mIxWSReGdGsbUDXAvZD4jWGmpYxbvHXaEHlx0jm7QIbromCWrbE2A5omdkRJkqHXMr3Y3IklAQ0U4ltKxrVi5KRaKEU5HhXYiaYzJAPeoo2mIAyrddHL4CO1rpYv8SOgKFZqovydSHEg5DDzdGWDVl7PeWPzPcxLln7kpC+NE0ZBMF/kJRypC4xRQlwlKFB9qgolg+lZE+lhgonRWBR2CPfvyPGlUyvZxuXJ7UqpeZnHkYRf24ABsOIUqXEMN6kAggWd4hTfjyXgx3o2PaWvOyGZ24A+Mzx9QvZDp</latexit>
9) 8) 7)
-1,-1 -1,0 -1,1
Basically, we are flipping the kernel (or the
6) 5) 4)
receptive field) horizontally and vertically
0,-1 0,0 0,1

3) 2) 1)
Looping direction 1,-1 1,0 1,1
indicated in red
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 28
Cross-Correlation vs Convolution

Deep Learning Jargon: convolution in DL is actually cross-correlation

"Real" convolution has the nice associative property:


(A ⇤ B) ⇤ C = A ⇤ (B ⇤ C)
<latexit sha1_base64="NmDEdG9hUsHl7PiAA9yVIT7ok0s=">AAAB+nicbVDLSsNAFJ3UV62vVJduBovQZlGSKuhG6GPjsoJ9QBvKZDpph04mYWailNhPceNCEbd+iTv/xmmbhbYeuHA4517uvceLGJXKtr+NzMbm1vZOdje3t39weGTmj9syjAUmLRyyUHQ9JAmjnLQUVYx0I0FQ4DHS8SaNud95IELSkN+raUTcAI049SlGSksDM1+sWfWS1YA3sGYV61ajNDALdtleAK4TJyUFkKI5ML/6wxDHAeEKMyRlz7Ej5SZIKIoZmeX6sSQRwhM0Ij1NOQqIdJPF6TN4rpUh9EOhiyu4UH9PJCiQchp4ujNAaixXvbn4n9eLlX/tJpRHsSIcLxf5MYMqhPMc4JAKghWbaoKwoPpWiMdIIKx0WjkdgrP68jppV8rORblyd1mo1tM4suAUnIEicMAVqIJb0AQtgMEjeAav4M14Ml6Md+Nj2Zox0pkT8AfG5w8dmJCs</latexit>

In DL, we usually don't care about that (as opposed to many traditional
computer vision and signal processing applications).

Also, cross-correlation is easier to implement.

Maybe the term "convolution" for cross-correlation became popular,


because "Cross-Correlational Neural Network" sounds weird ;)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 29
Backpropagation in CNNs

Same overall concept as before: Multivariable chain rule,


but now with an additional weight sharing constraint

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 30
Remember Lecture 6? Graph with Weight Sharing

<latexit sha1_base64="Kk055JsIihKrUR2Qz/L8arieVek=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN4NFqJuSqKAboejGZQV7gTaEyXTSDp1MwsxEiKW+ihsXirj1Qdz5Nk7bLLT1h4GP/5zDOfMHCWdKO863VVhZXVvfKG6WtrZ3dvfs/YOWilNJaJPEPJadACvKmaBNzTSnnURSHAWctoPRzbTefqBSsVjc6yyhXoQHgoWMYG0s3y73FBtE2Herj757gq6QQd+uODVnJrQMbg4VyNXw7a9ePyZpRIUmHCvVdZ1Ee2MsNSOcTkq9VNEEkxEe0K5BgSOqvPHs+Ak6Nk4fhbE0T2g0c39PjHGkVBYFpjPCeqgWa1Pzv1o31eGlN2YiSTUVZL4oTDnSMZomgfpMUqJ5ZgATycytiAyxxESbvEomBHfxy8vQOq25ZzXn7rxSv87jKMIhHEEVXLiAOtxCA5pAIINneIU368l6sd6tj3lrwcpnyvBH1ucPPZSTMQ==</latexit>
1 (z1 ) = a1
@a1
a1 @o
y
@w1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="BOU8IhEf1nCpOTJ2JoJhJKmU0Z0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF48V7Qe0oUy2m3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqKGvSWMSqE6BmgkvWNNwI1kkUwygQrB2Mb2d++4kpzWP5aCYJ8yMcSh5yisZKD9j3+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/NTp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPYzLpPUMEkXi8JUEBOT2d9kwBWjRkwsQaq4vZXQESqkxqZTsiF4yy+vktZF1busuve1Sv0mj6MIJ3AK5+DBFdThDhrQBApDeIZXeHOE8+K8Ox+L1oKTzxzDHzifP+lBjYs=</latexit>

@l
<latexit sha1_base64="4Z/KrxxA+GlxhVaTOyJaNHrf4VU=">AAACC3icbVC7TsMwFHV4lvIKMLJYrZCYqgSQYKxgYSwSfUhtFN24TmvVcSLbAVVRdxZ+hYUBhFj5ATb+BqeNBLQcydK559xr+54g4Uxpx/mylpZXVtfWSxvlza3tnV17b7+l4lQS2iQxj2UnAEU5E7Spmea0k0gKUcBpOxhd5X77jkrFYnGrxwn1IhgIFjIC2ki+XemFEkjWS0BqBhyD705+qntTYd+uOjVnCrxI3IJUUYGGb3/2+jFJIyo04aBU13US7WX5nYTTSbmXKpoAGcGAdg0VEFHlZdNdJvjIKH0cxtIcofFU/T2RQaTUOApMZwR6qOa9XPzP66Y6vPAyJpJUU0FmD4UpxzrGeTC4zyQlmo8NASKZ+SsmQzDhaBNf2YTgzq+8SFonNfe05tycVeuXRRwldIgq6Bi56BzV0TVqoCYi6AE9oRf0aj1az9ab9T5rXbKKmQP0B9bHN7v7mtM=</latexit>

w1 <latexit sha1_base64="A5BNLDamJxmqDahJkf2wo8PNgJk=">AAACCXicbZBPS8MwGMbT+W/Of1WPXoJD8DRaFfQ49OJxgtuErZS3WbqFpWlJUmGUXr34Vbx4UMSr38Cb38Z0K6ibDwR+PO/7JnmfIOFMacf5sipLyyura9X12sbm1vaOvbvXUXEqCW2TmMfyLgBFORO0rZnm9C6RFKKA024wvirq3XsqFYvFrZ4k1ItgKFjICGhj+TbuhxJI1k9AagYcx/kPg+/m2LfrTsOZCi+CW0IdlWr59md/EJM0okITDkr1XCfRXlbcSTjNa/1U0QTIGIa0Z1BARJWXTTfJ8ZFxBjiMpTlC46n7eyKDSKlJFJjOCPRIzdcK879aL9XhhZcxkaSaCjJ7KEw51jEuYsEDJinRfGIAiGTmr5iMwESjTXg1E4I7v/IidE4a7mnDuTmrNy/LOKroAB2iY+Sic9RE16iF2oigB/SEXtCr9Wg9W2/W+6y1YpUz++iPrI9vdbeaJw==</latexit>
@a1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

<latexit sha1_base64="EBSdFgeDtyPz9IVeKoGbRsZIvwA=">AAACB3icbZDLSsNAFIZP6q3WW9SlIINFcFUSFXRZdOOygr1AG8pkOmmHTiZhZiKUkJ0bX8WNC0Xc+grufBsnbUBt/WHg4z/nzMz5/ZgzpR3nyyotLa+srpXXKxubW9s79u5eS0WJJLRJIh7Jjo8V5UzQpmaa004sKQ59Ttv++Dqvt++pVCwSd3oSUy/EQ8ECRrA2Vt8+7AUSk7QXY6kZ5ohnPxxlqG9XnZozFVoEt4AqFGr07c/eICJJSIUmHCvVdZ1Ye2l+I+E0q/QSRWNMxnhIuwYFDqny0ukeGTo2zgAFkTRHaDR1f0+kOFRqEvqmM8R6pOZruflfrZvo4NJLmYgTTQWZPRQkHOkI5aGgAZOUaD4xgIlk5q+IjLAJRpvoKiYEd37lRWid1tyzmnN7Xq1fFXGU4QCO4ARcuIA63EADmkDgAZ7gBV6tR+vZerPeZ60lq5jZhz+yPr4BSfGZjg==</latexit>
@o L(y, o) = l
x1 w 1 · x 1 = z1 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="+e1bOL2+yE8wQHw7R7Wi1lbuH7o=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiuCqJCroRim5cVrAXaEOYTCbt0MlMmJmoMdRXceNCEbc+iDvfxmmbhbb+MPDxn3M4Z/4gYVRpx/m2SkvLK6tr5fXKxubW9o69u9dWIpWYtLBgQnYDpAijnLQ01Yx0E0lQHDDSCUZXk3rnjkhFBb/VWUK8GA04jShG2li+Xb33XdjHodDwwdAFfPRd3645dWcquAhuATVQqOnbX/1Q4DQmXGOGlOq5TqK9HElNMSPjSj9VJEF4hAakZ5CjmCgvnx4/hofGCWEkpHlcw6n7eyJHsVJZHJjOGOmhmq9NzP9qvVRH515OeZJqwvFsUZQyqAWcJAFDKgnWLDOAsKTmVoiHSCKsTV4VE4I7/+VFaB/X3ZO6c3Naa1wWcZTBPjgAR8AFZ6ABrkETtAAGGXgGr+DNerJerHfrY9ZasoqZKvgj6/MHXCGTRw==</latexit>

<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

w1 3 (a1 , a2 ) =o
a2
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

<latexit sha1_base64="MADeaGuZ5x0zDKtg8jYEX763ets=">AAAB/3icbVBNS8NAEN3Ur1q/ooIXL4tFqCAlaQW9CEUvHivYD2hDmGy37dJNNuxuhBJ78K948aCIV/+GN/+N2zYHrT4YeLw3w8y8IOZMacf5snJLyyura/n1wsbm1vaOvbvXVCKRhDaI4EK2A1CUs4g2NNOctmNJIQw4bQWj66nfuqdSMRHd6XFMvRAGEeszAtpIvn3QVWwQgl8tge+eYvArJ/gSC98uOmVnBvyXuBkpogx13/7s9gRJQhppwkGpjuvE2ktBakY4nRS6iaIxkBEMaMfQCEKqvHR2/wQfG6WH+0KaijSeqT8nUgiVGoeB6QxBD9WiNxX/8zqJ7l94KYviRNOIzBf1E461wNMwcI9JSjQfGwJEMnMrJkOQQLSJrGBCcBdf/kualbJbLTu3Z8XaVRZHHh2iI1RCLjpHNXSD6qiBCHpAT+gFvVqP1rP1Zr3PW3NWNrOPfsH6+Aa4qZP0</latexit>

@a2 <latexit sha1_base64="i3V+Hv5zU7q3mZ39kUdT/gvBnUk=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0QPu1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYP6PKcCZwWuqlGhPKxnSIXUsljVD72fzUKTmzyoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPYzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2nZEPwll9eJa1a1buouveXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnnzmGP3A+fwDqxY2M</latexit>

<latexit sha1_base64="UfwdaiaQv7vCe0d17Hqvq1wxekM=">AAACC3icbVDLSgMxFM3UV62vUZduQovgqsxUQZdFNy4r2Ad0huFOmmlDMw+SjFKG7t34K25cKOLWH3Dn35hpB9TWA4GTc+69yT1+wplUlvVllFZW19Y3ypuVre2d3T1z/6Aj41QQ2iYxj0XPB0k5i2hbMcVpLxEUQp/Trj++yv3uHRWSxdGtmiTUDWEYsYARUFryzKoTCCCZk4BQDDgGrzH9ud179hR7Zs2qWzPgZWIXpIYKtDzz0xnEJA1ppAgHKfu2lSg3y2cSTqcVJ5U0ATKGIe1rGkFIpZvNdpniY60McBALfSKFZ+rvjgxCKSehrytDUCO56OXif14/VcGFm7EoSRWNyPyhIOVYxTgPBg+YoETxiSZABNN/xWQEOhyl46voEOzFlZdJp1G3T+vWzVmteVnEUUZHqIpOkI3OURNdoxZqI4Ie0BN6Qa/Go/FsvBnv89KSUfQcoj8wPr4BvY+a1A==</latexit>
@w1 <latexit sha1_base64="UHK8Rq4ihGcm4gMByPiIvg3wEO0=">AAAB/HicbZDLSsNAFIZPvNZ6i3bpZrAIdVOSKuhGKLpxWcFeoA1hMp20Q2eSMDMRYqmv4saFIm59EHe+jdM2C239YeDjP+dwzvxBwpnSjvNtrayurW9sFraK2zu7e/v2wWFLxakktEliHstOgBXlLKJNzTSnnURSLAJO28HoZlpvP1CpWBzd6yyhnsCDiIWMYG0s3y71FBsI7Ncqj757iq6QQd8uO1VnJrQMbg5lyNXw7a9ePyapoJEmHCvVdZ1Ee2MsNSOcToq9VNEEkxEe0K7BCAuqvPHs+Ak6MU4fhbE0L9Jo5v6eGGOhVCYC0ymwHqrF2tT8r9ZNdXjpjVmUpJpGZL4oTDnSMZomgfpMUqJ5ZgATycytiAyxxESbvIomBHfxy8vQqlXds6pzd16uX+dxFOAIjqECLlxAHW6hAU0gkMEzvMKb9WS9WO/Wx7x1xcpnSvBH1ucPQKeTMw==</latexit>
2 (z1 ) = a2

Upper path
@l @l @o @a1 @l @o @a2
= · · + · · (multivariable chain rule)
<latexit sha1_base64="TYYPnCIxpTv7H9H6wB8RxqOKTTU=">AAAC9XicrVLLSsNAFJ3EV019VF26GSyCIJSkCroRim5cVrAPaEqZTCc6OMmEmYlSQv7DjQtF3Pov7vwbJ2nBmpauvDBwOPfce2buXC9iVCrb/jbMpeWV1bXSulXe2NzaruzstiWPBSYtzBkXXQ9JwmhIWooqRrqRICjwGOl4D1dZvvNIhKQ8vFWjiPQDdBdSn2KkNDXYMcquLxBO3AgJRRGDLP3FTwMnhdYFXCDhKXTxkCtoQVjU8SkdylrlyjnCLFlwPbb+xbS+yLReMLUGlapds/OAs8CZgCqYRHNQ+XKHHMcBCRVmSMqeY0eqn2Q9MSOp5caSRAg/oDvS0zBEAZH9JP+1FB5qZgh9LvQJFczZ6YoEBVKOAk8rA6TuZTGXkfNyvVj55/2EhlGsSIjHRn7MoOIwWwE4pIJgxUYaICyovivE90gPR+lFyYbgFJ88C9r1mnNSs29Oq43LyThKYB8cgCPggDPQANegCVoAG8J4Nl6NN/PJfDHfzY+x1DQmNXvgT5ifPxdI7YY=</latexit>
@w1 @o @a1 @w1 @o @a2 @w1

Lower path

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 31
Backpropagation in CNNs
Same overall concept as before: Multivariable chain rule,
but now with an additional weight sharing constraint

Due to weight sharing: w1 = w2 <latexit sha1_base64="FKGwU2qvUIEYUjcCdiIWnsvCgXU=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSJ4KrtV0ItQ9OKxgv2QdlmyabYNTbJLkrWUpb/CiwdFvPpzvPlvTNs9aOuDgcd7M8zMCxPOtHHdb2dldW19Y7OwVdze2d3bLx0cNnWcKkIbJOaxaodYU84kbRhmOG0nimIRctoKh7dTv/VElWaxfDDjhPoC9yWLGMHGSo+jwEPXaBRUg1LZrbgzoGXi5aQMOepB6avbi0kqqDSEY607npsYP8PKMMLppNhNNU0wGeI+7VgqsaDaz2YHT9CpVXooipUtadBM/T2RYaH1WIS2U2Az0IveVPzP66QmuvIzJpPUUEnmi6KUIxOj6feoxxQlho8twUQxeysiA6wwMTajog3BW3x5mTSrFe+8Ur2/KNdu8jgKcAwncAYeXEIN7qAODSAg4Ble4c1Rzovz7nzMW1ecfOYI/sD5/AFAMo9k</latexit>

w1
<latexit sha1_base64="BGIvv1Que1aISVw+1pGEuT4uC1M=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qnn9Uplt+LOQJaJl5My5Kj3Sl/dfszSiCtkkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AELeI2j</latexit>

w2
<latexit sha1_base64="k9TH6JRVGzznxlg0BHK2AhK6Dh8=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindeqd5dlGvXeRwFOIYTOAMPLqEGt1CHBjAYwDO8wpsjnRfn3fmYt644+cwR/IHz+QMM/I2k</latexit>

Optional averaging
weight update: ✓ ◆
1 @L @L
w1 := w2 := w1 ⌘· +
<latexit sha1_base64="MYHrCBQOQYN/sQ1qQKoAWS1dkO0=">AAACenicjVFNa9tAEF2pSZs6Ses2x1IYYvJFqJHcQEugYNJLDz0kECcBy4jReuUsWX2wO6oxQj8ify23/pJeesjKVku+DhlY9vFm5u3smyhX0pDn/XbcF0vLL1+tvG6trq2/edt+9/7MZIXmYsAzlemLCI1QMhUDkqTERa4FJpES59HV9zp//ktoI7P0lGa5GCU4SWUsOZKlwvb1NPTh8BtMw97i8uETBIIQIODjjCCINfLSr8peBUEkJ5PdhoIgR00SFQQJ0iVHVf6sqvI/a6Uq2H928T/5vbDd8brePOAx8BvQYU0ch+2bYJzxIhEpcYXGDH0vp1FZK3MlqlZQGJEjv8KJGFqYYiLMqJxbV8GWZcYQZ9qelGDO3u0oMTFmlkS2sh7cPMzV5FO5YUHx11Ep07wgkfLFQ3GhgDKo9wBjqQUnNbMAuZZ2VuCXaL0iu62WNcF/+OXH4KzX9T93eycHnf5RY8cK+8A22S7z2RfWZz/YMRswzv44H51tZ8f56266e+7+otR1mp4Ndi/cg1vG1r5V</latexit>
2 @w1 @w2
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 32
CNNs and Translation/Rotation/Scale Invariance
Note that CNNs are not really invariant to scale, rotation,
translation, etc.

The activations are still


dependent on the location, etc.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 33
Pooling Layers Can Help With Local Invariance

Sebastian Raschka, Vahid Mirjalili. Python Machine


Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750

Downside: Information is lost.


May not matter for classification, but applications where relative position is
important (like face recognition)

In practice for CNNs: some image preprocessing still recommended


Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 34
Pooling Layers Can Help With Local Invariance

Note that typical pooling layers do not have any learnable parameters
Downside: Information is lost.
May not matter for classification, but applications where relative position is
important (like face recognition)

In practice for CNNs: some image preprocessing still recommended


Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 35
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 36
Main Breakthrough for CNNs:
AlexNet & ImageNet

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.

neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48.
TheSutskever,
Krizhevsky, A., third, fourth, and
I., &fifth convolutional
Hinton, G. E.layers are connected
(2012). Imagenet to one another withoutwith
classification any intervening
deep convolutional neural
pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥
networks. In Advances in neural information processing systems (pp. 1097-1105).
256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth
convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 192 , and the fifth convolutional layer has 256
kernels of size 3 ⇥ 3 ⇥ 192. The fully-connected layers have 4096 neurons each.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 37
rward pass and do not participate in back-
ral network samples a different architecture,
Main Breakthrough for CNNs:
reduces complex co-adaptations of neurons,
ar other neurons. It is, therefore, forced to
n with many different random subsets of the
ut multiply their outputs by 0.5, which is a AlexNet & ImageNet
of the predictive distributions produced by

Figure 2. Without dropout, our network ex-


he number of iterations required to converge.

nt
nd
nt
In Figure 3: 96 convolutional kernels of size
er: 11⇥11⇥3 learned by the first convolutional
or layer on the 224⇥224⇥3 input images. The Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
top 48 kernels were learned on GPU 1 while with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
the bottom 48 kernels were learned on GPU remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.
2. See Section 6.1 for details.
i
In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its
top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the
D E top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,
ble, ✏ is the learning rate, and @w
@L
wi D
is only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)
there is genuine ambiguity about the intended focus of the photograph.
he objective with respect to w, evaluated at G. E.
Krizhevsky, A., Sutskever, I., & Hinton, (2012). Imagenet
the network’s classification with thedeepfeatureconvolutional neural
i
Another way to probe visual knowledge is to consider activations induced
networks. In Advances in neural information processing
by an systems (pp.
image at the last, 4096-dimensional 1097-1105).
hidden layer. If two images produce feature activation
vectors with a small Euclidean separation, we can say that the higher levels of the neural network
consider them to be similar. Figure 4 shows five images from the test set and the six images from
ean Gaussian distribution with standard de- the training set that are most similar to each of them according to this measure. Notice that at the
pixel level, the retrieved training images are generally not close in L2 to the query images in the first
cond, fourth, and fifth convolutional layers, column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the
e constantSebastian Raschka
1. This initialization STAT 453:
accelerates Intro tomany
results for Deep Learning
more test and
images in the Generative
supplementary material.Models SS 2020 38
Main Breakthrough for CNNs:
AlexNet & ImageNet
The ImageNet set that was used
has ~1.2 million
images and 1000 classes

Accuracy is measured as top-5


performance:
Correct prediction if the true
label matches one of the top 5
predictions of the model

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
FigureIn4:
networks. (Left) in
Advances Eight
neuralILSVRC-2010 test images
information processing and
systems the1097-1105).
(pp. five labels considered most prob
The correct label is written under each image, and the probability assigned to the correct
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in th
Sebastiancolumns
remaining Raschka showSTAT 453: training
the six Intro to Deep Learning
images andproduce
that Generativefeature
Models vectorsSSin2020
the last 39
hid
Main Breakthrough for CNNs:
AlexNet & ImageNet

Note that the actual network


inputs were still 224x224 images
(random crops from
downsampled 256x256 images)

224x224 is still a good/


reasonable size today
(224*224*3 = 150,528 features)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
FigureIn4:
networks. (Left) in
Advances Eight
neuralILSVRC-2010 test images
information processing and
systems the1097-1105).
(pp. five labels considered most prob
The correct label is written under each image, and the probability assigned to the correct
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in th
Sebastiancolumns
remaining Raschka showSTAT 453: training
the six Intro to Deep Learning
images andproduce
that Generativefeature
Models vectorsSSin2020
the last 40
hid
Common CNN Architectures

Figure 1: Top1 vs. network. Single-crop top-1 vali- Figure 2: Top1 vs. operations, size / parameters.
dation accuracies for top scoring single-model archi- Top-1 one-crop accuracy versus amount of operations
tectures. We introduce with this chart our choice of required for a single forward pass. The size of the
colour scheme, which will be used throughout this blobs is proportional to the number of network pa-
publication to distinguish effectively different archi- rameters; a legend is reported in the bottom right cor-
tectures and their correspondent authors. Notice that ner, spanning from 5⇥106 to 155⇥106 params. Both
networks of the same group share the same hue, for these figures share the same y-axis, and the grey dots
example ResNet are all variations of pink. highlight the centre of the blobs.

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical
single arXiv
applications. VGG-161arXiv:1605.07678.
run of preprint (Simonyan & Zisserman, 2014) and GoogLeNet (Szegedy et al., 2014) are
8.70% and 10.07% respectively, revealing that VGG-16 performs better than GoogLeNet. When
models are run with 10-crop sampling,2 then the errors become 9.33% and 9.15% respectively, and
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 41
therefore VGG-16 will perform worse than GoogLeNet, using a single central-crop. For this reason,
Convolutions with Color Channels

Sebastian Raschka, Vahid Mirjalili. Python Machine


Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750

Image dimension: X 2 Rn1 ⇥n2 ⇥cin in NWHC format,


<latexit sha1_base64="GzJ9t8GJxsmHc4EQcbLCmf2kfPA=">AAACIXicbVDLSsNAFJ34rPUVdelmsAiuSlIFuyy6cVnFPqCpYTKdtEMnkzBzI5SQX3Hjr7hxoUh34s84fQjaemDg3HPv5c45QSK4Bsf5tFZW19Y3Ngtbxe2d3b19++CwqeNUUdagsYhVOyCaCS5ZAzgI1k4UI1EgWCsYXk/6rUemNI/lPYwS1o1IX/KQUwJG8u2qFxEYBGHWzrHHJZ6VQXaXP2TSd7EHPGIaS7/yQ6mfcZnnvl1yys4UeJm4c1JCc9R9e+z1YppGTAIVROuO6yTQzYgCTgXLi16qWULokPRZx1BJzLFuNnWY41Oj9HAYK/Mk4Kn6eyMjkdajKDCTEwN6sTcR/+t1UgirXWMoSYFJOjsUpgJDjCdx4R5XjIIYGUKo4uavmA6IIhRMqEUTgrtoeZk0K2X3vFy5vSjVruZxFNAxOkFnyEWXqIZuUB01EEVP6AW9oXfr2Xq1PqzxbHTFmu8coT+wvr4Bx1+j5A==</latexit>

CUDA & PyTorch use NCWH

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 42
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 43
What a CNN Can See
Simple example: vertical edge detector

(From classical computer vision research)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 44
What a CNN Can See
Simple example: vertical edge detector

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 45
What a CNN Can See
Simple example: horizontal edge detector

A CNN can learn whatever it finds


best based on optimizing the objective
(e.g., minimizing a particular loss
to achieve good classification accuracy)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 46
repeated in layers 2,3,4,5. The last two layers are fully connected, taking features from

What a CNN Can See


the top convolutional layer as input in vector form (6 · 6 · 256 = 9216 dimensions).
The final layer is a C-way softmax function, C being the number of classes. All filters
and feature maps are square in shape.
Which patterns from the training set activate the feature map?

Layer 1 Layer 2 Layer 3 Layer 4 Layer 5

Fig. 4. Evolution of a randomly chosen subset of model features through training.


Each layer’s features are displayed in a different block. Within each block, we show
a randomly chosen subset of features at epochs [1,2,5,10,20,30,40,64]. The visualiza-
tion shows the strongest activation (across all training examples) for a given feature
map, projected down to pixel space using our deconvnet approach. Color contrast is
artificially enhanced and the figure is best viewed in electronic form.
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. European
occluderIncovers theconference on computer
image region vision (pp.
that appears 818-833).
in the Springer, Cham.
visualization, we see a
strong drop in activity in the feature map. This shows that the visualization
Method: backpropagate
genuinely correspondsstrong
to theactivation signals in
image structure thathidden layers that
stimulates to the input map,
feature images,
thenhence
applyvalidating the to
"unpooling" other
mapvisualizations
the values toshown in Fig.pixel
the original 4 and Fig. for
space 2.
visualization
5 Experiments STAT 453: Intro to Deep Learning and Generative Models
Sebastian Raschka SS 2020 47
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks.
824 In European
M.D. Zeiler andconference
R. Ferguson computer vision (pp. 818-833). Springer, Cham.

Layer 1

Layer 2

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 48
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
Layer 2

Layer 3

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 49
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European
Layer 3 conference on computer vision (pp. 818-833). Springer, Cham.

Layer 4 Layer 5
Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 50
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 51
LeNet-5 in PyTorch
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

https://github.com/rasbt/stat453-
deep-learning-ss20/tree/master/L12-
cnns/code

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 52
Cats and Dogs Classifier (VGG16)

https://github.com/rasbt/stat453-
deep-learning-ss20/tree/master/L12-
cnns/code

Test accuracy: 88.28%

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 53
Cats and Dogs Classifier (VGG16)
and Guided Backpropagation
Visualization of the loss gradients with respect
to the inputs (images) as a naive way to
visualize CNN predictions.
2 @L 3
@x1
6 @L 7
rL(x) = 4 @x2 5
..
.
• In a normal forward pass, negative activation values are clamped
by ReLU functions (gradient is 0 for these).

• In guided backpropagation, we also clamp the negative gradients


during backpropagation to 0.

• Focus is on those activations that have a positive influence on the


class of interest.

https://github.com/rasbt/stat453-deep-learning-ss20/
tree/master/L12-cnns/code

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 54
Optional Reading Material

http://www.deeplearningbook.org/contents/convnets.html

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 55
https://twitter.com/boredyannlecun/status/1237460174811602946?s=20

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 56

You might also like