9.5 Variants of The Basic Convolution Function Function of CNN
9.5 Variants of The Basic Convolution Function Function of CNN
Kernel K with element Ki,j,k,l giving the connection strength between a unit in channel i of
output and a unit in channel j of the input, with an offset of k rows and l columns between the
output unit and the input unit.
Input: Vi,j,k with channel i, row j and column k
Output Z same format as V
Use 1 as first entry
Full Convolution
0 Padding 1 stride
l,m,n
0 Padding s stride
l,m,n
Convolution with a stride greater than 1 pixel is equivalent to conv with 1 stride followed by
downsampling:
Some 0 Paddings and 1 stride
Without 0 paddings, the width of representation shrinks by one pixel less than the kernel width at
each layer. We are forced to choose between shrinking the spatial extent of the network rapidly
and using small kernel. 0 padding allows us to control the kernel width and the size of the output
independently.
Usually the optimal amount of 0 padding lies somewhere between ‘Valid’ or ‘Same’
Unshared Convolution
In some case when we do not want to use convolution but want to use locally connected layer. We
use Unshared convolution. Indices into weight W
l,m,n
Comparison on local connections, convolution and full connection
Useful when we know that each feature should be a function of a small part of space, but no reason
to think that the same feature should occur accross all the space. eg: look for mouth only in the
bottom half of the image.
It can be also useful to make versions of convolution or local connected layers in which the
connectivity is further restricted, eg: constrain each output channeel i to be a function of only a
subset of the input channel.
Adv: * reduce memory consumption * increase statistical efficiency * reduce computation for both
forward and backward prop.
Tiled Convolution
Learn a set of kernels that we rotate through as we move through space. Immediately neighboring
locations will have different filters, but the memory requirement for storing the parameters will
increase by a factor of the size of this set of kernels. Comparison on locally connected layers, tiled
convolution and stardard convolution:
K: 6-D tensor, t different choice of kernel stack
l,m,n
Local connected layers and tiled convolutional layer with max pooling: the detector units of these
layers are driven by different filters. If the filters learn to detect different tranformed version of the
same underlying features, then the max-pooled units become invariant to the learned
transformation.
Review:
Back prop in conv layer
Back prop of conv layer:
K: Kernel stack
V: Input image
Z: Output of conv layer
G: gradient on Z
Bias in after conv
We generally add some bias term to each output before applying nonelinearity.
For local conncted layers: give each unit its one bias
For tiled conv layers: share the biases with the same tiling pattern as the kernels
For conv layers: have one bias per channel of the output and share it accross all locations within
each convolution map. If the input is fixed size, it is also possible to learn a seperate bias at each
location of the output map.
Resources
Converting FC to CONV Layer
Technical Report Multidimensional Downsampled Convolution for Autoencoders