0% found this document useful (0 votes)

107 views

Technical Report Multidimensional, Downsampled Convolution For Autoencoders PDF

1. The document describes a technique for discrete convolution with multidimensional kernels when downsampling the output. It defines strided convolution using a new "@d" operation and derives the necessary gradients for using this operation in an autoencoder. 2. It shows that the gradient of the loss with respect to the weights for strided convolution is similar to regular convolution but uses the "@d" operation. 3. It also derives that the gradient of the loss of the transpose operation with respect to the weights and inputs both have the same form and can be computed using the "@d" operation.

Uploaded by

Leon

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

Technical Report Multidimensional, Downsampled Convolution For Autoencoders PDF

Uploaded by

Leon

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Technical Report: Multidimensional,

Downsampled Convolution for Autoencoders

Ian Goodfellow

August 9, 2010

Abstract
This technical report describes discrete convolution with a multidimen-
sional kernel. Convolution implements matrix multiplication by a sparse
matrix with several elements constrained to be equal to each other. To
implement a convolutional autoencoder, the gradients of this operation,
the transpose of this operation, and the gradients of the transpose are all
needed. When using standard convolution, each of these supplementary
operations can be described as a convolution on slightly modied argu-
ments. When the output is implicitly downsampled by moving the kernel
in more than one pixel at each step, we must dene two new operations
in order to compute all of the necessary values.

1 Denitions
Let L be our loss function, W our weights dening the kernel, d a vector of
strides, H our hidden units, and V our visible units. Hcij indexes position c
(an N-dimensional index) within feature map i for example j . V is of the same
format as H . Wcij indexes the weight at position c within the kernel, connecting
visible channel i to hidden channel j .
Convolution with downsampling is performed (assuming W is pre-ipped)
by
X
Hcij = Wkmi Vd◦c+k,m,j
k,m

(Where ◦ is elementwise product)

In the following sections, I derive all of the necessary operations to use this
operation in an autoencoder. You may want to skip directly to the summary of
results, section 7.

2 Basic gradient
The gradient of the loss function with respect to the weights is given by

1
∂L X ∂L ∂Hkmn
=
∂Wcij ∂Hkmn ∂Wcij
k,m,n

P
X ∂L ∂ p,q Wpqm Vd◦k+p,q,n
=
∂Hkmn ∂Wcij
k,m,n

P
X ∂L ∂ p,q Wpqj Vd◦k+p,q,n
=
∂Hkjn ∂Wcij
k,n

X ∂L
= Vd◦k+c,i,n
∂Hkjn
k,n

∂L X ∂L
= Vd◦k+c,i,m
∂Wcij ∂Hkjm
k,m

With a few dimshues, the gradient can be computed as a convolution

provided that d is all 1s. However, if d is not 1 for any element, then we have a
problem because during forward prop, the index into the output is multiplied by
the stride, while during computation of the gradient, the index into the kernel
is multiplied by the stride.

3 Transpose
We can think of strided convolution as multiplication by a matrix M . Let h be
H reshaped into a vector and v be V reshaped into a vector. Then
h = Mv
Let hr(c, i, j) be a reshaping function that maps indices in H to indices in
h. Let vr(c, i, j) be the same for V and v .
Then
X
hhr(c,i,j) = Wkmi vvr(d◦c+k,m,j)
k,m

Thus Mhr(c,i,j),vr(d◦c+k,m,j) = Wkmi where all the relevant indices take on

appropriate values, and 0 elsewhere.
Suppose we want to calculate R, a tensor in the same shape as V , such that
Rcij = rvr(c,i,j) and r = M T h.
X
ra = Mab hb

2
X
ra = Wkmi Hcij
c,i,j,k,m|vr(d◦c+k,m,j)=a

X X
Rq,m,j = Wkmi Hcij
c,k|d◦c+k=q i

To sum over the correct set of values for c and k, the we will need a modulus
operator or saved information from a previous iteration of a for loop, unless
d = ~1. So this is not a convolution in the large stride case.
In the case where d = ~1, we have
XX
Rq,m,j = Ww−p,m,i Hq−w+p,i,j
p i

where w = W.shape − ~1.

Changing some variable names, we get
X
Rc,i,j = Ww−k,i,m Hc−w+k,m,j
k,m

Recall that a stride 1 convolution is given by:

X
Hcij = Wkmi Vc+k,m,j
k,m

So our transpose may be calculated by padding d − ~1 zeros to H (each

dimension of the vector gives the number of zeros to pad to each dimension of
the hidden unit tensor), ipping all the spatial dimensions of the kernel, and
exchanging the input channel and output channel dimensions of the kernel.

4 New notation
I'm going to make up some new notation now, since our operation isn't really
convolution (downsampling is built into the operation, we don't ip the kernel,
etc). From here on out, I will write
X
Hcij = Wkmi Vc◦d+k,m,j
k,m

as
H = W @d V

and

3
X X
Rq,m,j = Wkmi Hcij
c,k|d◦c+k=q i

as
R = W @Td H

and
∂L(H = W @d V ) X ∂L
= Vd◦k+c,i,m
∂Wcij ∂Hkjm
k,m

as
∇W L(H = W @d V ) = (∇H L)#d V

5 Autoencoder gradients
To make an autoencoder, we'll need to be able to compute

R = gv (bv + Wr @Td gh (bh + We @d V ))

This means we're also going to need to be able to take the gradient of
L(W @T H) with respect to both W (so we know how to update the encoding
weights) and H , so we'll be able to propagate gradients back to the encoding
layer. Finally, when we stack the autoencoders into a convolution MLP, we'll
need to be able to propagate gradients back from one layer to another, so we
must also nd the gradient of L(W @V ) with respect to V .

5.1 Gradients of the loss applied to the transpose

5.1.1 With respect to the weights

X X
Rqmj = Wkmi Hcij
c,k|d◦c+k=q i

so
∂L X ∂L ∂Rqmj
=
∂Wxyz q,m,j
∂Rqmj ∂Wxyz

P P
X ∂L ∂ c,k|d◦c+k=q i Wkmi Hcij
=
q,m,j
∂Rqmj ∂Wxyz

4
P
X ∂L ∂ c|d◦c+x=q Wxyz Hczj
=
q,j
∂Rqyj ∂Wxyz

X ∂L X
= Hczj
q,j
∂Rqyj
c|d◦c+x=q

X ∂L
= Hczj
c,j
∂Rd◦c+x,y,j

Changing some variable names, we get

∂L X ∂L
= Hkjm
∂Wcij ∂Rd◦k+c,i,m
k,m

Recall that the gradient of L(W @V ) with respect to the kernel is:
∂L X ∂L
= Vd◦k+c,i,m
∂Wcij ∂Hkjm
k,m

This has the same form as the gradient we just derived, ie both use the new
#operation. Thus we can write that the gradient of L(W @T H) with respect to
the kernel is given by

∇W L(R = W @Td H) = H#d ∇R L

5.1.2 With respect to the inputs

X X
Rqmj = Wkmi Hcij
c,k|d◦c+k=q i

so
∂L X ∂L ∂Rqmj
=
∂Hxyz q,m,j
∂Rqmj ∂Hxyz

P P
X ∂L ∂ c,k|d◦c+k=q i Wkmi Hcij
=
q,m,j
∂Rqmj ∂Hxyz

P
X ∂L ∂ k|d◦x+k=q Wkmy Hxyz
=
q,m
∂Rqmz ∂Hxyz

5
X ∂L X
= Wkmy
q,m
∂Rqmz
k|d◦x+k=q

X ∂L
= Wkmy
∂Rd◦x+k,m,z
k,m

Changing some variable names, we get

∂L X ∂L
= Wkmi
∂Hcij ∂Rd◦c+k,m,j
k,m

Remember that X
Hcij = Wkmi Vd◦c+k,m,j
k,m
so we can write
∇H L(R = W @Td H) = W @d ∇R L

5.2 Gradient of the loss applied to @, with respect to the

inputs
The above is sucient to make a single layer autoencoder. To stack on top of
it, we also need to compute:
∂L(H = W @d V ) X ∂L ∂Hcij
=
∂Vxyz cij
∂Hcij ∂Vxyz
P
X ∂L ∂ k,m Wkmi Vd◦c+k,m,j
=
cij
∂Hcij ∂Vxyz
P
X ∂L ∂ k|d◦c+k=x Wkyi Vxyz
=
ci
∂Hciz ∂Vxyz

X ∂L X
= Wkyi
ci
∂Hciz
k|d◦c+k=x

X X ∂L
= Wkyi
i
∂Hciz
c,k|d◦c+k=x

Changing variable names around, we get

∂L(H = W @d V ) X X ∂L
= Wkmi
∂Vqmj i
∂Hcij
c,k|d◦c+k=q

∇V L(H = W @d V ) = W @Td ∇H L

6
6 The rest of the gradients
We now know enough to make a stacked autoencoder. However, there are still
some gradients that may be taken and it would be nice if our ops supported
all of them. The @ op's gradient can be expessed in terms of @T and #, and
the @T op's gradient can be expressed in terms of @ and #. Thus if we add
a gradient method to the # op, our ops will be innitely dierentiable for all
combinations of variables.

6.1 # with respect to kernel

Let A = B#d C . Then
∂L(A) X ∂L(A) ∂Acij
=
∂Bxyz c,i,j
∂Acij ∂Bxyz

P
X ∂L(A) ∂ k,m Bkjm Cd◦k+c,i,m
=
c,i,j
∂Acij ∂Bxyz

X ∂L(A) ∂Bxyz Cd◦x+c,i,z

=
c,i
∂Aciy ∂Bxyz

X ∂L(A)
= Cd◦x+c,i,z
c,i
∂Aciy

Renaming variables, we get

∂L(A) X ∂L(A)
= Cd◦c+k,m,j
∂Bcij ∂Akmi
k,m

So
∇B L(A = B#d C) = (∇A L)@d C

6.2 # with respect to input

Let A = B#d C . Then
∂L(A) X ∂L(A) ∂Acij
=
∂Cxyz c,i,j
∂Acij ∂Cxyz

P
X ∂L(A) ∂ k,m Bkjm Cd◦k+c,i,m
=
c,i,j
∂Acij ∂Cxyz

7
X X ∂L(A) ∂Bkjz Cxyz
=
j
∂Acyj ∂Cxyz
c,k|d◦k+c=x

X X ∂L(A)
= Bkjz
j
∂Acyj
c,k|d◦k+c=x

Renaming variables, we get

∂L(A) X X ∂L(A)
= Bcij
∂Cqmj i
∂Akmi
c,k|d◦c+k=q

∇C L(A = B#d C) = (∇A L)@Td B

7 Summary
We have dened these operations:
Strided convolution:
X
H = W @d V ⇒ Hcij = Wkmi Vc◦d+k,m,j
k,m

Transpose of strided convolution:

X X
R = W @Td H ⇒ Rqmj = Wkmi Hcij
c,k|d◦c+k=q i

Gradient of strided convolution with respect to weights:

X
A = B#d C ⇒ Acij = Bkjm Cd◦k+c,i,m
k,m

We have observed that, if and only if if and only if d = ~1. @T and #

may both be expressed in terms of @ by modifying their arguments with the
operations of zero padding, exchanging tensor dimensions, or mirror imaging
tensor dimensions. More specically, @~T1 and #~T1 may be expressed in this way
in terms of @~1 .
We have shown that the following identities are sucient to compute any
derivative any of the three operations:
∇W L(H = W @d V ) = (∇H L)#d V

∇V L(H = W @d V ) = W @Td ∇H L

8
∇W L(R = W @Td H) = H#d ∇R L

∇H L(R = W @Td H) = W @d ∇R L

∇B L(A = B#d C) = (∇A L)@d C

∇C L(A = B#d C) = (∇A L)@Td B

Activity Design
No ratings yet
Activity Design
3 pages
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
0% (1)
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
4 pages
Business Law Cases
100% (2)
Business Law Cases
22 pages
Decontamination and Sterilization
No ratings yet
Decontamination and Sterilization
7 pages
SkriptOptMach
No ratings yet
SkriptOptMach
49 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Learning 3
No ratings yet
Learning 3
98 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
8 SVMs
No ratings yet
8 SVMs
72 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
DL03 Classroom SNN
No ratings yet
DL03 Classroom SNN
41 pages
9.b Handout-3-GD variants
No ratings yet
9.b Handout-3-GD variants
3 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
L3_CSE256_FA24_FFN
No ratings yet
L3_CSE256_FA24_FFN
64 pages
Computing Neural Network Gradients-merged
No ratings yet
Computing Neural Network Gradients-merged
67 pages
Gradient Notes PDF
No ratings yet
Gradient Notes PDF
7 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
No ratings yet
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
30 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Backpropagation Exercises
No ratings yet
Backpropagation Exercises
7 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Ann2018 L5
No ratings yet
Ann2018 L5
23 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
Convex Optimization Prerequisite_topics
No ratings yet
Convex Optimization Prerequisite_topics
6 pages
Nonlinear Programming PDF
No ratings yet
Nonlinear Programming PDF
224 pages
Nonlinear Programming Concepts PDF
No ratings yet
Nonlinear Programming Concepts PDF
224 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Notes On MSE Gradients For Neural Networks: 1 2 Mean Squared Error (MSE)
No ratings yet
Notes On MSE Gradients For Neural Networks: 1 2 Mean Squared Error (MSE)
10 pages
Recognition Patterns: Jean Carlo Grandas Franco March 2020
No ratings yet
Recognition Patterns: Jean Carlo Grandas Franco March 2020
9 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
NNDL
No ratings yet
NNDL
4 pages
Exponential Convergence Rates For Batch Normalization - 4
No ratings yet
Exponential Convergence Rates For Batch Normalization - 4
1 page
Montanari
No ratings yet
Montanari
10 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
Vertopal.com C1 W2 Lab03 Feature Scaling and Learning Rate Soln
No ratings yet
Vertopal.com C1 W2 Lab03 Feature Scaling and Learning Rate Soln
10 pages
Day 1
No ratings yet
Day 1
41 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229
No ratings yet
CS229
69 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Ion Exchange Demineralizers: Big Problems, Small Solutions
No ratings yet
Ion Exchange Demineralizers: Big Problems, Small Solutions
10 pages
Pistão Corpo - Sheet1
No ratings yet
Pistão Corpo - Sheet1
1 page
Oil and Petrochemical Overview - Solutions For Your ... - Spirax Sarco
No ratings yet
Oil and Petrochemical Overview - Solutions For Your ... - Spirax Sarco
12 pages
Static Pressure For AHU's and Fans - 21.01.2013
No ratings yet
Static Pressure For AHU's and Fans - 21.01.2013
12 pages
Lab (Feed Formulation)
No ratings yet
Lab (Feed Formulation)
4 pages
Foundations of The Theory of Organization: Phillip Selznick
No ratings yet
Foundations of The Theory of Organization: Phillip Selznick
10 pages
LBN Accreditation Contract 2022
No ratings yet
LBN Accreditation Contract 2022
7 pages
Tle 6 Ict Q4 M5
0% (1)
Tle 6 Ict Q4 M5
15 pages
Epicor ERP Architecture Guide
No ratings yet
Epicor ERP Architecture Guide
37 pages
Ratio and Proportion Worksheets 7th Grade Worksheet 1
No ratings yet
Ratio and Proportion Worksheets 7th Grade Worksheet 1
8 pages
Climate Graph Assignment Rubric
No ratings yet
Climate Graph Assignment Rubric
1 page
Smart Vs City of Davao
No ratings yet
Smart Vs City of Davao
2 pages
Spare Parts Catalogue: 536317 ADN-50-50-A-P-A Compact Cylinder - Series: All
No ratings yet
Spare Parts Catalogue: 536317 ADN-50-50-A-P-A Compact Cylinder - Series: All
2 pages
Issues in Teacher Leadership Review - Phlong Paulina
No ratings yet
Issues in Teacher Leadership Review - Phlong Paulina
2 pages
NETZSCH PlantEngineering e
No ratings yet
NETZSCH PlantEngineering e
12 pages
MNR School OF Excellence: "Parking Management System"
No ratings yet
MNR School OF Excellence: "Parking Management System"
23 pages
ORC - Case Analysis No.1
No ratings yet
ORC - Case Analysis No.1
8 pages
Population Projection-3-4
No ratings yet
Population Projection-3-4
2 pages
CLG PROJECT UNFIvbbhhh
No ratings yet
CLG PROJECT UNFIvbbhhh
18 pages
Cadbury Presentation - Group 3 - Consumer Behavior
0% (1)
Cadbury Presentation - Group 3 - Consumer Behavior
28 pages
kud project 2024
No ratings yet
kud project 2024
73 pages
ENPREP 114E - TDS US English
No ratings yet
ENPREP 114E - TDS US English
4 pages
Group 3 MKT330
No ratings yet
Group 3 MKT330
28 pages
Complete Download Doing Business with the Republic of Cyprus Phillip Dew PDF All Chapters
100% (3)
Complete Download Doing Business with the Republic of Cyprus Phillip Dew PDF All Chapters
91 pages
Patients Safety - Key Issues and Challenges
No ratings yet
Patients Safety - Key Issues and Challenges
4 pages
SAP Concur: The Complete Smart Solution For Your Business Travel
No ratings yet
SAP Concur: The Complete Smart Solution For Your Business Travel
1 page
Quay Et Al (2022) Dewey's Education Through Occupations As Being-Doing-Knowing Creative Learning Units
No ratings yet
Quay Et Al (2022) Dewey's Education Through Occupations As Being-Doing-Knowing Creative Learning Units
17 pages