Welcome to Scribd!

0% found this document useful (0 votes)

36 views

Gradient of A Matrix Matrix Multiplication

Uploaded by

This document discusses deriving the gradient of a loss function L with respect to the weights matrix W in a neural network that performs matrix multiplication. It shows that the gradient can be calculated as the partial derivative of L with respect to the output matrix D, multiplied by the transpose of the input matrix X. This allows the gradient to be calculated using only a single matrix multiplication, simplifying the backpropagation process.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Gradient of A Matrix Matrix Multiplication

Uploaded by

Jason Stanley

0% found this document useful (0 votes)

36 views1 page

Original Title

Gradient of a Matrix Matrix multiplication

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

36 views1 page

Gradient of A Matrix Matrix Multiplication

Uploaded by

Jason Stanley

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 1

Search inside document

Edward Hu Blog About

Gradient of a Matrix Matrix multiplication

JULY 28, 2018 - 3 MINUTE READ - MACHINE LEARNING BACKPROPAGATION MATRIX CALCULUS

This is just matrix multiplication.

It’s good to understand how to derive gradients for your neural network. It gets a little hairy when you have
matrix matrix multiplication, such as W X + b . When I was reviewing Backpropagation in CS231n, they
handwaved over this derivation of the loss function L in respect to the weights matrix W :

∂L ∂L ∂D
=
∂W ∂D ∂W

∂L ∂L
T
= X
∂W ∂D

Let:

X = (m, n) input matrix with m features and n samples

W = (H , m) weight matrix with H neurons
D = WX

(H , n) matrix

L = f (D)

scalar value, f is arbitrary loss function

Note that others may use D = XW where X ’s rows are samples and columns are feature dimensions. That’s
ok, you can follow this math and switch the indices and nd the result to be identical.

The canonical neuron is Relu(D + b), but to make things simpler we’ll ignore the nonlinearity and bias and
say the L takes in D instead of Relu(D + b). We want to nd the gradient of L with respect to W to do
gradient descent.

We want to nd , so let’s start by looking at a speci c weight Wdc . This way we can think more easily
∂L

∂W

about the gradient of L for a single weight and extrapolate for all weights W .

∂L ∂L ∂ D ij
= ∑
∂ W dc ∂ D ij ∂ W dc
i,j

∂Dij
Let’s look more closely at the partial of D ij with respect to Wdc . We know that = 0 if i ≠ d because
∂Wdc

D ij is the dot product of row i of W and column j of X . This means the summation can be simpli ed by only
∂Dij
looking at cases where ≠ 0 , which is when i = d .
∂Wdc

∂L ∂ D ij ∂L ∂ D dj
∑ = ∑
∂ D ij ∂ W dc ∂ D dj ∂ W dc
i,j j

∂Ddj
Finally, what is ?
∂Wdc

D dj = ∑ W dk X kj

k=1

q q
∂ D dj ∂ ∂
= ∑ W dk X kj = ∑ W dk X kj
∂ W dc ∂ W dc ∂ W dc
k=1 k=1

∂ D dj
∴ = X cj
∂ W dc

So to put it all together, we have:

∂L ∂L
= ∑ X cj
∂ W dc ∂ D dj
j

Now how can we simplify this? Well, one quick way is see that the sum over j is doing a dot product on with
row d and column c if we transpose X cj to X jc
T
.

∂L ∂L
T
= ∑ X
jc
∂ W dc ∂ D dj
j

Now we want this for all weights in W , which means we can generalize this to:

∂L ∂L
T
= X
∂W ∂D

1 Comment edwardshu 🔒 Disqus' Privacy Policy  Jason Stanley

 Recommend 1 t Tweet f Share Sort by Best

Join the discussion…

Michael Heinzer • 4 months ago

There is a slightly imprecise notation whenever you sum up to q, as q is never defined. The q term should probably be replaced by m. I would recommend adding the limits of your sum everywhere to make your post more
clear.
△ ▽ • Reply • Share ›

✉ Subscribe d Add Disqus to your siteAdd DisqusAdd ⚠ Do Not Sell My Data

Arihant NCERT Notes India & World Geography - Nihit Kishore
Document369 pages
Arihant NCERT Notes India & World Geography - Nihit Kishore
akpatel1911
83% (12)
Api RP 2SM-2014 (2020)
Document116 pages
Api RP 2SM-2014 (2020)
HASEEB FAYYAZ ABBASI
No ratings yet
Solutions For Chapter 2
Document6 pages
Solutions For Chapter 2
faizee1985
No ratings yet
Green Theorems
Document4 pages
Green Theorems
antonio
No ratings yet
FF Elecreomagnetics1 ZD
Document11 pages
FF Elecreomagnetics1 ZD
Nərgiz Qasımova
No ratings yet
Electric Potential of A Charge Distribution.: Today's Agenda
Document30 pages
Electric Potential of A Charge Distribution.: Today's Agenda
Özgür BOZA
No ratings yet
Lecture 10 - Transmission Lines: X X y y
Document8 pages
Lecture 10 - Transmission Lines: X X y y
samer saeed
No ratings yet
Gradient Notes
Document5 pages
Gradient Notes
ganor44300
No ratings yet
MECN3036 FormulaSheet
Document2 pages
MECN3036 FormulaSheet
Muaaz Moosa
No ratings yet
Equation of Continuity (Conservation of Charge) :: DT DQ
Document3 pages
Equation of Continuity (Conservation of Charge) :: DT DQ
Devkriti Sharma
No ratings yet
Thermodynamics
Document2 pages
Thermodynamics
rgcsm88
No ratings yet
HW5 Sol
Document3 pages
HW5 Sol
oppipx
No ratings yet
Porous Wall Membranas
Document5 pages
Porous Wall Membranas
Roberto Wilson
No ratings yet
Total Differential, Rates of Change and Small Changes
Document6 pages
Total Differential, Rates of Change and Small Changes
Chainarong Taepanich
No ratings yet
Arfken MMCH 9 S 7 e 4
Document4 pages
Arfken MMCH 9 S 7 e 4
Quratulain
No ratings yet
Classification of Equations - Characteristics
Document5 pages
Classification of Equations - Characteristics
Amit Jha
No ratings yet
Electromagnitic
Document13 pages
Electromagnitic
sanjay s
No ratings yet
Integral Theorems: 1 Integral of The Gradient
Document7 pages
Integral Theorems: 1 Integral of The Gradient
Caio Campos
No ratings yet
EMT10 New
Document12 pages
EMT10 New
zcapt
No ratings yet
Transient Waves On Transmission Lines
Document21 pages
Transient Waves On Transmission Lines
atul206
No ratings yet
Vector Space PDF
Document26 pages
Vector Space PDF
Shashi Kumar
No ratings yet
Dash of Maxwells Chapter 2
Document13 pages
Dash of Maxwells Chapter 2
조성철
No ratings yet
Electromagnetism Basics - Formulas, Solutions, Applications
Document4 pages
Electromagnetism Basics - Formulas, Solutions, Applications
carlos120munro
No ratings yet
MATH4052 - Partial Differential Equations
Document9 pages
MATH4052 - Partial Differential Equations
John Chan
No ratings yet
ECU103 Lecture 06
Document30 pages
ECU103 Lecture 06
sanicyril7
No ratings yet
0.1 Derivation of Unsteady Bernoulli's Equation: 2.016 Hydrodynamics
Document1 page
0.1 Derivation of Unsteady Bernoulli's Equation: 2.016 Hydrodynamics
Ihab Omar
No ratings yet
Vector Operators and Plane Curves: Radu Miculescu
Document47 pages
Vector Operators and Plane Curves: Radu Miculescu
prica_adrian
No ratings yet
Transmission Lines: 1 A Lossless Transmission Line
Document20 pages
Transmission Lines: 1 A Lossless Transmission Line
Chandan Choudhary
No ratings yet
1---Mixed-Finite-Elements-for-_2017_Stable-Numerical-Schemes-for-Fluids--Str
Document44 pages
1---Mixed-Finite-Elements-for-_2017_Stable-Numerical-Schemes-for-Fluids--Str
basukumbar
No ratings yet
CIVL3612/9612 Fluid Mechanics: Equation Sheet
Document3 pages
CIVL3612/9612 Fluid Mechanics: Equation Sheet
vincent vivek
No ratings yet
Lec 10 Example PiecewiseGDE 1D Bar H
Document3 pages
Lec 10 Example PiecewiseGDE 1D Bar H
yijewen662
No ratings yet
Vector Calculus
Document17 pages
Vector Calculus
radhjasra
No ratings yet
m341 10 Lecture07 Mass Conserv Diff
Document8 pages
m341 10 Lecture07 Mass Conserv Diff
Deepak Singhal
No ratings yet
Maxwell Equation
Document2 pages
Maxwell Equation
黃士軒
No ratings yet
Linear Quadratic Gradients
Document3 pages
Linear Quadratic Gradients
SuryaTeja Masapogu
No ratings yet
Open Systems: Chemical Potential: 0=Sdt −Vdp+Ndμ→ Ndμ=Vdp−Sdt → Dμ=~V Dp−~S Dt
Document5 pages
Open Systems: Chemical Potential: 0=Sdt −Vdp+Ndμ→ Ndμ=Vdp−Sdt → Dμ=~V Dp−~S Dt
lucia
No ratings yet
KCL KVL 2
Document3 pages
KCL KVL 2
son goku
No ratings yet
Lecture # 02 Goverening Equations and Its Discritization
Document55 pages
Lecture # 02 Goverening Equations and Its Discritization
maheshj90
No ratings yet
EMFIELDS ASSIGNMENT 2
Document5 pages
EMFIELDS ASSIGNMENT 2
Reagan Torbi
No ratings yet
Equations of Motion, Momentum and Energy For Deformable Solids
Document11 pages
Equations of Motion, Momentum and Energy For Deformable Solids
Aditya Agrawal
No ratings yet
Aerodynamic 5
Document15 pages
Aerodynamic 5
حسين نبيل محمد
No ratings yet
E·dS= B · dS = 0 E · dl = − B · dS J + ε B·dl= J + ε: Fundamental constants Unit conversions
Document4 pages
E·dS= B · dS = 0 E · dl = − B · dS J + ε B·dl= J + ε: Fundamental constants Unit conversions
barry
No ratings yet
Timoshenko Beam
Document5 pages
Timoshenko Beam
Nishal Caleb
No ratings yet
Maxwell's Equations-2-9-2022
Document7 pages
Maxwell's Equations-2-9-2022
abstudio0049
No ratings yet
Guass Div TH
Document5 pages
Guass Div TH
arpit sharma
No ratings yet
Divergence Is An Operation On A Vector Yielding A Scalar, Just Like The Dot Product. We Define The Del Operator As A Vector Operator
Document23 pages
Divergence Is An Operation On A Vector Yielding A Scalar, Just Like The Dot Product. We Define The Del Operator As A Vector Operator
Kybs nyhu
No ratings yet
Motion and Deformation
Document3 pages
Motion and Deformation
Patrick Joseph Robles
No ratings yet
Unit 3: Analytic Functions
Document20 pages
Unit 3: Analytic Functions
KAVIN PARITHI.S
No ratings yet
MA111 Lec7 D3D4
Document20 pages
MA111 Lec7 D3D4
pahnhnyk
No ratings yet
1.2 Differential Calculus 1.2.1 "Ordinary" Derivatives: DF DF DX DX
Document7 pages
1.2 Differential Calculus 1.2.1 "Ordinary" Derivatives: DF DF DX DX
nahom tefera
No ratings yet
Lagrangian Properties
Document3 pages
Lagrangian Properties
Pipí
No ratings yet
Thermodynamic Property Relations: Dy y Z DX X Z DZ
Document49 pages
Thermodynamic Property Relations: Dy y Z DX X Z DZ
msgkev
No ratings yet
1 技巧
Document3 pages
1 技巧
lyr13903570599
No ratings yet
LECTURE2
Document7 pages
LECTURE2
kemangpitso
No ratings yet
sol5
Document7 pages
sol5
kumardeepkdm
No ratings yet
Theorie 437253 abWS2017-18 (Engl)
Document174 pages
Theorie 437253 abWS2017-18 (Engl)
neumair.eva
No ratings yet
6 Technical Results
Document3 pages
6 Technical Results
Adrian Marin Jimenez
No ratings yet
1 Thermodynamic Relations: 1.1 Relations For Energy Properties
Document24 pages
1 Thermodynamic Relations: 1.1 Relations For Energy Properties
Sarthak
100% (2)
Summary Chapter2
Document4 pages
Summary Chapter2
blalll
No ratings yet
Example-Water Flow in A Pipe
Document13 pages
Example-Water Flow in A Pipe
cristinelb
No ratings yet
The Spectral Theory of Toeplitz Operators
From Everand
The Spectral Theory of Toeplitz Operators
L. Boutet de Monvel
No ratings yet
Green's Function Estimates for Lattice Schrödinger Operators and Applications
From Everand
Green's Function Estimates for Lattice Schrödinger Operators and Applications
Jean Bourgain
No ratings yet
Sae Application
Document2 pages
Sae Application
Jason Stanley
No ratings yet
Topic#3.Scalars, Vectors and Tensors
Document64 pages
Topic#3.Scalars, Vectors and Tensors
Jason Stanley
No ratings yet
PARAFAC. Tutorial and Applications: Elsevier
Document23 pages
PARAFAC. Tutorial and Applications: Elsevier
Jason Stanley
No ratings yet
Sejal Divekar Resume1
Document2 pages
Sejal Divekar Resume1
Jason Stanley
No ratings yet
Cublas Library
Document254 pages
Cublas Library
Jason Stanley
No ratings yet
Quadratic Forms and Convexity: Eivind Eriksen
Document22 pages
Quadratic Forms and Convexity: Eivind Eriksen
Jason Stanley
No ratings yet
Marani - Et - Al 2007
Document5 pages
Marani - Et - Al 2007
Jason Stanley
No ratings yet
1 s2.0 S1877050917307858 Main
Document10 pages
1 s2.0 S1877050917307858 Main
Jason Stanley
No ratings yet
Numerical Methods 2
Document281 pages
Numerical Methods 2
Jason Stanley
No ratings yet
Murray, Knaapen 2008
Document18 pages
Murray, Knaapen 2008
Jason Stanley
No ratings yet
Efficient Parallel Non-Negative Least Squares On Multi-Core Architectures
Document16 pages
Efficient Parallel Non-Negative Least Squares On Multi-Core Architectures
Jason Stanley
No ratings yet
BLVS (Bounded - variableLSucb)
Document13 pages
BLVS (Bounded - variableLSucb)
Jason Stanley
No ratings yet
Marani Et Al. 2013
Document5 pages
Marani Et Al. 2013
Jason Stanley
No ratings yet
"List of Possible Topics For OE545 Project
Document1 page
"List of Possible Topics For OE545 Project
Jason Stanley
No ratings yet
Prediction of Warship Manoeuvring Coefficients Using CFD: C. Oldfield, M. Moradi Larmaei, A. Kendrick
Document17 pages
Prediction of Warship Manoeuvring Coefficients Using CFD: C. Oldfield, M. Moradi Larmaei, A. Kendrick
Jason Stanley
No ratings yet
OE5450: Numerical Techniques in Ocean Hydrodynamics: Implementation of FEM On A Rectangular Domain
Document15 pages
OE5450: Numerical Techniques in Ocean Hydrodynamics: Implementation of FEM On A Rectangular Domain
Jason Stanley
No ratings yet
OE4020 Assignment - 1: Jason Stanley.S Na15b021
Document1 page
OE4020 Assignment - 1: Jason Stanley.S Na15b021
Jason Stanley
No ratings yet
Fem Shape Function
Document11 pages
Fem Shape Function
Jason Stanley
No ratings yet
102.20-N1 (1109) Solution
Document140 pages
102.20-N1 (1109) Solution
list16947
No ratings yet
06 WDM教學
Document26 pages
06 WDM教學
Wei Liao
No ratings yet
Nastran Element Quality Checks
Document2 pages
Nastran Element Quality Checks
Anonymous oCOJwW
No ratings yet
Review of Train Wheel Fatigue Life
Document15 pages
Review of Train Wheel Fatigue Life
abdurhman suleiman
No ratings yet
2005KIWYMIC Individual
Document4 pages
2005KIWYMIC Individual
JL
No ratings yet
Tutorial Chapter 2
Document5 pages
Tutorial Chapter 2
Naasir Sheekeye
No ratings yet
Rev01 - QRATE Scanner 3300 Hardware User Manual
Document92 pages
Rev01 - QRATE Scanner 3300 Hardware User Manual
Miguel I. Roman Barrera
No ratings yet
Logical Reasoning
Document37 pages
Logical Reasoning
Pooja Sharma
No ratings yet
Triangle Trigonometry: What This Module Is About
Document25 pages
Triangle Trigonometry: What This Module Is About
isayblank
No ratings yet
Simulation of Indirect Field-Oriented Induction Motor Drive System
Document12 pages
Simulation of Indirect Field-Oriented Induction Motor Drive System
Thinseep
No ratings yet
Tugas Bahasa Inggris
Document24 pages
Tugas Bahasa Inggris
Kenan
No ratings yet
Jahn Teller Theorm 5th Chem
Document10 pages
Jahn Teller Theorm 5th Chem
Muhammad Arham
No ratings yet
Survey Procedures
Document52 pages
Survey Procedures
naina
No ratings yet
dll-math-8-third-quarter_OK
Document26 pages
dll-math-8-third-quarter_OK
Nodelyn Reyes
No ratings yet
(GIS - 23) Lecture 2 - Georeferncing
Document29 pages
(GIS - 23) Lecture 2 - Georeferncing
Philip Wagih
100% (1)
P&ID
Document34 pages
P&ID
Az Zahra Assyifa Ushwah
100% (1)
BURGMANN Manual 15.4
Document153 pages
BURGMANN Manual 15.4
Emmanuel
No ratings yet
Max9924 Max9927
Document23 pages
Max9924 Max9927
someone else
No ratings yet
ME 2110 D The Georgia Tech Experience Final Report
Document38 pages
ME 2110 D The Georgia Tech Experience Final Report
Campbell Patterson
No ratings yet
Questions Met
Document8 pages
Questions Met
Asim Anand
No ratings yet
IO Interfaces-Chp 6
Document45 pages
IO Interfaces-Chp 6
Sahil Nagra
No ratings yet
Air, Atmospheric Pressure and Winds
Document42 pages
Air, Atmospheric Pressure and Winds
Vishnu Prasad
100% (1)
G8 Mathematics Syllabus Sy 2022 2023
Document12 pages
G8 Mathematics Syllabus Sy 2022 2023
AARON ZUELA
No ratings yet
Lecture Note On Mat 313 Optimization Theory Ii 3units
Document12 pages
Lecture Note On Mat 313 Optimization Theory Ii 3units
Adetola Olamide
No ratings yet
Assignment 3 - Semester 1 - 2022
Document11 pages
Assignment 3 - Semester 1 - 2022
Ziphelele Vilakazi
No ratings yet
MCQ Fso Iii, Iv, V PDF
Document4 pages
MCQ Fso Iii, Iv, V PDF
PETRO HOD
No ratings yet
Trading The Connors Windows Strategy
Document11 pages
Trading The Connors Windows Strategy
Joe D
No ratings yet
Beam Pre Stress
Document55 pages
Beam Pre Stress
suranga
No ratings yet