Calc

calculus

Uploaded by

ballistic.code

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Calc

calculus

Uploaded by

ballistic.code

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Calculus for Machine Learning

∗
Caroline Sun
December 2020

1 Introduction
Calculus is going to be an integral part of our next few lectures regarding neural
networks. As a result, this is going to be a crash course into derivatives and
partials—if you’d like to get into more depth, check out the resources at the
end. While it’s going to be difficult to work with this material if you haven’t
been in a calculus class before, try to stick with it: it’ll be worth it.

2 Derivatives
Simply put, a derivative is just the rate of change of a function at a given
point. Even without calculus, we’ve dealt with rate of change and average rate
of change before. Let’s take a look at a basic example: lines.

2.1 Derivative and Slope

We already recognize slope as the rate of change in a line (by definition).

∆y f (x1 ) − f (x0 )
slope = =
∆x x1 − x0

In the example y = 3x + 8, let’s find the rate of change given our equation
above.
∆y f (1) − f (0) 11 − 8
slope = = = =3
∆x 1−0 1−0
In a line, the rate of change at any point is constant, it’s already defined by
the slope! Let’s take a look at more complex curves.

2.2 Curves, Secant Slopes, and Derivative

Given the parabola y = x2 , let’s try to find the rate of change at x = 1.
∗ based on Kevin Fu’s lecture of the same name

1
Figure 1: y = x2 with secant y = 2x

We could approximate this value by finding the average rate of change, A,

of the function between x = 1 and a close number.

f (b) − f (a)
A= (1)
b−a
Let’s start by finding the average rate of change between x = 0 and x = 2,
then. We would get f (2)−f
2−1
(1)
= 4−1
2−1 = 3. Now, let’s take a closer look, and
zoom into the average rate of change from x = 1 to x = 1.25. f (1.25)−f1.25−1
(1)
=
1.5625−1
1.25−1 = 2.25. Even closer, let’s try x = 1 to x = 1.001, which would give us
a slope of 2.001. Does it look like it’s approaching a number?
Essentially, we are trying to use our equation for the average rate of change,
and find a b-value as close as we possibly can to our a value. Mathematically,
that’s a limit that we can express by the following expression.

∆y f (x0 + ∆x) − f (x0 )

= lim (2)
∆x ∆x→0 ∆x
This equation is how we can formally describe a derivative mathematically,
not words. The ways we usually express derivative of function are most com-
dy d df
monly f’(x), dx , dx f(x), and dx .
As a side note, a common real world functions of derivatives includes finding
the velocity of a graph at a time (by using the graph/equation of position versus
time), which is often used in physics.

2
2.3 Run-through of Taking the Derivative of a Function
Fair warning, this section is going to be the most dense of this lecture. Granted,
it’s not a fully replacement of the first month of calculus, but we’re going to
try to distill essential information to take the derivative of the sigmoid function
later.
Luckily for us, we don’t have to use the limit definition of a derivative to
find the derivative of a function. We have several essential derivative rules that,
once established, we can use to find the derivative of complex functions.
First, let’s just remember that for any equation f(x) = c, f’(x) = 0. Why?
There is no change. We can circle back to the our explanation of a derivative
of a line. In this function, the slope is also 0.
The first derivative rule is known as the Power Rule. This is because it
applies for any function of the form f(x) = xn . For any function of that form,
the derivative will be f ’(x) = nxn-1 . We can use the power rule on our parabola
y = x2 example in 2.2, and confirm that the derivative at x = 1 is 2.
A special function is f(x) = ex , where the derivative will also equal f’(x) =
e . The derivative of its inverse, ln(x), is x1 .
x

However, most functions we approach are not that simple. As a result, we

have the chain rule, the sum rule, and the product/quotient rules.
The chain rule helps us find the derivative for a function containing layers
2
of a basic function. For example, the function f(x) = e3x is a function within
a function, the outermost function being ey (where y = 3x2 ) and the inner
function being 3x2 . We can’t just use the ex rule on this function: a change in
x will also affect the inner function as well. In the chain rule, we will take the
derivative of the outer function, and multiply that by the derivative of the inner
function. Giving the outermost function the name f(x) and inner function g(x)
in this example, the chain rule is
d
f (g(x)) = f 0 (g(x))g 0 (x) (3)
dx
Here, we would take the derivative of the big function f’(g(x)) which equals
2 d 3x2
e3x and multiply it by g’(x), which is 6x per the power rule, resulting in (e )
dx
3x2
= 6xe .
The chain rule also works for deeper functions. If we had three layers of
functions, the chain rule would extend to
d
f (g(h(x))) = f 0 (g(h(x)))g 0 (h(x))h0 (x) (4)
dx
Thankfully, the next rule, the sum rule, is much less complex than the chain rule.
For a function f(x) = h(x) + g(x), where h(x) and g(x) are smaller functions,
f’(x) = h’(x) + g’(x). Through the sum rule, we know that the derivative of
f(x) = ex + x2 is f’(x) = ex + 2x.
Next, we can use the product rule. For a function f(x) = h(x)g(x), its
derivative is

3
f 0 (x) = h(x)g 0 (x) + h0 (x)g(x) (5)
An example is f(x) = 3x2 ex , where f’(x) = 6x·ex + 3x2 ex = 3xex (x + 2).
The last relevant rule is the derivative rule is the quotient rule. For a function
g(x)
f(x) = h(x) , its derivative is

h(x)g 0 (x) − h0 (x)g(x)

f 0 (x) = (6)
h(x)2
which can be remembered as “down d-up minus up d-down over down down.”
This is not an extensive list of derivative rules! It’s just the basics of what we
need to solve our sigmoid function. For starters, it doesn’t include trigonometric
derivatives. If you’re curious about those, check out the references section.

3 Sigmoid Derivative
The sigmoid function is
1
S(x) =
1 + e-x

Figure 2: Sigmoid function

and will be relevant in future neural

0
network
0
lectures as a common activation
function. We can use f’(x) = h(x)g (x)−h
h(x)2
(x)g(x)
as a first step to tackling this,
-x
with h(x) = 1 and g(x) = 1 + e . In this case, h’(x) = 0, as 1 is a constant,
and g’(x) = -e-x per the sum rule and chain rule on 1 + e-x . Therefore, using
e-x
the quotient rule we’d get S’(x) = (1+e -x )2 .
Can we simplify this further by writing S’(x) in terms of S(x)?
let’s rewrite our answer to be
1 e−x e−x
( ) = S(x)
1 + e−x 1 + e−x 1 + e−x
and if we further split the right hand side, we get
(1 + e−x ) − 1 1 + e−x 1
= S(x) = S(x)( − )
1 + e−x 1 + e−x 1 + e−x
giving us

4
S 0 (x) = S(x)(1 − S(x)) (7)
This equation fully gives us S’(x) in terms of S(x), allowing us to only use
the S(x) function to find S’(x).

4 More Dimensions
Another use of derivatives in machine learning is for training neural networks,
known as gradient descent. At its simplest, a neural network is a chain of
weighted sums, and we use the error of our network, or the cost, to change
our weighted sums to maximize accuracy. Every layer of sums affects the next,
thus many variables end up changing the cost. We can’t just use a derivative
now, as our graph will be n-dimensional, and derivatives apply for the 2-D.
Instead, we use a gradient, ∇f . This gradient is analogous to a derivative in
a multi-variable context. While we won’t have to manually find gradients in
creating neural networks, we’ll try to grasp what a gradient means within a
simple example.
In the graph, z = x2 + y2 , it’s a parabaloid.

Figure 3: Paraboloid z = x2 + y2

Imagine laying a small ball onto the paraboloid: it’ll roll downwards, stop-
ping at (0, 0, 0). The negative of the gradient gives us the specific direction it
will fall down to.
Finding the gradient is using the same derivatives we found earlier, except it’s
a vector, with each component being the derivative of the function with respect

5
to one variable (and treating the other variables, temporarily, as constants).
These component derivatives are called partials. The partials of z with respect
to x and y, are the following:
∂z ∂z
,
∂x ∂y
and our gradient is just the those put into vector form, so
∂z ∂z
∇z = h , i
∂x ∂y
In our paraboloid example, the gradient would be h2x, 2yi
If we plug in a value into our gradient, we’ll get the direction of the steepest
incline. As a result, if we took the negation of that vector, it’d point to the
direction of steepest descent. In this example, the direction of steepest descent
would be at h−6, −8i at (3, 5, 34). This can be generalized to other functions
and points.

5 Closing
Hopefully, conceptually, derivatives and gradients make sense. However, further
resources for both will be in the next section.

6 Resources
• https://www.khanacademy.org/math/differential-calculus/dc-diff-intro
• https://www.mathsisfun.com/calculus/derivatives-rules.html

• https://www.youtube.com/watch?v=WUvTyaaNkzMlist=PL0-GT3co4r2wlh6UHTUeQsrf3mlS2lk6x
• https://www.khanacademy.org/math/multivariable-calculus/multivariable-
derivatives/partial-derivative-and-gradient-articles/a/the-gradient
• https://betterexplained.com/articles/vector-calculus-understanding-the-gradient/

7 References
Desmos and Geogebra both used to generate graphs.

TSM WebsiteContents
0% (1)
TSM WebsiteContents
21 pages
Calculus Overview: 0.1 Limits and Continuity
No ratings yet
Calculus Overview: 0.1 Limits and Continuity
14 pages
Introduction To The Calculus of Variations
100% (1)
Introduction To The Calculus of Variations
12 pages
MATH 31B - Week 1 Exponential, Inverse Functions, and Logarithmic Functions (I)
No ratings yet
MATH 31B - Week 1 Exponential, Inverse Functions, and Logarithmic Functions (I)
3 pages
Partial Diffrentiation and Mutiple Integrals
No ratings yet
Partial Diffrentiation and Mutiple Integrals
79 pages
Representation and Types of Functions: 1 in This Section
No ratings yet
Representation and Types of Functions: 1 in This Section
8 pages
Calculus Stanford
No ratings yet
Calculus Stanford
24 pages
CHAPTER II1a
No ratings yet
CHAPTER II1a
19 pages
Differentiation & Integral
No ratings yet
Differentiation & Integral
12 pages
Difference Calculus: N K 1 3 M X 1 N y 1 2 N K 0 K
No ratings yet
Difference Calculus: N K 1 3 M X 1 N y 1 2 N K 0 K
9 pages
Arsdigita University Month 0: Mathematics For Computer Science
No ratings yet
Arsdigita University Month 0: Mathematics For Computer Science
24 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Algebra-6 FULL
No ratings yet
Algebra-6 FULL
22 pages
2 5 Lecture Notes
No ratings yet
2 5 Lecture Notes
8 pages
Introduction, Function and Limits With Derivatives: Module Outline
No ratings yet
Introduction, Function and Limits With Derivatives: Module Outline
10 pages
Chapter 2 Differentiation
No ratings yet
Chapter 2 Differentiation
4 pages
Lecture21 PDF
No ratings yet
Lecture21 PDF
4 pages
Calculus I
No ratings yet
Calculus I
51 pages
MATH 115: Lecture VIII Notes
No ratings yet
MATH 115: Lecture VIII Notes
2 pages
MTH 211
100% (1)
MTH 211
39 pages
Chain Rule: Motivation: Rule. There Are Two Forms of The Chain Rule. Here They Are. Suppose That We Have Two Functions
No ratings yet
Chain Rule: Motivation: Rule. There Are Two Forms of The Chain Rule. Here They Are. Suppose That We Have Two Functions
8 pages
Week 7 Differentiation Rules
No ratings yet
Week 7 Differentiation Rules
10 pages
Limits: Created by Tynan Lazarus September 24, 2017
No ratings yet
Limits: Created by Tynan Lazarus September 24, 2017
6 pages
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
No ratings yet
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
36 pages
MLPR w0d - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0d - Machine Learning and Pattern Recognition
3 pages
Chapter 6: Integration: Partial Fractions and Improper Integrals
No ratings yet
Chapter 6: Integration: Partial Fractions and Improper Integrals
33 pages
Integral Equations PDF
No ratings yet
Integral Equations PDF
16 pages
MATH
No ratings yet
MATH
43 pages
ENMA 121 Module I
No ratings yet
ENMA 121 Module I
14 pages
Đạo hàm - Tiếng Anh
No ratings yet
Đạo hàm - Tiếng Anh
14 pages
Algebra1section5 2
No ratings yet
Algebra1section5 2
6 pages
Derivative Definition: 1 Derivatives: The Five Basic Rules
No ratings yet
Derivative Definition: 1 Derivatives: The Five Basic Rules
4 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
10.1 Function Notation PDF
No ratings yet
10.1 Function Notation PDF
8 pages
Chapter 2, Lecture 3: Building Convex Functions
No ratings yet
Chapter 2, Lecture 3: Building Convex Functions
4 pages
Part2 Basic Math
No ratings yet
Part2 Basic Math
22 pages
Gmas Differential Calculus 2
No ratings yet
Gmas Differential Calculus 2
13 pages
Math Calculus
No ratings yet
Math Calculus
60 pages
Math 180 Lecture Sheets-M1P1
No ratings yet
Math 180 Lecture Sheets-M1P1
8 pages
Guc 2 61 38781 2023-11-25T16 24 45
No ratings yet
Guc 2 61 38781 2023-11-25T16 24 45
19 pages
MathReview 2
No ratings yet
MathReview 2
31 pages
1 Lecture 14: The Product and Quotient Rule: 1.1 Outline
No ratings yet
1 Lecture 14: The Product and Quotient Rule: 1.1 Outline
4 pages
Practice Problems Set 2
No ratings yet
Practice Problems Set 2
5 pages
Integration by Substitution Tutorial
No ratings yet
Integration by Substitution Tutorial
6 pages
Maths 20031 lecture slits
No ratings yet
Maths 20031 lecture slits
5 pages
Constant Rule: Y=4X Dydx=3⋅4X =12X V=43Πr Dvdr=3⋅43Πr =4Πr
No ratings yet
Constant Rule: Y=4X Dydx=3⋅4X =12X V=43Πr Dvdr=3⋅43Πr =4Πr
13 pages
Cha 3 Calculus Derivative
No ratings yet
Cha 3 Calculus Derivative
79 pages
UGCM1653 - Chapter 2 Calculus - 202001 PDF
No ratings yet
UGCM1653 - Chapter 2 Calculus - 202001 PDF
30 pages
Modified Section 01
No ratings yet
Modified Section 01
7 pages
DC Programming: The Optimization Method You Never Knew You Had To Know
No ratings yet
DC Programming: The Optimization Method You Never Knew You Had To Know
13 pages
LP1 Math1 Unit-2 Edited-083121
No ratings yet
LP1 Math1 Unit-2 Edited-083121
13 pages
Integrating Factor CP 7
No ratings yet
Integrating Factor CP 7
28 pages
Mathematical relations
No ratings yet
Mathematical relations
7 pages
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
Rules of Calculus
No ratings yet
Rules of Calculus
10 pages
MATH03
No ratings yet
MATH03
10 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Fix Data
No ratings yet
Fix Data
148 pages
Critical Thinking - Research Quality Review July 2021
No ratings yet
Critical Thinking - Research Quality Review July 2021
6 pages
Role of Uganda Police Land Desks On Conflict Management in Wakiso and Mukono Districts
No ratings yet
Role of Uganda Police Land Desks On Conflict Management in Wakiso and Mukono Districts
13 pages
MATLAB Homework 5: Exercise 5. 1 Load and Convert Data (2pts)
No ratings yet
MATLAB Homework 5: Exercise 5. 1 Load and Convert Data (2pts)
2 pages
ID Pengaruh Lingkungan Kerja Dan Disiplin Kerja Serta Motivasi Kerja Terhadap Kinerja
No ratings yet
ID Pengaruh Lingkungan Kerja Dan Disiplin Kerja Serta Motivasi Kerja Terhadap Kinerja
19 pages
B.Sc. (H) Probability and Statistics 2011-2012
No ratings yet
B.Sc. (H) Probability and Statistics 2011-2012
2 pages
9 Project Proposal Writing
No ratings yet
9 Project Proposal Writing
19 pages
PERT Example 1
No ratings yet
PERT Example 1
53 pages
Paper DCRE
No ratings yet
Paper DCRE
13 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
23 pages
Chapter-7 Application of Residues
No ratings yet
Chapter-7 Application of Residues
39 pages
AMATH 460: Mathematical Methods For Quantitative Finance: 7.1 Lagrange's Method
No ratings yet
AMATH 460: Mathematical Methods For Quantitative Finance: 7.1 Lagrange's Method
29 pages
final ceat tyre file
No ratings yet
final ceat tyre file
78 pages
An Overview of The Interpolation Toolset: Arcgis 10.2
No ratings yet
An Overview of The Interpolation Toolset: Arcgis 10.2
2 pages
MATHEMATICS - MSC Syl
No ratings yet
MATHEMATICS - MSC Syl
77 pages
Model Paper
No ratings yet
Model Paper
6 pages
Zeno's Arrow, Divisible Infinitesimals and Chrysippus
No ratings yet
Zeno's Arrow, Divisible Infinitesimals and Chrysippus
16 pages
Assignment 6
No ratings yet
Assignment 6
3 pages
Quadratic Functions Critical Thinking
No ratings yet
Quadratic Functions Critical Thinking
6 pages
Randomization Based Inference in R
No ratings yet
Randomization Based Inference in R
19 pages
S1.State Variable Representation
No ratings yet
S1.State Variable Representation
5 pages
E3 - Abs. Extrema and Optimization, Lagrange Multipliers
No ratings yet
E3 - Abs. Extrema and Optimization, Lagrange Multipliers
1 page
Galerkin's Method: APL705 Finite Element Method
No ratings yet
Galerkin's Method: APL705 Finite Element Method
3 pages
Lecture 18 Mechanical Vibrations
No ratings yet
Lecture 18 Mechanical Vibrations
25 pages
Organisation Chart As On 10.08.2018 (Nerldc)
No ratings yet
Organisation Chart As On 10.08.2018 (Nerldc)
2 pages
Skills in Mathematics Differential Calculus For JEE Main and Advanced 2022
88% (8)
Skills in Mathematics Differential Calculus For JEE Main and Advanced 2022
561 pages
1.4 Properties of Lebesgue Measure
No ratings yet
1.4 Properties of Lebesgue Measure
4 pages
Multiple-Choice Test Shooting Method: X X DX y D
No ratings yet
Multiple-Choice Test Shooting Method: X X DX y D
4 pages
Numerical Optimization
No ratings yet
Numerical Optimization
12 pages