0% found this document useful (0 votes)

134 views

Gradient Descent Algorithms and Variations - PyImageSearch

Uploaded by

ROHIT ARORA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views

Gradient Descent Algorithms and Variations - PyImageSearch

Uploaded by

ROHIT ARORA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

PYIMAGESE

ARCH

DEEP LEARNING (HTTPS://PYIMAGESEARCH.COM/CATEGORY/DEEP-LEARNING/)

TUTORIALS (HTTPS://PYIMAGESEARCH.COM/CATEGORY/TUTORIALS/)

Gradient Descent Algorithms and

Variations
by Adrian Rosebrock (https://pyimagesearch.com/author/adrian/) on May 5, 2021

6:29
In this tutorial, you will learn:

What gradient descent is

How gradient descent enables us to train neural networks

Variations of gradient descent, including Stochastic Gradient Descent (SGD)

How SGD can be improved using momentum and Nesterov acceleration

p g

To learn about gradient descent and its variations, just keep reading.

Gradient Descent Algorithms and Variations

When it comes to training a neural network, gradient descent isn’t just the workhorse — it’s the
plow that tills the ground and the farmer that controls where the plow is going.

There have been a tremendous number of variations of gradient descent and optimizers, ranging
from your vanilla gradient descent, mini-batch gradient descent, Stochastic Gradient Descent
(SGD), and mini-batch SGD, just to name a few.

Furthermore, entirely new model optimizers have been designed with improvements to SGD in
mind, including Adam, Adadelta, RMSprop, and others.

Today we are going to review the fundamentals of gradient descent and focus primarily on SGD,
including two improvements to SGD, momentum and Nesterov acceleration.

What is gradient descent?

What is gradient descent?
Gradient descent is a first-order optimization algorithm. The goal of gradient descent is to find a
local minimum of a differentiable function.

We perform gradient descent iteratively:

1 We start by taking our cost/loss function (i.e., the function responsible for computing the
value we want to minimize)

2 We then compute the gradient of the loss

3 And finally, we take a step in the direction opposite of the gradient (since this will take us
down the path to our local minimum)

The following figure summarizes gradient descent concisely:

Build your own AI Bots
Join our GPT Kickstarter Campaign
02 09 20 28
DAYS HOURS MINUTES SECONDS

Figure 1: The goal of gradient descent to iteratively take steps towards lower areas of the loss landscape, similar
to descending to the bottom of a parabola, but in multiple dimensions (image source

(https://medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1)).

But how does this apply to neural networks and deep learning?
Let’s address that in the next section.

How does gradient descent power neural networks and deep

learning?

Figure 2: Forward and backpropagation of a neural network (image source

(https://datascience.stackexchange.com/questions/44703/how-does-gradient-descent-and-backpropagation-
work-together)).

A neural network consists of one or more hidden layers. Each layer consists of a set of
parameters. Our goal is to optimize these parameters such that our loss is minimized.

Typical loss functions include binary cross-entropy (two-class classification), categorical cross-
entropy (multi-class classification), mean squared error (regression), etc.

There are many types of loss functions, each of which are used in certain roles. Instead of getting
too caught up in which loss function is being used, instead think of it this way:

1 We initialize our neural network with a random set of weights

2 We ask the neural network to make a prediction on a data point from our training set

3 We compute the prediction and then the loss/cost function, which tells us how good/bad of
a job we did at making the correct prediction
4 We compute the gradient off the loss

5 And then we ever-so-slightly tweak the parameters of the neural network such that our
predictions are better

We do this over and over again until our model is said to “converge” and is able to make reliable,
accurate predictions.

There are many types of gradient descent algorithms, but the types we’ll be focusing on here
today are:

1 Vanilla gradient descent

2 Stochastic Gradient Descent (SGD)

3 Mini-batch SGD

4 SGD with momentum

5 SGD with Nesterov acceleration

Vanilla gradient descent

Consider an image dataset of N=10,000 images. Our goal is to train a neural network to classify
each of these 10,000 images into a total of T=10 categories.

To train a neural network on this dataset we would utilize gradient descent.

The most basic form of gradient descent, which I like to call vanilla gradient descent, we only
update the weights of the network once per update.

What that means is:

1 We run all 10,000 images through our network

2 We compute the loss and the gradient

3 We update the parameters of the network

In vanilla gradient descent we only update the network’s weights once per iteration, meaning that
the network sees the entire dataset every time a weight update is performed.
y g p p

In practice, that’s not very useful.

If the number of training examples is large, then vanilla gradient descent is going to take a
long time to converge due to the fact that a weight update is only happening once per data
cycle.

Furthermore, the larger your dataset gets, the more nuanced your gradients can become, and if
you’re only updating the weights once per epoch then you’re going to be spending the
majority of your time computing predictions and not much time actually learning (which is the
goal of an optimization problem, right?)

Luckily, there are other variations of gradient descent that address this problem.

Stochastic Gradient Descent (SGD)

Unlike vanilla gradient descent, which only does one weight update per epoch, Stochastic
Gradient Descent (SGD) instead does multiple weight updates.

The original formulation of SGD would do N weight updates per epoch where N is equal to the
total number of data points in your dataset. So, using our example above, if we have N=10,000
images, then we would have 10,000 weight updates per epoch.

The SGD algorithm becomes:

Until convergence:

Randomly select a data point from our dataset

Make a prediction on it

Compute the loss and the gradient

Update the parameters of the network

SGD tends to converge much faster because it’s able to start improving itself after each and
every weight update.

That said, performing N weight updates per epoch (where N is equal to the total number of data
points in our dataset) is also a bit computationally wasteful — we’ve now swung to the other side
of the pendulum.

What e need instead is a median bet een the t o

What we need instead is a median between the two.

Mini-batch SGD

Figure 3: Top: Vanilla gradient descent. Bottom: An illustration of mini-batch SGD with a batch size of S=3. At each itera
sampled, predictions are made, loss is computed, and parameters to the network are updated (image so
(https://kenndanielso.github.io/mlrefined/blog_posts/13_Multilayer_perceptrons/13_6_Stochastic_and_minibatch_gra

While SGD can converge faster for large datasets, we actually run into another problem — we
cannot leverage our vectorized libraries that make training super fast (again, because we are only
passing one data point at a time through the network).

There is a variation of SGD called mini-batch SGD that solves this problem. When you hear
people talking about SGD what they are almost always referring to is mini-batch SGD.

Mini-batch SGD introduces the concept of a batch size, S. Now, given a dataset of size N, there
will be a total of N / S updates to the network.

We can summarize the mini-batch SGD algorithm as:

Randomly shuffle the input data

Until convergence:
g

Select the next batch of data of size S

Make predictions on the subset

Calculate the loss and mean gradient of the mini-batch

Update the parameters of the network

If you visualize each mini-batch directly then you’ll see a very noisy plot, such as the following
one:

Figure 4: Plotting the loss of every mini-batch can lead to a very noisy plot (image source
(https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3)).

But when you average out the loss across all mini-batches the plot is actually quite stable:
Figure 5: Averaging the mini-batch loss over the course of an entire epoch leads to more stable-
looking plots (image source (https://towardsdatascience.com/gradient-descent-algorithm-and-
its-variants-10f652806a3)).

Note: Depending on what deep learning library you are using you may see both types of plots.

When you hear deep learning practitioners talking about SGD they are more than likely talking
about mini-batch SGD.

SGD with momentum

Figure 6: Applying SGD with momentum can improve our ability to navigate ravines in the loss landscape (image
source (https://eloquentarduino.github.io/2020/04/stochastic-gradient-descent-on-your-microcontroller/)).

SGD has a problem when navigating areas of the loss landscape that are significantly steeper in
one dimension than in others (which you’ll see around local optima).

When this happens, it appears that SGD simply oscillates the ravine instead of descending into
areas of lower loss and ideally lower accuracy (see Sebastian Ruder’s excellent article
(https://ruder.io/optimizing-gradient-descent/) for more details on this phenomenon).

By applying momentum (Figure 6) we build up a head of steam in a direction and then allow
gravity to roll us faster and faster down the hill.

Typically you’ll see a momentum value of 0.9 in most SGD applications.

Get used to seeing momentum when using SGD — it is used in the majority of neural network
experiments that apply SGD.

SGD with Nesterov acceleration

Figure 7: Nesterov acceleration is an extension to SGD that may lead to better optimization in some cases (image
source (https://cs231n.github.io/neural-networks-3/)).

The problem with momentum is that once you develop a head of steam, the train can easily
become out of control and roll right over our local minima and back up the hill again.

Basically, we shouldn’t be blindly following the slope of the gradient.

Nesterov acceleration accounts for this and helps us recognize when the loss landscape starts
sloping back up again.

Nearly all deep learning libraries that contain a SGD implementation also include momentum and
Nesterov acceleration terms.

Momentum is nearly always a good idea. Nesterov acceleration works in some situations and not
in others. You’ll want to treat them as hyperparameters you need to tune when training your
neural networks (i.e., pick values for each, run an experiment, log the results, update the
parameters, and repeat until you find a set of hyperparameters that yields good results).

Furthermore, we have an entire set of tutorials on hyperparameter optimization which you can
find here. (https://pyimagesearch.com/2021/05/17/introduction-to-hyperparameter-tuning-
with-scikit-learn-and-python/)

What's next? I recommend PyImageSearch University

(https://pyimagesearch.com/pyimagesearch-
( p py g py g
university/?
utm_source=blogPost&utm_medium=bottomBanner&u
tm_campaign=What%27s%20next%3F%20I%20recom
mend).

3:52

Course information:
79 total classes • 101+ hours of on-demand code walkthrough videos • Last updated:
August 2023
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision
and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming,
overwhelming, and complicated? Or has to involve complex mathematics and
equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain
things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to
change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be
PyImageSearch University, the most comprehensive computer vision, deep learning,
and OpenCV course online today. Here you’ll learn how to successfully and
confidently apply computer vision to your work, research, and projects. Join me in
computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 79 courses on essential computer vision, deep learning, and OpenCV topics

✓ 79 Certificates of Completion

✓ 101+ hours of on-demand video

✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-
art techniques

✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Run all code examples in your web browser — works on Windows, macOS, and
Linux (no dev environment configuration required!)

✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch

✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

✓ Access on mobile, laptop, desktop, etc.

CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY

(HTTPS://PYIMAGESEARCH.COM/PYIMAGESEARCH-UNIVERSITY/?
UTM_SOURCE=BLOGPOST&UTM_MEDIUM=BOTTOMBANNER&UTM_CAMPA
IGN=WHAT%27S%20NEXT%3F%20I%20RECOMMEND)

Summary
In this tutorial, you learned about gradient and descent and its variations, namely Stochastic
Gradient Descent (SGD).
Gradient Descent (SGD).

SGD is the workhorse of deep learning. All optimizers, including Adam, Adadelta, RMSprop, etc.,
have their roots in SGD — each of these optimizers provides tweaks and variations to SGD,
ideally improving convergence and making the model more stable during training.

We’ll cover these more advanced optimizers soon, but for the time being, understand that SGD is
the basis of all of them.

We can further improve SGD by including a momentum term (nearly always recommended).

Occasionally, Nesterov acceleration can further improve SGD (dependent on your specific
project).

To download the source code to this post (and be notified when future tutorials are published
here on PyImageSearch), simply enter your email address in the form below!
Join the PyImageSearch Newsletter and Grab My FREE 17-
page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my
FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
About the Author
Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and
researchers wasting their time, studying the wrong things, and generally struggling
to get started with Computer Vision, Deep Learning, and OpenCV. I created this
website to show you what I believe is the best possible way to get your start.

Face Recognition with Local Binary Patterns (LBPs) and OpenCV

(https://pyimagesearch.com/2021/05/03/face-recognition-with-local-binary-patterns-lbps-and-
opencv/)
Next Article:

Understanding weight initialization for neural networks

(https://pyimagesearch.com/2021/05/06/understanding-weight-initialization-for-neural-
networks/)

Comment section

Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love
hearing from readers, a couple years ago I made the tough decision to no longer offer
1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post
comments. I simply did not have the time to moderate and respond to them all, and
the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and
OpenCV community at large by focusing my time on authoring high-quality blog
posts, tutorials, and books/courses.

If you need help learning computer vision and deep learning, I suggest you refer to
my full catalog of books and courses (https://pyimagesearch.com/books-and-
courses/) — they have helped tens of thousands of developers, students, and
researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.

Click here to browse my full catalog. (https://pyimagesearch.com/books-and-

courses/)

DEEP LEARNING KERAS AND TENSORFLOW TUTORIALS

Fire and smoke detection with Keras and Deep Learning

November 18, 2019

(https://pyimagesearch.com/2019/11/18/fire-and-smoke-detection-with-keras-and-
deep-learning/)

ENSURING RESEARCH VISIBILITY RESEARCH PAPER RESEARCH TIPS TUTORIALS

Ensuring Your Research Stays Visible and General Tips
June 1, 2022
(https://pyimagesearch.com/2022/06/01/ensuring-your-research-stays-visible-and-
general-tips/)

TUTORIALS

Skin Detection: A Step-by-Step Example using Python and OpenCV

August 18, 2014
(https://pyimagesearch.com/2014/08/18/skin-detection-step-step-example-using-
python-opencv/)
You can learn Computer Vision, Deep Learning, and OpenCV.
Get your FREE 17 page Computer Vision, OpenCV, and Deep Learning Resource Guide PDF. Inside
you’ll find our hand-picked tutorials, books, courses, and libraries to help you master CV and DL.

Machine Learning and Computer Vision

(https://pyimagesearch.com/category/machine-
Topics learning-2/)

Medical Computer Vision

Deep Learning
(https://pyimagesearch.com/category/medical/)
(https://pyimagesearch.com/category/deep-
learning-2/) Optical Character Recognition (OCR)
(https://pyimagesearch.com/category/optical-
Dlib Library
character-recognition-ocr/)
(https://pyimagesearch.com/category/dlib/)
Object Detection
Embedded/IoT and Computer Vision
(https://pyimagesearch.com/category/embedded/) (https://pyimagesearch.com/category/object-
detection/)
Face Applications
Object Tracking
(https://pyimagesearch.com/category/faces/)
(https://pyimagesearch.com/category/object-
Image Processing tracking/)
(https://pyimagesearch.com/category/image-
OpenCV Tutorials
processing/)
(https://pyimagesearch.com/category/opencv/)
Interviews
Raspberry Pi
(https://pyimagesearch.com/category/interviews/)
(https://pyimagesearch.com/category/raspberry-
Keras (https://pyimagesearch.com/category/keras/) pi/)
OpenCV Install Guides
(https://pyimagesearch.com/opencv-tutorials-
resources-guides/)

Books & Courses PyImageSearch

PyImageSearch University Affiliates (https://pyimagesearch.com/affiliates/)

(https://pyimagesearch.com/pyimagesearch-
Get Started (https://pyimagesearch.com/start-
university/)
here/)
FREE CV, DL, and OpenCV Crash Course
About (https://pyimagesearch.com/about/)
(https://pyimagesearch.com/free-opencv-
computer-vision-deep-learning-crash-course/) Consulting (https://pyimagesearch.com/consulting-
2/)
Practical Python and OpenCV
(https://pyimagesearch.com/practical-python- Coaching (https://pyimagesearch.com/consult-
opencv/) adrian/)

Deep Learning for Computer Vision with Python FAQ (https://pyimagesearch.com/faqs/)

(https://pyimagesearch.com/deep-learning-
YouTube (https://pyimagesearch.com/youtube/)
computer-vision-python-book/)
Blog (https://pyimagesearch.com/topics/)
PyImageSearch Gurus Course
(https://pyimagesearch.com/pyimagesearch- Contact (https://pyimagesearch.com/contact/)
gurus/) Privacy Policy (https://pyimagesearch.com/privacy-
policy/)
Raspberry Pi for Computer Vision
(https://pyimagesearch.com/raspberry-pi-for-
computer-vision/)

(https://www.facebook.com/pyimagesearch)
(https://twitter.com/PyImageSearch) (http://www.linkedin.com/pub/adrian-
rosebrock/2a/873/59b) (https://www.youtube.com/channel/UCoQK7OVcIVy-nV4m-
SMCk_Q/videos)

Quiz Week 8 - Unsupervised Learning Clustering
50% (2)
Quiz Week 8 - Unsupervised Learning Clustering
2 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Skip Gram
100% (1)
Skip Gram
37 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Unit 2
No ratings yet
Unit 2
112 pages
Soft Max
No ratings yet
Soft Max
6 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
DL Lab Manual
No ratings yet
DL Lab Manual
65 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Tensor Flow
No ratings yet
Tensor Flow
12 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
Technical Seminar: Sapthagiri College of Engineering
No ratings yet
Technical Seminar: Sapthagiri College of Engineering
18 pages
ch9 Ensemble Learning
No ratings yet
ch9 Ensemble Learning
19 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
ML Lab
No ratings yet
ML Lab
21 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
6 - Train - Test - Split - Ipynb - Colaboratory
No ratings yet
6 - Train - Test - Split - Ipynb - Colaboratory
5 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
ML Unit-3 Notes
No ratings yet
ML Unit-3 Notes
26 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Hyper-Parameter Tuning Techniques in Deep Learning - Towards Data Science
No ratings yet
Hyper-Parameter Tuning Techniques in Deep Learning - Towards Data Science
14 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Deep Learning Lab Manual - IGDTUW - Vinisky Kumar
100% (1)
Deep Learning Lab Manual - IGDTUW - Vinisky Kumar
33 pages
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
100% (1)
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
33 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
32 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
No ratings yet
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
15 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Lecture 2 Prompt Engineering
No ratings yet
Lecture 2 Prompt Engineering
60 pages
2.neural Network
No ratings yet
2.neural Network
19 pages
Neural Network
No ratings yet
Neural Network
16 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
SVM
No ratings yet
SVM
12 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
schleicher-2018-the-conjugate-gradient-method
No ratings yet
schleicher-2018-the-conjugate-gradient-method
3 pages
Ray-based_seismic_modeling
No ratings yet
Ray-based_seismic_modeling
20 pages
Goyes 2021 Consensus
No ratings yet
Goyes 2021 Consensus
6 pages
amato-del-monte-2015-seismic-petrophysics-part-2
No ratings yet
amato-del-monte-2015-seismic-petrophysics-part-2
3 pages
bougher-2015-introduction-to-compressed-sensing
No ratings yet
bougher-2015-introduction-to-compressed-sensing
2 pages
Understanding Weight Initialization For Neural Networks - PyImageSearch
No ratings yet
Understanding Weight Initialization For Neural Networks - PyImageSearch
16 pages
Jge5 1 008
No ratings yet
Jge5 1 008
10 pages
Best Shortestpath
No ratings yet
Best Shortestpath
142 pages
Testing
No ratings yet
Testing
87 pages
Dynamic Ray Trace
No ratings yet
Dynamic Ray Trace
79 pages
Good Thesis
No ratings yet
Good Thesis
85 pages
MUSFIRAH 15273 Full
No ratings yet
MUSFIRAH 15273 Full
104 pages
Ray Theoretical Traveltime Inversion of Seismic Data in Two-Dimen
No ratings yet
Ray Theoretical Traveltime Inversion of Seismic Data in Two-Dimen
159 pages
Thesis 1392349859
No ratings yet
Thesis 1392349859
16 pages
Gaussian Beamlet 2022 0
No ratings yet
Gaussian Beamlet 2022 0
24 pages
Notional Ghosts
No ratings yet
Notional Ghosts
5 pages
3 DSeismic Waveform Classification
No ratings yet
3 DSeismic Waveform Classification
5 pages
Lecture10 p1
No ratings yet
Lecture10 p1
42 pages
Incorporating-Depth-dependency-in-QI - Through-Statistical-RP
No ratings yet
Incorporating-Depth-dependency-in-QI - Through-Statistical-RP
4 pages
02 Rendering Zsolnai Ray Tracing
No ratings yet
02 Rendering Zsolnai Ray Tracing
388 pages
Chen - Sediment Dispersal and Redistributive Processes in Axial and Transverse Deep Time
No ratings yet
Chen - Sediment Dispersal and Redistributive Processes in Axial and Transverse Deep Time
23 pages
Zhang - How Small Slip Surfaces Evolve Into Large Submarine Landslides Insight From 3D Numerical
No ratings yet
Zhang - How Small Slip Surfaces Evolve Into Large Submarine Landslides Insight From 3D Numerical
24 pages
Note Splines
No ratings yet
Note Splines
52 pages
Distribution Characteristics of Quaternary Channel
No ratings yet
Distribution Characteristics of Quaternary Channel
14 pages
GABRIELSSON - Music Performance Research Millennium
No ratings yet
GABRIELSSON - Music Performance Research Millennium
53 pages
Let S Get Digital
No ratings yet
Let S Get Digital
6 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
DABD (KMBNIT01) Model Paper With Solution
No ratings yet
DABD (KMBNIT01) Model Paper With Solution
19 pages
A PRACTICAL GUIDE TO DATA ANALYTICS IN AML
100% (1)
A PRACTICAL GUIDE TO DATA ANALYTICS IN AML
12 pages
FutureSkills4All - Learning Pathways - EN
No ratings yet
FutureSkills4All - Learning Pathways - EN
22 pages
Unsupervised Anomaly Detection For X-Ray Images
No ratings yet
Unsupervised Anomaly Detection For X-Ray Images
22 pages
Synopsis Of Final Year Project[2024]
No ratings yet
Synopsis Of Final Year Project[2024]
7 pages
Using Multi-Criteria Decision-Making and Machine Learning For Football Player Selection and Performance Prediction A Systematic Review
No ratings yet
Using Multi-Criteria Decision-Making and Machine Learning For Football Player Selection and Performance Prediction A Systematic Review
10 pages
Sentiment Analysis Using Bert On Yelp Restaurant Reviews
No ratings yet
Sentiment Analysis Using Bert On Yelp Restaurant Reviews
63 pages
Modern Optimization With R Use R 2nd Ed 2021 3030728188 9783030728182 - Compress
No ratings yet
Modern Optimization With R Use R 2nd Ed 2021 3030728188 9783030728182 - Compress
264 pages
Revised Blueprint, Software Engineering
No ratings yet
Revised Blueprint, Software Engineering
23 pages
Refrences
No ratings yet
Refrences
5 pages
Unit Iii ML MCQ
100% (1)
Unit Iii ML MCQ
7 pages
Rapport Pfe
No ratings yet
Rapport Pfe
82 pages
Loan Mount Prediction. ML Project report
No ratings yet
Loan Mount Prediction. ML Project report
6 pages
Data Science Cheat Sheet 1666858346
No ratings yet
Data Science Cheat Sheet 1666858346
84 pages
Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth
No ratings yet
Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth
14 pages
Conference Latex Template
No ratings yet
Conference Latex Template
6 pages
BCG Virtual Experience Task 3 Feature Engineering1
No ratings yet
BCG Virtual Experience Task 3 Feature Engineering1
12 pages
FAI-question bank
No ratings yet
FAI-question bank
4 pages
FAI_QUESTION_BANK.docx
No ratings yet
FAI_QUESTION_BANK.docx
2 pages
7 Leading Machine Learning Use Cases
No ratings yet
7 Leading Machine Learning Use Cases
11 pages
@ Applied Learning Algorithms For Intelligent IoT
100% (1)
@ Applied Learning Algorithms For Intelligent IoT
369 pages
BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and Machine-Learning Raj Kamal - The ebook version is available in PDF and DOCX for easy access
No ratings yet
BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and Machine-Learning Raj Kamal - The ebook version is available in PDF and DOCX for easy access
61 pages
I Semester: Controller of Examinations
No ratings yet
I Semester: Controller of Examinations
15 pages
Assignment 1
No ratings yet
Assignment 1
9 pages
Plant Disease Detection Using Deep Learning
No ratings yet
Plant Disease Detection Using Deep Learning
5 pages
ML Assignments
No ratings yet
ML Assignments
2 pages

Gradient Descent Algorithms and Variations - PyImageSearch

Uploaded by

Gradient Descent Algorithms and Variations - PyImageSearch

Uploaded by

PYIMAGESE

DEEP LEARNING (HTTPS://PYIMAGESEARCH.COM/CATEGORY/DEEP-LEARNING/)

Gradient Descent Algorithms and

What gradient descent is

How gradient descent enables us to train neural networks

Variations of gradient descent, including Stochastic Gradient Descent (SGD)

How SGD can be improved using momentum and Nesterov acceleration

Gradient Descent Algorithms and Variations

What is gradient descent?

We perform gradient descent iteratively:

2 We then compute the gradient of the loss

The following figure summarizes gradient descent concisely:

How does gradient descent power neural networks and deep

Figure 2: Forward and backpropagation of a neural network (image source

1 We initialize our neural network with a random set of weights

1 Vanilla gradient descent

2 Stochastic Gradient Descent (SGD)

4 SGD with momentum

5 SGD with Nesterov acceleration

Vanilla gradient descent

To train a neural network on this dataset we would utilize gradient descent.

What that means is:

1 We run all 10,000 images through our network

2 We compute the loss and the gradient

3 We update the parameters of the network

In practice, that’s not very useful.

Stochastic Gradient Descent (SGD)

The SGD algorithm becomes:

Randomly select a data point from our dataset

Compute the loss and the gradient

Update the parameters of the network

What e need instead is a median bet een the t o

We can summarize the mini-batch SGD algorithm as:

Randomly shuffle the input data

Select the next batch of data of size S

Make predictions on the subset

Calculate the loss and mean gradient of the mini-batch

Update the parameters of the network

SGD with momentum

Typically you’ll see a momentum value of 0.9 in most SGD applications.

SGD with Nesterov acceleration

Basically, we shouldn’t be blindly following the slope of the gradient.

What's next? I recommend PyImageSearch University

That’s not the case.

Inside PyImageSearch University you'll find:

✓ 79 courses on essential computer vision, deep learning, and OpenCV topics

✓ 101+ hours of on-demand video

✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Access to centralized code repos for all 512+ tutorials on PyImageSearch

✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

✓ Access on mobile, laptop, desktop, etc.

CLICK HERE TO JOIN PYIMAGESEARCH UNIVERSITY

Face Recognition with Local Binary Patterns (LBPs) and OpenCV

Understanding weight initialization for neural networks

Click here to browse my full catalog. (https://pyimagesearch.com/books-and-

DEEP LEARNING KERAS AND TENSORFLOW TUTORIALS

Fire and smoke detection with Keras and Deep Learning

ENSURING RESEARCH VISIBILITY RESEARCH PAPER RESEARCH TIPS TUTORIALS

Skin Detection: A Step-by-Step Example using Python and OpenCV

Machine Learning and Computer Vision

Medical Computer Vision

Books & Courses PyImageSearch

PyImageSearch University Affiliates (https://pyimagesearch.com/affiliates/)

Deep Learning for Computer Vision with Python FAQ (https://pyimagesearch.com/faqs/)

© 2023 PyImageSearch (https://pyimagesearch.com). All Rights Reserved.

You might also like