Deep Learning Unit 1
Deep Learning Unit 1
Unit -1
DL
ML vs DL
So, What do you
think, what is Deep
Learning
Deep learning is a machine
learning technique that learns
features and task directly from
the data, where data may by
images, text or sound!
∑ 𝒘𝒊 𝒙𝒊
𝒊= 𝒙
Applying Activation
Function
What is an Activation Function?
Activation functions are an extremely important feature of the artificial neural networks.
They basically decide whether a neuron should be activated or not. Whether the
information that the neuron is receiving is relevant for the given information or should it
be ignored.
The activation function is the non linear transformation that we do over the input
signal. This transformed output is then seen to the next layer of neurons as input.
What we have done here is that we have simply replaced the horizontal line with a non-zero, non-horizontal line. Here
a is a small value like 0.01 or so.
Sigmoid Function?
Pronounced “tanch,” tanh is a hyperbolic trigonometric function
The tangent represents a ratio between the opposite and adjacent sides of a right triangle,
tanh represents the ratio of the hyperbolic sine to the hyperbolic cosine: tanh(x) = sinh(x) /
cosh(x)
Unlike the Sigmoid function, the normalized range of tanh is –1 to 1 The advantage of tanh is
that it can deal more easily with negative numbers
Softmax Function (for Multiple
Classification)?
Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of
saying, this function will calculate the probabilities of each target class over all possible target classes. Later the
calculated probabilities will be helpful for determining the target class for the given inputs.
The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all
the probabilities will be equal to one. If the softmax function used for multi-classification model it returns the
probabilities of each class and the target class will have the high probability.
The formula computes the exponential (e-power) of the given input value and the sum of exponential values of
all the values in the inputs. Then the ratio of the exponential of the input value and the sum of exponential
values is the output of the softmax function.
Activation Function Example
How Neural Network Work
and
Back Propagation in deep learning
How Neural Network Work with many neurons
What is Gradient Descent (BGD)
Gradient Descent is an optimization technique that is used to improve deep learning
and neural network-based models by minimizing the cost function.
Gradient Descent is a process that occurs in the backpropagation phase where the
goal is to continuously resample the gradient of the model’s parameter in the
opposite direction based on the weight w, updating consistently until we reach
the global minimum of function J(w).
More Precisely,
Gradient descent is an algorithm, which is used to iterate through different
combinations of weights in an optimal way.....to find the best combination of
weights which has a minimum error.
Brute force algorithm
Curse of dimensionality
Brute Force Algorithms refers to a programming style that does not include any shortcuts to
improve performance, but instead relies on sheer computing power to try all possibilities until the
solution to a problem is found. A classic example is the traveling salesman problem (TSP).
What is Gradient Descent
What is Gradient Descent
Useful link
https://towardsdatascience.com/understanding-the-mathematics-b
ehind-gradient-descent-dde5dc9be06e
Stochastic gradient descent
The word ‘stochastic‘ means a system or a process that is linked with a random
probability. Hence, in Stochastic Gradient Descent, a few samples are selected
randomly instead of the whole data set for each iteration. In Gradient Descent,
there is a term called “batch” which denotes the total number of samples from a
dataset that is used for calculating the gradient for each iteration. In typical
Gradient Descent optimization, like Batch Gradient Descent, the batch is taken to
be the whole dataset. Although, using the whole dataset is really useful for
getting to the minima in a less noisy or less random manner, but the problem
arises when our datasets get really huge.
Stochastic gradient descent
Stochastic gradient descent (often abbreviated SGD) is an iterative method for
optimizing an objective function with suitable smoothness properties (e.g. differentiable
or subdifferentiable). ~Convex Loss function~
Stochastic gradient descent
Stochastic gradient descent
Mini Batch gradient descent
Mini-batch gradient descent is a variation of the gradient descent algorithm
that splits the training dataset into small batches that are used to calculate
model error and update model coefficients.
Implementations may choose to sum the gradient over the mini-batch which
further reduces the variance of the gradient.