Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Lect 5- Non Linear Activation Functions

The document discusses the importance of activation functions in neural networks, which introduce non-linearity and enable the model to learn complex patterns. It covers various activation functions such as Sigmoid, Hyperbolic Tangent, ReLU, and their variants, along with their characteristics and use cases in different layers of a neural network. The summary emphasizes that choosing the right activation function is crucial for effective learning and performance in deep learning models.

Uploaded by

cs22b2021
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lect 5- Non Linear Activation Functions

The document discusses the importance of activation functions in neural networks, which introduce non-linearity and enable the model to learn complex patterns. It covers various activation functions such as Sigmoid, Hyperbolic Tangent, ReLU, and their variants, along with their characteristics and use cases in different layers of a neural network. The summary emphasizes that choosing the right activation function is crucial for effective learning and performance in deep learning models.

Uploaded by

cs22b2021
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

NONLINEAR ACTIVATION

FUNCTIONS

Dr. Umarani Jayaraman


Non Linear Functions
Activation Functions
What is activation function?
 Every neuron has two operations
 Summation: linear combination of input X
with W
 Non linear Activation function f: The
purpose of the activation function is
to introduce non-linearity into the output
of a neuron.
What is activation function?
 An activation function is a
mathematical function used in neural
networks to introduce nonlinearity to the
model.
 It enabling to learn and model complex
patterns in data.
Why activation functions?
 Without an activation function, a neural
network would behave like a linear
regression model, regardless of its depth
 The activation function does the non-
linear transformation to the input making
it capable to learn and perform more
complex tasks.
 Activation functions help the network
learn complex decision boundaries for
classification and regression tasks.
Characteristics of activation functions

 An ideal activation function is both


nonlinear and differentiable.
 The nonlinear behavior of an activation
function allows our neural network to learn
nonlinear relationships in the data.
 It should be continuous, differentiable, non-
decreasing, and easy to compute.
 Differentiability is important because it
allows us to back propagate the model's
error when training to optimize the weights.
Step/Threshold Function

 While this is the original activation first


developed when neural networks were
invented, it is no longer used in neural
network architectures because it's
incompatible with back-propagation.
 Back-propagation allows us to find the
optimal weights for our model using a
version of gradient descent;
 Unfortunately, the derivative of a step
activation function cannot be used to update
the weights (since it is 0).
Step/Threshold Function

 Problems: not
compatible
gradient descent
via back-
propagation since
its derivative is
zero
Sigmoid (logistic)

 The sigmoid function is commonly used


non-linear function
 However, it has fallen out of practice to
use this activation function in real-world
neural networks due to a problem known
as the vanishing gradient.
Sigmoid (logistic)
 Problems:
vanishing
gradient at edges
 Output is not zero
centered.
Sigmoid (logistic)
 The sigmoid function has values between 0 to 1
 The output is not Zero-Centered
 Sigmoid saturate and kill gradients.
 We could see at top and bottom level of sigmoid
functions the curve changes slowly, if we
calculate slope (gradients) it is zero
 Due to this when the x value is small or big the
slope is zero
→ then there is no learning
Sigmoid (logistic)
 When should we use the Sigmoid
activation function?
 If we want output value between 0 to 1 use
sigmoid at output layer neuron only
 For binary classification problem sigmoid is
used
 Otherwise sigmoid is not preferred
Hyperbolic Tangent

 Problems:
vanishing
gradient at
edges.
Hyperbolic Tangent
 Now it’s output is zero centered because
its range in between -1 to 1 (i.e) -1 <
output < 1 .
 Hence optimization is easier in this
method hence in practice it is always
preferred over Sigmoid function .
 But still it suffers from Vanishing
gradient problem.
Hyperbolic Tangent
What is zero centered
distribution
Why zero centered activation
function is important?

 Better weight distribution: Zero-centered


functions help to ensure that the weights
are updated in all directions, leading to a
more effective exploration of the loss
landscape.
 This can help avoid certain neurons
becoming “stuck” and not contributing
to learning
Hyperbolic Tangent
 When should we use
 Usually used in hidden layers of a neural
network
 As it’s values lies between -1 to 1 hence
the mean for the hidden layer comes out
be 0 or very close to it
 Hence, it helps in centering the data by
bringing mean close to 0. This makes
learning for the next layer much easier.
Activation functions- sigmoid, tanh and
linear and its derivatives
Inverse Tangent

 Output is zero
centered.
ReLU (Rectified Linear Unit)

 This is one of the


most popularly used
activation functions
since 2017.
 It avoids and
rectifies vanishing
gradient problem .
Almost all deep
learning Models
use ReLu now-a-days.
 ReLu could result in
Dead Neurons
ReLU Variants
 Due to its popularity, a number of variants
have been proposed that provide an
incremental benefit over standard ReLUs
 Leaky ReLU, Parametric ReLU
 Maxout, ELU
ReLU Variants- Leaky ReLU

 Leaky ReLu fix the


problem of dead
neurons that occurred
in ReLU
 It introduces a small
slope to keep the
updates alive
 But its limitation is
that it should only be
used within Hidden
layers of a Neural
Network Model.
ReLU variants- Parametric
ReLU
ReLU Variants- Maxout
Function
 We then have another variant made
form both ReLu and Leaky ReLu
called Maxout function .
ELU (Exponential Linear Units)

 No dead neurons
 Output is zero
centered.
ReLU Variants- ELU (Exponential Linear Units)
Identity function – for output layer

 The following activation functions should


only be used on the output layer
 Use: Regression
Softmax- for output layer
 The softmax function is commonly used
as the output activation function for
multi-class
 It scales the preceding inputs from a
range between 0 and 1 and normalizes
the output layer so that the sum of all
output neurons is equal to one.
 As a result, we can consider the softmax
function as a categorical probability
distribution.
 This allows you to communicate a
Softmax- for output layer
 Note: we use the
exponential
function to ensure
all values in the
summation are
positive.
 Use:
classification.
Softmax/ Normalized Exponential Function
Summary
 Activation Functions
Summary
Summary
 Step Function

 Softmax
Summary
 Choosing an Activation Function
 Hidden Layers: Typically use ReLU or its
variants (Leaky ReLU, Swish).
 Output Layer:
 Regression: Linear activation.
 Binary Classification: Sigmoid.
 Multiclass Classification: Softmax.
 Activation functions are essential for deep
learning as they enable networks to capture
complex patterns, relationships, and features in
data.
Summary
Sources:
 https://www.jeremyjordan.me/neural-net
works-activation-functions/
 https://missinglink.ai/guides/neural-netw
ork-concepts/7-types-neural-network-act
ivation-functions-right/
 https://ml-cheatsheet.readthedocs.io/en/
latest/activation_functions.html
 https://github.com/MlvPrasadOfficial/ineu
ron.ai/blob/master/IPYNB%20FILES%20D
L/Activation%20Functions.ipynb
Sorces-Sigmoid function
 https://deepai.org/machine-learning-glos
sary-and-terms/sigmoidal-nonlinearity
 https://www.analyticsvidhya.com/blog/20
20/12/beginners-take-how-logistic-regres
sion-is-related-to-linear-regression/
 https://towardsdatascience.com/activatio
n-functions-neural-networks-1cbd9f8d91
d6
Thank you
Linear function definition
Non-linear function
definition

You might also like