Lect 5- Non Linear Activation Functions

The document discusses the importance of activation functions in neural networks, which introduce non-linearity and enable the model to learn complex patterns. It covers various activation functions such as Sigmoid, Hyperbolic Tangent, ReLU, and their variants, along with their characteristics and use cases in different layers of a neural network. The summary emphasizes that choosing the right activation function is crucial for effective learning and performance in deep learning models.

Uploaded by

cs22b2021

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Lect 5- Non Linear Activation Functions

Uploaded by

cs22b2021

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

NONLINEAR ACTIVATION

FUNCTIONS

Dr. Umarani Jayaraman

Non Linear Functions
Activation Functions
What is activation function?
 Every neuron has two operations
 Summation: linear combination of input X
with W
 Non linear Activation function f: The
purpose of the activation function is
to introduce non-linearity into the output
of a neuron.
What is activation function?
 An activation function is a
mathematical function used in neural
networks to introduce nonlinearity to the
model.
 It enabling to learn and model complex
patterns in data.
Why activation functions?
 Without an activation function, a neural
network would behave like a linear
regression model, regardless of its depth
 The activation function does the non-
linear transformation to the input making
it capable to learn and perform more
complex tasks.
 Activation functions help the network
learn complex decision boundaries for
classification and regression tasks.
Characteristics of activation functions

 An ideal activation function is both

nonlinear and differentiable.
 The nonlinear behavior of an activation
function allows our neural network to learn
nonlinear relationships in the data.
 It should be continuous, differentiable, non-
decreasing, and easy to compute.
 Differentiability is important because it
allows us to back propagate the model's
error when training to optimize the weights.
Step/Threshold Function

 While this is the original activation first

developed when neural networks were
invented, it is no longer used in neural
network architectures because it's
incompatible with back-propagation.
 Back-propagation allows us to find the
optimal weights for our model using a
version of gradient descent;
 Unfortunately, the derivative of a step
activation function cannot be used to update
the weights (since it is 0).
Step/Threshold Function

 Problems: not
compatible
gradient descent
via back-
propagation since
its derivative is
zero
Sigmoid (logistic)

 The sigmoid function is commonly used

non-linear function
 However, it has fallen out of practice to
use this activation function in real-world
neural networks due to a problem known
as the vanishing gradient.
Sigmoid (logistic)
 Problems:
vanishing
gradient at edges
 Output is not zero
centered.
Sigmoid (logistic)
 The sigmoid function has values between 0 to 1
 The output is not Zero-Centered
 Sigmoid saturate and kill gradients.
 We could see at top and bottom level of sigmoid
functions the curve changes slowly, if we
calculate slope (gradients) it is zero
 Due to this when the x value is small or big the
slope is zero
→ then there is no learning
Sigmoid (logistic)
 When should we use the Sigmoid
activation function?
 If we want output value between 0 to 1 use
sigmoid at output layer neuron only
 For binary classification problem sigmoid is
used
 Otherwise sigmoid is not preferred
Hyperbolic Tangent

 Problems:
vanishing
gradient at
edges.
Hyperbolic Tangent
 Now it’s output is zero centered because
its range in between -1 to 1 (i.e) -1 <
output < 1 .
 Hence optimization is easier in this
method hence in practice it is always
preferred over Sigmoid function .
 But still it suffers from Vanishing
gradient problem.
Hyperbolic Tangent
What is zero centered
distribution
Why zero centered activation
function is important?

 Better weight distribution: Zero-centered

functions help to ensure that the weights
are updated in all directions, leading to a
more effective exploration of the loss
landscape.
 This can help avoid certain neurons
becoming “stuck” and not contributing
to learning
Hyperbolic Tangent
 When should we use
 Usually used in hidden layers of a neural
network
 As it’s values lies between -1 to 1 hence
the mean for the hidden layer comes out
be 0 or very close to it
 Hence, it helps in centering the data by
bringing mean close to 0. This makes
learning for the next layer much easier.
Activation functions- sigmoid, tanh and
linear and its derivatives
Inverse Tangent

 Output is zero
centered.
ReLU (Rectified Linear Unit)

 This is one of the

most popularly used
activation functions
since 2017.
 It avoids and
rectifies vanishing
gradient problem .
Almost all deep
learning Models
use ReLu now-a-days.
 ReLu could result in
Dead Neurons
ReLU Variants
 Due to its popularity, a number of variants
have been proposed that provide an
incremental benefit over standard ReLUs
 Leaky ReLU, Parametric ReLU
 Maxout, ELU
ReLU Variants- Leaky ReLU

 Leaky ReLu fix the

problem of dead
neurons that occurred
in ReLU
 It introduces a small
slope to keep the
updates alive
 But its limitation is
that it should only be
used within Hidden
layers of a Neural
Network Model.
ReLU variants- Parametric
ReLU
ReLU Variants- Maxout
Function
 We then have another variant made
form both ReLu and Leaky ReLu
called Maxout function .
ELU (Exponential Linear Units)

 No dead neurons
 Output is zero
centered.
ReLU Variants- ELU (Exponential Linear Units)
Identity function – for output layer

 The following activation functions should

only be used on the output layer
 Use: Regression
Softmax- for output layer
 The softmax function is commonly used
as the output activation function for
multi-class
 It scales the preceding inputs from a
range between 0 and 1 and normalizes
the output layer so that the sum of all
output neurons is equal to one.
 As a result, we can consider the softmax
function as a categorical probability
distribution.
 This allows you to communicate a
Softmax- for output layer
 Note: we use the
exponential
function to ensure
all values in the
summation are
positive.
 Use:
classification.
Softmax/ Normalized Exponential Function
Summary
 Activation Functions
Summary
Summary
 Step Function

 Softmax
Summary
 Choosing an Activation Function
 Hidden Layers: Typically use ReLU or its
variants (Leaky ReLU, Swish).
 Output Layer:
 Regression: Linear activation.
 Binary Classification: Sigmoid.
 Multiclass Classification: Softmax.
 Activation functions are essential for deep
learning as they enable networks to capture
complex patterns, relationships, and features in
data.
Summary
Sources:
 https://www.jeremyjordan.me/neural-net
works-activation-functions/
 https://missinglink.ai/guides/neural-netw
ork-concepts/7-types-neural-network-act
ivation-functions-right/
 https://ml-cheatsheet.readthedocs.io/en/
latest/activation_functions.html
 https://github.com/MlvPrasadOfficial/ineu
ron.ai/blob/master/IPYNB%20FILES%20D
L/Activation%20Functions.ipynb
Sorces-Sigmoid function
 https://deepai.org/machine-learning-glos
sary-and-terms/sigmoidal-nonlinearity
 https://www.analyticsvidhya.com/blog/20
20/12/beginners-take-how-logistic-regres
sion-is-related-to-linear-regression/
 https://towardsdatascience.com/activatio
n-functions-neural-networks-1cbd9f8d91
d6
Thank you
Linear function definition
Non-linear function
definition

Applied ML notes
No ratings yet
Applied ML notes
123 pages
MT6789 Android Correct Scatter
No ratings yet
MT6789 Android Correct Scatter
24 pages
Switch Mode Power Conversion Fundamentals
No ratings yet
Switch Mode Power Conversion Fundamentals
45 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Descrete Electronic Components and Trouble Shooting
No ratings yet
Descrete Electronic Components and Trouble Shooting
24 pages
Lec - Basics of Electronics
No ratings yet
Lec - Basics of Electronics
40 pages
Complex Circuits Time-Dependent Currents: December 1, 2018 University Physics, Chapter 26 1
No ratings yet
Complex Circuits Time-Dependent Currents: December 1, 2018 University Physics, Chapter 26 1
38 pages
Make Sure Your Optocoupler Is Properly Biased
No ratings yet
Make Sure Your Optocoupler Is Properly Biased
5 pages
Power Supply
No ratings yet
Power Supply
21 pages
Complete Response of RL Series Circuit To A Forcing Function
100% (1)
Complete Response of RL Series Circuit To A Forcing Function
5 pages
Type of Power Supply
No ratings yet
Type of Power Supply
7 pages
Basic Electricity
No ratings yet
Basic Electricity
48 pages
Chapter - I 1.1 General
No ratings yet
Chapter - I 1.1 General
67 pages
AVR Microcontroller: Prepared By: Eng. Ashraf Darwish
100% (2)
AVR Microcontroller: Prepared By: Eng. Ashraf Darwish
27 pages
MITS - Analog Circuits - Familiarisation
No ratings yet
MITS - Analog Circuits - Familiarisation
22 pages
Digital Power-Conversion For The Analog Engineer
No ratings yet
Digital Power-Conversion For The Analog Engineer
12 pages
Switching Electronics - Betz
100% (7)
Switching Electronics - Betz
474 pages
Backpropagation - Theory, Architectures, and Applications - Yves Chauvin (Ed.), David E. Rumelhart (Ed.) - Lawrence Erlbaum Associates (1995)
No ratings yet
Backpropagation - Theory, Architectures, and Applications - Yves Chauvin (Ed.), David E. Rumelhart (Ed.) - Lawrence Erlbaum Associates (1995)
575 pages
6 Super Simple Steps To Troubleshoot Electronics Down To The Component Level Without Schematic
No ratings yet
6 Super Simple Steps To Troubleshoot Electronics Down To The Component Level Without Schematic
11 pages
AVR Microcontroller: Prepared By: Eng. Ashraf Darwish
100% (2)
AVR Microcontroller: Prepared By: Eng. Ashraf Darwish
28 pages
chapter 4 Neural Network
No ratings yet
chapter 4 Neural Network
46 pages
Electronics Symbols
No ratings yet
Electronics Symbols
19 pages
Basic Power Supply
100% (1)
Basic Power Supply
147 pages
Motherboard
No ratings yet
Motherboard
63 pages
Switched Mode Power Supply
No ratings yet
Switched Mode Power Supply
10 pages
Linear DC Power Supply
No ratings yet
Linear DC Power Supply
56 pages
Lecture 1 introduction of deep learning - Copy
No ratings yet
Lecture 1 introduction of deep learning - Copy
31 pages
12V Relay Switch: Pin Configuration
No ratings yet
12V Relay Switch: Pin Configuration
2 pages
Sem Report
No ratings yet
Sem Report
38 pages
SMPS Tutorial
100% (1)
SMPS Tutorial
18 pages
Chapter2-Applications and Future Trends of Artificial Intelligence-FuzzyLogic
No ratings yet
Chapter2-Applications and Future Trends of Artificial Intelligence-FuzzyLogic
17 pages
1 - Basic Electronics Component
No ratings yet
1 - Basic Electronics Component
37 pages
Micro Controller Single Phase Rectifire
No ratings yet
Micro Controller Single Phase Rectifire
23 pages
Chapter 6-Voltage Regulator
No ratings yet
Chapter 6-Voltage Regulator
36 pages
What Is Through Hole PCB Assembly
No ratings yet
What Is Through Hole PCB Assembly
12 pages
Introduction To DC and AC Circuits
No ratings yet
Introduction To DC and AC Circuits
15 pages
MICROCONTROLLERS AND MICROPROCESSORS SYSTEMS DESIGN - Chapter
No ratings yet
MICROCONTROLLERS AND MICROPROCESSORS SYSTEMS DESIGN - Chapter
12 pages
EE301: Electronic Circuits: Unit 1: Linear DC Power Supply
No ratings yet
EE301: Electronic Circuits: Unit 1: Linear DC Power Supply
53 pages
Fundamentals of Electronics
No ratings yet
Fundamentals of Electronics
17 pages
What Is The Difference Between PCB and PCBA
No ratings yet
What Is The Difference Between PCB and PCBA
18 pages
UNIT 1-DC Power Supply
No ratings yet
UNIT 1-DC Power Supply
30 pages
Switching Power Supply Circuit Diagram With Explanation: Catalog
100% (1)
Switching Power Supply Circuit Diagram With Explanation: Catalog
22 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
Electrical and Electronics Symbols
No ratings yet
Electrical and Electronics Symbols
52 pages
Introduction To Capacitors Inductors and
100% (1)
Introduction To Capacitors Inductors and
68 pages
Unit-1 - DC AC Ciruit Analysis
100% (1)
Unit-1 - DC AC Ciruit Analysis
104 pages
How to Identify SMD Components from Appearances
No ratings yet
How to Identify SMD Components from Appearances
12 pages
16 Easy Steps To Start PCB Circuit Design
No ratings yet
16 Easy Steps To Start PCB Circuit Design
10 pages
Ethernet Protocol
No ratings yet
Ethernet Protocol
115 pages
Switch Mode Power Supply Presentation
100% (1)
Switch Mode Power Supply Presentation
22 pages
ELECTRONIC COMPONENTS Handout
No ratings yet
ELECTRONIC COMPONENTS Handout
22 pages
Test Methods and Practices
No ratings yet
Test Methods and Practices
30 pages
Convert ATX PSU To Bench Supply To Power Circuits PDF
No ratings yet
Convert ATX PSU To Bench Supply To Power Circuits PDF
13 pages
Machine Learning C
No ratings yet
Machine Learning C
24 pages
Familiarization With The Various Computer Systems' Components and Peripherals
No ratings yet
Familiarization With The Various Computer Systems' Components and Peripherals
78 pages
Microcontrollers and Embedded Systems
No ratings yet
Microcontrollers and Embedded Systems
49 pages
V. T. Frolkin, L. N. Popov - Pulse Circuits - Mir - 1982
No ratings yet
V. T. Frolkin, L. N. Popov - Pulse Circuits - Mir - 1982
400 pages
02 AVR-Micro Controller
No ratings yet
02 AVR-Micro Controller
59 pages
Recycling Old Bad Power Supplies
No ratings yet
Recycling Old Bad Power Supplies
10 pages
Activation Function
No ratings yet
Activation Function
43 pages
Pipelining in Pentium 2
No ratings yet
Pipelining in Pentium 2
9 pages
PNL38 WGS On MiSeq 508
No ratings yet
PNL38 WGS On MiSeq 508
20 pages
Business English Course
No ratings yet
Business English Course
14 pages
Newton Cotes Integration Method: Prepared By:-Pabari Shashikant
No ratings yet
Newton Cotes Integration Method: Prepared By:-Pabari Shashikant
23 pages
Thuraya XT-Hotspot
No ratings yet
Thuraya XT-Hotspot
2 pages
h07rn F Bs en 50525 2 21 Flexible Rubber Cable
No ratings yet
h07rn F Bs en 50525 2 21 Flexible Rubber Cable
6 pages
3-Actual Exams Last Edition (ASME IX)
No ratings yet
3-Actual Exams Last Edition (ASME IX)
6 pages
Username and Passwords
100% (1)
Username and Passwords
5 pages
DTI Directory of Key Officials As of 24 June 2024
No ratings yet
DTI Directory of Key Officials As of 24 June 2024
33 pages
Field-Installation-Guide-Cisco-HCI-UCM
No ratings yet
Field-Installation-Guide-Cisco-HCI-UCM
31 pages
Subnet Mask Cheat Sheet - NetworkCalc
No ratings yet
Subnet Mask Cheat Sheet - NetworkCalc
1 page
Techspecs Atw Monoblock en
No ratings yet
Techspecs Atw Monoblock en
1 page
التحكم في سرعة محركات التيار المستمر
No ratings yet
التحكم في سرعة محركات التيار المستمر
21 pages
2022 GKS-G Application Guidelines (English)
No ratings yet
2022 GKS-G Application Guidelines (English)
38 pages
Sandwich PDF
No ratings yet
Sandwich PDF
26 pages
Lab 01
No ratings yet
Lab 01
14 pages
Pelatihan Pembuatan Robot Line Follower Untuk Meningkatkan Pengetahuan Robotika Pada Siswa SMK Negeri I Kramatwatu
No ratings yet
Pelatihan Pembuatan Robot Line Follower Untuk Meningkatkan Pengetahuan Robotika Pada Siswa SMK Negeri I Kramatwatu
11 pages
An Introduction To The AWS Command Line Tool - Linux
No ratings yet
An Introduction To The AWS Command Line Tool - Linux
5 pages
1 Introduction To Biostatistics Last
No ratings yet
1 Introduction To Biostatistics Last
19 pages
ICOM IC-02AT Service Manual
75% (4)
ICOM IC-02AT Service Manual
82 pages
Elementary Matrices and RREF: N N I Ij Ij
No ratings yet
Elementary Matrices and RREF: N N I Ij Ij
3 pages
Kawayan Charcoal Specification: Chichacorn Effluent
No ratings yet
Kawayan Charcoal Specification: Chichacorn Effluent
2 pages
Fundamental of Programming Chapter 1
100% (2)
Fundamental of Programming Chapter 1
46 pages
Tinder Codigo Oficial
No ratings yet
Tinder Codigo Oficial
3 pages
Resume of Ragib Shahriar
No ratings yet
Resume of Ragib Shahriar
4 pages
744-Article Text-1416-1-10-20201129
No ratings yet
744-Article Text-1416-1-10-20201129
14 pages
Unit 1
No ratings yet
Unit 1
23 pages
And Conversation, Shows That Many of The Incivilities That Plague Our Offices Today Are Virtually
No ratings yet
And Conversation, Shows That Many of The Incivilities That Plague Our Offices Today Are Virtually
6 pages
Unit 7 Lecture Note
No ratings yet
Unit 7 Lecture Note
25 pages