Activation Functions
Activation Functions
Activation Functions
• Activation functions are a crucial component of
artificial neural networks, used to introduce non-
linearity into the model.
• They determine whether a neuron should be activated
(fire) or not, based on the weighted sum of inputs.
• There are several types of activation functions, each
with its own characteristics and use cases.
• To put in simple terms, an artificial neuron calculates
the ‘weighted sum’ of its inputs and adds a bias, as
shown in the following figure the net input.
Activation Functions
Activation Functions
• Now the value of net input can be any anything from -inf to
+inf.
• The neuron doesn’t really know how to bound to value and
thus is not able to decide the firing pattern.
• Thus the activation function is an important part of an artificial
neural network.
• They basically decide whether a neuron should be activated or
not.
• Thus it bounds the value of the net input.
The activation function is a non-linear transformation that we
do over the input before sending it to the next layer of neurons
or finalizing it as output.
Types of Activation Functions
• Types of Activation Functions –
– Several different types of activation functions are
used in Machine Learning.
– Some of them are explained next.
Step Function
• Step Function is one of the simplest kind of
activation functions.
• In this, we consider a threshold value and if
the value of net input say y is greater than the
threshold then the neuron is activated.
Mathematically,
Given below is the graphical representation of
step function.
Sigmoid Function
• Sigmoid function is a widely used activation
function. It is defined as:
Sigmoid Function
• This is a smooth function and is continuously
differentiable.
• The biggest advantage that it has over step and linear
function is that it is non-linear.
• This is an incredibly cool feature of the sigmoid
function.
• This essentially means that when I have multiple
neurons having sigmoid function as their activation
function – the output is non linear as well.
• The function ranges from 0-1 having an S shape.
ReLU
• The ReLU function is the Rectified linear unit.
• It is the most widely used activation function.
It is defined as:
ReLU
• Graphically,
ReLU
• The main advantage of using the ReLU
function over other activation functions is that
it does not activate all the neurons at the
same time.
• It means if the input is negative it will convert
it to zero and the neuron does not get
activated.
Leaky ReLU
• Leaky ReLU function is nothing but an
improved version of the ReLU function.
• Instead of defining the Relu function as 0 for x
less than 0, we define it as a small linear
component of x. It can be defined as:
Leaky ReLU
• Graphically,
Parametric Rectified Linear Unit (PReLU)
import numpy as np
import matplotlib.pyplot as plt
def leaky_relu(x, alpha=0.01):
return np.where(x >= 0, x, alpha * x)
x = np.linspace(-5, 5, 100)
y = leaky_relu(x)
plt.plot(x, y)
plt.title("Leaky Rectified Linear Unit (Leaky ReLU)")
plt.grid()
plt.show()
Parametric Rectified Linear Unit (PReLU)
import numpy as np
import matplotlib.pyplot as plt
def prelu(x, a=0.01):
return np.where(x >= 0, x, a * x)
x = np.linspace(-5, 5, 100)
y = prelu(x)
plt.plot(x, y)
plt.title("Parametric Rectified Linear Unit (PReLU)")
plt.grid()
plt.show()
Exponential Linear Unit (ELU)
import numpy as np
import matplotlib.pyplot as plt
def elu(x, alpha=1.0):
return np.where(x >= 0, x, alpha * (np.exp(x) - 1))
x = np.linspace(-5, 5, 100)
y = elu(x)
plt.plot(x, y)
plt.title("Exponential Linear Unit (ELU)")
plt.grid()
plt.show()
Exponential Linear Unit (ELU)
• In this program, we define the ELU activation
function using the formula
elu(x, alpha) = x for x >= 0,
and elu(x, alpha) = alpha * (exp(x) - 1) for x < 0.
• You can adjust the alpha parameter to control
the slope of the negative part of the curve.
• The code then creates a range of x values,
computes the corresponding y values using
the ELU function, and plots the result.
Scaled Exponential Linear Unit (SELU)
• The Scaled Exponential Linear Unit (SELU) is a
self-normalizing activation function that can
maintain mean activations close to 0 and
standard deviations close to 1 during training.
• Here's a Python implementation of the SELU
activation function:
Scaled Exponential Linear Unit (SELU)
import numpy as np
import matplotlib.pyplot as plt
def selu(x, alpha=1.67326, scale=1.0507):
return scale * np.where(x > 0, x, alpha * (np.exp(x) - 1))
x = np.linspace(-5, 5, 100)
y = selu(x)
plt.plot(x, y)
plt.title("Scaled Exponential Linear Unit (SELU)")
plt.grid()
plt.show()
Scaled Exponential Linear Unit (SELU)
Scaled Exponential Linear Unit (SELU)
• In this implementation, we define the SELU
activation function using the formula
selu(x, alpha, scale) = scale * x for x >= 0,
and selu(x, alpha, scale) = scale * (alpha * (exp(x) - 1)) for
x < 0.
• The alpha and scale parameters are specific values
that are part of the SELU definition.
• The code then creates a range of x values,
computes the corresponding y values using the
SELU function, and plots the result.
Swish Function
• The Swish activation function is defined as:
swish(x) = x / (1 + exp(-x))
Swish Function
import numpy as np
import matplotlib.pyplot as plt
def swish(x):
return x / (1 + np.exp(-x))
x = np.linspace(-5, 5, 100)
y = swish(x)
plt.plot(x, y)
plt.title("Swish Activation Function")
plt.grid()
plt.show()
Swish Function
• In this code, we define the Swish function
using the formula provided.
• We create a range of x values, calculate the
corresponding y values using the Swish
function, and plot the curve.
• The Swish function is known for being smooth
and continuous, allowing it to be a viable
choice as an activation function in neural
networks.