Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Unit- II Deep Learning (UDS21401J)

BACKPROPAGATION

Backpropagation is an algorithm that backpropagates the errors from the output


nodes to the input nodes. Therefore, it is simply referred to as the backward
propagation of errors.

It computes the gradient of the loss function with respect to the network weights.
It is very efficient, rather than naively directly computing the gradient concerning
each weight. This efficiency makes it possible to use gradient methods to train
multi-layer networks and update weights to minimize loss; variants such as
gradient descent or stochastic gradient descent are often used.

It uses in the vast applications of neural networks in data mining like Character
recognition, Signature verification, etc.

Features of Backpropagation:

1. it is the gradient descent method as used in the case of simple perceptron


network with the differentiable unit.
2. it is different from other networks in respect to the process by which the
weights are calculated during the learning period of the network.
3. training is done in the three stages :
 the feed-forward of input training pattern
 the calculation and backpropagation of the error
 updation of the weight

Working of Backpropagation:

Neural networks use supervised learning to generate output vectors from input
vectors that the network operates on. It Compares generated output to the desired
output and generates an error report if the result does not match the generated
output vector. Then it adjusts the weights according to the bug report to get your
desired output.

Backpropagation Algorithm:

Step 1: Inputs X, arrive through the preconnected path.


Step 2: The input is modeled using true weights W. Weights are usually chosen
randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden
layer to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to
reduce the error.
Step 6: Repeat the process until the desired output is achieved.

Types of Backpropagation

There are two types of backpropagation networks.


 Static backpropagation: Static backpropagation is a network designed to
map static inputs for static outputs. These types of networks are capable of
solving static classification problems such as OCR (Optical Character
Recognition).
 Recurrent backpropagation: Recursive backpropagation is another network
used for fixed-point learning. Activation in recurrent backpropagation is feed-
forward until a fixed value is reached. Static backpropagation provides an
instant mapping, while recurrent backpropagation does not provide an instant
mapping.

Advantages:

 It is simple, fast, and easy to program.


 Only numbers of the input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.
Disadvantages:

 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
 Performance is highly dependent on input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

GRADIENT DESCENT
Gradient descent is an optimization algorithm commonly used in machine
learning to minimize the cost or loss function during the training of a model
aiming to find the optimal set of parameters. It is a numerical optimization
algorithm that aims to find the optimal parameters—weights and biases—of a
neural network by minimizing a defined cost function.
The learning happens during the backpropagation while training the neural
network-based model. There is a term known as Gradient Descent, which is used
to optimize the weight and biases based on the cost function. cost function
evaluates the difference between the actual and predicted outputs.

Gradient descent is an optimization algorithm used in machine learning to


minimize the cost function by iteratively adjusting parameters in the direction of
the negative gradient, aiming to find the optimal set of parameters.

The cost function represents the discrepancy between the predicted output of the
model and the actual output. The goal of gradient descent is to find the set of
parameters that minimizes this discrepancy and improves the model’s
performance.

The two types of gradient descent are


1. batch gradient descent
2. stochastic gradient descent.

Batch gradient descent updates the model’s parameters using the entire training
set in each iteration, while stochastic gradient descent updates the parameters
using only one training sample at a time.

How Does Gradient Descent Work?


1. Gradient descent is an optimization algorithm used to minimize the cost
function of a model.

2. The cost function measures how well the model fits the training data and is
defined based on the difference between the predicted and actual values.
3. The gradient of the cost function is the derivative with respect to the
model’s parameters and points in the direction of the steepest ascent.

4. The algorithm starts with an initial set of parameters and updates them in
small steps to minimize the cost function.

5. In each iteration of the algorithm, the gradient of the cost function with
respect to each parameter is computed.
6. The gradient tells us the direction of the steepest ascent, and by moving in
the opposite direction, we can find the direction of the steepest descent.
7. The size of the step is controlled by the learning rate, which determines
how quickly the algorithm moves towards the minimum.

8. The process is repeated until the cost function converges to a minimum,


indicating that the model has reached the optimal set of parameters.

9. There are different variations of gradient descent, including batch gradient


descent, stochastic gradient descent, and mini-batch gradient descent, each
with its own advantages and limitations.

10. Efficient implementation of gradient descent is essential for achieving good


performance in machine learning tasks. The choice of the learning rate and
the number of iterations can significantly impact the performance of the
algorithm.
Challenges of Deep Learning
DATA ISSUE:-

Large volumes of data are necessary for deep learning. Additionally, the more
accurate and powerful models will need more parameters, which calls for more
data.
When we hear "Big Data," we might wonder how it differs from the more
common "data." The term "data" refers to any unprocessed character or symbol
that can be recorded on media or transmitted via electronic signals by a computer.
Raw data, however, is useless until it is processed somehow.
Before we jump into the challenges of Big Data, let’s start with the five ‘V’s of Big
Data.
The Five ‘V’s of Big Data

Big Data is simply a catchall term used to describe data too large and complex to
store in traditional databases. The “five ‘V’s” of Big Data are:

 Volume – The amount of data generated


 Velocity - The speed at which data is generated, collected and analyzed
 Variety - The different types of structured, semi-structured and unstructured data
 Value - The ability to turn data into useful insights
 Veracity - Trustworthiness in terms of quality and accuracy
Challenges of Big Data:-

A. Storage
With vast amounts of data generated daily, the greatest challenge is storage
(especially when the data is in different formats) within legacy systems.
Unstructured data cannot be stored in traditional databases.
B. Processing
Processing big data refers to the reading, transforming, extraction, and formatting
of useful information from raw information. The input and output of information in
unified formats continue to present difficulties.
C. Security
Security is a big concern for organizations. Non-encrypted information is at risk of
theft or damage by cyber-criminals. Therefore, data security professionals must
balance access to data against maintaining strict security protocols.
D. Finding and Fixing Data Quality Issues
Many of you are probably dealing with challenges related to poor data quality, but
solutions are available. The following are four approaches to fixing data problems:
A. Correct information in the original database.
B. Repairing the original data source is necessary to resolve any data
inaccuracies.
C. You must use highly accurate methods of determining who someone is.

E. Scaling Big Data Systems


Database sharding, memory caching, moving to the cloud and separating read-only
and write-active databases are all effective scaling methods. While each one of
those approaches is fantastic on its own, combining them will lead you to the next
level.
F. Evaluating and Selecting Big Data Technologies
Companies are spending millions on new big data technologies, and the market for
such tools is expanding rapidly. In recent years, however, the IT industry has
caught on to big data and analytics potential. The trending technologies include the
following:
A. Hadoop Ecosystem
B. Apache Spark
C. NoSQL Databases
D. R Software
E. Predictive Analytics
F. Prescriptive Analytics
G. Big Data Environments
In an extensive data set, data is constantly being ingested from various sources,
making it more dynamic than a data warehouse. The people in charge of the big
data environment will fast forget where and what each data collection came from.
H. Real-Time Insights
The term "real-time analytics" describes the practice of performing analyses on
data as a system is collecting it. Decisions may be made more efficiently and with
more accurate information thanks to real-time analytics tools, which use logic and
mathematics to deliver insights on this data quickly.
I. Data Validation
Before using data in a business process, its integrity, accuracy, and structure must
be validated. The output of a data validation procedure can be used for further
analysis, BI, or even to train a machine learning model.
J. Healthcare Challenges
Electronic health records (EHRs), genomic sequencing, medical research,
wearables, and medical imaging are just a few examples of the many sources of
health-related big data.

Overfitting in neural networks

What is Overfitting?

Overfitting is an undesirable machine learning behavior that occurs when the


machine learning model gives accurate predictions for training data but not for new
data.
When data scientists use machine learning models for making predictions, they
first train the model on a known data set. Then, based on this information, the
model tries to predict outcomes for new data sets.
An overfit model can give inaccurate predictions and cannot perform well for all
types of new data.
Why does overfitting occur?

You only get accurate predictions if the machine learning model generalizes to all
types of data within its domain. Overfitting occurs when the model cannot
generalize and fits too closely to the training dataset instead. Overfitting happens
due to several reasons, such as:
• The training data size is too small and does not contain enough data samples to
accurately represent all possible input data values.
• The training data contains large amounts of irrelevant information, called noisy
data.
• The model trains for too long on a single sample set of data.
• The model complexity is high, so it learns the noise within the training data.
Overfitting examples

Consider a use case where a machine learning model has to analyze photos and
identify the ones that contain dogs in them. If the machine learning model was
trained on a data set that contained majority photos showing dogs outside in parks ,
it may learn to use grass as a feature for classification, and may not recognize a
dog inside a room.

Another overfitting example is a machine learning algorithm that predicts a


university student's academic performance and graduation outcome by analyzing
several factors like family income, past academic performance, and academic
qualifications of parents. However, the test data only includes candidates from a
specific gender or ethnic group. In this case, overfitting causes the algorithm's
prediction accuracy to drop for candidates with gender or ethnicity outside of the
test dataset.
How can you detect overfitting?

The best method to detect overfit models is by testing the machine learning models
on more data with comprehensive representation of possible input data values and
types.
Typically, part of the training data is used as test data to check for overfitting. A
high error rate in the testing data indicates overfitting. One method of testing for
overfitting is given below.
K-fold cross-validation
Cross-validation is one of the testing methods. In this method, data scientists
divide the training set into K equally sized subsets or sample sets called folds. The
training process consists of a series of iterations. During each iteration, the steps
are:
1. Keep one subset as the validation data and train the machine learning model
on the remaining K-1 subsets.
2. Observe how the model performs on the validation sample.
3. Score model performance based on output data quality.

Iterations repeat until you test the model on every sample set. You then average the
scores across all iterations to get the final assessment of the predictive model.
How can you prevent overfitting?
1. Early stopping
Early stopping pauses the training phase before the machine learning model learns
the noise in the data. However, getting the timing right is important; else the model
will still not give accurate results.
2. Pruning
You might identify several features or parameters that impact the final prediction
when you build a model. Feature selection—or pruning—identifies the most
important features within the training set and eliminates irrelevant ones.
For example, to predict if an image is an animal or human, you can look at various
input parameters like face shape, ear position, body structure, etc. You may
prioritize face shape and ignore the shape of the eyes.

3. Regularization
Regularization is a collection of training/optimization techniques that seek to
reduce overfitting. These methods try to eliminate those factors that do not impact
the prediction outcomes by grading features based on importance.
For example, mathematical calculations apply a penalty value to features with
minimal impact. Consider a statistical model attempting to predict the housing
prices of a city in 20 years. Regularization would give a lower penalty value to
features like population growth and average annual income but a higher penalty
value to the average annual temperature of the city.

4. Ensembling
Ensembling combines predictions from several separate machine learning
algorithms. Some models are called weak learners because their results are often
inaccurate. Ensemble methods combine all the weak learners to get more accurate
results. They use multiple models to analyze sample data and pick the most
accurate outcomes.
The two main ensemble methods are bagging and boosting. Boosting trains
different machine learning models one after another to get the final result, while
bagging trains them in parallel.

5. Data augmentation
Data augmentation is a machine learning technique that changes the sample data
slightly every time the model processes it. You can do this by changing the input
data in small ways.
When done in moderation, data augmentation makes the training sets appear
unique to the model and prevents the model from learning their characteristics.
For example, applying transformations such as translation, flipping, and rotation to
input images.

Hyperparameters of Neural Networks


A training process involves selecting the best/optimal hyperparameters that are
used by learning algorithms to provide the best result.
Hyperparameters are defined as the parameters that are explicitly defined by
the user to control the learning process.

Hyperparameters of a neural network are variables that determine the network’s


architecture and behavior during training. They include the number of layers, the
number of nodes in each layer, the activation functions, learning rate, batch size,
regularization parameters, dropout rate, optimizer choice, and weight initialization
methods.

Difference between Model Parameter and Model Hyperparameter:-

Model Parameters:

Model parameters are configuration variables that are internal to the model, and a
model learns them on its own. For example, W Weights or Coefficients of
independent variables in the Linear regression model. or Weights or Coefficients
of independent variables in SVM, weight, and biases of a neural network, cluster
centroid in clustering. Some key points for model parameters are as follows:
a) They are used by the model for making predictions.
b) They are learned by the model from the data itself
c) These are usually not set manually.
d) These are the part of the model and key to a machine learning Algorithm.

Model Hyperparameters:

Hyperparameters are those parameters that are explicitly defined by the user to
control the learning process. Some key points for model parameters are as follows:
These are usually defined manually by the machine learning engineer.
One cannot know the exact best value for hyperparameters for the given problem.
The best value can be determined either by the rule of thumb or by trial and error.
Some examples of Hyperparameters are the learning rate for training a neural
network.

Categories of Hyperparameters
Broadly hyperparameters can be divided into two categories, which are given
below:
A. Hyperparameter for Optimization
B. Hyperparameter for Specific Models

Hyperparameter for Optimization


The process of selecting the best hyperparameters to use is known as
hyperparameter tuning, and the tuning process is also known as hyperparameter
optimization. Optimization parameters are used for optimizing the model.

Some of the popular optimization parameters are given below:

Learning Rate: The learning rate is the hyperparameter in optimization algorithms


that controls how much the model needs to change in response to the estimated
error for each time when the model's weights are updated.
Selecting the optimized learning rate is a challenging task because if the learning
rate is very less, then it may slow down the training process.
If the learning rate is too large, then it may not optimize the model properly.
a) Batch Size: To enhance the speed of the learning process, the training set is
divided into different subsets, which are known as a batch.
b) Number of Epochs: An epoch can be defined as the complete cycle for training
the machine learning model. Epoch represents an iterative learning process.
The number of epochs varies from model to model, and various models are
created with more than one epoch. The number of epochs is increased until
there is a reduction in a validation error. If there is no improvement in
reduction error for the consecutive epochs, then it indicates to stop
increasing the number of epochs.

Why neural network called black box ?

Particularly deep neural networks, because their internal working can be complex
and difficult to interpret. Here are a few reasons why neural networks are referred
to as black boxes:
 Complexity of Internal Operations: Neural networks consist of many
layers and nodes, each with numerous parameters. The interactions and
transformations that occur within the network during the learning process
are Complex and not easily understandable. The Complete number of
parameters and the non-linear nature of the operations make it challenging to
automatically grasp how the network arrives at a specific output.
 Lack of Interpretability: Understanding why a neural network makes a
specific decision or prediction can be difficult. While it may be possible to
observe the input and output, determining the exact reasons for the network's
decision can be Difficult to catch or find. This lack of interpretability is
similar to treating the network as a black box where the internal processes
are not transparent.
 High Dimensionality: Neural networks often operate in high-dimensional
spaces, making it impractical for humans to visualize or comprehend the
relationships between inputs and outputs. As a result, the inner workings of
the network remain obscure.
 Non-linearity: Neural networks apply non-linear transformations to the
input data, and this non-linearity contributes to the complexity of their
behavior. Understanding how small changes in input affect the output is not
straightforward due to these non-linear transformations.
 Learning from Data: Neural networks learn from data, adjusting their
parameters based on patterns and relationships within the training set. While
this ability to learn complex patterns is a strength, it also means that the
learned representations may not be easily interpretable by humans.

Lack of Flexibility:-

Deep learning is generally known for its flexibility and ability to learn complex
representations from data. Deep neural networks, which form the foundation of
deep learning, are capable of automatically extracting hierarchical features from
raw input data, making them suitable for a wide range of tasks such as image
recognition, natural language processing, and reinforcement learning.

However, there are certain challenges or perceptions related to flexibility that


might be associated with deep learning:

1. Data Dependence: Deep learning models often require large amounts of labeled
data to perform well. This dependency on data can be seen as a limitation,
especially in scenarios where obtaining labeled data is expensive or time-
consuming.

2. Computational Requirements: Training deep neural networks, especially large


ones, can be computationally comprehensive. This may limit their applicability in
resource-constrained environments.

3. Interpretability: Deep learning models, particularly deep neural networks with


many layers, are sometimes criticized for being "black boxes" due to the
complexity of their internal representations. Understanding and interpreting the
decisions made by these models can be challenging.

4. Hyperparameter Tuning: Configuring deep neural networks involves choosing


various hyperparameters, such as the learning rate, number of layers, and layer
sizes. Tuning these hyperparameters can be time-consuming and requires expertise.

Multitasking:-

Deep learning models can be adapted for multitasking or handling multiple tasks
simultaneously. Multitasking in deep learning refers to training a model to perform
more than one distinct task using a shared set of parameters. There are a few ways
in which deep learning models can be designed to support multitasking:

1. Multi-Task Learning (MTL): In multi-task learning, a single neural network is


trained to perform multiple tasks at the same time. The network is designed with
shared layers that are responsible for capturing common features across tasks, and
task-specific layers that are responsible for handling the unique aspects of each
task. This shared representation allows the model to learn a more generalized and
robust set of features.

2. Joint Training: Instead of training separate models for each task, a deep
learning model can be trained jointly on multiple tasks. During training, the model
is presented with data from all tasks, and the optimization process updates the
shared parameters to improve performance on all tasks simultaneously.

3. Transfer Learning: Transfer learning involves training a model on one task and
then transferring the learned knowledge to another related task. This can be
considered a form of multitasking, where the model leverages knowledge gained
from one task to improve performance on another. Pre-trained models, such as
those trained on large image datasets, are often fine-tuned for specific tasks in this
way.

4. Attention Mechanisms: Attention mechanisms can be used to focus on


different aspects of the input data for different tasks. This allows the model to
selectively attend to relevant information for each task, enhancing its ability to
multitask effectively.

Applications of Deep Learning in Cybersecurity :-


1. Intrusion Detection and Prevention Systems (IDS/IPS)
These systems detect malicious network activities and prevent intruders from
accessing the systems and alerts the user. Typically, they are recognized by known
signatures and generic attack forms. This is useful against threats like data
breaches.
Convolutional neural networks and Recurrent Neural Networks (RNNs) can be
applied to create smarter ID/IP systems by analyzing the traffic with better
accuracy, reducing the number of false alerts.

2. Dealing with Malware


Traditional malware solutions such as regular firewalls detect malware by using a
signature-based detection system. Deep learning algorithms are capable of
detecting more advanced threats and are not reliant on remembering known
signatures and common attack patterns.

3. Spam and Social Engineering Detection


Natural Language Processing (NLP), a deep learning technique, can help you to
easily detect and deal with spam and other forms of social engineering. NLP learns
normal forms of communication and language patterns and uses various statistical
models to detect and block spam.

4. Network Traffic Analysis


Deep learning ANNs are showing promising results in analyzing HTTPS network
traffic to look for malicious activities. This is very useful to deal with many cyber
threats such as SQL injections and DOS attacks.

5. User Behavior Analytics


Tracking and analyzing user activities and behaviors is an important security
practice for any organization. It is much more challenging than recognizing
traditional malicious
ous activities against the networks since it bypasses security
measures.

Neuron

Neuron
euron is a fundamental unit of a neural network, which is inspired by the
structure and functioning of biological neurons in the human brain. Neurons in
deep learning are also referred to as nodes or artificial neurons. They play a crucial
role in processing information and making predi
predictions
ctions within a neural network.

Basic
asic overview of the key component
components of a neuron in deep learning:

1. Input: (Also called Input signal, fefeature, data point, predictor))


Neurons receive input signals from the features or other neurons. Each input is
associated with a weight, which represents the strength of the connection between
be
the input and the neuron.
2. Weight: (Also called Connection strength, synaptic weight, parameter,
coefficient)
Weights are parameters that the neural network learns during the training
process. They determine the impact of each input on the neuron's output. Adjusting
these weights is a crucial aspect of training the networ
networkk to make accurate
predictions.
3. Summation Function: (A (Also called Aggregation function, weighted sum,
summation, combination)
The neuron calculates the weighted sum of its inputs. This involves multiplying
multip
each input by its corresponding weight and summing up these products.

4. Activation Function: (Also called Transfer function, nonlinearity, squashing


function, activation)
The weighted sum is then passed through an activation function. The activation
function introduces non-linearity
linearity to the neuron, allowing the neural network to
model complex relationships in the data. Common activation functions include
sigmoid, hyperbolic
ic tangent (tanh), an
and rectified linear unit (ReLU).
Output = Activation (Weighted
Weighted SumSum)
5. Output: (Also Called Result, prediction, neuron output, activation output,
response)
The output of the neuron is the result of the activation function applied to the
weighted sum. This output is then passed to other neurons in the subsequent
subsequen layers
of the neural network.
In a deep neural network, neurons are organized into layers, including input layers,
hidden layers, and output layers.
Weight

Weight refers to a parameter associated with a connection between two neurons or


nodes in a neural network. Each connection between neurons has an associated
weight that influences the information flow between them. These weights play a
crucial role in the learning proc
process of a neural network.

1.Connection Strength:
A weight represents the strength or intensity of the connection between two
neurons. It determines the impact of the input signal from one neuron on the output
signal of another.
2.Learnable Parameter:
In thee training phase of a neural network, the weights are learnable parameters.
They are initialized randomly and then adjusted iteratively during the training
process to minimize the difference between the predicted output and the actual
target output.
3. Influence
luence on Neuron Activation:
The weighted sum of inputs to a neuron, including the associated weights, is
computed as part of the neuron's activation. This weighted sum is then passed
through an activation function to determine the neuron's output.

4.Role in Learning:
During the training process, the neural network adjusts the weights to minimize the
error in its predictions. This is typically done using optimization algorithms like
gradient descent, where the weights are updated in the direction that reduces
red the
error.
5.Modeling Relationships:
The weights allow the neural network to model complex relationships and patterns
in the input data. By adjusting the weights, the network can learn to give more or
less importance to specific features, capturing the underlying structure of the data.
6. Bias Term:
In addition to weights, a neuron may have an associated bias term. The bias allows
the neuron to produce an output even when all inputs are zero, providing flexibility
in modeling.
Bias

Bias is an additional
onal parameter associated with each neuron in a neural network.
While weights represent the strength of connections between neurons, biases
provide neurons with the flexibility to produce an output even when all inputs are
zero. The bias term allows the ne
neural
ural network to model more complex
relationships and capture patterns that might not be evident from the raw input data
alone.
Here are the key points about bias in deep learning:
1.Introduction of Offsets:
The bias term introduces an offset or constant value to the input of a neuron. This
is particularly useful when all the input values are zero, preventing the neuron from
being stuck at zero output.
2.Learnable Parameter:
Similar to weights, the bias is a learnable parameter that is adjusted during the
training process. The neural network learns the appropriate values for biases to
minimize the difference between its predictions and the actual target values.
3.Impact on Activation:
The bias term is added to the weighted sum of inputs before passing through the
activation function. Mathematically, this can be expressed as:

4.Flexibility in Modeling:
Biases provide each neuron with a certain degree of independence from the input
data. This flexibility is important for the network to adapt and capture relationships
that may not be clear in the raw input features.
5.Role in Training:
During the training process, biases are adjusted along with weights to optimize the
network's performance on a specific task. Optimization algorithms, such as
gradient descent, are used to update both weights and biases iteratively.

Activation Function?

An activation function is a function that creates inputs and finds relationships from
a series of outputs. An activation function uses algorithms that function like a
human brain to find patterns and relationships in sets of data. Different activation
functions are used depending on the desired impact and performance of the neural
network. Activation functions are made up of 3 layers; input layers, hidden layers,
and output layers.

Why is Activation Function Important?

Activation functions are important because they can add linearity or non-linearity
to a neural network. Activation functions allow information to be presented in a
way that patterns and relationships in data can be extracted. Since all data is not
linear, activation functions allow users to find patterns in multidimensional
information. Since activation functions can be multidimensional, they allow for the
analysis of image, audio, and video.

What are the Main Types of Activation Functions?

Linear
Linear activation functions are represented with f(x) = x, it only delivers a range of
activations and cannot compute complex data. This means that complex patterns
and information can not be found using linear activation functions. Linear
functions are good for simple sets of data that can be easily interpreted.
Binary Step
Binary step activation functions are able to comprehend more complex data, but
cannot be used for problems with multi-step classifications.
Non-Linear
Non-linear functions are the most used and make it simple for a neural network to
separate information. There are several different kinds of non-linear functions that
are used depending on results needed. The most common of nonlinear functions
are Sigmoid, Tanh, and ReLU.

When to Use an Activation Function?

Binary step activation functions are used to determine if a neuron should be


activated. This is determined if the input is greater than the threshold.
Linear activation functions are used when the activation is proportional to the
input. This is used for simple tasks with easy interpretability, for more complex
patterns one of the other kinds of activation functions will need to be used.
Non-linear functions are used when a complex set of data needs to be interpreted.
This can be used on multi-step classification problems and information is presented
in a way that patterns and relationships can be determined.

Activation Function Terms

Neural Network: A neural network is a series of algorithms that function similarly


to a human brain to find patterns and relationships in sets of data.
Linearity (Linear): Information that follows the pattern of a straight line, it is not
complex.
Input Layer: Provides information from outside into the network, no computation
in this layer.
Hidden Layer: All computation is performed in the hidden layer and once finished
is brought to the outside layer.
Outside Layer: The information that was learned by the network is shared with
the outside.
Forward Propagation in Neural Networks

Forward propagation is where input data is fed through a network, in a forward


direction, to generate an output. The data is accepted by hidden layers and
processed, as per the activation function, and moves to the successive layer. The
forward flow of data is designed to avoid data moving in a circular motion, which
does not generate an output.
During forward propagation, pre-activation and activation take place at each
hidden and output layer node of a neural network. The pre-activation function is
the calculation of the weighted sum. The activation function is applied, based on
the weighted sum, to make the neural network flow non-linearly using bias.

How Does Forward Propagation Relate to Backpropagation?

In order to be trained, a neural network relies on both forward and backward


propagation.

Backpropagation is used in machine learning and data mining to improve


prediction accuracy through backward propagation calculated derivatives.
Backward Propagation is the process of moving from right (output layer) to left
(input layer). Forward propagation is the way data moves from left (input layer) to
right (output layer) in the neural network.

A neural network can be understood by a collection of connected input/output


nodes. The accuracy of a node is expressed as a loss function or error rate.
Backpropagation calculates the slope of a loss function of other weights in the
neural network.

Backpropagation Algorithms in Neural Networks

Backpropagation is used in neural networks to improve output. A neural network is


a collection of connected input and output nodes. Each node's accuracy is
expressed as a loss function, which is also known as an error rate.

Backpropagation calculates the mathematical gradient, or slope, of the error rate


compared against the other weights in the neural network. Based on the
calculations, neural network nodes with high error rates are given less weight than
nodes with lower error rates, which are given more weight. Weights determine
how much influence an input will have on an output.
Backpropagation trains a neural network by assigning random weights to the
algorithms and analyzing where the error in the system increases. When errors
occur, the difference between the model output and the actual output is calculated.
Once calculated, a different weight is assigned and the system is run again, to see if
the error is minimized. If the error is not minimized, then an update of parameters
is required

To update parameters, weights and biases are adjusted. Biases are located after
weights and are in a different layer of a network, always being assigned the value
of 1. After the parameters are updated, the process is run again. Once the error is at
a minimum, the model is ready to start predicting.

Types of Backpropagation

There are two types of backpropagation: static and recurrent.

Static Backpropagation

A static backpropagation network aims to produce a map of static inputs to fixed


outputs. This type of network can solve static classification problems such as
optical character recognition (OCR), which allows computers to understand written
documents.

Recurrent Backpropagation

Recurrent backpropagation is used in data mining, to find the fixed value. Once the
fixed value is found, the error is computed and then run through a backpropagation
algorithm.

The difference between the two types of backpropagation is that static mapping is
immediate and recurrent backpropagation takes a longer time to map.

Why Use Backpropagation

Backpropagation increases the accuracy of predictions as it is able to calculate


derivatives quickly. Backpropagation algorithms are intended to develop learning
algorithms for multilayer feedforward neural networks. This algorithm is trained to
capture mapping, which in turn aids in data mining and machine learning.
Backpropagation increases efficiency by reducing the errors found in a network.

You might also like