Deep Learning

lOMoARcPSD|5982446
Deep Learning
lOMoARcPSD|5982446
UNIT-I
SYLLABUS:
Introduction: Various paradigms of learning problems, Perspectives and Issues in deep
learning framework, review of fundamental learning techniques. Feed forward neural
network: Artificial Neural Network, activation function, multi-layer neural network
Various paradigms of learning problem:

Learning paradigms refer to different approaches or frameworks for organizing and
understanding the process of learning. There are several paradigms of learning that
have been proposed and studied in the fields of psychology, education, and
artificial intelligence. Here are some of the key paradigms:
1. Behaviorism:
• Description: Behaviorism focuses on observable behaviors and the
stimuli that elicit them. It suggests that learning is a result of conditioning,
where associations are formed between stimuli and responses.
• Key Figures: B.F. Skinner, Ivan Pavlov, John B. Watson.
2. Cognitivism:
• Description: Cognitivism emphasizes the role of mental processes such
as perception, memory, and problem-solving in learning. It views learning as
an internal process involving the acquisition and organization of knowledge.
• Key Figures: Jean Piaget, Lev Vygotsky.
3. Constructivism:
• Description: Constructivism posits that learners actively construct their
own understanding of the world through interactions with their environment. It
emphasizes the importance of prior knowledge and social interactions in the
learning process.
• Key Figures: Jean Piaget, Lev Vygotsky.
4. Connectivism:
• Description: Connectivism is a learning theory that acknowledges the
impact of technology and networks on learning. It suggests that learning is a
process of making connections and that the ability to navigate and participate in
networks is crucial.
• Key Figures: George Siemens, Stephen Downes.
5. Humanism:
• Description: Humanism focuses on the individual's potential for
growth, self-actualization, and personal development. It emphasizes personal
choice, free will, and the importance of fulfilling one's potential.
• Key Figures: Abraham Maslow, Carl Rogers.
6. Social Learning Theory:
• Description: Social learning theory emphasizes the role of social
interactions and observational learning in the acquisition of behaviors. It
suggests that individuals can learn by observing the behaviors of others and the
consequences of those behaviors.
• Key Figures: Albert Bandura.
7. Experiential Learning:
Description: Experiential learning emphasizes the importance of direct
lOMoARcPSD|5982446
experience in the learning process. It suggests that individuals learn best

through hands-on activities, reflection, and engagement with real-world
situations.
• Key Figures:David Kolb.
8. Machine Learning Paradigms:

• Description: In the context of artificial intelligence, there are various
paradigms for machine learning, including supervised learning, unsupervised
learning, reinforcement learning, and deep learning. These paradigms involve
different approaches to training machines to perform tasks based on data.
These paradigms provide different lenses through which to understand and study the
learning process, whether in the context of human cognition, education, or artificial
intelligence. Different paradigms may be more suitable for different types of learning
tasks or situations.
lOMoARcPSD|5982446
8. Machine Learning Paradigms:

• Description: In the context of artificial intelligence, there are various
paradigms for machine learning, including supervised learning, unsupervised
learning, reinforcement learning, and deep learning. These paradigms involve
different approaches to training machines to perform tasks based on data.
These paradigms provide different lenses through which to understand and study the
learning process, whether in the context of human cognition, education, or artificial
intelligence. Different paradigms may be more suitable for different types of learning
tasks or situations.
PERSPECTIVES AND ISSUES IN DEEP LEARNING FRAMEWORK

Deep learning, a subset of machine learning, has gained significant attention and
success in various domains. However, it comes with its own set of perspectives and
issues. Here are some key perspectives and challenges in the realm of deep learning
frameworks:
Perspectives:
1. Representation Learning:
• Perspective: Deep learning is often celebrated for its ability to
automatically learn hierarchical representations from data. This means that the
model can learn to extract meaningful features at different levels of
abstraction.
2. End-to-End Learning:
• Perspective: Deep learning enables end-to-end learning, where the
model learns to perform a task without explicit feature engineering. This can
simplify the development process and improve performance.
Scalability:
• Perspective: Deep learning models can scale with the amount of data
and computational resources available. This scalability is particularly
beneficial for handling large and complex datasets.
3. Transfer Learning:
• Perspective: Transfer learning is a powerful perspective in deep
learning, allowing models pre-trained on one task to be fine-tuned for another
task. This leverages knowledge gained from one domain to improve
performance in another.
4. Diversity of Architectures:
• Perspective: Deep learning encompasses a wide range of architectures,
including convolutional neural networks (CNNs) for image tasks, recurrent
neural networks (RNNs) for sequence tasks, and transformer architectures for
natural language processing. This diversity allows for specialized solutions
for different types of data and tasks.
Issues and Challenges:

1. Data Quality and Quantity:
lOMoARcPSD|5982446
• Challenge: Deep learning models often require large amounts of

labeled data for training. Obtaining high-quality labeled datasets can be
challenging, particularly in domains where expert annotation is required.
2. Interpretability:
• Challenge: Deep learning models are often considered "black boxes"
due to their complex architectures. Understanding how these models arrive at a
particular decision can be challenging, raising concerns, especially in critical
applications.
3. Computational Resources:
• Challenge: Training deep learning models can be computationally
intensive, requiring specialized hardware such as GPUs or TPUs. This can be
a barrier for researchers and organizations with limited resources.
4. Overfitting:
5. Challenge: Deep learning models, especially with a large number of
parameters, are prone to overfitting, where the model performs well on training data
but fails to generalize to new, unseen data.
6. Adversarial Attacks:
• Challenge: Deep learning models are vulnerable to adversarial attacks,
where small, carefully crafted perturbations to input data can lead to
misclassification. This raises security concerns in applications like image
recognition and autonomous vehicles.
7. Ethical Considerations:
• Challenge: The use of deep learning in sensitive domains (e.g.,
healthcare, criminal justice) raises ethical concerns regarding bias, fairness,
and the responsible use of technology.
8. Resource Intensive Training:
• Challenge: Training deep learning models, especially large ones,
requires substantial computing resources and energy consumption,
contributing to environmental concerns.
9. Lack of Explainability:
• Challenge: Understanding the decision-making process of deep learning
models is a significant challenge. Lack of explainability can hinder the
deployment of these models in critical applications where transparency is
crucial.
Researchers and practitioners continue to address these challenges, working towards

making deep learning more accessible, interpretable, and ethically responsible.
Ongoing advancements in the field aim to mitigate these issues and enhance the
applicability of deep learning across various domains.
REVIEW OF FUNDAMENTAL LEARNING TECHNIQUES:

Deep learning, as a subset of machine learning, has witnessed remarkable success
across various domains. Let's review some fundamental learning techniques in
the context of deep learning:
1. Neural Networks:
lOMoARcPSD|5982446
• Description: Neural networks are the foundation of deep learning. These

networks consist of layers of interconnected nodes (neurons) that process information.
Deep neural networks have multiple layers, allowing them to learn complex
hierarchical representations.
• Pros: Capability to model intricate relationships in data, adaptability to
various domains, effective feature learning.
• Cons: Prone to overfitting, requires substantial data and
computational resources.
2. Convolutional Neural Networks (CNNs):
• Description: CNNs are specialized neural networks designed for
processing grid-like data, such as images. They use convolutional layers to
automatically learn spatial hierarchies of features.
• Pros: Effective for image and spatial data, parameter sharing reduces
the number of parameters, translation-invariance.
• Cons: Limited understanding of temporal dependencies, may
require substantial data for training.
3. Recurrent Neural Networks (RNNs):
• Description: RNNs are designed for sequence data, allowing information
persistence over time. They have connections that create loops, enabling the
network to maintain memory of past inputs.
• Pros: Effective for sequential data, capable of handling variable-
length sequences.
• Cons: Prone to vanishing or exploding gradient problems, can
be computationally expensive.
4. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU):
• Description: LSTMs and GRUs are specialized RNN architectures designed
to address the vanishing gradient problem. They incorporate mechanisms to
selectively remember or forget information.
• Pros: Effective for capturing long-term dependencies in sequences,
mitigates vanishing gradient problem.
• Cons: Increased complexity, computationally more intensive than
traditional RNNs.
5. Autoencoders:
• Description: Autoencoders are neural networks designed for
unsupervised learning. They consist of an encoder that compresses input data
into a latent representation and a decoder that reconstructs the input from the
latent space.
• Pros: Useful for dimensionality reduction, feature learning, and
data generation.
• Cons: Interpretability challenges, may not perform well with limited data.
6. Generative Adversarial Networks (GANs):
• Description: GANs consist of a generator and a discriminator trained
simultaneously through adversarial training. The generator aims to produce
realistic data, while the discriminator aims to distinguish between real and
generated data.
• Pros: Powerful for generating realistic data, diverse applications
including image synthesis.
lOMoARcPSD|5982446
• Cons: Training instability, mode collapse (generator produces limited

diversity), potential for ethical concerns.
7. Transfer Learning with Pre-trained Models:
• Description: Transfer learning involves using pre-trained models on large
datasets for tasks with limited data. This approach leverages the knowledge
gained from one task to improve performance on another.
• Pros: Efficient use of resources, effective in scenarios with limited
labeled data.
• Cons: May not generalize well to tasks substantially different from the
pre- training task.
8. Attention Mechanisms and Transformers:
• Description: Attention mechanisms, popularized by Transformer
architectures, enable models to focus on different parts of the input sequence when
making predictions. Transformers have been highly successful in natural language
processing tasks.
• Pros: Effective for capturing long-range dependencies, parallel
processing, state-of-the-art results in NLP tasks.
• Cons: Computationally demanding, complex architecture.
Deep learning techniques have revolutionized the field by enabling machines to

automatically learn complex representations from data. However, they come with
challenges such as interpretability, data requirements, and computational costs.
Ongoing research is focused on addressing these challenges and extending the
capabilities of deep learning models.
FEED FORWARD NEURAL NETWORK

A feedforward neural network, also known as a multilayer perceptron (MLP), is a
fundamental architecture in artificial neural networks. It consists of an input layer,
one or more hidden layers, and an output layer. The term "feedforward" indicates that
the flow of information through the network is unidirectional, moving from the input
layer through the hidden layers to the output layer without forming cycles or loops.
Key Components of a Feedforward Neural Network:
1. Input Layer:
• The input layer consists of neurons (nodes) that represent the features of
the input data. Each neuron corresponds to an input feature, and the input
values are fed into these neurons.
2. Hidden Layers:
• The hidden layers are intermediate layers between the input and output
layers. Each hidden layer consists of neurons that apply weighted
transformations to the inputs and pass the result through an activation function.
• The term "hidden" arises because the values in these layers are not
directly observable; they are internal representations learned by the network.
3. Weights and Biases:
• Each connection between neurons is associated with a weight, which
determines the strength of the connection. Additionally, each neuron has an
lOMoARcPSD|5982446
associated bias, contributing to the flexibility and expressiveness of the

network.
4. Activation Function:
• Neurons in the hidden layers apply an activation function to the
weighted sum of their inputs. Common activation functions include sigmoid,
hyperbolic tangent (tanh), and rectified linear unit (ReLU). The choice of
activation function affects the network's ability to model complex
relationships.
5. Output Layer:
• The output layer produces the final predictions or classifications. The
number of neurons in the output layer corresponds to the number of classes in a
classification task or the number of output values in a regression task.
6. Forward Propagation:
• During forward propagation, input data is fed into the input layer, and
the network computes the output by passing the information through the hidden
layers to the output layer. The weighted sum and activation function are
applied at each neuron.
7. Loss Function:
• The loss function measures the difference between the predicted output
and the actual target values. The goal of training is to minimize this loss,
adjusting the weights and biases through optimization algorithms like gradient
descent.
8. Backpropagation:
• Backpropagation is the process of updating the weights and biases based
on the computed gradients of the loss function with respect to the network
parameters. This process is crucial for training the network to improve its
performance.
Advantages of Feedforward Neural Networks:
• Flexibility: They can model complex relationships and learn intricate patterns
in data.
• Scalability: Additional hidden layers and neurons can be added to increase
the network's capacity to learn.
• Generalization: With proper training, feedforward networks can
generalize well to unseen data.
Limitations:
• Requires Large Datasets: Training deep networks often requires a
large amount of labeled data.
• Overfitting: Deep networks, if not properly regularized, can be prone to
overfitting, especially with limited data.
• Computational Intensity: Training deep networks can be
computationally intensive, requiring powerful hardware resources.
Feedforward neural networks serve as the foundation for more advanced architectures
in deep learning and have applications in various domains, including image and
speech recognition, natural language processing, and regression tasks
ARTIFICIAL NEURAL NETWORK

lOMoARcPSD|5982446
An artificial neural network (ANN) is a computational model inspired by the structure

and functioning of the biological neural networks in the human brain. ANNs are a
fundamental component of machine learning and artificial intelligence, and they are
used for tasks such as pattern recognition, classification, regression, and more.
Key Components of an Artificial Neural Network:
1. Neurons (Nodes):
• Neurons are the basic units of an artificial neural network. Each neuron
receives one or more inputs, processes them using a set of weights, and
produces an output through an activation function. The activation function
determines whether a neuron should be activated (produce an output) based on
its weighted inputs.
2. Weights:
• Weights represent the strength of connections between neurons. They
are adjustable parameters that the network learns during the training process.
The weights determine the importance of different inputs to the neuron's
output.
3. Activation Function:
• The activation function defines the output of a neuron given its inputs
and weights. Common activation functions include sigmoid, hyperbolic tangent
(tanh), and rectified linear unit (ReLU). Activation functions introduce non-
linearity to the network, enabling it to learn complex relationships in data.
4. Layers:
• ANNs are organized into layers, including an input layer, one or more
hidden layers, and an output layer. The input layer receives the initial data,
and the output layer produces the final result or prediction. Hidden layers
process the information between the input and output layers.
5. Feedforward Architecture:
• In a feedforward neural network, information flows in one direction—
from the input layer through the hidden layers to the output layer. There are
no cycles or loops in the network during the processing of data, distinguishing
it from recurrent neural networks.
6. Loss Function:
• The loss function measures the difference between the predicted
output and the true target values. During training, the network aims to
minimize this loss by adjusting the weights and biases.
7. Backpropagation:
• Backpropagation is the optimization algorithm used to update the
weights of the network based on the calculated gradients of the loss function.
It involves propagating the error backward through the network to adjust the
weights and improve the model's performance.
8. Training:
• Training involves presenting the network with labeled training data,
computing the loss, and updating the weights using backpropagation. The
goal is to make the network generalize well to unseen data.
Types of Artificial Neural Networks:
1. Feedforward Neural Networks (Multilayer Perceptrons):
lOMoARcPSD|5982446
• The most basic type of neural network with an input layer, one or
more hidden layers, and an output layer.
2. Recurrent Neural Networks (RNNs):
• Networks with connections that form cycles, allowing them to capture
sequential dependencies in data. Suitable for tasks involving time-series
data.
3. Convolutional Neural Networks (CNNs):
• Specialized networks designed for processing grid-like data, such as
images. They use convolutional layers to automatically learn hierarchical
features.
4. Radial Basis Function Networks (RBFNs):
• Networks that use radial basis functions as activation functions, often
employed in pattern recognition and approximation tasks.
5. Generative Adversarial Networks (GANs):
• A pair of networks, a generator and a discriminator, that are trained
simultaneously through adversarial training. GANs are used for generating new
data instances.
Artificial neural networks have proven to be powerful tools in various applications,

including image and speech recognition, natural language processing, autonomous
vehicles, and many more. Their adaptability and ability to learn complex patterns
make them a central technology in the field of machine learning.
MULTI LAYER NEURAL NETWORK IN DEEP LEARNING

In the context of deep learning, a multi-layer neural network typically refers to a
neural network with multiple hidden layers, forming a deep architecture. Deep
learning specifically focuses on the use of deep neural networks, which are
characterized by the presence of many hidden layers, enabling the learning of
complex hierarchical features from data. Here are some key aspects of multi-layer
neural networks in deep learning:
Key Components:
1. Input Layer:
• The input layer consists of nodes that represent the features of the input
data. Each node corresponds to a feature, and the values in this layer are the
input values provided to the network.
2. Hidden Layers:
• Deep neural networks have multiple hidden layers (more than one)
between the input and output layers. Each hidden layer consists of nodes that
apply weighted transformations to the input data and pass the result through
activation functions.
3. Weights and Biases:
• Each connection between nodes in different layers is associated with
a weight, and each node has an associated bias. These parameters are
adjusted during training using optimization algorithms such as gradient
descent.
4. Activation Functions:
lOMoARcPSD|5982446
• Activation functions, such as ReLU, sigmoid, or tanh, are applied to the

weighted sum of inputs at each node in the hidden layers. The choice of
activation function introduces non-linearity, allowing the network to learn
complex patterns.
5. Output Layer:
• The output layer produces the final results or predictions based on the
processed information from the hidden layers. The number of nodes in the
output layer depends on the nature of the task (e.g., binary classification, multi-
class classification, regression).
6. Feedforward Architecture:
• Deep neural networks follow a feedforward architecture, meaning that
information flows in one direction—from the input layer, through the hidden
layers, to the output layer. There are no cycles or loops in the network during
the processing of data.
7. Loss Function:
• The loss function measures the difference between the predicted
output and the true target values. During training, the network aims to
minimize this loss by adjusting the weights and biases.
8. Backpropagation:
• Backpropagation is the optimization algorithm used to update the
weights and biases based on the calculated gradients of the loss function. It
involves propagating the error backward through the network to adjust the
parameters and improve the model's performance.
Characteristics:
• Representation Learning: Deep neural networks excel at learning hierarchical
representations of data, automatically extracting complex features from raw input.
• Expressiveness: The depth of the network allows it to model intricate
relationships in the data, making it suitable for tasks with high-dimensional and non-
linear patterns.
• Challenges in Training: Training deep networks can be challenging due
to issues like vanishing gradients, exploding gradients, and the need for careful
initialization and regularization techniques.
• Architectural Variations: Deep networks can have various architectures,
including fully connected layers, convolutional layers (for image data), and recurrent
layers (for sequential data).
• Hyperparameter Tuning: Proper configuration of hyperparameters,
including the learning rate, number of layers, and number of nodes in each layer, is
crucial for achieving optimal performance.
Deep learning architectures, particularly multi-layer neural networks, have been

instrumental in the remarkable progress of the field, achieving state-of-the-art results
in various applications such as image recognition, natural language processing, and
speech recognition. Advances in deep learning continue to drive innovations in
artificial intelligence.
lOMoARcPSD|5982446
UNIT-II
TRAINING NEURAL NETWORK

Training a neural network involves the process of optimizing its parameters (weights
and biases) to minimize the difference between predicted outputs and actual target
values. The primary steps in training a neural network include defining the
architecture, preparing the data, selecting a loss function, choosing an optimization
algorithm, and iterating through the training process. Here's a step-by-step guide:
1. Define the Neural Network Architecture:
• Input Layer: Define the number of input nodes based on the features in your
dataset.
• Hidden Layers: Decide the number of hidden layers and the number of nodes
in each layer. This choice is often influenced by the complexity of the task.
• Activation Functions: Choose appropriate activation functions for the hidden
layers. Common choices include ReLU, sigmoid, or tanh.
• Output Layer: Specify the number of nodes in the output layer based on the
nature of the task (e.g., binary classification, multi-class classification, regression).
2. Prepare the Data:
• Data Preprocessing: Normalize or standardize input features to ensure that
they are on a similar scale.
• Data Splitting: Divide the dataset into training, validation, and test sets. The
training set is used to train the model, the validation set helps tune hyperparameters,
and the test set evaluates the final model performance.
3. Choose a Loss Function:
• Regression Tasks: Mean Squared Error (MSE) is commonly used for
regression problems.
• Binary Classification: Binary Crossentropy is often used for binary
classification tasks.
• Multi-Class Classification: Categorical Crossentropy is used for multi-class
classification.
4. Select an Optimization Algorithm:
• Stochastic Gradient Descent (SGD): The basic optimization algorithm that
updates weights based on the gradient of the loss function with respect to the weights.
• Adam, RMSprop, Adagrad: Adaptive optimization algorithms that adjust the
learning rate during training.
5. Training Loop:
• Forward Propagation: Pass input data through the network to obtain
predictions.
• Compute Loss: Calculate the difference between predicted and true
values using the chosen loss function.
• Backpropagation: Compute gradients of the loss with respect to the weights
and biases using the chain rule.
• Update Weights: Use the optimization algorithm to update weights and biases
in the direction that reduces the loss.
lOMoARcPSD|5982446
6. Validation:
• Evaluate on Validation Set: Periodically assess the model's performance on
the validation set to avoid overfitting.
lOMoARcPSD|5982446
• Adjust Hyperparameters: Fine-tune hyperparameters based on validation set

performance.
7. Test the Model:
• Evaluate on Test Set: Assess the final model's performance on the test set
to estimate its generalization ability to unseen data.
Tips and Considerations:
• Regularization: Introduce techniques like dropout or L1/L2 regularization to
prevent overfitting.
• Learning Rate Schedule: Adjust the learning rate during training to
enhance convergence.
• Batch Size: Experiment with different batch sizes to find a balance between
computational efficiency and convergence.
• Early Stopping: Halt training when the performance on the validation set
ceases to improve, preventing overfitting.
• Initialization: Use appropriate weight initialization methods to facilitate
convergence.
• Monitor Loss and Metrics: Keep track of training and validation loss as well
as relevant metrics.
Training a neural network is an iterative process that requires experimentation and

fine-tuning. Regular monitoring and adjustment of hyperparameters are essential for
achieving optimal performance.
RISK MINIMIZATION IN TRAINING NEURAL NETWORK

Risk minimization in training neural networks involves the optimization process to
reduce the expected error or loss on unseen data. The ultimate goal is to create a
neural network that generalizes well to new instances. Here are key considerations
and techniques for risk minimization in the context of training neural networks:
1. Loss Functions:
• Selection: Choose an appropriate loss function based on the nature of the task
(e.g., Mean Squared Error for regression, Crossentropy for classification).
• Customization: Tailor the loss function to the specifics of the problem to be
solved.
2. Empirical Risk Minimization (ERM):
• Objective: Minimize the average loss over the training dataset (empirical risk).
• Implementation: Use optimization algorithms (e.g., Stochastic Gradient
Descent) to adjust the neural network's parameters to minimize the training set's
empirical risk.
3. Regularization Techniques:
• L1 and L2 Regularization: Add regularization terms to the loss function to
penalize large weights. Helps prevent overfitting.
• Dropout: Randomly deactivate neurons during training to improve
generalization.
4. Early Stopping:
lOMoARcPSD|5982446
• Technique: Monitor the performance of the neural network on a validation set

during training.
• Stopping Criteria: Halt training when the performance on the validation set
ceases to improve, preventing overfitting.
5. Batch Normalization:
• Technique: Normalize input values within each mini-batch during training.
• Benefits: Accelerates training and provides a form of
regularization, contributing to improved generalization.
6. Data Augmentation:
• Technique: Increase the diversity of the training dataset by applying
random transformations (e.g., rotation, scaling) to input data.
• Benefits: Helps the model generalize better by exposing it to various data
variations.
7. Hyperparameter Tuning:
• Learning Rate: Experiment with learning rates to find an optimal value for
convergence without overshooting.
• Number of Layers and Nodes: Adjust the neural network architecture
based on the complexity of the problem.
8. Validation and Test Sets:
• Validation Set: Use a separate dataset to monitor the model's performance
during training and tune hyperparameters.
• Test Set: Reserve a dataset for the final evaluation of the
model's generalization.
9. Cross-Validation:
• Technique: Divide the dataset into multiple folds and train/validate the
model on different subsets.
• Benefits: Provides a more robust estimate of generalization performance.
10. Ensemble Learning:
• Technique: Train multiple neural networks with different initializations
or architectures and combine their predictions.
• Benefits: Reduces the risk of relying on a single model's predictions,
improving overall performance.
11. Gradient Clipping:
• Technique: Limit the size of the gradients during training to prevent
exploding gradients.
• Benefits: Mitigates issues associated with large gradient updates.
12. Weight
• Technique: Use appropriate weight initialization methods to prevent
the network from getting stuck in local minima.
• Benefits: Facilitates faster convergence and reduces the risk of poor
initialization.
By applying these techniques, practitioners aim to strike a balance between fitting the
training data well and ensuring the neural network generalizes effectively to unseen
lOMoARcPSD|5982446
data. The selection and careful tuning of these components contribute to the success of
risk minimization during neural network training.
BACK PROPAGATION IN TRAINING NEURAL NETWORK

Backpropagation (short for "backward propagation of errors") is a supervised learning
algorithm used to train artificial neural networks by iteratively adjusting the weights
of connections between neurons. It is a key component of training neural networks
and involves two main phases: forward propagation and backward propagation.
1. Forward Propagation:
1. Input Layer:
• The process starts by presenting the input data to the neural network's
input layer. Each node in the input layer represents a feature of the input
data.
2. Weighted Sum and Activation:
• The input values are multiplied by the weights associated with the
connections to the next layer, and the weighted sum is calculated for each
node in the hidden layers. An activation function is then applied to this sum to
introduce non-linearity.
3. Propagation through Hidden Layers:
• The weighted sums and activations are calculated successively through
each hidden layer until reaching the output layer.
4. Output Layer:
• The output layer produces the predicted output of the neural network
based on the processed information from the hidden layers.
2. Backward Propagation:
1. Loss Calculation:
• The algorithm computes the loss, which is the difference between
the predicted output and the actual target values. The choice of loss
function depends on the type of task (e.g., mean squared error for
regression, crossentropy for classification).
2. Gradient Calculation:
• The gradients of the loss with respect to the weights and biases are
calculated. This involves applying the chain rule of calculus to find the rate
of change of the loss with respect to each weight and bias.
3. Weight and Bias Updates:

• The calculated gradients are used to update the weights and biases in the
network. The weights are adjusted in the direction that minimizes the loss. The
learning rate determines the size of the step taken during weight updates.
4. Iterative Process:
• Steps 1-3 are repeated for each mini-batch of training data. The entire
process of forward and backward propagation is repeated iteratively until the
model reaches satisfactory performance or a predefined number of iterations.
Key Concepts and Considerations:
lOMoARcPSD|5982446
• Chain Rule: Backpropagation relies on the chain rule of calculus to calculate

gradients. The gradients are calculated by propagating the error backward through
the network.
• Activation Functions: Differentiation of activation functions is a crucial
aspect of backpropagation. Common activation functions like sigmoid, tanh, and
ReLU have well-defined derivatives.
• Learning Rate: The learning rate is a hyperparameter that influences the size
of the step taken during weight updates. It must be carefully chosen to balance
convergence speed and stability.
• Mini-Batch Training: Backpropagation is often used with mini-batch
training, where updates to weights are made after processing a small subset (mini-
batch) of the training data.
• Batch Normalization and Dropout: Techniques like batch normalization and
dropout are used to enhance the stability and generalization capability of the neural
network during training.
• Convergence Criteria: Training stops when the model reaches satisfactory
performance on the training data or when a predefined number of iterations is reached.
Early stopping is a common technique to prevent overfitting.
Backpropagation is a foundational algorithm in neural network training, enabling

models to learn complex relationships in data. It has been crucial to the success of
deep learning and the development of various neural network architectures.
REGULARIZATION
Regularization is a technique used in machine learning to prevent overfitting and
improve the generalization of a model. Overfitting occurs when a model learns the
training data too well, including its noise and outliers, and performs poorly on new,
unseen data.
Regularization introduces a penalty term to the model's objective function,

discouraging the weights from becoming too large. This helps to prevent the model
from fitting the training data too closely and encourages it to learn more general
patterns.
There are several types of regularization, and two common ones are L1 regularization
and L2 regularization:
1. L1 Regularization (Lasso): It adds the absolute values of the coefficients as
a penalty term to the objective function. It tends to produce sparse weight vectors,
encouraging some weights to become exactly zero, effectively performing feature
selection.
2. L2 Regularization (Ridge): It adds the squared values of the coefficients as
a penalty term to the objective function. It discourages large weights but does not
usually lead to sparse weight vectors.
lOMoARcPSD|5982446
The regularization term is usually multiplied by a hyperparameter, often denoted as λ

(lambda), which controls the strength of the regularization. The larger the λ, the
stronger the regularization effect.
The regularized objective function can be represented as follows:
RegularizedLoss=Loss+λ×Regularization Term
Here, the Loss term represents the original loss function used for training the model.
Choosing an appropriate regularization strength is important. Too much regularization

can lead to underfitting, where the model is too simple and unable to capture the
underlying patterns in the data, while too little regularization may result in overfitting.
Regularization is widely used in various machine learning algorithms, including linear

regression, logistic regression, and neural networks, to improve the model's
performance on new, unseen data
CONDITIONAL RANDOM FEILDS

Conditional Random Fields (CRFs) are a type of probabilistic graphical model used in
machine learning, particularly in the fields of natural language processing (NLP),
computer vision, and bioinformatics. CRFs are designed for structured prediction
tasks, where the goal is to predict a set of interdependent output variables.
Here are some key concepts related to Conditional Random Fields:

1. Structured Prediction: In many machine learning tasks, the output is not a
simple label for each input, but a structured output with dependencies between the
output variables. For example, in part-of-speech tagging or named entity recognition,
the labels assigned to words are dependent on neighboring words.
2. Markov Random Fields (MRFs): CRFs are a specific type of Markov
Random Field, a graphical model that represents the dependencies between
variables. MRFs model the joint probability distribution over variables in terms of
energy functions, and CRFs are a type of MRF with a particular form of the energy
function.
3. Conditional Independence: The "conditional" in CRFs refers to the fact that
they model the conditional probability of the output variables given the input variables
1. Features: Features are functions of the input and output variables that capture
the dependencies in the data. The choice of features is crucial in CRFs, and they are
used to define the energy function.
2. Training: CRFs are trained by maximizing the conditional likelihood of the
training data. Gradient-based optimization methods, such as stochastic gradient
descent, are often employed to estimate the parameters.
lOMoARcPSD|5982446
Conditional Random Fields have been successfully applied to various tasks, including
part-of-speech tagging, named entity recognition, image segmentation, and biological
sequence analysis. They provide a way to model dependencies between output
variables in a principled probabilistic framework.
LINEAR CHAIN:
A "linear chain" in the context of machine learning and graphical models typically
refers to a sequence of interconnected variables. This structure is commonly
encountered in tasks where there is a temporal or sequential relationship among the
variables. One of the most prevalent examples is in the field of natural language
processing (NLP), where words or tokens in a sentence follow a linear order.
When dealing with linear chains in the context of probabilistic graphical models, two
main types are often discussed: Hidden Markov Models (HMMs) and linear chain
Conditional Random Fields (CRFs).
1. Hidden Markov Models (HMMs): HMMs are a type of probabilistic model
that deals with sequences of observations. They assume that there is an underlying
sequence of hidden states, and each hidden state generates an observation. The
transitions between hidden states form a linear chain. HMMs are widely used for
tasks such as part-of-speech tagging, speech recognition, and bioinformatics.
2. Linear Chain Conditional Random Fields (CRFs): As mentioned earlier,
CRFs are probabilistic graphical models used for structured prediction tasks. Linear
chain CRFs specifically model dependencies in sequences. They are often applied to
problems where the output labels have a sequential or temporal order. For example,
in named entity recognition, the goal is to label each word in a sentence with its entity
type, and the dependencies between adjacent words form a linear chain.
The linear chain structure simplifies the modeling and inference processes, making it
computationally more feasible. Both HMMs and linear chain CRFs have been used in
various applications where sequential relationships need to be considered, and they
have proven effective in capturing dependencies within sequences of data.
PARTITION FUNCTION:
The partition function is a concept commonly used in statistical mechanics and
probability theory, particularly in the context of Gibbs distributions and Boltzmann
distributions. It plays a crucial role in determining the probabilities of different
states in a system.
1. Statistical Mechanics:
• In statistical mechanics, the partition function is associated with the
probability distribution of different microstates of a physical system. It is
denoted by Z] and is defined as the sum (or integral) of the exponential of the
negative energy over all possible states of the system.
lOMoARcPSD|5982446
Mathematically, for a system with discrete energy levels, the partition function is
given by:
Z=∑ all statese −βE
1.
• Here, E is the energy of a state, β is the inverse temperature, and the
sum is taken over all possible states of the system.
• The partition function is fundamental for calculating thermodynamic
properties such as free energy, entropy, and specific heat.
2. Probability Theory:
• In probability theory, particularly in the context of graphical models
like Markov Random Fields (MRFs) and Conditional Random Fields (CRFs),
the partition function is used to normalize the distribution over possible
configurations.
• In the case of CRFs, for example, the partition function ensures that
the probabilities assigned to all possible output sequences sum to 1. It helps in
defining a valid probability distribution over the output space.
• Mathematically, for a conditional distribution P(Y|X) in a CRF,
the partition function is often denoted as Z(X):
In both cases, the partition function serves to normalize the distribution and ensure
that it represents a valid probability distribution. The specific form of the partition
function depends on the context in which it is used, such as statistical mechanics
or
probabilistic graphical models.
MARKOV NETWORK
A Markov Network, also known as a Markov Random Field (MRF), is a type of
probabilistic graphical model that represents dependencies between variables in a
structured way. Markov Networks are commonly used in various fields, including
machine learning, computer vision, and statistical physics.
Here are some key concepts related to Markov Networks:

1. Nodes and Edges:
• In a Markov Network, nodes represent random variables, and edges
represent dependencies between variables. The absence of an edge between
two nodes indicates conditional independence given the rest of the variables in
the network.
2. Local Markov Property:
• The distribution over the variables in a Markov Network satisfies the
local Markov property, which states that each variable is conditionally
independent of its non-neighbors given its neighbors.
3. Factors:
• Markov Networks are defined by factors, which are functions associated
with cliques in the graph. A clique is a subset of nodes such that there is an
lOMoARcPSD|5982446
edge between every pair of nodes in the subset. The joint distribution is
factorized as the product of these factors.
• Mathematically, for a set of variables X, the factor associated with
clique C a is denoted as ϕC(XC).
4. Potential Functions:
• The factors are often represented by potential functions, which assign a
non-negative value to each possible assignment of values to the variables in the
clique. The joint distribution is then proportional to the product of these
potential functions.
• For a set of variables X, the potential function associated with a clique C
is often denoted as ϕC(XC). and the joint distribution is given by the product of
these potential functions.
5. Global Markov Property:
• The global Markov property states that a variable is conditionally
independent of all other variables in the network given its neighbors. This is
a consequence of the local Markov property and the factorization of the joint
distribution.
Markov Networks are used for modeling structured dependencies in various

applications, such as image segmentation, denoising, and protein structure prediction.
They are closely related to Bayesian Networks, but while Bayesian Networks use a
directed acyclic graph to represent dependencies and are suited for causal
relationships, Markov Networks use an undirected graph to represent pairwise
dependencies without necessarily implying a causal direction. The choice between the
two depends on the nature of the dependencies in the problem at hand.
BELIEF PROPOGATION
Belief Propagation (BP) is an algorithm used for making approximate inferences in
graphical models, such as Bayesian Networks and Markov Random Fields. It is
particularly useful for solving problems related to marginal probabilities and making
predictions in models with complex dependencies.
Here are some key concepts related to Belief Propagation:

1. Graphical Models:
• Belief Propagation is applied to graphical models, which represent
dependencies between random variables using a graph structure. Nodes in the
graph correspond to variables, and edges represent probabilistic dependencies.
2. Message Passing:
• The key idea behind Belief Propagation is to pass messages between
nodes in the graph to update beliefs about the variables. Messages convey
information about the probability distribution of a variable given the
information from its neighboring nodes.
3. Factor Graphs:
lOMoARcPSD|5982446
• Belief Propagation is often explained using factor graphs, a bipartite

graph that represents both variables and factors. Nodes on one side represent
variables, and nodes on the other side represent factors (conditional probability
distributions).
4. Messages:
• Messages are computed and exchanged between nodes in the graph.
There are two types of messages: "belief" messages and "factor" messages.
• Belief Messages: These messages convey information from a
variable node to a factor node about the variable's beliefs given its
neighbors' beliefs.
• Factor Messages: These messages convey information from a
factor node to a variable node about the distribution of the variable
given the factor's neighbors' beliefs.
5. Message Passing Equations:
• The update rules for computing belief and factor messages are derived
from the conditional independence properties of the graphical model. The
messages are updated iteratively until convergence.
6. Sum-Product Algorithm:
• Belief Propagation is a special case of the more general Sum-Product
Algorithm. The Sum-Product Algorithm is a message-passing algorithm for
computing marginal probabilities in graphical models.
7. Convergence:
• Belief Propagation may not always converge, especially in the
presence of cycles in the graph. However, it often provides good
approximations for marginal probabilities, even in cases where convergence is
not guaranteed.
Belief Propagation is widely used in various fields, including error correction codes,
computer vision, and artificial intelligence. While it is efficient and scalable, it's
essential to be aware of its limitations, especially in graphs with loops where the
algorithm may not converge or may produce biased results. Extensions and variations
of Belief Propagation have been developed to address some of these limitations in
different scenarios HIDDEN
MARKOV MODEL
A Hidden Markov Model (HMM) is a statistical model used to describe a system that
evolves over time and is subject to observation. HMMs are widely applied in various
fields, including speech recognition, bioinformatics, natural language processing, and
finance.
Here are the key components and concepts associated with Hidden Markov Models:
1. States:
• An HMM consists of a set of hidden states, which represent the
underlying, unobservable processes or conditions of the system. The system
is assumed to be in one of these states at any given time.
2. Transitions:
lOMoARcPSD|5982446
• Transitions define the probabilities of moving from one state to

another. The transition probabilities are usually represented by a state
transition matrix. The probability of transitioning from state i to state j is
denoted as aij.
3. Observations:
• Each state emits an observation symbol or a set of symbols with a
certain probability. Observations are visible or measurable aspects of the
system. The probability of emitting an k from state i is denoted as
observation
bi(k).
4. Initial State Distribution:
• The initial state distribution represents the probabilities of starting the
sequence in each state. It is often represented by a vector �π, where ��πi is
the probability of starting in state �i.
5. Hidden Markov Model Parameters:
• The set of parameters for an HMM includes the state transition matrix
(�A), the observation emission probabilities (�B), and the initial state
distribution (�π ).
6. Observation Sequence:
• The observation sequence is the sequence of visible symbols observed
over time. It is what we can directly measure or observe from the system.
7. State Sequence:
• The state sequence is the sequence of hidden states that the system
traverses over time. It is not directly observable and needs to be inferred
from the observations.
8. Forward Algorithm:
• The forward algorithm is used to compute the probability of observing a
particular sequence given the HMM parameters. It involves calculating the
forward probabilities at each time step.
9. Viterbi Algorithm:
• The Viterbi algorithm is used to find the most likely state
sequence given the observed sequence. It is a dynamic programming
algorithm that efficiently finds the optimal state sequence.
10. Baum-Welch Algorithm:
• The Baum-Welch algorithm, also known as the forward-backward
algorithm, is used for training HMMs. It estimates the model parameters (state
transition probabilities, observation probabilities, and initial state distribution)
based on a set of observation sequences.
Hidden Markov Models are versatile and can be used for various applications, such as
speech recognition, part-of-speech tagging, bioinformatics (e.g., gene prediction), and
financial modeling. Their ability to model sequential data and handle uncertainty
makes them valuable in scenarios where the underlying processes are not directly
observable.
ENTROPY IN DEEP LEARNING
In the context of deep learning and neural networks, the concept of entropy is often
associated with two main areas: softmax layer and generative models.
lOMoARcPSD|5982446
1. Softmax Layer in Classification:

• In the output layer of a neural network used for classification, the
softmax function is commonly applied. The softmax function is used to convert
the raw output scores (logits) into probabilities, making it suitable for
multiclass classification problems.
• The softmax function is defined as follows:
•
• softmax(zi
•
•
• where zi is the raw score for class i, and K is the total number of classes.
• The entropy in this context is often referred to as cross-entropy. Cross-
entropy measures the difference between the predicted probability
distribution (output of softmax) and the true distribution (one-hot encoded
ground truth). The formula for cross-entropy for a single data point is:
Minimizing cross-entropy during training encourages the neural network to

produce output probabilities that are close to the ground truth.
2. Entropy in Generative Models:
• In generative models, such as Variational Autoencoders (VAEs) and
Generative Adversarial Networks (GANs), entropy plays a role in measuring
the uncertainty or diversity of generated samples.
• Entropy can be used to quantify how diverse the generated samples are.
Higher entropy indicates more diversity among the generated samples, while
lower entropy indicates more uniformity.
• For instance, in the context of VAEs, the latent space is often
regularized to have a specific distribution, such as a standard normal
distribution. The entropy of this distribution is a measure of the diversity of
representations in the latent space.
• In GANs, the entropy of the generated samples' distribution can be a
factor in assessing the quality and diversity of generated samples.
U-3
Deep learning:
Deep learning is a branch of machine learning which is based on artificial neural
networks. It is capable of learning complex patterns and relationships within data. In
deep learning, we don’t need to explicitly program everything. It has become
lOMoARcPSD|5982446
increasingly popular in recent years due to the advances in processing power and the
availability of large datasets. Because it is based on artificial neural networks
(ANNs) also known as deep neural networks (DNNs). These neural networks are
inspired by the structure and function of the human brain’s biological neurons, and
they are designed to learn from large amounts of data.
1. Deep Learning is a subfield of Machine Learning that involves the use of
neural networks to model and solve complex problems. Neural networks are
modeled after the structure and function of the human brain and consist of layers
of interconnected nodes that process and transform data.
2. The key characteristic of Deep Learning is the use of deep neural networks,
which have multiple layers of interconnected nodes. These networks can learn
complex representations of data by discovering hierarchical patterns and features
in the data. Deep Learning algorithms can automatically learn and improve from
data without the need for manual feature engineering.
3. Deep Learning has achieved significant success in various fields, including
image recognition, natural language processing, speech recognition, and
recommendation systems. Some of the popular Deep Learning architectures
include Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Deep Belief Networks (DBNs).
4. Training deep neural networks typically requires a large amount of data and
computational resources. However, the availability of cloud computing and the
development of specialized hardware, such as Graphics Processing Units (GPUs),
has made it easier to train deep neural networks.
In summary, Deep Learning is a subfield of Machine Learning that involves the use
of deep neural networks to model and solve complex problems. Deep Learning has
achieved significant success in various fields, and its use is expected to continue to
grow as more data becomes available, and more powerful computing resources
become available.
TRAINING DEEP MODELS

Training deep models involves the process of optimizing the parameters (weights and
biases) of a neural network to make accurate predictions on a given task. Here's a
general overview of the steps involved in training deep models:
1. Define the Neural Network Architecture:

• Specify the number of layers, the number of nodes in each layer, and
the activation functions.
• Choose an appropriate loss function based on the task (e.g.,
mean squared error for regression, cross-entropy for classification).
2. Prepare the
Dataset:
• Split the dataset into training, validation, and test sets.
• Normalize or standardize the input features to ensure numerical stability.
• Handle any missing or categorical data appropriately.
3. Initialization:
lOMoARcPSD|5982446
• Initialize the weights and biases of the neural network. Common

initialization methods include random initialization or using pre-trained
weights for transfer learning.
4. Forward Pass:
• Perform a forward pass to obtain predictions on the training data.
• Calculate the loss by comparing the predictions to the actual labels.
5. Backpropagation:
• Compute the gradients of the loss with respect to the model parameters
(weights and biases) using backpropagation.
• Update the model parameters using an optimization algorithm (e.g.,
gradient descent, Adam) to minimize the loss.
6. Hyperparameter Tuning:
• Experiment with hyperparameters such as learning rate, batch size,
and the number of hidden units or layers.
• Use the validation set to assess the model's performance during training
and adjust hyperparameters accordingly.
7. Regularization:
• Apply regularization techniques to prevent overfitting, such as dropout,
L1 or L2 regularization.
• Monitor the training and validation loss to ensure the model generalizes
well to new data.
8. Training Iterations:
• Repeat the forward pass, backward pass, and parameter updates for a
specified number of epochs or until convergence.
• Monitor the training and validation performance to detect signs of
overfitting or underfitting.
9. Evaluate on Test Set:
• Once training is complete, evaluate the model on the test set to assess
its generalization performance.
10. Fine-Tuning and Optimization:
• Fine-tune the model based on performance feedback.
• Consider adjusting the model architecture, experimenting with
different optimization algorithms, or incorporating more advanced techniques
(e.g., learning rate schedules).
11. Save the Model:
• Save the trained model for future use or deployment.
DROPOUT IN DEEP NEURAL NETWORKS?
Dropout refers to data, or noise, that's intentionally dropped from a neural network to
improve processing and time to results.
A neural network is software attempting to emulate the actions of the human brain.
The human brain contains billions of neurons that fire electrical and chemical signals
lOMoARcPSD|5982446
to each other to coordinate thoughts and life functions. A neural network uses a
software equivalent of these neurons, called units. Each unit receives signals from
other units and then computes an output that it passes onto other neuron/units, or
nodes, in the network.
Why do we need dropout?
The challenge for software-based neural networks is they must find ways to reduce the
noise of billions of neuron nodes communicating with each other, so the networks'
processing capabilities aren't overrun. To do this, a network eliminates all
communications that are transmitted by its neuron nodes not directly related to the
problem or training that it's working on. The term for this neuron node elimination
is dropout.
Dropout layers
When data scientists apply dropout to a neural network, they consider the nature of
this random processing. They make decisions about which data noise to exclude and
then apply dropout to the different layers of a neural network as follows:
• Input layer. This is the top-most layer of artificial intelligence (AI)

and machine learning where the initial raw data is being ingested. Dropout can be
applied to this layer of visible data based on which data is deemed to be irrelevant
to the business problem being worked on.
• Intermediate or hidden layers. These are the layers of processing after data
ingestion. These layers are hidden because we can't exactly see what they do. The
layers, which could be one or many, process data and then pass along
intermediate
-- but not final -- results that they send to other neurons for additional processing.
Because much of this intermediate processing will end up as noise, data
scientists use dropout to exclude some of it.
• Output layer. This is the final, visible processing output from all neuron
units. Dropout is not used on this layer.
lOMoARcPSD|5982446
These
images show the different layers of a neural network before and after dropout has been
applied.
Examples and uses of dropout
An organization that's monitoring sound transmissions from space is looking for

repetitious, patterned signals because they might be possible signs of life. The raw
signals are fed into a neural network to perform an analysis. Upfront, data scientists
exude all incoming sound signals that aren't repetitive or patterned. They also
exclude a percentage of intermediate, hidden layer units to reduce processing and
speed time to results.
Here's another real-world example that shows how dropout works: A biochemical
company wants to design a new molecular structure that will enable it to produce a
revolutionary form of plastic. The company already knows the individual elements
that will comprise the molecule. What it doesn't know is the correct formulation of
these elements.
CONVOLUTIONAL NEURAL NETWORK
A Convolutional Neural Network (CNN) is a type of Deep Learning neural

network architecture commonly used in Computer Vision. Computer vision is a field
of Artificial Intelligence that enables a computer to understand and interpret the
image or visual data
Convolutional Neural Network (CNN) is the extended version of artificial neural
networks (ANN) which is predominantly used to extract the feature from the grid-
like matrix dataset. For example visual datasets like images or videos where data
patterns play an extensive role.
lOMoARcPSD|5982446
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.
The Convolutional layer applies filters to the input image to extract features, the
Pooling layer downsamples the image to reduce computation, and the fully
connected layer makes the final prediction. The network learns the optimal filters
through backpropagation and gradient descent.
neural networks work?
Convolutional neural networks are distinguished from other neural networks by their
superior performance with image, speech, or audio signal inputs. They have three
main types of layers, which are:
• Convolutional layer
• Pooling layer
• Fully-connected (FC) layer
The convolutional layer is the first layer of a convolutional network. While
convolutional layers can be followed by additional convolutional layers or pooling
layers, the fully-connected layer is the final layer. With each layer, the CNN increases
in its complexity, identifying greater portions of the image. Earlier layers focus on
simple features, such as colors and edges. As the image data progresses through the
layers of the CNN, it starts to recognize larger elements or shapes of the object until it
finally identifies the intended object.
Convolutional layer
The convolutional layer is the core building block of a CNN, and it is where the
majority of computation occurs. It requires a few components, which are input data, a
filter, and a feature map. Let’s assume that the input will be a color image, which is
made up of a matrix of pixels in 3D. This means that the input will have three
dimensions—a height, width, and depth—which correspond to RGB in an image. We
also have a feature detector, also known as a kernel or a filter, which will move across
the receptive fields of the image, checking if the feature is present. This process is
known as a convolution.
lOMoARcPSD|5982446
The feature detector is a two-dimensional (2-D) array of weights, which represents

part of the image. While they can vary in size, the filter size is typically a 3x3 matrix;
this also determines the size of the receptive field. The filter is then applied to an area
of the image, and a dot product is calculated between the input pixels and the filter.
This dot product is then fed into an output array. Afterwards, the filter shifts by a
stride, repeating the process until the kernel has swept across the entire image. The
final output from the series of dot products from the input and the filter is known as a
feature map, activation map, or a convolved feature.
Note that the weights in the feature detector remain fixed as it moves across the
image, which is also known as parameter sharing. Some parameters, like the weight
values, adjust during training through the process of backpropagation and gradient
descent. However, there are three hyperparameters which affect the volume size of
the output that need to be set before the training of the neural network begins. These
include:
1. The number of filters affects the depth of the output. For example, three
distinct filters would yield three different feature maps, creating a depth of three.
2. Stride is the distance, or number of pixels, that the kernel moves over the input
matrix. While stride values of two or greater is rare, a larger stride yields a
smaller output.
3. Zero-padding is usually used when the filters do not fit the input image. This sets
all elements that fall outside of the input matrix to zero, producing a larger or
equally sized output. There are three types of padding:
• Valid padding: This is also known as no padding. In this case, the

last convolution is dropped if dimensions do not align.
• Same padding: This padding ensures that the output layer has the same size as
the input layer
• Full padding: This type of padding increases the size of the output by adding
zeros to the border of the input.
After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU)
transformation to the feature map, introducing nonlinearity to the model.
lOMoARcPSD|5982446
Pooling layer
Pooling layers, also known as downsampling, conducts dimensionality reduction,

reducing the number of parameters in the input. Similar to the convolutional layer, the
pooling operation sweeps a filter across the entire input, but the difference is that this
filter does not have any weights. Instead, the kernel applies an aggregation function to
the values within the receptive field, populating the output array. There are two main
types of pooling:
• Max pooling: As the filter moves across the input, it selects the pixel with the
maximum value to send to the output array. As an aside, this approach tends to
be used more often compared to average pooling.
• Average pooling: As the filter moves across the input, it calculates the
average value within the receptive field to send to the output array.
While a lot of information is lost in the pooling layer, it also has a number of benefits
to the CNN. They help to reduce complexity, improve efficiency, and limit risk of
overfitting.
Fully-connected layer
The name of the full-connected layer aptly describes itself. As mentioned earlier, the
pixel values of the input image are not directly connected to the output layer in
partially connected layers. However, in the fully-connected layer, each node in the
output layer connects directly to a node in the previous layer.
This layer performs the task of classification based on the features extracted through
the previous layers and their different filters. While convolutional and pooling
layers tend to use ReLu functions, FC layers usually leverage a softmax activation
function to classify inputs appropriately, producing a probability from 0 to 1.
RECURRENT NEURAL NETWORK:
A recurrent neural network (RNN) is a type of artificial neural network which uses
sequential data or time series data. These deep learning algorithms are commonly used
for ordinal or temporal problems, such as language translation, natural language
lOMoARcPSD|5982446
processing (nlp), speech recognition, and image captioning; they are incorporated into
popular applications such as Siri, voice search, and Google Translate. Like
feedforward and convolutional neural networks (CNNs), recurrent neural networks
utilize training data to learn. They are distinguished by their “memory” as they take
information from prior inputs to influence the current input and output. While
traditional deep neural networks assume that inputs and outputs are independent of
each other, the output of recurrent neural networks depend on the prior elements
within the sequence
Architecture Of Recurrent Neural Network

RNNs have the same input and output architecture as any other deep neural
architecture. However, differences arise in the way information flows from input to
output. Unlike Deep neural networks where we have different weight matrices for
each Dense network in RNN, the weight across the network remains the same. It
calculates state hidden state Hi for every input Xi . By using the following formulas:
h= σ(UX + Wh-1 + B)
Y = O(Vh + C) Hence
Y = f (X, h , W, U, V, B, C)
Here S is the State matrix which has element si as the state of the network at
timestep i
The parameters in the network are W, U, V, c, b which are shared across timestep
lOMoARcPSD|5982446
How RNN works
The Recurrent Neural Network consists of multiple fixed activation function units,
one for each time step. Each unit has an internal state which is called the hidden
state of the unit. This hidden state signifies the past knowledge that the network
currently holds at a given time step. This hidden state is updated at every time step to
signify the change in the knowledge of the network about the past. The hidden state
is updated using the following recurrence relation:-
The formula for calculating the current state:
where:
ht -> current state
ht-1 -> previous state
xt -> input state
Formula for applying Activation function(tanh):
where:
whh -> weight at recurrent neuron
wxh -> weight at input neuron
The formula for calculating output:
Yt -> output
Why -> weight at output layer
These parameters are updated using Backpropagation. However, since RNN works
on sequential data here we use an updated backpropagation which is known as
Backpropagation through time.
Training through RNN
1. A single-time step of the input is provided to the network.

2. Then calculate its current state using a set of current input and the previous
state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the
information from all the previous states.
lOMoARcPSD|5982446
5. Once all the time steps are completed the final current state is used to
calculate the output.
6. The output is then compared to the actual output i.e the target output and the
error is generated.
7. The error is then back-propagated to the network to update the weights and
hence the network (RNN) is trained using Backpropagation through time.
Advantages of Recurrent Neural Network
1. An RNN remembers each and every piece of information through time. It is

useful in time series prediction only because of the feature to remember previous
inputs as well. This is called Long Short Term Memory.
2. Recurrent neural networks are even used with convolutional layers to extend
the effective pixel neighborhood.
Disadvantages of Recurrent Neural Network
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation
function.
Applications of Recurrent Neural Network
1. Language Modelling and Generating Text
2. Speech Recognition
3. Machine Translation
4. Image Recognition, Face detection
5. Time series Forecasting
Types Of RNN
There are four types of RNNs based on the number of inputs and outputs in the
network.
1. One to One
2. One to Many
3. Many to One
4. Many to Many
One to One
This type of RNN behaves the same as any simple Neural network it is also known
as Vanilla Neural Network. In this Neural network, there is only one input and one
output.
One To Many
lOMoARcPSD|5982446
In this type of RNN, there is one input and many outputs associated with it. One of
the most used examples of this network is Image captioning where given an image
we predict a sentence having Multiple words.
Many to One
In this type of network, Many inputs are fed to the network at several states of the
network generating only one output. This type of network is used in the problems
like sentimental analysis. Where we give multiple words as input and predict only
the sentiment of the sentence as output.
Many to Many
In this type of neural network, there are multiple inputs and multiple outputs
corresponding to a problem. One Example of this Problem will be language
translation. In language translation, we provide multiple words from one language as
input and predict multiple words from the second language as output.
lOMoARcPSD|5982446
U-4
PROBABILISTIC NEURAL NETWORK:
A probabilistic neural network (PNN) is a sort of feedforward neural network used to

handle classification and pattern recognition problems. In the PNN technique, the
parent probability distribution function (PDF) of each class is approximated using a
Parzen window and a non-parametric function. The PDF of each class is then used to
estimate the class probability of fresh input data, and Bayes’ rule is used to allocate
the class with the highest posterior probability to new input data. With this method,
the possibility of misclassification is lowered. This type of ANN was created using a
Bayesian network and a statistical approach known as Kernel Fisher discriminant
analysis
The following are the major types of difficulties that researchers have attempted to
address with PNN:
• Labeled stationary data pattern classification

• Data pattern classification in which the data has a time-varying probabilistic
density function
• Unsupervised algorithms for unlabeled data sets, etc.
Structure of Probabilistic Neural Network

The network is composed of four basic layers. Let’s understand them one by one.
lOMoARcPSD|5982446
Input Layer Each predictor variable is represented by a neuron in the input

layer. When there are N categories in a categorical variable, N-1 neurons are used.
By subtracting the median and dividing by the interquartile range, the range of data is
standardized. The values are then fed to each of the neurons in the hidden layer by
the input neurons.
Pattern Layer : Each case in the training data set has one neuron in this
layer. It saves the values of the case’s predictor variables as well as the target value. A
hidden neuron calculates the Euclidean distance between the test case and the
neuron’s center point, then uses the sigma values to apply the radial basis kernel
function.
Summation Layer
Each category of the target variable has one pattern neuron in PNN. Each hidden
neuron stores the actual target category of each training event; the weighted value
output by a hidden neuron is only supplied to the pattern neuron that corresponds to
the hidden neuron’s category. The values for the class that the pattern neurons
represent are added together.
Source
lOMoARcPSD|5982446
Decision Layer
The output layer compares the weighted votes accumulated in the pattern layer for
each target category and utilizes the largest vote to predict the target category.
Advantages and Disadvantages of Probabilistic Neural Networks

There are various benefits and drawbacks and applications of employing a PNN rather
than a multilayer perceptron.
Advantages
• Multilayer perceptron networks are substantially slower than PNNs.

• PNNs have the potential to outperform multilayer perceptron networks in terms
of accuracy.
• Outliers aren’t as noticeable in PNN networks.
• PNN networks predict target probability scores with high accuracy.
• PNNs are getting close to Bayes’s optimum classification.
Disadvantages
• When it comes to classifying new cases, PNNs are slower than multilayer
perceptron networks.
• PNN requires extra memory to store the mod.
Applications of Probabilistic Neural Network

Following can be the major applications of probabilistic neural networks
• Probabilistic Neural Networks can be used to identify ships.

• Management of sensor setup in a wireless ad hoc network can be done using a
probabilistic neural network.
• It can be applied to Remote-sensing image classification.
• Character Recognition is also an important application of Probabilistic Neural
Networks. There are many more applications of PNNs.

Deep Learning

Uploaded by

Copyright:

Available Formats

Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning

Uploaded by

Copyright:

Available Formats

lOMoARcPSD|5982446

Various paradigms of learning problem:

experience in the learning process. It suggests that individuals learn best

8. Machine Learning Paradigms:

8. Machine Learning Paradigms:

PERSPECTIVES AND ISSUES IN DEEP LEARNING FRAMEWORK

Issues and Challenges:

• Challenge: Deep learning models often require large amounts of

Researchers and practitioners continue to address these challenges, working towards

REVIEW OF FUNDAMENTAL LEARNING TECHNIQUES:

• Description: Neural networks are the foundation of deep learning. These

• Cons: Training instability, mode collapse (generator produces limited

Deep learning techniques have revolutionized the field by enabling machines to

FEED FORWARD NEURAL NETWORK

associated bias, contributing to the flexibility and expressiveness of the

ARTIFICIAL NEURAL NETWORK

An artificial neural network (ANN) is a computational model inspired by the structure

Artificial neural networks have proven to be powerful tools in various applications,

MULTI LAYER NEURAL NETWORK IN DEEP LEARNING

• Activation functions, such as ReLU, sigmoid, or tanh, are applied to the

Deep learning architectures, particularly multi-layer neural networks, have been

TRAINING NEURAL NETWORK

• Adjust Hyperparameters: Fine-tune hyperparameters based on validation set

Training a neural network is an iterative process that requires experimentation and

RISK MINIMIZATION IN TRAINING NEURAL NETWORK

• Technique: Monitor the performance of the neural network on a validation set

BACK PROPAGATION IN TRAINING NEURAL NETWORK

3. Weight and Bias Updates:

• Chain Rule: Backpropagation relies on the chain rule of calculus to calculate

Backpropagation is a foundational algorithm in neural network training, enabling

Regularization introduces a penalty term to the model's objective function,

The regularization term is usually multiplied by a hyperparameter, often denoted as λ

The regularized objective function can be represented as follows:

Choosing an appropriate regularization strength is important. Too much regularization

Regularization is widely used in various machine learning algorithms, including linear

CONDITIONAL RANDOM FEILDS

Here are some key concepts related to Conditional Random Fields:

Here are some key concepts related to Markov Networks:

Markov Networks are used for modeling structured dependencies in various

Here are some key concepts related to Belief Propagation:

• Belief Propagation is often explained using factor graphs, a bipartite

• Transitions define the probabilities of moving from one state to

1. Softmax Layer in Classification:

Minimizing cross-entropy during training encourages the neural network to

TRAINING DEEP MODELS

1. Define the Neural Network Architecture:

• Initialize the weights and biases of the neural network. Common

Why do we need dropout?

• Input layer. This is the top-most layer of artificial intelligence (AI)

An organization that's monitoring sound transmissions from space is looking for

CONVOLUTIONAL NEURAL NETWORK

A Convolutional Neural Network (CNN) is a type of Deep Learning neural

The feature detector is a two-dimensional (2-D) array of weights, which represents

• Valid padding: This is also known as no padding. In this case, the

Pooling layers, also known as downsampling, conducts dimensionality reduction,

RECURRENT NEURAL NETWORK:

Architecture Of Recurrent Neural Network

How RNN works

Training through RNN