Deep Learning

DEEP LEARNING
The definition of Deep learning is that it is the branch of machine learning that is
based on artificial neural network architecture. An artificial neural network or
ANN uses layers of interconnected nodes called neurons that work together to
process and learn from the input data.
Use Cases:
1. Computer Vision
example: object detection, Image classification, Image
segmentation.
2. NLP
example: Automatic Text Generation, Speech Recognition,
Sentiment Analysis.
3. Reinforcement Learning
example: Game playing, Robotics, Etc.
What is Machine Learning?
Machine learning is a part of artificial intelligence and growing technology that
enables machines to learn from past data and perform a given task automatically.
Machine Leaning allows the computers to learn from the experiences by its
own, use statistical methods to improve the performance and predict the output without
being explicitly programmed.
Some useful ML algorithms are:

1. Decision Tree algorithm
2. Naïve Bayes
3. Random Forest
4. K-means clustering
5. KNN algorithm
How does Machine Learning work?
The working of machine learning models can be understood by the

example of identifying the image of a cat or dog. To identify this, the ML model takes
images of both cat and dog as input, extracts the different features of images such as shape,
height, nose, eyes, etc., applies the classification algorithm, and predict the output .
How Deep Learning Works?
We can understand the working of deep learning with the same example of
identifying cat vs. dog. The deep learning model takes the images as the input and feed it
directly to the algorithms without requiring any manual feature extraction step. The images
pass to the different layers of the artificial neural network and predict the final output.
Differences
Scalars:
A scalar is just a single number or value. It’s the simplest form of data with
no dimensions (0D).
Example:Scalars can represent parameters like a bias in a neural network or
a single activation value.
Vectors:
A vector is a one-dimensional array of numbers. It has only one axis (1D).
example: A vector can represent the weights of a single layer in a neural
network, or a set of features from a single input (e.g., a list of pixel values from a
grayscale image).
vector = [3, 7, 5] # A vector with 3 elements

Matrices:
A matrix is a two-dimensional array of numbers. It has two axes (2D) – rows and
columns.
example: A matrix can represent the weights between two layers of a neural network or
an image’s pixel grid in image processing tasks.
matrix = [ [1, 2, 3],

[4, 5, 6],
[7, 8, 9]] # A 3x3 matrix
Tensors:
A tensor is a generalized concept of arrays that can have more than two dimensions. It
is a multi-dimensional array of numbers.
example: Tensors are the primary data structure used. For example, a batch of images
can be represented as a 4D tensor (batch size x height x width x channels).
#A 3D tensor representing 2 matrices (2 layers, each with 2x3 dimensions)

tensor = [[[1, 2, 3],[4, 5, 6]],[[7, 8, 9],[10, 11, 12]]]
Gradient Based Optimization:
• It is most commonly used Optimization Algorithm in ML model by
minimize the Loss Function(Error) between the Input(actual) value and
Output(target) value.
• It helps to Find Local Maximum and Local minimum of a function.
How does Gradient Descent work?

Before starting the working principle of gradient descent, we should know some
basic concepts to find out the slope of a line from linear regression. The equation for simple
linear regression is given as:
Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts on they-axis.
• The starting point(shown in above fig.) is used to evaluate the performance as
it is considered just as an arbitrary point. At this starting point, we will derive
the first derivative or slope and then use a tangent line to calculate the
steepness of this slope. Further, this slope will inform the updates to the
parameters (weights and bias).
• The slope becomes steeper at the starting point or arbitrary point, but whenever
new parameters are generated, then steepness gradually reduces, and at the
lowest point, it approaches the lowest point, which is called a point of
convergence.
• If we move towards a negative gradient or away from the gradient of the function at
the current point, it will give the local minimum of that function.
• Whenever we move towards a positive gradient or towards the gradient of the

function at the current point, we will get the local maximum of that function.
Types:
1. Batch Gradient Descent:
In batch gradient descent, To update the model parameter values like weight
and bias, the entire training dataset is used to compute the gradient and update the
parameters at each iteration.
2. Stochastic Gradient Descent (SGD):
In SGD, only one training example is used to compute the gradient and update
the parameters at each iteration. This can be faster than batch gradient descent but may lead
to more noise in the updates.
3. Mini-batch Gradient Descent:
In Mini-batch gradient descent a small batch of training examples is used to
compute the gradient and update the parameters at each iteration. This can be a good
compromise between batch gradient descent and Stochastic Gradient Descent, as it can be
faster than batch gradient descent and less noisy than Stochastic Gradient Descent.
Underfitting:
A statistical model or a machine learning algorithm is said to have underfitting
when a model is too simple to capture data complexities.
Reason:
1. The model is too simple, So it may be not capable to represent the complexities
in the data.
2. The Size of Training Data is not Enough
Techniques to Reduce Underfitting:

1. Increase model complexity
2. increase the number of data.
3. increase the number Epochs.
Overfitting:
A statistical model is said to be overfitted when the model does not make
accurate predictions on testing data. When a model gets trained with so much data, it starts
learning from the noise and inaccurate data entries in our data set.
Reason:
1. High variance and Low bias.
2. the model is too complex.
3. The size of the Training Data is too High
Techniques to Reduce Overfitting:

1. Reduce the model Complex.
2. Ridge Regularization and Lasso Regularization.
3. Improving the Quality of Training Data
Hyperparameters:
this are settings or configurations that you can adjust before training a model. They
control how the learning process works but are not learned from the data itself.
Think of them like the dials and switches on a machine:
•Learning rate: How fast the model learns from the data.
•Batch size: How much data the model processes at once during training.
•Number of epochs: How many times the model sees the full dataset during training.
Validation Set:
a validation set is a portion of the data used to evaluate a model while it is being
trained.
Data were Split into for Training and Testing.
Bias:
> The Bias is Known as Difference between the Input(actual) Value and
Output(predicted) value.
> High Bias Lead to Underfitting.
> It Gives Larger Error in Training As Well as Testing Data.
Variance:
> It Measures spread in Data form its Mean position.
> Model have High Variance has very Complex to Fit the Training Data.
> High Variance leads to Overfitting of Data.
Deep Feed Forward Neural Network:
> A Deep Feedforward Neural Network is a type of artificial neural network
where connections between the nodes do not form cycles.
> The network consists of an input layer, one or more hidden layers, and an output
layer.
> Information flows in one direction—from input to output—hence the name
“feedforward.”
Types Of Neural Networks:
> Deep Learning models are able to automatically learn features from the data.
> The most widely used architectures in deep learning are,
1.Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow
of information through the network. FNNs have been widely used for tasks such as image
classification, speech recognition, and natural language processing.
2. Convolutional Neural Networks (CNNs) are specifically for image and video
recognition tasks. CNNs are able to automatically learn features from the images,
3. Recurrent Neural Networks (RNNs) are a type of neural network that is able to process
sequential data, such as time series and natural language. RNNs are able to maintain an
internal state that captures information about the previous inputs
Regularization:
> Regularization is a technique used to prevent overfitting in machine
learning models.
> Regularization adds a penalty (or regularization term) to the loss function
(the function that measures the error of the model). This discourages the model from
learning overly complex patterns.
> Common types of regularization:
•L1 Regularization (Lasso): Adds the sum of the absolute values of the weights as a
penalty. It encourages some weights to be zero, making the model sparse.
•L2 Regularization (Ridge): Adds the sum of the squared values of the weights as a
penalty. It discourages large weights, keeping the model small.
OPTIMIZATION:
Optimization is the process of minimizing the loss function of a model. In machine
learning, the goal of optimization is to find the best model parameters (weights) that reduce the
error between the model’s predictions and the actual data.
•Goal: To find the best parameters for the model that minimize error or loss.
•How it works: Optimization algorithms adjust the model’s parameters step by step to
minimize the loss function.
Common optimization techniques:

•Gradient Descent: Moves in the direction of the negative gradient (i.e., the slope) of the loss
function to reduce the error.
•Stochastic Gradient Descent (SGD): An updated version of gradient descent where updates
are made using one data point at a time, which is faster and can handle large datasets.

Deep Learning

Uploaded by

Copyright:

Available Formats

Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning

Uploaded by

Copyright:

Available Formats

DEEP LEARNING

Some useful ML algorithms are:

The working of machine learning models can be understood by the

vector = [3, 7, 5] # A vector with 3 elements

matrix = [ [1, 2, 3],

#A 3D tensor representing 2 matrices (2 layers, each with 2x3 dimensions)

How does Gradient Descent work?

• Whenever we move towards a positive gradient or towards the gradient of the

Techniques to Reduce Underfitting:

Techniques to Reduce Overfitting:

Think of them like the dials and switches on a machine:

Common optimization techniques:

You might also like