Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

deep learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

An Internship Report On

Deep Learning / NLP / Artificial Intelligence


Short -Term Internship

Submitted for partial fulfilment of the requirements for the award of degree of

Bachelor of Technology

In

COMPUTER SCIENCE AND ENGINEERING


by
A.Santhosh - 22KP1A0504

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

NRI INSTITUTE OF TECHNOLOGY

(APPROVED BY AICTE & AFFILIATED TO JNTU-KAKINADA)

Visadala (p), Medikonduru (M), GUNTUR- 522438 ,0863-2344300

1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

NRI INSTITUTE OF TECHNOLOGY

(Approved by AICTE & Affiliated to JNTU- Kakinada)

DECLARATION

This is to certify that the in course entitled “Deep Learning / NLP / Artificial

Intelligence” done by Santhosh Appari (22KP1A0504). At the Department of

Computer Science and Engineering, NRI INSTITUTTE OF TECHNOLOGY,

affiliated to Jawaharlal Nehru Technology University Kakinada, the results

embodied in this have not been submitted to any other University for the same

purpose.

Date:

Place: Guntur A.Santhosh - 22KP1A0504

Signature of the Candidate:

2
NRI INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This certificate attests that the following report accurately represents the work completed by A.SANTHOSH ,

Registration Number- 22KP1A0504 ,during the academic year 2023-2024, covering the time period from December

2023 to February 2024, as part of the CYBERSECURITY VIRTUAL INTERNSHIP PROGRAMME.

Signature of the HOD


K.NAGESWARAO
(Prof. Department Of CSE )

3
ABSTRACT

Deep learning, a subset of artificial intelligence (AI), has revolutionized various fields by
enabling machines to learn from large amounts of data and improve their performance over
time without explicit programming. At its core, deep learning utilizes neural networks with
multiple layers, known as deep neural networks, to model complex patterns and
representations.

These models have achieved groundbreaking results in areas such as computer vision, natural
language processing, speech recognition, and autonomous systems. The availability of big
data, advancements in computational power, and innovative algorithms have accelerated
deep learning's impact across industries, from healthcare to finance and entertainment.

Despite its successes, challenges remain in terms of interpretability, ethical concerns, and the
need for vast computational resources. As deep learning continues to evolve, it holds the
potential to unlock further advancements in AI, leading to smarter, more autonomous systems
capable of tackling increasingly complex real-world problems.

Deep learning, a key subset of artificial intelligence (AI), has made significant strides in natural
language processing (NLP), revolutionizing how machines understand, generate, and interact
with human language. By leveraging deep neural networks, particularly architectures like
transformers, deep learning models can learn complex patterns from vast amounts of textual
data, enabling advancements in machine translation, sentiment analysis, question answering,
and text generation.

4
*

LETTER OF UNDERTAKING
To
The Principal
NRI Institute of Technology
Visadala,
Guntur.
Subject: Submission of Internship Report on Deep Learning/NLP/Artificial
Inteligence Internship on ExcelR Edtech platform.
Dear sir,
I am pleased to submit my internship report on “on Deep Learning/NLP/Artificial
Inteligence Internship” as per your instruction to fulfil the requirements of the Degree of
Bachelor of Technology in CSE from Jawaharlal Nehru Technological University, Kakinada.
While preparing this report, I have tried my level best to include all the relevant information,
explanations, things I learned from the Internship Courses, my contribution to this programme to
make the report informative and comprehensive. It would not have been possible to complete
this report without your assistance, of which I am very thankful. Working for Six Weeks on
Deep Learning/NLP/Artificial Inteligence Internship in online was amazing and a huge learning
opportunity for me. Also, it was a great experience to prepare this report and I will be available for
any clarification, if required.Therefore, I hope that you would be kind enough to accept my
Internship Report and oblige thereby.

Yours Obediently,
A.SANTHOSH

ID:22KP1A0504
EMAIL: santhoshappari05@gmail.com

5
6
ACKNOWLEDGEMENT

We take this opportunity to express our deepest gratitude and appreciation


to all those people who made this Internship work easier with words of
encouragement, motivation, discipline, and faith by offering different places
to look to expand my ideas and help me towards the successful completion
of this Internship work.

First and foremost, we express our deep gratitude to Mr. Rajendra Pradesh,
Chairman, NRI Institute of Technology for providing necessary facilities
throughout the Computer Science & Engineering program.

We express our sincere thanks to Dr. K.Koteswarao, Principal, NRI Institute


of Technology for his constant support and cooperation throughout the
Computer Science & Engineering program.

We express our sincere gratitude to Mr. K.Nageswarao, Professor & HOD,


Computer Science and Engineering, NRI Institute of Technology for his
constant encouragement, motivation and faith by offering different places to
look to expand my ideas.

We would like to take this opportunity to express our thanks to the teaching
and non- teaching staff in the Department of Computer Science &
Engineering, NRIIT for their invaluable help and support.

A.SANTHOSH-22KP1A0504

7
Table of Contents:
DeepLearning/NLP/ArtificialInteligence

Modules content Date Pages

Module1 Fundamentals of DeepLearning 10-6-24


1.ArtificialInteligence
To 9 - 15
2.MachineLearning
3.Overfitting&Underfitting 19-6-24

Module2 Introduction of DeepLearning 20-6-24


1.Neural Network
To 15 - 23
2 Introduction to keras
3. Generative Adversial Networks 29-6-24

Module3 Types of Neural Network 1-7-24 23 -


1.convolutional Neural Network
To
2.Recurrent Neural Network
3. Artificial Neural Network 10-7-24

Module4 Applications of DeepLearning 11-7-24


1.Natural Language Processing
To
2.Deep Reinforcement Learning
3. Convolutional Layers 20-7-24

8
Fundamentals of DeepLearning

Artificial Intelligence

1. ➢ The topic of whether computers might be taught to "think" was first posed
in the 1950s by a small group of pioneers in the developing discipline of
computer science. The implications of this question are still being researched
today.
2. ➢ The endeavour to automate intellectual processes typically carried out by
humans would serve as a succinct explanation of the area. As a result, AI is a
broad area that comprises a variety of methods that include learning as well as
machine learning and deep learning. For instance, early chess programmes
used just hardcoded rules created by programmers and were not machine
learning applications.
3. ➢ For many years, experts thought that AI could be achieved by having
programmers write a lot of rules for computers to follow. ➢ This approach is
called symbolic AI, and it was the main way of doing AI from the 1950s to the
late 1980s.
4. ➢ Symbolic AI reached its peak popularity in the 1980s, when expert systems
were very popular. 1.1. Artificial intelligence, machine learning, and deep
learning Artificial intelligence (AI), machine learning (ML), and deep learning
(DL) are all terms that are often used interchangeably, but they actually have
different meanings.
5. • Artificial intelligence is a broad term that refers to the ability of machines to
perform tasks that are typically associated with human intelligence, such as
learning, reasoning, and problem-solving.
6. • Machine learning is a subset of AI that involves the development of
algorithms that can learn from data without being explicitly programmed.
Machine learning algorithms are trained on large datasets, and they can then
be used to make predictions or decisions about new data.
7. • Deep learning is a subset of machine learning that uses artificial neural
networks to learn from data. Neural networks are inspired by the human brain,
and they can be used to solve complex problems that would be difficult or
impossible to solve with traditional machine learning algorithms. In other
words, AI is the umbrella term, ML is a subset of AI, and DL is a subset of ML

9
Here are some examples of how AI, ML, and DL are being used today:
• AI is being used to develop self-driving cars, facial recognition software, and spam
filters.
• ML is being used to predict customer behaviour, optimize product
recommendations, and personalize marketing campaigns.
• DL is being used to develop natural language processing (NLP) models, image
recognition algorithms, and medical diagnosis tools.
AI, ML, and DL are all rapidly growing fields, and they are having a major impact on our
lives. As these technologies continue to develop, we can expect to see even more innovative
and groundbreaking applications in the years to come.
Machine Learning:
Machine learning is a type of artificial intelligence (AI) that allows software
applications to become more accurate in predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict
new output values. In other words, machine learning algorithms can learn from data and
improve their performance over time. This is in contrast to traditional programming, where
software applications are explicitly programmed to perform specific tasks.

10
Neural networks are a type of mathematical model that is inspired by the human
brain. They are made up of layers of interconnected nodes, and they can learn to represent
complex patterns in data. In deep learning, neural networks are used to learn successive
layers of increasingly meaningful representations. This is done by feeding the network a
large amount of data, and then adjusting the weights of the connections between the nodes
until the network is able to correctly classify or predict the data.
The term "neural network" is a reference to neurobiology, but neural networks are
not models of the brain. The brain is a complex organ, and we do not yet fully understand
how it works. Neural networks are much simpler models, and they are not designed to be a
complete representation of the brain.

11
12
There are many different types of loss functions. Some common loss functions include:
• Mean squared error (MSE): This is a loss function that measures the squared
difference between the predictions and the target.
• Cross-entropy: This is a loss function that is used for classification problems. It
measures the difference between the predicted probabilities and the true probabilities.
• Huber loss: This is a loss function that is less sensitive to outliers than MSE.

13
Four branches of machine learning
There are four main branches of machine learning:
1. Supervised Learning: In supervised learning, the algorithm is trained on a labelled
dataset, where each input data point is associated with the corresponding target
or output label. The goal of the algorithm is to learn a mapping from inputs to
outputs, enabling it to make predictions on new, unseen data. Common tasks in
supervised learning include classification (assigning labels to input data) and
regression (predicting numerical values).
2. Unsupervised Learning: Unsupervised learning involves training algorithms on an
unlabelled dataset, where the data does not have predefined output labels. The
goal of unsupervised learning is to discover patterns, structures, or relationships
within the data. Clustering, where the algorithm groups similar data points
together, and dimensionality reduction, which aims to simplify data while
preserving essential characteristics, are examples of unsupervised learning tasks.
3. Semi-Supervised Learning: Semi-supervised learning is a combination of
supervised and unsupervised learning. The algorithm is trained on a dataset that
contains both labelled and unlabelled data. The labelled data provides some
information for guidance, and the unlabelled data helps the algorithm learn more
about the underlying structure of the data, often leading to better performance
when labelled data is limited or expensive to obtain.
4. Reinforcement Learning: Reinforcement learning is different from the previous
three types as it involves an agent that interacts with an environment to achieve a
goal. The agent takes actions in the environment and receives feedback in the
form of rewards or penalties. The goal of the agent is to learn a policy or strategy
that maximizes the cumulative reward over time. Reinforcement learning is
commonly used in applications such as game playing, robotics, and autonomous
systems.

Overfitting and underfitting


Overfitting and underfitting are two common problems that can occur in machine
learning. Overfitting occurs when a model learns the training data too well and
starts to memorize the noise and outliers in the data. This can lead to the model
performing poorly on new data that it has not seen before. Underfitting occurs
when a model does not learn the training data well enough and is unable to make
accurate predictions.
Here are some examples of overfitting and underfitting:

14
• Overfitting: A model that is trained to predict whether a patient has cancer
might learn to memorize the specific features of the training data that are
associated with cancer. This would allow the model to make accurate predictions
on the training data, but it would also cause the model to perform poorly on new
data that does not have the same features.

• Underfitting: A model that is trained to predict the price of a house might not
learn the relationship between the features of the house and its price. This would
cause the model to make inaccurate predictions on both the training data and
new data

Neural Network:
Neural networks extract identifying features from data, lacking pre-
programmed understanding. Network components include neurons, connections,
weights, biases, propagation functions, and a learning rule. Neurons receive inputs,
governed by thresholds and activation functions. Connections involve weights and
biases regulating information transfer. Learning, adjusting weights and biases, occurs
in three stages: input computation, output generation, and iterative refinement
enhancing the network’s proficiency in diverse tasks.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed as a result of this
simulation.
3. The neural network then responds in a new way to the environment because of
the changes in its free parameters.

15
Working of a Neural Network
Neural networks are complex systems that mimic some features of the functioning of
the human brain. It is composed of an input layer, one or more hidden layers, and an
output layer made up of layers of artificial neurons that are coupled. The two stages of
the basic process are called backpropagation and forward propagation.

16
Forward Propagation
• Input Layer: Each feature in the input layer is represented by a node on the
network, which receives input data.
• Weights and Connections: The weight of each neuronal connection indicates
how strong the connection is. Throughout training, these weights are changed.
• Hidden Layers: Each hidden layer neuron processes inputs by multiplying them
by weights, adding them up, and then passing them through an activation
function. By doing this, non-linearity is introduced, enabling the network to
recognize intricate patterns.
• Output: The final result is produced by repeating the process until the output
layer is reached.
Backpropagation
• Loss Calculation: The network’s output is evaluated against the real goal values,
and a loss function is used to compute the difference. For a regression problem,
the Mean Squared Error (MSE) is commonly used as the cost function.

Loss Function:
• Gradient Descent: Gradient descent is then used by the network to reduce the
loss. To lower the inaccuracy, weights are changed based on the derivative of the
loss with respect to each weight.
• Adjusting weights: The weights are adjusted at each connection by applying this
iterative process, or backpropagation, backward across the network.
• Training: During training with different data samples, the entire process of
forward propagation, loss calculation, and backpropagation is done iteratively,
enabling the network to adapt and learn patterns from the data.
• Actvation Functions: Model non-linearity is introduced by activation functions
like the rectified linear unit (ReLU) or sigmoid. Their decision on whether to “fire”
a neuron is based on the whole weighted input.

Introduction to Keras
Keras is a deep-learning framework for Python that provides a convenient way to
define and train almost any kind of deep-learning model. Keras was initially
developed for researchers, with the aim of enabling fast experimentation.
17
Keras has the following key features:
❖ It allows the same code to run seamlessly on CPU or GPU.
❖ It has a user-friendly API that makes it easy to quickly prototype deep-
learning models.
❖ It has built-in support for convolutional networks (for computer vision),
recurrent networks (for sequence processing), and any combination of both.
❖ It supports arbitrary network architectures: multi-input or multi-output
models, layer sharing, model sharing, and so on.
This means Keras is appropriate for building essentially any deep learning
model, from a generative adversarial network to a neural Turing machine. Keras is
distributed under the permissive MIT license, which means it can be freely used in
commercial projects. It’s compatible with any version of Python from 2.7 to 3.6
(as of mid-2017).
Keras has well over 200,000 users, ranging from academic researchers and
engineers at both startups and large companies to graduate students and
hobbyists. Keras is used at Google, Netflix, Uber, CERN, Yelp, Square, and
hundreds of startups working on a wide range of problems. Keras is also a popular
framework on Kaggle, the machine-learning competition website, where almost
every recent deep-learning competition has been won using Keras models.

Generative Adversial Networks:


Generative Adversarial Networks (GANs) are a powerful class of neural
networks that are used for an unsupervised learning. GANs are made up of two neural
networks, a discriminator and a generator. They use adversarial training to produce
artificial data that is identical to actual data.
• The Generator attempts to fool the Discriminator, which is tasked with accurately
distinguishing between produced and genuine data, by producing random noise
samples.
• Realistic, high-quality samples are produced as a result of this competitive
interaction, which drives both networks toward advancement.
• GANs are proving to be highly versatile artificial intelligence tools, as evidenced
by their extensive use in image synthesis, style transfer, and text-to-image
synthesis.
• They have also revolutionized generative modeling.

18
Through adversarial training, these models engage in a competitive interplay until the
generator becomes adept at creating realistic samples, fooling the discriminator
approximately half the time.
Generative Adversarial Networks (GANs) can be broken down into three parts:
• Generative: To learn a generative model, which describes how data is generated
in terms of a probabilistic model.
• Adversarial: The word adversarial refers to setting one thing up against another.
This means that, in the context of GANs, the generative result is compared with
the actual images in the data set. A mechanism known as a discriminator is used
to apply a model that attempts to distinguish between real and fake images.
• Networks: Use deep neural networks as artificial intelligence (AI) algorithms for
training purposes.
Architecture of GANs
A Generative Adversarial Network (GAN) is composed of two primary parts,
which are the Generator and the Discriminator.
Generator Model
A key element responsible for creating fresh, accurate data in a Generative
Adversarial Network (GAN) is the generator model. The generator takes random noise
as input and converts it into complex data samples, such text or images. It is commonly
depicted as a deep neural network.
The generator’s ability to generate high-quality, varied samples that can fool the
discriminator is what makes it successful.
Discriminator Model
An artificial neural network called a discriminator model is used in Generative
Adversarial Networks (GANs) to differentiate between generated and actual input. By
evaluating input samples and allocating probability of authenticity, the discriminator
functions as a binary classifier.
Convolutional layers or pertinent structures for other modalities are usually used in its
architecture when dealing with picture data. Maximizing the discriminator’s capacity to
accurately identify generated samples as fraudulent and real samples as authentic is
the aim of the adversarial training procedure. The discriminator grows increasingly
discriminating as a result of the generator and discriminator’s interaction, which helps
the GAN produce extremely realistic-looking synthetic data overall.
MinMax Loss

19
In a Generative Adversarial Network (GAN), the minimax loss formula is provided by:
minGmaxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]minGmaxD
(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]
Where,
• G is generator network and is D is the discriminator network
• Actual data samples obtained from the true data distribution pdata(x)pdata(x) are
represented by x.
• Random noise sampled from a previous distribution pz(z)pz(z)(usually a normal
or uniform distribution) is represented by z.
• D(x) represents the discriminator’s likelihood of correctly identifying actual data
as real.
• D(G(z)) is the likelihood that the discriminator will identify generated data coming
from the generator as authentic.

How does a GAN work?


The steps involved in how a GAN works:
1. Initialization: Two neural networks are created: a Generator (G) and a
Discriminator (D).
• G is tasked with creating new data, like images or text, that closely
resembles real data.
20
• D acts as a critic, trying to distinguish between real data (from a training
dataset) and the data generated by G.
2. Generator’s First Move: G takes a random noise vector as input. This noise
vector contains random values and acts as the starting point for G’s creation
process. Using its internal layers and learned patterns, G transforms the noise
vector into a new data sample, like a generated image.
3. Discriminator’s Turn: D receives two kinds of inputs:
• Real data samples from the training dataset.
• The data samples generated by G in the previous step. D’s job is to analyze
each input and determine whether it’s real data or something G cooked up.
It outputs a probability score between 0 and 1. A score of 1 indicates the
data is likely real, and 0 suggests it’s fake.
4. The Learning Process: Now, the adversarial part comes in:
• If D correctly identifies real data as real (score close to 1) and generated
data as fake (score close to 0), both G and D are rewarded to a small
degree. This is because they’re both doing their jobs well.
• However, the key is to continuously improve. If D consistently identifies
everything correctly, it won’t learn much. So, the goal is for G to eventually
trick D.
5. Generator’s Improvement:
• When D mistakenly labels G’s creation as real (score close to 1), it’s a sign
that G is on the right track. In this case, G receives a significant positive
update, while D receives a penalty for being fooled.
• This feedback helps G improve its generation process to create more
realistic data.
6. Discriminator’s Adaptation:
• Conversely, if D correctly identifies G’s fake data (score close to 0), but G
receives no reward, D is further strengthened in its discrimination abilities.
• This ongoing duel between G and D refines both networks over time.

Convolution Neural Network


Convolutional Neural Network (CNN) is the extended version of
artificial neural

21
networks (ANN) which is predominantly used to extract the feature from the grid-
like matrix
dataset. For example visual datasets like images or videos where data patterns
play an
extensive role.
CNN Architecture
Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.

Simple CNN architecture


The Convolutional layer applies filters to the input image to extract features, the Pooling
layer downsamples the image to reduce computation, and the fully connected layer
makes the final prediction. The network learns the optimal filters through
backpropagation and gradient descent.
How Convolutional Layers Works?
Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image. It can be represented as a cuboid having its
length, width (dimension of the image), and height (i.e the channel as images generally
have red, green, and blue channels).

22
Now imagine taking a small patch of this image and running a small neural network,
called a filter or kernel on it, with say, K outputs and representing them vertically. Now
slide that neural network across the whole image, as a result, we will get another image
with different widths, heights, and depths. Instead of just R, G, and B channels now we
have more channels but lesser width and height. This operation is called Convolution.
If the patch size is the same as that of the image it will be a regular neural network.
Because of this small patch, we have fewer weights.

Image source: Deep Learning Udacity

Recurrent Neural Networks (RNNs):


In traditional neural networks, inputs and outputs are treated independently. However,
tasks like predicting the next word in a sentence require information from previous
words to make accurate predictions. To address this limitation, Recurrent Neural
Networks (RNNs) were developed.
Recurrent Neural Networks introduce a mechanism where the output from one step is
fed back as input to the next, allowing them to retain information from previous inputs.
This design makes RNNs well-suited for tasks where context from earlier steps is
essential, such as predicting the next word in a sentence.

23
Recurrent Neural Network
In simple terms, RNNs apply the same network to each element in a sequence, RNNs
preserve and pass on relevant information, enabling them to learn temporal
dependencies that conventional neural networks cannot.
Key Components of RNNs
1. Recurrent Neurons
The fundamental processing unit in a Recurrent Neural Network (RNN) is a Recurrent
Unit, which is not explicitly called a “Recurrent Neuron.” Recurrent units hold a hidden
state that maintains information about previous inputs in a sequence. Recurrent
units can “remember” information from prior steps by feeding back their hidden state,
allowing them to capture dependencies across time.

Recurrent Neuron
2. RNN Unfolding

24
RNN unfolding, or “unrolling,” is the process of expanding the recurrent structure over
time steps. During unfolding, each step of the sequence is represented as a separate
layer in a series, illustrating how information flows across each time step. This unrolling
enables backpropagation through time (BPTT), a learning process where errors are
propagated across time steps to adjust the network’s weights, enhancing the RNN’s
ability to learn dependencies within sequential data.

RNN Unfolding
Variants of Recurrent Neural Networks (RNNs)
There are several variations of RNNs, each designed to address specific challenges or
optimize for certain tasks:
1. Vanilla RNN
This simplest form of RNN consists of a single hidden layer, where weights are shared
across time steps. Vanilla RNNs are suitable for learning short-term dependencies but
are limited by the vanishing gradient problem, which hampers long-sequence learning.
2. Bidirectional RNNs
Bidirectional RNNs process inputs in both forward and backward directions, capturing
both past and future context for each time step. This architecture is ideal for tasks
where the entire sequence is available, such as named entity recognition and question
answering.
3. Long Short-Term Memory Networks (LSTMs)
Long Short-Term Memory Networks (LSTMs) introduce a memory mechanism to
overcome the vanishing gradient problem. Each LSTM cell has three gates:
• Input Gate: Controls how much new information should be added to the cell
state.

25
• Forget Gate: Decides what past information should be discarded.
• Output Gate: Regulates what information should be output at the current step.
This selective memory enables LSTMs to handle long-term dependencies,
making them ideal for tasks where earlier context is critical.
4. Gated Recurrent Units (GRUs)
Gated Recurrent Units (GRUs) simplify LSTMs by combining the input and forget gates
into a single update gate and streamlining the output mechanism. This design is
computationally efficient, often performing similarly to LSTMs, and is useful in tasks
where simplicity and faster training are beneficial.
Recurrent Neural Network Architecture
RNNs share similarities in input and output structures with other deep learning
architectures but differ significantly in how information flows from input to output.
Unlike traditional deep neural networks, where each dense layer has distinct weight
matrices, RNNs use shared weights across time steps, allowing them to remember
information over sequences.
In RNNs, the hidden state HiHi is calculated for every input XiXi to retain sequential
dependencies. The computations follow these core formulas:
Hidden State Calculation:
h=σ(U⋅X+W⋅ht−1+B)h=σ(U⋅X+W⋅ht−1+B)
Here, hh represents the current hidden state, UU and WW are weight matrices,
and BB is the bias.
Output Calculation:
Y=O(V⋅h+C)Y=O(V⋅h+C)
The output YY is calculated by applying OO, an activation function, to the weighted
hidden state, where VV and CC represent weights and bias.
Overall Function:
Y=f(X,h,W,U,V,B,C)Y=f(X,h,W,U,V,B,C)
This function defines the entire RNN operation, where the state matrix SS holds each
element sisi representing the network’s state at each time step ii.
Key Parameters in RNNs:
• Weight Matrices: W,U,VW,U,V
• Bias Terms: B,CB,C
26
These parameters remain consistent across all time steps, enabling the network to
model sequential dependencies more efficiently, which is essential for tasks like
language processing, time-series forecasting, and more.

Recurrent Neural Architecture


How does RNN work?
In a RNN, each time step consists of units with a fixed activation function. Each unit
contains an internal hidden state, which acts as memory by retaining information from
previous time steps, thus allowing the network to store past knowledge. The hidden
state htht is updated at each time step to reflect new input, adapting the network’s
understanding of previous inputs.
Updating the Hidden State in RNNs
The current hidden state htht depends on the previous state ht−1ht−1 and the current
input xtxt, and is calculated using the following relations:
1. State Update:
ht=f(ht−1,xt)ht=f(ht−1,xt)
where:
• htht is the current state
• ht−1ht−1 is the previous state
• xtxt is the input at the current time step
2. Activation Function Application:
27
ht=tanh⁡(Whh⋅ht−1+Wxh⋅xt)ht=tanh(Whh⋅ht−1+Wxh⋅xt)
Here, WhhWhh is the weight matrix for the recurrent neuron, and WxhWxh is the weight
matrix for the input neuron.
3. Output Calculation:
yt=Why⋅htyt=Why⋅ht
where ytyt is the output and WhyWhy is the weight at the output layer.
These parameters are updated using backpropagation. However, since RNN works on
sequential data here we use an updated backpropagation which is known
as backpropagation through time.
Backpropagation Through Time (BPTT) in RNNs
In a Recurrent Neural Network (RNN), data flows sequentially, where each time step’s
output depends on the previous time step. This ordered data structure necessitates
applying backpropagation across all hidden states, or time steps, in sequence. This
unique approach is called Backpropagation Through Time (BPTT), essential for
updating network parameters that rely on temporal dependencies.

Backpropagation Through Time (BPTT) In RNN


BPTT Process in RNNs
The loss function L(θ)L(θ) is dependent on the final hidden state h3h3, but each hidden
state relies on the preceding states, forming a sequential chain:
• h3h3 depends on h2h2 and the weight matrix WW
• h2h2 depends on h1h1 and WW
• h1h1 depends on h0h0 and WW, where h0h0 is the initial, constant state
This dependency chain is managed by backpropagating the gradients across each state
in the sequence.
Calculating Gradients Through Time Steps
1. Simplified Gradient Calculation for One Row
For simplicity of this equation, we will apply backpropagation on only one row:
∂L(θ)∂W=∂L(θ)∂h3∂h3∂W∂W∂L(θ)=∂h3∂L(θ)∂W∂h3
We already know how to compute this one as it is the same as any simple deep neural
network backpropagation.
28
∂L(θ)∂h3∂h3∂L(θ)
However, we will see how to apply backpropagation to this term ∂h3∂W∂W∂h3
2. Handling Dependencies in Sequential Layers:
Since h3h3 is defined as:
h3=σ(W⋅h2+b)h3=σ(W⋅h2+b)
where σ\sigmaσ is the activation function, we need to calculate ∂h3∂W∂W∂h3,
considering its dependency on previous hidden states.
3. Gradient Calculation with Explicit and Implicit Parts:
The total derivative of h3h3 with respect to WWW is broken into:
• Explicit:∂h3+∂W∂W∂h3+, treating all other inputs as constants
• Implicit: Summing over all indirect paths from h3h3 to WW
Thus, we calculate:
∂h3∂W=∂h3+∂W+∂h3∂h2∂h2∂W=∂h3+∂W+∂h3∂h2[∂h2+∂W+∂h2∂h1∂h1∂W]=∂h3+∂W
+∂h3∂h2∂h2+∂W+∂h3∂h2∂h2∂h1[∂h1+∂W]∂W∂h3=∂W∂h3++∂h2∂h3∂W∂h2=∂W∂h3+
+∂h2∂h3[∂W∂h2++∂h1∂h2∂W∂h1]=∂W∂h3++∂h2∂h3∂W∂h2++∂h2∂h3∂h1∂h2
[∂W∂h1+]
4. Short-Circuiting Paths for Simplification:
For simplicity, we reduce the paths by short-circuiting, yielding:
∂h3∂W=∂h3+∂W+∂h3∂h2∂h2+∂W+∂h3∂h1∂h1+∂W∂W∂h3=∂W∂h3++∂h2∂h3∂W∂h2+
+∂h1∂h3∂W∂h1+

Artificial Neural Networks:


Artificial Neural Networks contain artificial neurons which are called units . These units
are arranged in a series of layers that together constitute the whole Artificial Neural
Network in a system. A layer can have only a dozen units or millions of units as this
depends on how the complex neural networks will be required to learn the hidden
patterns in the dataset. Commonly, Artificial Neural Network has an input layer, an
output layer as well as hidden layers. The input layer receives data from the outside
world which the neural network needs to analyze or learn about.
In the majority of neural networks, units are interconnected from one layer to another.
Each of these connections has weights that determine the influence of one unit on
another unit. As the data transfers from one unit to another, the neural network learns
29
more and more about the data which eventually results in an output from the output
layer.

Neural Networks Architecture


The structures and operations of human neurons serve as the basis for artificial neural
networks. It is also known as neural networks or neural nets. The input layer of an
artificial neural network is the first layer, and it receives input from external sources and
releases it to the hidden layer, which is the second layer. In the hidden layer, each
neuron receives input from the previous layer neurons, computes the weighted sum,
and sends it to the neurons in the next layer. These connections are weighted means
effects of the inputs from the previous layer are optimized more or less by assigning
different-different weights to each input and it is adjusted during the training process by
optimizing these weights for improved model performance.

Natural Language Processing (NLP):


NLP powers many applications that use language, such as text translation, voice
recognition, text summarization, and chatbots. You may have used some of these
applications yourself, such as voice-operated GPS systems, digital assistants, speech-
to-text software, and customer service bots. NLP also helps businesses improve their
efficiency, productivity, and performance by simplifying complex tasks that involve
language.

30
NLP Techniques
NLP encompasses a wide array of techniques that aimed at enabling computers to
process and understand human language. These tasks can be categorized into several
broad areas, each addressing different aspects of language processing. Here are some
of the key NLP techniques:
1. Text Processing and Preprocessing In NLP
• Tokenization: Dividing text into smaller units, such as words or sentences.
• Stemming and Lemmatization: Reducing words to their base or root forms.
• Stopword Removal: Removing common words (like “and”, “the”, “is”) that may
not carry significant meaning.
• Text Normalization: Standardizing text, including case normalization, removing
punctuation, and correcting spelling errors.
2. Syntax and Parsing In NLP
• Part-of-Speech (POS) Tagging: Assigning parts of speech to each word in a
sentence (e.g., noun, verb, adjective).
• Dependency Parsing: Analyzing the grammatical structure of a sentence to
identify relationships between words.
• Constituency Parsing: Breaking down a sentence into its constituent parts or
phrases (e.g., noun phrases, verb phrases).
3. Semantic Analysis
• Named Entity Recognition (NER): Identifying and classifying entities in text, such
as names of people, organizations, locations, dates, etc.
• Word Sense Disambiguation (WSD): Determining which meaning of a word is
used in a given context.
• Coreference Resolution: Identifying when different words refer to the same entity
in a text (e.g., “he” refers to “John”).
4. Information Extraction
• Entity Extraction: Identifying specific entities and their relationships within the
text.
• Relation Extraction: Identifying and categorizing the relationships between
entities in a text.
5. Text Classification in NLP
31
• Sentiment Analysis: Determining the sentiment or emotional tone expressed in a
text (e.g., positive, negative, neutral).
• Topic Modeling: Identifying topics or themes within a large collection of
documents.
• Spam Detection: Classifying text as spam or not spam.
6. Language Generation
• Machine Translation: Translating text from one language to another.
• Text Summarization: Producing a concise summary of a larger text.
• Text Generation: Automatically generating coherent and contextually relevant
text.
7. Speech Processing
• Speech Recognition: Converting spoken language into text.
• Text-to-Speech (TTS) Synthesis: Converting written text into spoken language.
Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) is a revolutionary Artificial Intelligence
methodology that combines reinforcement learning and deep neural networks. By
iteratively interacting with an environment and making choices that maximise
cumulative rewards, it enables agents to learn sophisticated strategies. Agents are able
to directly learn rules from sensory inputs thanks to DRL, which makes use of deep
learning’s ability to extract complex features from unstructured data.
Core Components of Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) building blocks include all the aspects that power
learning and empower agents to make wise judgements in their surroundings. Effective
learning frameworks are produced by the cooperative interactions of these elements.
The following are the essential elements:
• Agent: The decision-maker or learner who engages with the environment. The
agent acts in accordance with its policy and gains experience over time to
improve its ability to make decisions.
• Environment: The system outside of the agent that it communicates with. Based
on the actions the agent does, it gives the agent feedback in the form of
incentives or punishments.

32
• State: A depiction of the current circumstance or environmental state at a
certain moment. The agent chooses its activities and makes decisions based on
the state.
• Action: A choice the agent makes that causes a change in the state of the
system. The policy of the agent guides the selection of actions.Reward: A scalar
feedback signal from the environment that shows whether an agent’s behaviour
in a specific state is desirable. The agent is guided by rewards to learn positive
behaviour.\
• Value Function: This function calculates the anticipated cumulative reward an
agent can obtain from a specific state while adhering to a specific policy. It is
beneficial in assessing and contrasting states and policies.
• Model: A depiction of the dynamics of the environment that enables the agent to
simulate potential results of actions and states. Models are useful for planning
and forecasting.
These core components collectively form the foundation of Deep Reinforcement
Learning, empowering agents to learn strategies, make intelligent decisions, and adapt
to dynamic environments.

A convolution layer is a type of neural network layer that applies a


convolution operation to the input data. The convolution operation involves a filter (or
kernel) that slides over the input data, performing element-wise multiplications and
summing the results to produce a feature map. This process allows the network to
detect patterns such as edges, textures, and shapes in the input images.
Key Components of a Convolution Layer
33
1. Filters (Kernels): Filters are small, learnable matrices that extract specific
features from the input data. For example, a filter might detect horizontal edges,
while another might detect vertical edges. During training, the values of these
filters are adjusted to optimize the feature extraction process.
2. Stride: The stride determines how much the filter moves during the convolution
operation. A stride of 1 means the filter moves one pixel at a time, while a stride
of 2 means it moves two pixels at a time. Larger strides result in smaller output
feature maps and faster computations.
3. Padding: Padding involves adding extra pixels around the input data to control
the spatial dimensions of the output feature map. There are two common types
of padding: 'valid' padding, which adds no extra pixels, and 'same' padding, which
adds pixels to ensure the output feature map has the same dimensions as the
input.
4. Activation Function: After the convolution operation, an activation function,
typically the Rectified Linear Unit (ReLU), is applied to introduce non-linearity into
the model. This helps the network learn complex patterns and relationships in
the data.
Steps in a Convolution Layer
1. Initialize Filters:
• Randomly initialize a set of filters with learnable parameters.
2. Convolve Filters with Input:
• Slide the filters across the width and height of the input data, computing the
dot product between the filter and the input sub-region.
3. Apply Activation Function:
• Apply a non-linear activation function to the convolved output to introduce
non-linearity.
4. Pooling (Optional):
• Often followed by a pooling layer (like max pooling) to reduce the spatial
dimensions of the feature map and retain the most important information.

34

You might also like