deep learning
deep learning
deep learning
Submitted for partial fulfilment of the requirements for the award of degree of
Bachelor of Technology
In
1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
DECLARATION
This is to certify that the in course entitled “Deep Learning / NLP / Artificial
embodied in this have not been submitted to any other University for the same
purpose.
Date:
2
NRI INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CERTIFICATE
This certificate attests that the following report accurately represents the work completed by A.SANTHOSH ,
Registration Number- 22KP1A0504 ,during the academic year 2023-2024, covering the time period from December
3
ABSTRACT
Deep learning, a subset of artificial intelligence (AI), has revolutionized various fields by
enabling machines to learn from large amounts of data and improve their performance over
time without explicit programming. At its core, deep learning utilizes neural networks with
multiple layers, known as deep neural networks, to model complex patterns and
representations.
These models have achieved groundbreaking results in areas such as computer vision, natural
language processing, speech recognition, and autonomous systems. The availability of big
data, advancements in computational power, and innovative algorithms have accelerated
deep learning's impact across industries, from healthcare to finance and entertainment.
Despite its successes, challenges remain in terms of interpretability, ethical concerns, and the
need for vast computational resources. As deep learning continues to evolve, it holds the
potential to unlock further advancements in AI, leading to smarter, more autonomous systems
capable of tackling increasingly complex real-world problems.
Deep learning, a key subset of artificial intelligence (AI), has made significant strides in natural
language processing (NLP), revolutionizing how machines understand, generate, and interact
with human language. By leveraging deep neural networks, particularly architectures like
transformers, deep learning models can learn complex patterns from vast amounts of textual
data, enabling advancements in machine translation, sentiment analysis, question answering,
and text generation.
4
*
LETTER OF UNDERTAKING
To
The Principal
NRI Institute of Technology
Visadala,
Guntur.
Subject: Submission of Internship Report on Deep Learning/NLP/Artificial
Inteligence Internship on ExcelR Edtech platform.
Dear sir,
I am pleased to submit my internship report on “on Deep Learning/NLP/Artificial
Inteligence Internship” as per your instruction to fulfil the requirements of the Degree of
Bachelor of Technology in CSE from Jawaharlal Nehru Technological University, Kakinada.
While preparing this report, I have tried my level best to include all the relevant information,
explanations, things I learned from the Internship Courses, my contribution to this programme to
make the report informative and comprehensive. It would not have been possible to complete
this report without your assistance, of which I am very thankful. Working for Six Weeks on
Deep Learning/NLP/Artificial Inteligence Internship in online was amazing and a huge learning
opportunity for me. Also, it was a great experience to prepare this report and I will be available for
any clarification, if required.Therefore, I hope that you would be kind enough to accept my
Internship Report and oblige thereby.
Yours Obediently,
A.SANTHOSH
ID:22KP1A0504
EMAIL: santhoshappari05@gmail.com
5
6
ACKNOWLEDGEMENT
First and foremost, we express our deep gratitude to Mr. Rajendra Pradesh,
Chairman, NRI Institute of Technology for providing necessary facilities
throughout the Computer Science & Engineering program.
We would like to take this opportunity to express our thanks to the teaching
and non- teaching staff in the Department of Computer Science &
Engineering, NRIIT for their invaluable help and support.
A.SANTHOSH-22KP1A0504
7
Table of Contents:
DeepLearning/NLP/ArtificialInteligence
8
Fundamentals of DeepLearning
Artificial Intelligence
1. ➢ The topic of whether computers might be taught to "think" was first posed
in the 1950s by a small group of pioneers in the developing discipline of
computer science. The implications of this question are still being researched
today.
2. ➢ The endeavour to automate intellectual processes typically carried out by
humans would serve as a succinct explanation of the area. As a result, AI is a
broad area that comprises a variety of methods that include learning as well as
machine learning and deep learning. For instance, early chess programmes
used just hardcoded rules created by programmers and were not machine
learning applications.
3. ➢ For many years, experts thought that AI could be achieved by having
programmers write a lot of rules for computers to follow. ➢ This approach is
called symbolic AI, and it was the main way of doing AI from the 1950s to the
late 1980s.
4. ➢ Symbolic AI reached its peak popularity in the 1980s, when expert systems
were very popular. 1.1. Artificial intelligence, machine learning, and deep
learning Artificial intelligence (AI), machine learning (ML), and deep learning
(DL) are all terms that are often used interchangeably, but they actually have
different meanings.
5. • Artificial intelligence is a broad term that refers to the ability of machines to
perform tasks that are typically associated with human intelligence, such as
learning, reasoning, and problem-solving.
6. • Machine learning is a subset of AI that involves the development of
algorithms that can learn from data without being explicitly programmed.
Machine learning algorithms are trained on large datasets, and they can then
be used to make predictions or decisions about new data.
7. • Deep learning is a subset of machine learning that uses artificial neural
networks to learn from data. Neural networks are inspired by the human brain,
and they can be used to solve complex problems that would be difficult or
impossible to solve with traditional machine learning algorithms. In other
words, AI is the umbrella term, ML is a subset of AI, and DL is a subset of ML
9
Here are some examples of how AI, ML, and DL are being used today:
• AI is being used to develop self-driving cars, facial recognition software, and spam
filters.
• ML is being used to predict customer behaviour, optimize product
recommendations, and personalize marketing campaigns.
• DL is being used to develop natural language processing (NLP) models, image
recognition algorithms, and medical diagnosis tools.
AI, ML, and DL are all rapidly growing fields, and they are having a major impact on our
lives. As these technologies continue to develop, we can expect to see even more innovative
and groundbreaking applications in the years to come.
Machine Learning:
Machine learning is a type of artificial intelligence (AI) that allows software
applications to become more accurate in predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict
new output values. In other words, machine learning algorithms can learn from data and
improve their performance over time. This is in contrast to traditional programming, where
software applications are explicitly programmed to perform specific tasks.
10
Neural networks are a type of mathematical model that is inspired by the human
brain. They are made up of layers of interconnected nodes, and they can learn to represent
complex patterns in data. In deep learning, neural networks are used to learn successive
layers of increasingly meaningful representations. This is done by feeding the network a
large amount of data, and then adjusting the weights of the connections between the nodes
until the network is able to correctly classify or predict the data.
The term "neural network" is a reference to neurobiology, but neural networks are
not models of the brain. The brain is a complex organ, and we do not yet fully understand
how it works. Neural networks are much simpler models, and they are not designed to be a
complete representation of the brain.
11
12
There are many different types of loss functions. Some common loss functions include:
• Mean squared error (MSE): This is a loss function that measures the squared
difference between the predictions and the target.
• Cross-entropy: This is a loss function that is used for classification problems. It
measures the difference between the predicted probabilities and the true probabilities.
• Huber loss: This is a loss function that is less sensitive to outliers than MSE.
13
Four branches of machine learning
There are four main branches of machine learning:
1. Supervised Learning: In supervised learning, the algorithm is trained on a labelled
dataset, where each input data point is associated with the corresponding target
or output label. The goal of the algorithm is to learn a mapping from inputs to
outputs, enabling it to make predictions on new, unseen data. Common tasks in
supervised learning include classification (assigning labels to input data) and
regression (predicting numerical values).
2. Unsupervised Learning: Unsupervised learning involves training algorithms on an
unlabelled dataset, where the data does not have predefined output labels. The
goal of unsupervised learning is to discover patterns, structures, or relationships
within the data. Clustering, where the algorithm groups similar data points
together, and dimensionality reduction, which aims to simplify data while
preserving essential characteristics, are examples of unsupervised learning tasks.
3. Semi-Supervised Learning: Semi-supervised learning is a combination of
supervised and unsupervised learning. The algorithm is trained on a dataset that
contains both labelled and unlabelled data. The labelled data provides some
information for guidance, and the unlabelled data helps the algorithm learn more
about the underlying structure of the data, often leading to better performance
when labelled data is limited or expensive to obtain.
4. Reinforcement Learning: Reinforcement learning is different from the previous
three types as it involves an agent that interacts with an environment to achieve a
goal. The agent takes actions in the environment and receives feedback in the
form of rewards or penalties. The goal of the agent is to learn a policy or strategy
that maximizes the cumulative reward over time. Reinforcement learning is
commonly used in applications such as game playing, robotics, and autonomous
systems.
14
• Overfitting: A model that is trained to predict whether a patient has cancer
might learn to memorize the specific features of the training data that are
associated with cancer. This would allow the model to make accurate predictions
on the training data, but it would also cause the model to perform poorly on new
data that does not have the same features.
• Underfitting: A model that is trained to predict the price of a house might not
learn the relationship between the features of the house and its price. This would
cause the model to make inaccurate predictions on both the training data and
new data
Neural Network:
Neural networks extract identifying features from data, lacking pre-
programmed understanding. Network components include neurons, connections,
weights, biases, propagation functions, and a learning rule. Neurons receive inputs,
governed by thresholds and activation functions. Connections involve weights and
biases regulating information transfer. Learning, adjusting weights and biases, occurs
in three stages: input computation, output generation, and iterative refinement
enhancing the network’s proficiency in diverse tasks.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed as a result of this
simulation.
3. The neural network then responds in a new way to the environment because of
the changes in its free parameters.
15
Working of a Neural Network
Neural networks are complex systems that mimic some features of the functioning of
the human brain. It is composed of an input layer, one or more hidden layers, and an
output layer made up of layers of artificial neurons that are coupled. The two stages of
the basic process are called backpropagation and forward propagation.
16
Forward Propagation
• Input Layer: Each feature in the input layer is represented by a node on the
network, which receives input data.
• Weights and Connections: The weight of each neuronal connection indicates
how strong the connection is. Throughout training, these weights are changed.
• Hidden Layers: Each hidden layer neuron processes inputs by multiplying them
by weights, adding them up, and then passing them through an activation
function. By doing this, non-linearity is introduced, enabling the network to
recognize intricate patterns.
• Output: The final result is produced by repeating the process until the output
layer is reached.
Backpropagation
• Loss Calculation: The network’s output is evaluated against the real goal values,
and a loss function is used to compute the difference. For a regression problem,
the Mean Squared Error (MSE) is commonly used as the cost function.
Loss Function:
• Gradient Descent: Gradient descent is then used by the network to reduce the
loss. To lower the inaccuracy, weights are changed based on the derivative of the
loss with respect to each weight.
• Adjusting weights: The weights are adjusted at each connection by applying this
iterative process, or backpropagation, backward across the network.
• Training: During training with different data samples, the entire process of
forward propagation, loss calculation, and backpropagation is done iteratively,
enabling the network to adapt and learn patterns from the data.
• Actvation Functions: Model non-linearity is introduced by activation functions
like the rectified linear unit (ReLU) or sigmoid. Their decision on whether to “fire”
a neuron is based on the whole weighted input.
Introduction to Keras
Keras is a deep-learning framework for Python that provides a convenient way to
define and train almost any kind of deep-learning model. Keras was initially
developed for researchers, with the aim of enabling fast experimentation.
17
Keras has the following key features:
❖ It allows the same code to run seamlessly on CPU or GPU.
❖ It has a user-friendly API that makes it easy to quickly prototype deep-
learning models.
❖ It has built-in support for convolutional networks (for computer vision),
recurrent networks (for sequence processing), and any combination of both.
❖ It supports arbitrary network architectures: multi-input or multi-output
models, layer sharing, model sharing, and so on.
This means Keras is appropriate for building essentially any deep learning
model, from a generative adversarial network to a neural Turing machine. Keras is
distributed under the permissive MIT license, which means it can be freely used in
commercial projects. It’s compatible with any version of Python from 2.7 to 3.6
(as of mid-2017).
Keras has well over 200,000 users, ranging from academic researchers and
engineers at both startups and large companies to graduate students and
hobbyists. Keras is used at Google, Netflix, Uber, CERN, Yelp, Square, and
hundreds of startups working on a wide range of problems. Keras is also a popular
framework on Kaggle, the machine-learning competition website, where almost
every recent deep-learning competition has been won using Keras models.
18
Through adversarial training, these models engage in a competitive interplay until the
generator becomes adept at creating realistic samples, fooling the discriminator
approximately half the time.
Generative Adversarial Networks (GANs) can be broken down into three parts:
• Generative: To learn a generative model, which describes how data is generated
in terms of a probabilistic model.
• Adversarial: The word adversarial refers to setting one thing up against another.
This means that, in the context of GANs, the generative result is compared with
the actual images in the data set. A mechanism known as a discriminator is used
to apply a model that attempts to distinguish between real and fake images.
• Networks: Use deep neural networks as artificial intelligence (AI) algorithms for
training purposes.
Architecture of GANs
A Generative Adversarial Network (GAN) is composed of two primary parts,
which are the Generator and the Discriminator.
Generator Model
A key element responsible for creating fresh, accurate data in a Generative
Adversarial Network (GAN) is the generator model. The generator takes random noise
as input and converts it into complex data samples, such text or images. It is commonly
depicted as a deep neural network.
The generator’s ability to generate high-quality, varied samples that can fool the
discriminator is what makes it successful.
Discriminator Model
An artificial neural network called a discriminator model is used in Generative
Adversarial Networks (GANs) to differentiate between generated and actual input. By
evaluating input samples and allocating probability of authenticity, the discriminator
functions as a binary classifier.
Convolutional layers or pertinent structures for other modalities are usually used in its
architecture when dealing with picture data. Maximizing the discriminator’s capacity to
accurately identify generated samples as fraudulent and real samples as authentic is
the aim of the adversarial training procedure. The discriminator grows increasingly
discriminating as a result of the generator and discriminator’s interaction, which helps
the GAN produce extremely realistic-looking synthetic data overall.
MinMax Loss
19
In a Generative Adversarial Network (GAN), the minimax loss formula is provided by:
minGmaxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]minGmaxD
(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]
Where,
• G is generator network and is D is the discriminator network
• Actual data samples obtained from the true data distribution pdata(x)pdata(x) are
represented by x.
• Random noise sampled from a previous distribution pz(z)pz(z)(usually a normal
or uniform distribution) is represented by z.
• D(x) represents the discriminator’s likelihood of correctly identifying actual data
as real.
• D(G(z)) is the likelihood that the discriminator will identify generated data coming
from the generator as authentic.
21
networks (ANN) which is predominantly used to extract the feature from the grid-
like matrix
dataset. For example visual datasets like images or videos where data patterns
play an
extensive role.
CNN Architecture
Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.
22
Now imagine taking a small patch of this image and running a small neural network,
called a filter or kernel on it, with say, K outputs and representing them vertically. Now
slide that neural network across the whole image, as a result, we will get another image
with different widths, heights, and depths. Instead of just R, G, and B channels now we
have more channels but lesser width and height. This operation is called Convolution.
If the patch size is the same as that of the image it will be a regular neural network.
Because of this small patch, we have fewer weights.
23
Recurrent Neural Network
In simple terms, RNNs apply the same network to each element in a sequence, RNNs
preserve and pass on relevant information, enabling them to learn temporal
dependencies that conventional neural networks cannot.
Key Components of RNNs
1. Recurrent Neurons
The fundamental processing unit in a Recurrent Neural Network (RNN) is a Recurrent
Unit, which is not explicitly called a “Recurrent Neuron.” Recurrent units hold a hidden
state that maintains information about previous inputs in a sequence. Recurrent
units can “remember” information from prior steps by feeding back their hidden state,
allowing them to capture dependencies across time.
Recurrent Neuron
2. RNN Unfolding
24
RNN unfolding, or “unrolling,” is the process of expanding the recurrent structure over
time steps. During unfolding, each step of the sequence is represented as a separate
layer in a series, illustrating how information flows across each time step. This unrolling
enables backpropagation through time (BPTT), a learning process where errors are
propagated across time steps to adjust the network’s weights, enhancing the RNN’s
ability to learn dependencies within sequential data.
RNN Unfolding
Variants of Recurrent Neural Networks (RNNs)
There are several variations of RNNs, each designed to address specific challenges or
optimize for certain tasks:
1. Vanilla RNN
This simplest form of RNN consists of a single hidden layer, where weights are shared
across time steps. Vanilla RNNs are suitable for learning short-term dependencies but
are limited by the vanishing gradient problem, which hampers long-sequence learning.
2. Bidirectional RNNs
Bidirectional RNNs process inputs in both forward and backward directions, capturing
both past and future context for each time step. This architecture is ideal for tasks
where the entire sequence is available, such as named entity recognition and question
answering.
3. Long Short-Term Memory Networks (LSTMs)
Long Short-Term Memory Networks (LSTMs) introduce a memory mechanism to
overcome the vanishing gradient problem. Each LSTM cell has three gates:
• Input Gate: Controls how much new information should be added to the cell
state.
25
• Forget Gate: Decides what past information should be discarded.
• Output Gate: Regulates what information should be output at the current step.
This selective memory enables LSTMs to handle long-term dependencies,
making them ideal for tasks where earlier context is critical.
4. Gated Recurrent Units (GRUs)
Gated Recurrent Units (GRUs) simplify LSTMs by combining the input and forget gates
into a single update gate and streamlining the output mechanism. This design is
computationally efficient, often performing similarly to LSTMs, and is useful in tasks
where simplicity and faster training are beneficial.
Recurrent Neural Network Architecture
RNNs share similarities in input and output structures with other deep learning
architectures but differ significantly in how information flows from input to output.
Unlike traditional deep neural networks, where each dense layer has distinct weight
matrices, RNNs use shared weights across time steps, allowing them to remember
information over sequences.
In RNNs, the hidden state HiHi is calculated for every input XiXi to retain sequential
dependencies. The computations follow these core formulas:
Hidden State Calculation:
h=σ(U⋅X+W⋅ht−1+B)h=σ(U⋅X+W⋅ht−1+B)
Here, hh represents the current hidden state, UU and WW are weight matrices,
and BB is the bias.
Output Calculation:
Y=O(V⋅h+C)Y=O(V⋅h+C)
The output YY is calculated by applying OO, an activation function, to the weighted
hidden state, where VV and CC represent weights and bias.
Overall Function:
Y=f(X,h,W,U,V,B,C)Y=f(X,h,W,U,V,B,C)
This function defines the entire RNN operation, where the state matrix SS holds each
element sisi representing the network’s state at each time step ii.
Key Parameters in RNNs:
• Weight Matrices: W,U,VW,U,V
• Bias Terms: B,CB,C
26
These parameters remain consistent across all time steps, enabling the network to
model sequential dependencies more efficiently, which is essential for tasks like
language processing, time-series forecasting, and more.
30
NLP Techniques
NLP encompasses a wide array of techniques that aimed at enabling computers to
process and understand human language. These tasks can be categorized into several
broad areas, each addressing different aspects of language processing. Here are some
of the key NLP techniques:
1. Text Processing and Preprocessing In NLP
• Tokenization: Dividing text into smaller units, such as words or sentences.
• Stemming and Lemmatization: Reducing words to their base or root forms.
• Stopword Removal: Removing common words (like “and”, “the”, “is”) that may
not carry significant meaning.
• Text Normalization: Standardizing text, including case normalization, removing
punctuation, and correcting spelling errors.
2. Syntax and Parsing In NLP
• Part-of-Speech (POS) Tagging: Assigning parts of speech to each word in a
sentence (e.g., noun, verb, adjective).
• Dependency Parsing: Analyzing the grammatical structure of a sentence to
identify relationships between words.
• Constituency Parsing: Breaking down a sentence into its constituent parts or
phrases (e.g., noun phrases, verb phrases).
3. Semantic Analysis
• Named Entity Recognition (NER): Identifying and classifying entities in text, such
as names of people, organizations, locations, dates, etc.
• Word Sense Disambiguation (WSD): Determining which meaning of a word is
used in a given context.
• Coreference Resolution: Identifying when different words refer to the same entity
in a text (e.g., “he” refers to “John”).
4. Information Extraction
• Entity Extraction: Identifying specific entities and their relationships within the
text.
• Relation Extraction: Identifying and categorizing the relationships between
entities in a text.
5. Text Classification in NLP
31
• Sentiment Analysis: Determining the sentiment or emotional tone expressed in a
text (e.g., positive, negative, neutral).
• Topic Modeling: Identifying topics or themes within a large collection of
documents.
• Spam Detection: Classifying text as spam or not spam.
6. Language Generation
• Machine Translation: Translating text from one language to another.
• Text Summarization: Producing a concise summary of a larger text.
• Text Generation: Automatically generating coherent and contextually relevant
text.
7. Speech Processing
• Speech Recognition: Converting spoken language into text.
• Text-to-Speech (TTS) Synthesis: Converting written text into spoken language.
Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) is a revolutionary Artificial Intelligence
methodology that combines reinforcement learning and deep neural networks. By
iteratively interacting with an environment and making choices that maximise
cumulative rewards, it enables agents to learn sophisticated strategies. Agents are able
to directly learn rules from sensory inputs thanks to DRL, which makes use of deep
learning’s ability to extract complex features from unstructured data.
Core Components of Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) building blocks include all the aspects that power
learning and empower agents to make wise judgements in their surroundings. Effective
learning frameworks are produced by the cooperative interactions of these elements.
The following are the essential elements:
• Agent: The decision-maker or learner who engages with the environment. The
agent acts in accordance with its policy and gains experience over time to
improve its ability to make decisions.
• Environment: The system outside of the agent that it communicates with. Based
on the actions the agent does, it gives the agent feedback in the form of
incentives or punishments.
32
• State: A depiction of the current circumstance or environmental state at a
certain moment. The agent chooses its activities and makes decisions based on
the state.
• Action: A choice the agent makes that causes a change in the state of the
system. The policy of the agent guides the selection of actions.Reward: A scalar
feedback signal from the environment that shows whether an agent’s behaviour
in a specific state is desirable. The agent is guided by rewards to learn positive
behaviour.\
• Value Function: This function calculates the anticipated cumulative reward an
agent can obtain from a specific state while adhering to a specific policy. It is
beneficial in assessing and contrasting states and policies.
• Model: A depiction of the dynamics of the environment that enables the agent to
simulate potential results of actions and states. Models are useful for planning
and forecasting.
These core components collectively form the foundation of Deep Reinforcement
Learning, empowering agents to learn strategies, make intelligent decisions, and adapt
to dynamic environments.
34