Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Deep Learning Module-01

The document provides an overview of deep learning, a subset of machine learning that utilizes artificial neural networks to model complex patterns in data. It covers core components of deep learning models, popular architectures like CNNs and RNNs, and their applications across various industries. Additionally, it discusses the advantages, challenges, historical trends, and future directions of deep learning technology.

Uploaded by

sanjana sm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Deep Learning Module-01

The document provides an overview of deep learning, a subset of machine learning that utilizes artificial neural networks to model complex patterns in data. It covers core components of deep learning models, popular architectures like CNNs and RNNs, and their applications across various industries. Additionally, it discusses the advantages, challenges, historical trends, and future directions of deep learning technology.

Uploaded by

sanjana sm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

21CS743 | DEEP LEARNING

Module-01

Introduction to Deep Learning, Machine Learning Basics

Chapter-01: Introduction to Deep Learning

➢ Deep learning, a subset of machine learning, has revolutionized various fields by enabling
systems to learn and make decisions with minimal human intervention.
➢ At its core, deep learning leverages artificial neural networks with multiple layers (hence

ud
"deep") to model complex patterns in data.
➢ This introduction provides an overview of deep learning models, their architectures,
applications, and significance in today's technological landscape.

What is Deep Learning?

lo
❖ Deep learning involves training artificial neural networks computational models inspired
by the human brain to recognize patterns and make decisions based on vast amounts of
data.
❖ Unlike traditional machine learning, which may require feature engineering and manual
C
intervention, deep learning models automatically discover representations and features
from raw data, making them particularly effective for tasks like image and speech
recognition.
tu

Core Components of Deep Learning Models

1. Neural Networks: The foundational structure in deep learning, consisting of


interconnected layers of nodes (neurons). Each neuron processes input data, applies a
transformation, and passes the result to the next layer.
V

2. Layers:

o Input Layer: Receives the raw data.

o Hidden Layers: Intermediate layers where computations are performed. The


"deep" in deep learning refers to the presence of multiple hidden layers.

o Output Layer: Produces the final prediction or classification.

Page 1
21CS743 | DEEP LEARNING

3. Activation Functions: Non-linear functions (e.g., ReLU, Sigmoid, Tanh) applied to


neuron outputs to introduce non-linearity, enabling the network to learn complex patterns.

4. Loss Function: Measures the difference between the model's predictions and the actual
outcomes, guiding the optimization process.

5. Optimization Algorithms: Techniques (e.g., Stochastic Gradient Descent, Adam) used to


adjust the network's weights to minimize the loss function.

ud
Popular Deep Learning Architectures

1. Convolutional Neural Networks (CNNs):

o Purpose: Primarily used for image and video recognition.

o Key Features: Utilize convolutional layers to automatically and adaptively learn

o
lo
spatial hierarchies of features from input images.

Applications: Image classification, object detection, facial recognition.

2. Recurrent Neural Networks (RNNs):


C
o Purpose: Designed for sequential data processing.

o Key Features: Incorporate loops to maintain information across time steps, making
tu

them suitable for tasks where context is essential.

o Variants: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)
networks address issues like vanishing gradients.

o Applications: Language modeling, machine translation, speech recognition.


V

3. Transformer Models:

o Purpose: Handle sequential data without relying on recurrence.

o Key Features: Utilize self-attention mechanisms to weigh the importance of


different parts of the input data.

Page 2
21CS743 | DEEP LEARNING

o Applications: Natural language processing tasks like text generation, translation,


and understanding (e.g., GPT, BERT).

4. Generative Adversarial Networks (GANs):

o Purpose: Generate new data samples that resemble a given dataset.

o Key Features: Consist of two networks—a generator and a discriminator—that

ud
compete against each other, improving the quality of generated data over time.

o Applications: Image generation, style transfer, data augmentation.

5. Autoencoders:

o Purpose: Learn efficient data encodings in an unsupervised manner.

o
reconstructs it. lo
Key Features: Comprise an encoder that compresses the data and a decoder that

Applications: Dimensionality reduction, anomaly detection, denoising data.


C
Applications of Deep Learning

Deep learning models have a wide array of applications across various industries:

• Healthcare: Medical image analysis, drug discovery, personalized treatment plans.


tu

• Automotive: Autonomous driving, driver assistance systems.

• Finance: Fraud detection, algorithmic trading, risk management.

• Entertainment: Content recommendation, video game AI, music composition.


V

• Natural Language Processing: Chatbots, language translation, sentiment analysis.

• Robotics: Object manipulation, navigation, human-robot interaction.

Page 3
21CS743 | DEEP LEARNING

Advantages of Deep Learning

• Automatic Feature Extraction: Eliminates the need for manual feature engineering,
allowing models to learn directly from raw data.

• Scalability: Can handle large volumes of data and complex models with millions of
parameters.

ud
• Versatility: Applicable to diverse domains and tasks, from vision and speech to text and
beyond.

• Performance: Achieves state-of-the-art results in many benchmark tasks, often surpassing


human-level performance.

Challenges and Considerations


lo
Data Requirements: Deep learning models typically require vast amounts of labeled data,
which can be costly and time-consuming to obtain.

Computational Resources: Training deep models demands significant computational


C
power, often necessitating specialized hardware like GPUs.

• Interpretability: Deep networks are often considered "black boxes," making it difficult to
understand how decisions are made.
tu

• Overfitting: Models can become too tailored to training data, reducing their ability to
generalize to new, unseen data.

Future of Deep Learning

❖ As technology advances, deep learning continues to evolve with innovations in


V

architectures, optimization techniques, and applications.


❖ Areas like unsupervised and self-supervised learning aim to reduce reliance on labeled
data, while efforts in explainable AI seek to make models more transparent.
❖ Additionally, integrating deep learning with other AI fields, such as reinforcement learning
and symbolic reasoning, holds promise for creating more robust and versatile intelligent
systems.

Page 4
21CS743 | DEEP LEARNING

Historical Trends in Deep Learning

Deep learning, a branch of machine learning, has experienced tremendous growth and
transformation over the decades.

While its core principles date back to the mid-20th century, it has undergone several stages of
advancement due to technological innovations, better algorithms, and increased computational
power. Below is a timeline highlighting key historical trends in deep learning:

ud
1. Early Foundations (1940s–1960s)

The foundation for deep learning lies in early research on neural networks and the imitation of
human cognition in machines. Several key milestones shaped the beginnings of the field:

• 1943: McCulloch and Pitts: The concept of a neuron as a binary classifier was introduced


lo
by Warren McCulloch and Walter Pitts. They proposed a mathematical model of a neuron
that laid the groundwork for later neural network research.

1958: Perceptron by Frank Rosenblatt: The perceptron was a simple neural network
designed to perform binary classification tasks. It could learn by adjusting weights based
C
on input-output relationships, similar to modern deep learning models. However, its
limitations in handling non-linearly separable data, such as the XOR problem, restricted its
capabilities.
tu

• 1960s: Backpropagation Concept Introduced: Although it wasn't widely used until much
later, the concept of backpropagation—the algorithm for training multilayer neural
networks—was introduced by multiple researchers, including Bryson and Ho.

2. Dormant Period (1970s–1980s)


V

After initial interest, neural networks entered a period of decline, often called the "AI winter."
There was disappointment in the limitations of single-layer perceptrons, and other machine
learning methods, such as support vector machines (SVMs) and decision trees, gained traction.

• 1970s: The limitations of early neural networks, like the perceptron, led to reduced funding
and enthusiasm for the approach.

Page 5
21CS743 | DEEP LEARNING

• 1980s: Interest was revived through theoretical work, and some breakthroughs in deep
learning principles were laid during this period, though they wouldn’t be fully realized for
decades.

3. The Reawakening of Neural Networks (1980s–1990s)

• 1986: Backpropagation Popularized: The backpropagation algorithm, rediscovered and


popularized by Geoffrey Hinton, David Rumelhart, and Ronald J. Williams, enabled the

ud
training of multi-layer perceptrons, which overcame the limitations of single-layer models.
This development reignited interest in neural networks and laid the groundwork for future
deep learning models.

• 1989: Convolutional Neural Networks (CNNs) Introduced: Yann LeCun developed the
first CNN, LeNet, designed for image classification tasks. LeNet was able to recognize


lo
handwritten digits and was used by banks to process checks, marking one of the earliest
practical applications of deep learning.

1990s: Recurrent Neural Networks (RNNs): Researchers like Jürgen Schmidhuber and
Sepp Hochreiter developed Long Short-Term Memory (LSTM) networks in 1997, solving
C
the problem of vanishing gradients in standard RNNs and allowing neural networks to
better handle sequential data.

4. Emergence of Deep Learning (2000s)


tu

• 2006: Deep Belief Networks (DBNs): Geoffrey Hinton and his team proposed the idea of
using deep belief networks, a type of unsupervised deep neural network. This marked the
beginning of modern deep learning, where the goal was to train deeper neural networks
that could learn complex representations.
V

• 2007–2009: GPU Acceleration: The adoption of Graphics Processing Units (GPUs) for
deep learning computations drastically improved the ability to train deeper networks faster.
This technological breakthrough allowed for more practical training of neural networks
with multiple layers.

Page 6
21CS743 | DEEP LEARNING

5. Breakthrough Era (2010s)

The 2010s are often referred to as the "Golden Age" of deep learning. With the combination of
better hardware (especially GPUs), large datasets, and advanced algorithms, deep learning
achieved state-of-the-art performance across various domains.

• 2012: AlexNet and ImageNet Competition: A deep CNN called AlexNet, developed by
Alex Krizhevsky and Geoffrey Hinton, won the ImageNet Large Scale Visual Recognition

ud
Challenge by a large margin. This victory demonstrated the power of deep learning in
image recognition and spurred widespread interest in the field.

• 2014:

o Generative Adversarial Networks (GANs): Introduced by Ian Goodfellow,


GANs became one of the most revolutionary architectures in deep learning. GANs

o
lo
consist of two networks—a generator and a discriminator—that compete against
each other, enabling the creation of highly realistic synthetic data.

VGGNet and ResNet: VGGNet and ResNet were breakthroughs in CNN


C
architectures that allowed for deeper networks to be trained without performance
degradation. ResNet's introduction of skip connections solved the problem of
vanishing gradients for very deep networks.
tu

• 2017: Transformers and Attention Mechanisms:

o The introduction of the Transformer model by Vaswani et al. transformed the


field of natural language processing (NLP). The Transformer, which uses self-
attention mechanisms to process sequences in parallel, has since become the
V

foundation of cutting-edge NLP models, including BERT and GPT.

• 2018–2019: Transfer Learning and Pre-trained Models: Large pre-trained models like
BERT (from Google) and GPT-2 (from OpenAI) demonstrated the power of transfer
learning, where a model pre-trained on massive datasets can be fine-tuned for specific tasks
with smaller datasets, drastically reducing training time and improving performance.

Page 7
21CS743 | DEEP LEARNING

6. Modern Trends (2020s and Beyond)

The 2020s have seen deep learning evolve further, with a focus on more efficient models, ethical
AI practices, and novel applications.

• Transformer Dominance: The transformer architecture has become ubiquitous,


particularly in NLP. Models like GPT-3 (2020) and ChatGPT have demonstrated
unprecedented language generation abilities, paving the way for practical AI applications

ud
in content generation, summarization, and conversational AI.

• Deep Reinforcement Learning: Deep learning has been integrated with reinforcement
learning to create AI agents capable of mastering complex environments. Breakthroughs
like AlphaGo and AlphaZero (developed by DeepMind) demonstrate the potential of AI
in learning strategies through trial and error in dynamic environments.

• lo
Ethics and Interpretability: As deep learning models are increasingly deployed in real-
world applications, attention has shifted toward ensuring fairness, reducing biases, and
improving the interpretability of these "black box" models.
C
• Resource Efficiency: There has been a growing interest in optimizing deep learning
models to make them more resource-efficient, addressing concerns about the
environmental impact of training massive models. Techniques like pruning, quantization,
and distillation aim to reduce the computational and energy demands of deep learning
tu

models.
V

Page 8
21CS743 | DEEP LEARNING

Chapter-02: Machine Learning Basics

Machine learning allows computers to learn from data to improve their performance on certain
tasks. The main components of machine learning are the task (T), the performance measure (P),
and the experience (E). These three elements form the basis of any machine learning algorithm.

ud
1. The Task (T)

The task in machine learning is the problem that we want the system to solve. It could be
recognizing images, predicting numbers, translating languages, or even detecting fraud. The task
doesn’t include learning itself but refers to the goal or action we want the machine to perform.

Some common tasks include:

• lo
Classification: The algorithm assigns an input (like an image) into one of several
categories. For example, identifying whether an image is of a cat or a dog is a classification
task.
C
• Regression: The algorithm predicts a continuous value, like forecasting house prices or
stock market trends.

• Transcription: The algorithm converts unstructured data into a structured format, such as
tu

recognizing text in images (optical character recognition) or converting speech into text.

• Machine Translation: Translating text from one language to another, like English to
French.

• Anomaly Detection: Finding unusual patterns or behaviors, such as detecting fraud in


V

transactions.

• Structured Output: Tasks where the output involves multiple values that are connected,
such as generating captions for images.

• Synthesis and Sampling: The algorithm creates new data that is similar to the training
data, like generating realistic images or audio.

Page 9
21CS743 | DEEP LEARNING

• Imputation of Missing Values: Predicting missing data points based on the available
information.

• Denoising: Cleaning up corrupted data by predicting what the original data was before it
got corrupted.

• Density Estimation: Learning the probability distribution that explains how data points
are spread out in the dataset.

ud
2. The Performance Measure (P)

The performance measure tells us how well the machine learning algorithm is doing. It helps us
compare the system’s predictions with the actual results. Different tasks require different
performance measures.
lo
For example, in classification tasks, the performance measure might be accuracy, which tells us
how many predictions were correct. Alternatively, we can measure the error rate, which counts
how many predictions were wrong. In some cases, we may want a more detailed performance
C
measure, such as giving partial credit for partially correct answers.

For tasks that don’t involve predicting categories (like density estimation), accuracy isn’t useful,
so we use other performance measures, like log-probability.
tu

3. The Experience (E)

The experience refers to the data that the algorithm learns from. There are different types of
V

experiences:

• Supervised Learning: The system is trained using data that includes both input features
and their corresponding outputs or labels. For example, training a model with labeled
images of cats and dogs, so it learns to classify them.

Page 10
21CS743 | DEEP LEARNING

• Unsupervised Learning: The system is trained using data without labels. It tries to find
patterns or structure in the data, such as grouping similar data points together (clustering)
or estimating the data distribution (density estimation).

• Semi-Supervised Learning: Some examples in the training data have labels, but others
don’t. This is useful when getting labeled data is difficult or expensive.

• Reinforcement Learning: The system learns by interacting with an environment and

ud
receiving feedback based on its actions. This approach is used in robotics and game
playing, where the system gets rewards or penalties based on the decisions it makes.

Example: Linear Regression

lo
To make the concept clearer, we can look at an example of a machine learning task called linear
regression, which predicts a continuous value. In linear regression, the algorithm uses the input
data (represented as a vector) to predict a value by calculating a linear combination of the input
features.
C
For example, if you want to predict the price of a house based on its size and location, the algorithm
might use a linear function to estimate the price. The output is calculated by multiplying the input
features by their corresponding weights and summing them up.
tu

The weights are the parameters that the algorithm adjusts during training. The goal is to find the
weights that minimize the mean squared error (MSE), which measures how far off the
predictions are from the actual values.
V

Page 11
21CS743 | DEEP LEARNING

ud
Supervised Learning Algorithms

lo
Supervised learning algorithms learn to map inputs (x) to outputs (y) using a training set. These
outputs often require human intervention but can also be collected automatically.
C
1. Probabilistic Supervised Learning

Most supervised learning algorithms estimate the probability of output yyy given input xxx,
represented as p(y∣x)p(y | x)p(y∣x). This can be done using maximum likelihood estimation,
tu

which finds the best parameters θ\thetaθ for a distribution.

2. Logistic Regression


V

In linear regression, we predict continuous values using a normal distribution.

• For classification tasks (e.g., binary classification), we predict a class by squashing the
output into a probability between 0 and 1 using the logistic sigmoid function
σ(θTx)\sigma(θ^T x)σ(θTx).

• This technique is known as logistic regression. Despite its name, it is used for
classification, not regression.

Page 12
21CS743 | DEEP LEARNING

3. Finding Optimal Weights

• Linear regression allows us to compute optimal weights using a simple formula (normal
equations).

• Logistic regression does not have a closed-form solution. Instead, the optimal weights are
found by minimizing the negative log-likelihood (NLL) using gradient descent.

4. k-Nearest Neighbour’s (k-NN)

ud
• k-NN is a non-parametric algorithm used for classification or regression. It doesn’t have a
traditional training phase; instead, it stores all training data.

• At test time, it finds the k-nearest neighbors of a test point and predicts the output by
averaging their values.


over classes.
lo
For classification, it averages over one-hot encoded vectors to get a probability distribution

Strength: k-NN can handle large datasets well and achieve high accuracy with enough
C
training examples.

• Weakness: It struggles with small datasets and computational efficiency, especially with
irrelevant features, as it treats all features equally.
tu

5. Decision Trees

• Decision Trees divide the input space into regions based on decisions made at each node
V

of the tree. Internal nodes make binary decisions, and leaf nodes map each region to a
constant output.

• Strength: They are easy to understand and interpret.

• Weakness: Decision trees may struggle with problems where decision boundaries aren’t
axis-aligned, requiring many nodes to approximate simple boundaries.

Page 13
21CS743 | DEEP LEARNING

Unsupervised Learning Algorithms

Unsupervised learning algorithms deal with data that contains only features and no labeled targets.
They aim to extract meaningful patterns or structures from the data without human supervision,
and they are often used for tasks like clustering, density estimation, and learning data
representations.

ud
1. Goals of Unsupervised Learning

The main goal in unsupervised learning is often to find the best representation of the data. A
good representation preserves the most important information about the data while simplifying it
or making it easier to work with.

2. Types of Representations lo
There are three common types of data representations:

• Low-Dimensional Representations: Compress the data into fewer dimensions while


C
retaining as much information as possible.

• Sparse Representations: Map the data into a higher-dimensional space where most of the
tu

values are zero. This structure makes the representation more efficient and reduces
redundancy.

• Independent Representations: Try to separate the underlying sources of variation in the


data, making the features statistically independent.
V

3. Benefits of Good Representations

• Reducing the dimensionality of the data helps with compression and makes it easier to find
and use the key features.

• Sparse and independent representations make the data easier to interpret and process in
machine learning algorithms.

Page 14
21CS743 | DEEP LEARNING

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised learning algorithm used for


dimensionality reduction and data representation. It finds a lower-dimensional representation
of the data while preserving as much information as possible.

ud
1. Goals of PCA

PCA reduces the dimensionality of the data while ensuring that the new representation's features
are decorrelated (no linear correlations between the features). It is a step toward achieving
statistical independence of the features, though PCA only removes linear relationships.

2. How PCA Works


lo
Linear Transformation: PCA projects the data onto new axes that capture the directions
of maximum variance in the data.
C
• The algorithm learns an orthogonal transformation that projects input xxx to a new
representation z=xTWz = x^T Wz=xTW, where WWW is a matrix of principal
components (the directions of maximum variance).
tu

• The first principal component explains the most variance in the data, and each subsequent
component captures the remaining variance, while being orthogonal to the previous ones.

3. Covariance and Dimensionality Reduction


V

• PCA transforms the data such that the covariance matrix of the new representation is
diagonal, meaning the new features are uncorrelated.

• It uses eigenvectors of the data’s covariance matrix or singular value decomposition


(SVD) to find the directions of maximum variance.

• The result is a compact, decorrelated representation of the data that can be used for further
analysis while minimizing information loss.

Page 15
21CS743 | DEEP LEARNING

k-Means Clustering

k-Means clustering is a simple and widely used unsupervised learning algorithm. It divides a
dataset into k clusters, grouping examples that are close to each other in the feature space. Each
data point is assigned to the nearest cluster, and the algorithm iteratively refines these clusters.

ud
1. How k-Means Works

• The algorithm begins by initializing k centroids (cluster centers), which are assigned
random values.

• Assignment Step: Each data point is assigned to the nearest centroid, forming clusters.

• Update Step: Each centroid is recalculated as the mean of the points assigned to it.


lo
This process repeats until the centroids no longer change significantly, signaling
convergence.
C
2. One-Hot Representation

• k-means clustering provides a one-hot representation for each data point. If a point
tu

belongs to cluster iii, its representation vector hhh has a 1 at position iii and 0 everywhere
else.

• This is an example of a sparse representation because only one element in the vector is
non-zero for each point.
V

• However, this representation is limited because it treats clusters as mutually exclusive and
doesn’t capture relationships between different clusters.

Page 16
21CS743 | DEEP LEARNING

3. Limitations of k-Means

• Ill-posed Problem: There is no single, definitive way to evaluate how well the clustering
reflects real-world structures. For example, clustering based on vehicle color (red vs. gray)
is as valid as clustering based on type (car vs. truck), but each reveals different information.

• Lack of Fine-Grained Similarity: k-means provides a strict one-hot output, which doesn’t
capture nuanced similarities between examples. For instance, it can’t show that red cars are

ud
more similar to gray cars than gray trucks.

4. Comparison with Distributed Representations

• In contrast to one-hot encoding, a distributed representation captures multiple attributes


lo
for each data point. For example, vehicles could be described by both color and type (e.g.,
car or truck), allowing for more detailed comparisons.

Distributed representations are more flexible and can capture complex relationships
between data points, reducing the burden on the algorithm to find a single attribute for
C
clustering.
tu
V

Page 17

You might also like