DL Unit 4

COURSE MATERIAL
SUBJECT DEEP LEARNING
UNIT 4
COURSE B.TECH
COMPUTER SCIENCE & ENGINEERING

DEPARTMENT (AI&ML)
SEMESTER 5
Version V-1
PREPARED / REVISED DATE 23.08.2023
1|D L - U N I T - I V
BTECH_CSE-SEM 31
TABLE OF CONTENTS – UNIT
3
S. CONTENTS PAGE NO.
NO
1 COURSE OBJECTIVES 1
2 PREREQUISITES 1
3 SYLLABUS 1
4 COURSE OUTCOMES 1
5 CO - PO/PSO MAPPING 1
6 LESSON PLAN 2
7 ACTIVITY BASED LEARNING 2
8 LECTURE NOTES 2
4.1 INTRODUCTION TO CONVOLUTIONAL NETWORKS 5
4.2 CONVOLUTIONAL OPERATION 5
4.3 POOLING 11
4.4 CONVOLUTION 13
4.5 BASIC CONVOLUTION FUNCTIONS 15
4.6 STRUCTURED OUTPUTS 19
4.7 DATA TYPES 21
4.8 EFFICIENT CONVOLUTION ALGORITHM’S 23
4.9 RANDOM OR UNSUPERVISED FEATURES 23
4.10 BASIS FOR CONVOLUTIONAL NETWORKS 24
2|D L - U N I T - I V
BTECH_CSE-SEM 31
1. Course Objectives
The objectives of this course is to
1. To demonstrate the major technology trends driving Deep Learning.
2. To build, train and apply fully connected neural networks.
3. To implement efficient neural networks.
4. To analyze the key parameters and hyper perameters in neural network’s
architecture.
5. To apply concepts of Deep Learning to solve real word problems.
2. Prerequisites
This course is intended for senior undergraduate and junior graduate students who
have a proper understanding of
 Python Programming Language
 Calculus
 Linear Algebra
 Probability Theory
Although it would be helpful, knowledge about classical machine learning is NOT
required.
3. Syllabus
UNIT 4
Introduction to CONVOLUTIONAL NETWORK:The convolution operation, Pooling,
Convolution, Basic convolution functions, Structured outputs, Data types, Efficient
convolution algorithms, Random or unsupervised features, Basis for convolutional
network.
4. Course outcomes
1. Demonstrate the mathematical foundation of neural network.
2. Describe the machine learning basics.
3. Differentiate architecture ofdeep neural network.
4. Build the convolution neural network.
5. Build and Train RNN and LSTMs.
5. Co-PO / PSO Mapping

Machine
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 P10 PO11 PO12 PSO1 PSO2
Tools
1|D L - U N I T - I V
BTECH_CSE-SEM 31
CO1 3 2
CO2 3 2
CO3 3 3 2 2 3 2 2
CO4 3 3 2 2 3 2 2
CO5
6. Lesson Plan
Lecture No. Weeks Topics to be covered References
1 Introduction to Convolutional networks: T1, R1
2 The convolution operation , Pooling T1, R1

1
3 Convolution T1, R1
4 Basic convolution functions T1, R1
5 Structured outputs T1, R1
6 Data types T1, R1

2
7 Efficient convolution algorithm’s T1, R1
8 Random or unsupervised features T1, R1

3 Basis for convolutional networks
11 T1, R1
Basis for convolutional networks
12 T1, R1
7.Activity Based Learning
1. DL course is associated with laboratory, different open-ended problem statements

are given for each student to carry out the experiments using google colabtool. The
foundations of Deep Learning, understand how to build neural networks, and learn
how to lead successful machine learning projects. You will learn about Convolutional
networks, RNNs, LSTM,etc.
2. You will work on case studies from healthcare, autonomous driving, sign language
reading, music generation, and natural language processing. You will master not only
the theory, but also see how it is applied in industry.
8. Lecture Notes
4.1 INTRODUCTION TO CONVOLUTIONAL NETWORK

2|D L - U N I T - I V
BTECH_CSE-SEM 31
Introduction:
A Convolutional neural network (CNN) is a neural network that has one or more
convolutional layers and are used mainly for image processing, classification,
segmentation and also for other auto correlated data.
A convolution is essentially sliding a filter over the input. One helpful way to think
about convolutions is this quote from Dr Prasad Samarakoon: “A convolution can be
thought as “looking at a function’s surroundings to make better/accurate predictions
of its outcome.”
Rather than looking at an entire image at once to find certain features it can be more
effective to look at smaller portions of the image.
Common uses for CNNs
The most common use for CNNs is image classification, for example identifying
satellite images that contain roads or classifying hand written letters and digits. There
are other quite mainstream tasks such as image segmentation and signal processing,
for which CNNs perform well at.
CNNs have been used for understanding in Natural Language Processing (NLP) and
speech recognition, although often for NLP Recurrent Neural Nets (RNNs) are used.
A CNN can also be implemented as a U-Net architecture, which are essentially two
almost mirrored CNNs resulting in a CNN whose architecture can be presented in a U
shape. U-nets are used where the output needs to be of similar size to the input such
as segmentation and image improvement.
4.2. CONVOLUTIONALOPERATION:
The convolution operates on the input with a kernel (weights) to produce
an output map given by:
Continuous domain convolution

Let us break down the formula.
The steps involved are:
 Express each function in terms of a dummy variable τ
 Reflect the function g i.e. g(τ) → g(-τ)
3|D L - U N I T - I V
BTECH_CSE-SEM 31
 Add a time offset i.e. g(τ) → g(t-τ). Adding the offset shifts the input to
the right by t units (by convention, a negative offset shits it to the left)
 Multiply f and g point-wise and accumulate the results to get output at instant
t. Basically, we are calculating the area of overlap between f and shifted g
4|D L - U N I T - I V
BTECH_CSE-SEM 31
For our application, we are interested in the discrete domain formulation:
1-D discrete convolution
2-D discrete convolution

When the kernel is not flipped in its domain, we obtain the cross-
correlation operation. The basic difference between the two operations is that
convolution is commutative in nature, i.e. f and g can be interchanged without
changing the output. Cross-correlation is not commutative. This difference is
highlighted in the image below:
Although these equations imply that the domains for both f and g are infinite, in
practice, these two variables are non-zero only in a finite region. As a result, the
5|D L - U N I T - I V
BTECH_CSE-SEM 31
output is non-zero only in a finite region (where the non-zero regions
of f and g overlap).
The intuition for convolution in 1-D can be extended to n-dimensions by nesting the
convolution operations. Vincent Dumoulin and Francesco Visin provide an in depth
analysis of how input and output shapes and computations are tied. Below is their
visualization of a 2-D convolution operation:
Fig: 2D convolution (source)
The 1D convolution operation can be represented as a matrix vector product. The

kernel marix is obtained by composing weights into a Toeplitz matrix.
A Toeplitz matrix has the property that values along all diagonals are constant.
The general structure of a Toeplitz matrix
6|D L - U N I T - I V
BTECH_CSE-SEM 31
Using the Toeplitz matrix of the kernel for matrix-vector implementation of convolution
To extend this principle to 2D input, we first need to unroll the 2D input into a 1D
vector. Once this is done, the kernel needs to be modified as before but this time
resulting in a block-circulant matrix. What’s that?
Fig: A general circulant matrix
A circulant matrix is a special case of a Toeplitz matrix where each row is a circular
shift of the previous row. To see that it is a special case of the Toeplitz matrix is trivial.
Fig: A block circulant matrix (each Hi is a matrix)

A matrix which is circulant with respect to its sub-matrices is called a block circulant
matrix. If each of the submatrices is itself circulant, the matrix is called doubly block-
circulant matrix.
Now, given a 2D kernel, we can create the block-circulant matrix that will act allow
matrix-vector implementation of convolution as below:
7|D L - U N I T - I V
BTECH_CSE-SEM 31
Convince yourself that the resultant of convolving a 3x3 kernel on a 4x4 input (16x1
unrolled vector) results in a 2x2 output (4x1 vector) [refer to gif above] and hence
the required kernel matrix must be of shape 4x16
4.3. POOLING:
Pooling is nothing other than down sampling of an image. The most common pooling
layer filter is of size 2x2, which discards three forth of the activations. Role of pooling
layer is to reduce the resolution of the feature map but retaining features of the map
required for classification through translational and rotational invariants. In addition to
spatial invariance robustness, pooling will reduce the computation cost by a great
deal.
Backpropagation is used for training of pooling operation It again helps the processor
to process things faster.
There are many pooling techniques. They are as follows
i) Max pooling where we take largest of the pixel values of a segment.
ii) Mean pooling where we take largest of the pixel values of a segment.
8|D L - U N I T - I V
BTECH_CSE-SEM 31
iii) Avg pooling where we take largest of the pixel values of a segment.
As cross validation is expensive for big network, remedy of over-fitting in a modern

neural network is considered through two roots:
 Reducing the number of the parameter by representing the model more
effectively.
 Regularization So dominant architecture in recent times for image classification
is convolution neural network, where number of parameter is reduced
effectively through convolution technique in initial layers and fully connected
layers at the very end of the network.
Usually, regularization is performed through data augmentation, dropout or batch
normalization. Most of these regularization techniques have difficulties to implement
in convolutional layers. So, alternatively, such responsibility can be carried over by
pooling layers in convolutional neural network.
9|D L - U N I T - I V
BTECH_CSE-SEM 31
There are three variants of pooling operation depending on roots of regularization
technique:
Stochastic pooling:
Randomly picked activation within each pooling region is considered than
deterministic pooling operations for regularization of the network. Stochastic pooling
performs reduction of feature size but denies role for selecting features judiciously for
the sake of regularization. Although clipping of negative output from ReLU activation
helps to carry some of the selection responsibility.
Overlapping pooling:
Overlapping pooling operation shares responsibility of local connection beyond the
size of previous convolutional filter, which breaks orthogonal responsibility between
pooling layer and convolutional layer. So, no information is gained if pooling windows
overlap
Fractional pooling:
Reduction ratio of filter size due to pooling can be controlled by a fractional pooling
concept, which helps to increase the depth of the network. Unlike stochastic pooling,
the randomness is related to the choice of pooling regions, not the way pooling is
performed inside each of the pooling regions.
There are other variants of pooling as follows:
- Min pooling
- wavelet pooling
- tree pooling
- max-avg pooling
- spatial pyramid pooling
Pooling makes the network invariant to translations in shape, size and scale. Max
pooling is generally predominantly used in objection recognition.
4.4. CONVOLUTION:
Convolution is an orderly procedure where two sources of information are

intertwined; it’s an operation that changes a function into something else.
Convolutions have been used for a long time typically in image processing to blur and
sharpen images, but also to perform other operations. (e.g. enhance edges and
emboss) CNNs enforce a local connectivity pattern between neurons of
10|D L - U N I T - I V
BTECH_CSE-SEM 31
adjacent layers.
.
CNNs make use of filters (also known as kernels), to detect what features, such as
edges, are present throughout an image.
There are four main operations in a CNN:
 Convolution
 Non Linearity (ReLU)
 Pooling or Sub Sampling
 Classification (Fully Connected Layer)
The first layer of a Convolutional Neural Network is always a Convolutional
Layer. Convolutional layers apply a convolution operation to the input, passing the
result to the next layer. A convolution converts all the pixels in its receptive field into
a single value.
For example, if you would apply a convolution to an image, you will be decreasing
the image size as well as bringing all the information in the field together into a
single pixel. The final output of the convolutional layer is a vector. Based on the type
of problem we need to solve and on the kind of features we are looking to learn, we
can use different kinds of convolutions.
The 2D Convolution Layer

The most common type of convolution that is used is the 2D convolution layer and is
usually abbreviated as conv2D. A filter or a kernel in a conv2D layer “slides” over the
2D input data, performing an elementwise multiplication. As a result, it will be summing
up the results into a single output pixel. The kernel will perform the same operation for
every location it slides over, transforming a 2D matrix of features into a different 2D
matrix of features.
The Dilated or Atrous Convolution
11|D L - U N I T - I V
BTECH_CSE-SEM 31
This operation expands window size without increasing the number of weights by
inserting zero-values into convolution kernels. Dilated or Atrous Convolutions can be
used in real time applications and in applications where the processing power is less
as the RAM requirements are less intensive.
Separable Convolutions
There are two main types of separable convolutions: spatial separable convolutions,
and depthwise separable convolutions.
The spatial separable convolution deals primarily with the spatial dimensions of an
image and kernel: the width and the height. Compared to spatial separable
convolutions, depthwise separable convolutions work with kernels that cannot be
“factored” into two smaller kernels. As a result, it is more frequently used.
Transposed Convolutions
These types of convolutions are also known as deconvolutions or fractionally strided
convolutions. A transposed convolutional layer carries out a regular convolution but
reverts its spatial transformation.
4.5. BASIC CONVOLUTION FUNCTIONS :
In practical implementations of the convolution operation, certain modifications are

made which deviate from the discrete convolution formula mentioned above:
In general a convolution layer consists of application of several different kernels to the
input. This allows the extraction of several different features at all locations in the input.
This means that in each layer, a single kernel (filter) isn’t applied. Multiple kernels
(filters), usually a power of 2, are used as different feature detectors.
The input is generally not real-valued but instead vector valued (e.g. RGB values at
each pixel or the feature values computed by the previous layer at each pixel
position). Multi-channel convolutions are commutative only if number of output and
input channels is the same.
In order to allow for calculation of features at a coarser level strided convolutions
can be used. The effect of strided convolution is the same as that of a
convolution
12|D L - U N I T - I V
BTECH_CSE-SEM 31
followed by a downsampling stage. This can be used to reduce the representation
size.
Fig: 2D convolution 3x3 kernel and stride of 2 units (source)
Zero padding helps to make output dimensions and kernel size independent.
3 common zero padding strategies are:
i) valid: The output is computed only at places where the entire kernel lies inside the
input. Essentially, no zero padding is performed. For a kernel of size k in any dimension,
the input shape of m in the direction will become m-k+1 in the output. This shrinkage
restricts architecture depth.
ii) same: The input is zero padded such that the spatial size of the input and output
is same. Essentially, for a dimension where kernle size is k, the input is padded by
k- 1 zeros in that dimension. Since the number of output units connected to border
pixels is less than that for centre pixels, it may under-represent border pixels.
iii) full: The input is padded by enough zeros such that each input pixel is
connected to the same number of
output units. In terms of test set accuracy, the optimal padding is
somewhere between same and valid.
13|D L - U N I T - I V
BTECH_CSE-SEM 31
valid(left), same(middle) and full(right) padding (source). The extreme left one is for
stride=2.
Besides locally-connected layers and tiled convolution, another extension can be to
restrict the kernels to operate on certain input channels. One way to implement this
is to connect the first m input channels to the first n output channels, the next m
14|D L - U N I T - I V
BTECH_CSE-SEM 31
input
15|D L - U N I T - I V
BTECH_CSE-SEM 31
channels to the next n output channels and so on. This method decreases the
number of parameters in the model without dereasing the number of output units.
When max pooling operation is applied to locally connected layer or tiled

convolution, the model has the ability to become transformation invariant because
adjacent filters have the freedom to learn a transformed version of the same
feature. This essentially similar to the property leveraged by pooling over channels
rather than spatially.
Bias terms can be used in different ways in the convolution stage. For locally
connected layer and tiled convolution, we can use a bias per output unit and kernel
respectively. In case of traditional convolution, a single bias term per output channel
is used. If the input size is fixed, a bias per output unit may be used to counter the
effect of regional image statistics and smaller activations at the boundary due to zero
padding.
4.6.STRUCTURED OUTPUTS:
Convolutional networks can be trained to output high-dimensional structured output

rather than just a classification score. A good example is the task of image
segmentation where each pixel needs to be associated with an object class. Here the
output is the same size (spatially) as the input. The model outputs a
tensor S where S[i,j,k] is the probability that pixel (j,k) belongs to class i.
To produce an output map as the same size as the input map, only same-
padded convolutions can be stacked. Alternatively, a coarser segmentation map
can be obtained by allowing the output map to shrink spatially.
The output of the first labelling stage can be refined successively by another
convolutional model. If the models use tied parameters, this gives rise to a type
of recursive model as shownbelow. (H¹, H², H³ share parameters)
16|D L - U N I T - I V
BTECH_CSE-SEM 31
Fig:Recursive refinement of the segmentation
map
The output can be further processed under the assumption that contiguous regions
of pixels will tend to belong to the same label. Graphical models can describe this
relationship. Alternately, CNNs can learn to optimize the graphical models training
objective.
Another model that has gained popularity for segmentation tasks (especially in the
medical imaging community) is the U-Net. The up-convolution mentioned is just a
direct upsampling by repetition followed by a convolution with same padding.
17|D L - U N I T - I V
BTECH_CSE-SEM 31
Fig: U-Net architecture for medical image segmentation (source)
4.7.DATA TYPES
The data used with a convolutional network usually consist of several channels, each
channel being the observation of a different quantity at some point in space or time.
One advantage to convolutional networks is that they can also process inputs with
varying spatial extents.
When the output is accordingly variable sized, no extra design change needs to be
made. If however the output is fixed sized, as in the classification task, a pooling stage
with kernel size proportional to the input size needs to be used.
Fig: Different data types based on the number of spatial dimensions and channels
17|D L - U N I T - I V
BTECH_CSE-SEM 31
4.8.EFFICIENT COVOLUTION ALGORITHMS:
In some problem settings, performing convolution as pointwise multiplication in the

frequency domain can provide a speed up as compared to direct computation. This
is a result from the property of convolution:
Convolution in the source domain is multiplication in the frequency domain.

F is the transformation operation.
When a d-dimensional kernel can be broken into the outer product of d vectors, the
kernel is said to be separable. The corresponding convolution operations are more
efficient when implemented as d 1-dimensional convolutions rather than a direct d-
dimensional convolution. Note however, it may not always be possible to express a
kernel as an outer product of lower dimensional kernels.
This is not to be confused with depthwise separable convolution (explained

brilliantly here). This method restricts convolution kernels to operate on only one input
channel at a time followed by 1x1 convolutions on all channels of the intermediate
output.
Devising faster ways of performing convolution or approximate convolution without
harming the accuracy of the model is an active area of research.
4.9.RANDOM OR UNSUPERVISED FEATURES
To reduce the computational cost of training the CNN, we can use features not
learned by supervised training.
Random initialization has been shown to create filters that are frequency selective
and translation invariant. This can be used to inexpensively select the model
architecture. Randomly initialize several CNN architectures and just train the last
classification layer. Once a winner is determined, that model can be fully trained in a
supervised manner.
18|D L - U N I T - I V
BTECH_CSE-SEM 31
Hand designed kernels may be used; e.g. to detect edges at different orientations
and intensities.
Unsupervised training of kernels may be performed; e.g. applying k-means clustering
to image patches and using the centroids as convolutional kernels. Unsupervised pre-
training may offer regularization effect (not well established). It may also allow for
training of larger CNNs because of reduced computation cost.
Another approach for CNN training is greedy layer-wise pretraining most notably
used in convolutional deep belief network. For example, in the case of multi-layer
perceptrons, starting with the first layer, each layer is trained in isolation. Once the
first layer is trained, its output is stored and used as input for training the next layer,
and so on.
4.10. BASIS FOR CONVOLUTIONAL NETWORKS

Hubel and Wiesel studied the activity of neurons in a cat’s brain in response to visual
stimuli. Their work characterized many aspects of brain function.
In a simplified view, we have:
The light entering the eye stimulates the retina. The image then passes through the the
optic nerve and a region of the brain called the LGN (lateral geniculate nucleus)
V1 (primary visual cortex):
The image produced on the retina is transported to the V1 with minimal processing.
The properties of V1 that have been replicated in CNNs are:
 The V1 response is localized spatially, i.e. the upper image stimulates the cells in
the upper region of V1 [localized kernel].
 V1 has simple cells whose activity is a linear function of the input in a small
neighbourhood[convolution].
 V1 has complex cells whose activity is invariant to shifts in the position of the
feature [pooling] as well as some changes in lighting which cannot be
captured by spatial pooling [cross-channel pooling].
There are several stages of V1 like operations [stacking convolutional layers].
In the medial temporal lobe, we find grandmother cells. These cells respond to
specific concepts and are invariant to several transforms of the input. In the medial
temporal lobe, researchers also found neurons spiking on a particular concept, e.g.
19|D L - U N I T - I V
BTECH_CSE-SEM 31
the Halle Berry neuron fires when looking at a photo/drawing of Halle Berry or even
reading the text Halle Berry. Of course, there are neurons which spike at other
concepts like Bill Clinton, Jennifer Aniston, etc.
The medial temporal neurons are more generic than CNN in that they respond even
to specific ideas. A closer match to the function of the last layers of a CNN is the IT
(inferotemporal cortex). When viewing an object, information flows from the retina,
through LGN, V1, V2, V4 and reaches IT. This happens within 100ms. When a person
continues to look at an object, the brain sends top-down feedback signals to affect
lower level activation.
Some of the major differences between the human visual system (HVS) and the CNN
model are:
 The human eye is low resolution except in a region called fovea.
 Essentially, the eye does not receive the whole image at high resolution but
stiches several patches through eye movements called saccades.
This attention based gazing of the input image is an active research problem.
Note: attention mechanisms have been shown to work on natural language tasks.
Integration of several senses in the HVS while CNNs are only visual.
The HVS processes rich 3D information, and can also determine relations between
objects. CNNs for such tasks are in their early stages.
The feedback from higher levels to V1 has not been incorporated into CNNs with
substantial improvement.
While the CNN can capture firing rates in the IT, the similarity between intermediate
computations is not established. The brain probably uses different activation and
pooling functions. Even the linearity of filter response is doubtful as recent models for
V1 involve quadratic filters.
20|D L - U N I T - I V
BTECH_CSE-SEM 31
Neuroscience tells us very little about the training procedure. Backpropogation which
is a standard training mechanism today is not inspired by neuroscience and
sometimes considered biologically implausible.
Fig:The heatmap of a 2D Gabor filter (source)
In order to determine the filter parameters used by neurons, a process called

reverse correlation is used. The neuron activations are measured by an electrode
when viewing several white noise images and a linear model is used to approximate
this behaviour. It has been shown experimentally that the weights of the fitted model
of V1 neurons are described by Gabor functions. If we go by the simplified version of
the HVS, if the simple cells detect Gabor-like features, then complex cells learn a
function of simple cell outputs which is invariant to certain translations and
magnitude changes.
A wide variety of statistical learning algorithms (from unsupervised (sparse code) to

deep learning (first layer features)) learn features with Gabor-like functions when
applied to natural images. This goes to show that while no algorithm can be touted
as the right method based on Gabor-like feature detectors, a lack of such features
may be taken as a bad sign.
21|D L - U N I T - I V
BTECH_CSE-SEM 31
Fig:(Left) Gabor functions with different values of the parameters that control the
coordinate system. (Middle) Weights learned by an unsupervised learning algorithm
(Right) Convolution kernels learned by the ﬁrst layer of a fully supervised
convolutional maxout network.
9. Practice QuiZ
1. Supervised learning and unsupervised clustering both require at least one
a) hidden attribute
b) output attribute
c) input attribute
d) categorial attribute
2. when did deep learning start
a) 1989
b) 1943
c) 1978
d) 1962
3. Computer systems are designed by
a) simplifying requirements of system

b) breaking of the system into smaller self-contained co-operating
subsystems
c) breaking up the systems into independent parts
d) modular design
4. who is the father of deep learning
a) iiyasutskever
b) Frank rosenblatt
c) David
rumelhart d)none
5. how many layers deep learning algorithms are
constructed a)3
b) 1
22|D L - U N I T - I V
BTECH_CSE-SEM 31
c) 4
d) 7
6. which of the following is a subset of machine learning
a) SciPy
b) NumPy
c) deep learning
d) none
7. first layer of deep learning
a) hidden layer
b) outer layer
c) none
d) inner layer
8. RNN stands for
a) report neural networks
(b) recurrent neural networks
c) receives neural networks
d) recording neural networks
9. Which of the following is/are Common uses of RNNs?
A. BusinessesHelp securities traders to generate analytic reports
B. Detect fraudulent credit-card transaction
C. Provide a caption for images
D. All of the above
10. Which of the following is well suited for perceptual tasks?
A. Feed-forward neural networks
B. Recurrent neural networks
C. Convolutional neural networks
D. Reinforcement Learning
11. CNN is mostly used when there is an?
A. structured data
B. unstructured data
C. Both A and B
D. None of the above
12. Which neural network has only one hidden layer between the input and output?
A. Shallow neural network
B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
13. Which of the following is/are Limitations of deep learning?
A. Data labeling
B. Obtain huge training datasets
C. Both A and B
D. None of the above
14. Deep learning algorithms are more accurate than machine
learning algorithm in image classification.
A. 33%
B. 37%
23|D L - U N I T - I V
BTECH_CSE-SEM 31
C. 40%
D. 41%
24|D L - U N I T - I V
BTECH_CSE-SEM 31

DL Unit 4

Uploaded by

Copyright:

Available Formats

DL Unit 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL Unit 4

Uploaded by

Copyright:

Available Formats

COURSE MATERIAL

SUBJECT DEEP LEARNING

COMPUTER SCIENCE & ENGINEERING

PREPARED / REVISED DATE 23.08.2023

5. Co-PO / PSO Mapping

Lecture No. Weeks Topics to be covered References

1 Introduction to Convolutional networks: T1, R1

2 The convolution operation , Pooling T1, R1

4 Basic convolution functions T1, R1

5 Structured outputs T1, R1

6 Data types T1, R1

8 Random or unsupervised features T1, R1

9 Random or unsupervised features T1, R1

10 Random or unsupervised features T1, R1

7.Activity Based Learning

1. DL course is associated with laboratory, different open-ended problem statements

4.1 INTRODUCTION TO CONVOLUTIONAL NETWORK

Continuous domain convolution

1-D discrete convolution

2-D discrete convolution

Fig: 2D convolution (source)

The 1D convolution operation can be represented as a matrix vector product. The

The general structure of a Toeplitz matrix

Fig: A general circulant matrix

Fig: A block circulant matrix (each Hi is a matrix)

i) Max pooling where we take largest of the pixel values of a segment.

As cross validation is expensive for big network, remedy of over-fitting in a modern

Convolution is an orderly procedure where two sources of information are

The 2D Convolution Layer

The Dilated or Atrous Convolution

4.5. BASIC CONVOLUTION FUNCTIONS :

In practical implementations of the convolution operation, certain modifications are

Fig: 2D convolution 3x3 kernel and stride of 2 units (source)

When max pooling operation is applied to locally connected layer or tiled

Convolutional networks can be trained to output high-dimensional structured output

In some problem settings, performing convolution as pointwise multiplication in the

Convolution in the source domain is multiplication in the frequency domain.

This is not to be confused with depthwise separable convolution (explained

4.9.RANDOM OR UNSUPERVISED FEATURES

4.10. BASIS FOR CONVOLUTIONAL NETWORKS

Fig:The heatmap of a 2D Gabor filter (source)

In order to determine the filter parameters used by neurons, a process called

A wide variety of statistical learning algorithms (from unsupervised (sparse code) to

2. when did deep learning start

3. Computer systems are designed by

a) simplifying requirements of system

You might also like