Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CNN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

A Convolu onal Neural Network (CNN) is a type of Deep Learning neural network architecture

commonly used in Computer Vision. Computer vision is a field of Ar ficial Intelligence that enables a
computer to understand and interpret the image or visual data.

Convolu onal Neural Network (CNN) is the extended version of ar ficial neural networks (ANN) which
is predominantly used to extract the feature from the grid-like matrix dataset. For example visual
datasets like images or videos where data pa erns play an extensive role.

CNN Architecture

Convolu onal Neural Network consists of mul ple layers like the input layer, Convolu onal layer,
Pooling layer, and fully connected layers.

The Convolu onal layer applies filters to the input image to extract features, the Pooling layer
downsamples the image to reduce computa on, and the fully connected layer makes the final
predic on. The network learns the op mal filters through backpropaga on and gradient descent.

How Convolu onal Layers Works?

Convolu on Neural Networks or covnets are neural networks that share their parameters. Imagine you
have an image. It can be represented as a cuboid having its length, width (dimension of the image),
and height (i.e the channel as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural network, called a filter or
kernel on it, with say, K outputs and represen ng them ver cally. Now slide that neural network across
the whole image, as a result, we will get another image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser width and height. This
opera on is called Convolu on. If the patch size is the same as that of the image it will be a regular
neural network. Because of this small patch, we have fewer weights.
Image source: Deep Learning Udacity

Mathema cal Overview of Convolu on

Now let’s talk about a bit of mathema cs that is involved in the whole convolu on process.

 Convolu on layers consist of a set of learnable filters (or kernels) having small widths and
heights and the same depth as that of input volume (3 if the input layer is image input).

 For example, if we have to run convolu on on an image with dimensions 34x34x3. The possible
size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to
the image dimension.

 During the forward pass, we slide each filter across the whole input volume step by step where
each step is called stride (which can have a value of 2, 3, or even 4 for high-dimensional
images) and compute the dot product between the kernel weights and patch from input
volume.

 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a
result, we’ll get output volume having a depth equal to the number of filters. The network will
learn all the filters.

Layers Used to Build ConvNets

A complete Convolu on Neural Networks architecture is also known as covnets. A covnets is a


sequence of layers, and every layer transforms one volume to another through a differen able
func on.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.

 Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input
will be an image or a sequence of images. This layer holds the raw input of the image with
width 32, height 32, and depth 3.

 Convolu onal Layers: This is the layer, which is used to extract the feature from the input
dataset. It applies a set of learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input
image data and computes the dot product between kernel weight and the corresponding input
image patch. The output of this layer is referred as feature maps. Suppose we use a total of 12
filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.

 Ac va on Layer: By adding an ac va on func on to the output of the preceding layer,


ac va on layers add nonlinearity to the network. it will apply an element-wise ac va on
func on to the output of the convolu on layer. Some common ac va on func ons are RELU:
max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume will
have dimensions 32 x 32 x 12.

 Pooling layer: This layer is periodically inserted in the covnets and its main func on is to
reduce the size of volume which makes the computa on fast reduces memory and also
prevents overfi ng. Two common types of pooling layers are max pooling and average
pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be of
dimension 16x16x12.

Image source: cs231n.stanford.edu

 Fla ening: The resul ng feature maps are fla ened into a one-dimensional vector a er the
convolu on and pooling layers so they can be passed into a completely linked layer for
categoriza on or regression.

 Fully Connected Layers: It takes the input from the previous layer and computes the final
classifica on or regression task.

Image source: cs231n.stanford.edu

 Output Layer: The output from the fully connected layers is then fed into a logis c func on
for classifica on tasks like sigmoid or so max which converts the output of each class into the
probability score of each class.

You might also like