Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Images, Neural Networks, CNNs

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Images and

convolutional neural
networks

By: DIVAKAR KESHRI


PhD NIT TRICHY

1
Computer vision

Computer vision = giving computers the


ability to understand visual information
Examples:
○ A robot that can move around obstacles by
analysing the input of its camera(s)
○ A computer system finding images of cats
among millions of images on the Internet

2
From picture to pixels

An image has to be digitized for It is turned into millions of “pixel”


computer processing elements

0.49411765 0.49411765 0.4745098 0.49019608 0.4745098

0.49411765 0.49411765 0.5058824 0.49411765 0.49803922

0.49803922 0.49411765 0.4862745 0.47058824 0.49411765

0.5019608 0.49803922 0.49803922 0.49019608 0.50980395

0.50980395 0.5058824 0.52156866 0.50980395 0.5058824

Picture source: https://pixabay.com/en/kitty-cat-kid-cat-domestic-cat-2948404/


Each a set of numbers
quantifying the color of that
3
element
From pixels to … understanding?

0.49411765 0.49411765 0.4745098 0.49019608 0.4745098

0.49411765 0.49411765 0.5058824 0.49411765 0.49803922


There’s a cat among
0.49803922 0.49411765 0.4862745 0.47058824 0.49411765
some flowers in the
0.5019608 0.49803922 0.49803922 0.49019608 0.50980395 grass
0.50980395 0.5058824 0.52156866 0.50980395 0.5058824

● This is easy for humans


● But for AI it’s actually one of the harder problems!
● How do you transform that grid of numbers into
understanding…
or even something
4
useful?
Image understanding
• Humans are so good in vision that it’s not even
considered intelligence

5
Convolutional neural
networks
Convolutional neural network
(CNN, ConvNet)
● Dense or fully-connected: each neuron connected
to all neurons in previous layer
● CNN: only connected to a small “local” set of
neurons
● Radically reduces numberDense layer Convolutional
of network connections layer

7
Convolution for image data
3✕3 weights
3✕3 image (conv. kernel)
area output
● Image represented as 2D grid of neuron
values
● Each output neuron connected to

small 2D area in the image


● Output value = weighted sum of
inputs
● Idea: nearby pixels are related ⇒

we can learn local relationships


of pixels 8
Image source: https://mlnotebook.github.io/post/CNN1/
Convolution for image data
image input 3✕3 weights
● We repeat for each output (conv. kernel)

neuron
● Weights stay the same
(shared weights)
● Border effect: without
padding output area is
smaller
● Outputs form a “feature feature map
map”
9
Image source: https://mlnotebook.github.io/post/CNN1/
A real example

Image from: http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx


Side note: color images
● Example: 256 ✕ 256 color image with 3 color channels
(red, green, and blue)
⇒ single image is a 3D tensor: 256 ✕ 256 ✕ 3
● Example: 5 ✕ 5 convolution is actually also a 3D tensor:
5✕5✕3
● Slides over width and height, but covers the full color
depth

11
Convolution for image data K feature maps each
252✕252✕1

K kernels
● We can repeat for different each 5✕5(✕3)
sets of weights (kernels)
● Each learns a different
“feature”
image
● Typically: edges, corners, 256✕256✕3
etc
● Each outputs a feature
map

...

...
12
Convolution for image data
output tensor
252✕252✕K

● We stack the feature maps K kernels


each 5✕5(✕3)
into a single tensor
● Depth out output tensor =
number of kernels K
● Tensor is the output of the image
entire convolutional layer 256✕256✕3

...
13
Convolution in layers: intuition
● We can then add
another
convolutional layer
● This operates on the
previous layer’s
output tensor “cat”
(feature maps)
● Features layered
from simple to more
complex

14
learned learned learned
learned
low-level mid-level high-level ca
classifier
features features features t

Image from lecture by Yann Le Cun, original from Zeiler & Fergus (2013)

15
Image datasets
• Color image mini-batches are 4D
tensors:
width ✕ height ✕ color
channels ✕ samples
• Plenty of big datasets for training
exist, e.g., ImageNet with 1,2 million
images in 1000 classes
• Data augmentation for small datasets:
generate more training data by
transforming existing data
• E.g., shifting, rotation, cropping,
Scaling, adding noise, etc …

16
Convolutional layers
• Input: tensor of size N × Wi × Hi × Ci
• Hyperparameters:
• K: number of filters
• w, h: kernel size
• padding: how to handle image borders
• activation function
• Output: tensor of size N × Wo × Ho × K
• In tf.keras:
layers.Conv2D(filters, kernel_size,
padding, activation)

(there is also Conv1D and Conv3D)

17
Pooling layers

• Used to reduce the spatial resolution


• independently on each channel
• reduce complexity and number
of parameters
• MAX operator most common
• sometimes also AVERAGE
• In tf.keras:
layers.MaxPooling2D(pool_size)
layers.AveragePooling2D(pool_size)

18
Image from http://cs231n.github.io/convolutional-networks/
Other layers
• Flatten
• flattens the input into a vector
(typically before dense layers)
• Dropout
• similar as with dense layers
• In tf.keras:
layers.Flatten()
layers.Dropout(rate)

19
Typical architecture

1. Input layer = image pixels


2. Convolution
3. ReLU Repeat one or more times
4. Pooling
5. One or more fully connected layers (+ReLU)
6. Final fully connected layer to get to the
number of classes we want
7. Softmax to get probability distribution over
classes
20
CNN architectures and
applications

21
AlexNet

VGG

22
Inception /
GoogLeNet

ResNet

DenseNet

23
Large-scale CNNs with pre-trained
weights retrain

replace
output layer

extracted
features

• For many applications, an existing CNN can be re-used instead


of training a new model from scratch: extract features from
suitable layer or
retrain the top layers with new data
• Keras contains several models trained with ImageNet:
• Xception, VGG16, VGG19, ResNet50, InceptionV3,
InceptionResNetV2, MobileNet, DenseNet, NASNet
Computer vision
applications

Image credit: Li Fei-Fei


et al
25
Image credit: Noh et al, Learning Deconvolution Network for Semantic Segmentation,
Some selected applications
• Object detection:
https://pjreddie.com/darknet/yolo/
• Semantic segmentation:
https://www.youtube.com/watch?v=qWl9idsCu
LQ

• Human pose estimation:


https://www.youtube.com/watch?v=pW6nZXe
WlGM

• Video recognition: https://valossa.com/


26
• Digital pathology: https://www.aiforia.com/

You might also like