Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

Week5_Computer_Vision

Uploaded by

albertadi412
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Week5_Computer_Vision

Uploaded by

albertadi412
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

CSCI218: Foundations

of Artificial Intelligence
Human Vision System

2
Robot Vision System

3
Image Formation

4
Image Formation

5
Simple Image Feature

Image Color Histogram 6


Simple Image Feature

Edge

7
Simple Image Feature

Edge

8
Simple Image Feature

Texture (e.g., Gray-Level Co-Occurrence Matrix (GLCM))


- Characterise how often pairs of pixel with specific values and in a specified spatial relationship occur in an image

9
Simple Image Feature

Optical Flow: Whenever there is relative movement between the camera and one or more
objects in the scene, the resulting apparent motion in the image is called optical flow.

10
Simple Image Feature

Optical Flow: Whenever there is relative movement between the camera and one or more
objects in the scene, the resulting apparent motion in the image is called optical flow.

11
Simple Image Feature

Segmentation of natural images

12
Classifying Images

Important sources of appearance variation

13
Classifying Images

Why convolutional neural networks classify images well

14
Detecting Objects

Faster RCNN for object detection

15
The 3D World

Binocular stereopsis

16
Using Computer Vision

Understanding what people are doing

17
Using Computer Vision

Understanding what people are doing

18
Using Computer Vision

Automated image captioning

19
Using Computer Vision

Visual question-answering
20
Using Computer Vision

Reconstruction from many views

21
Using Computer Vision

Geometry from a single view

22
Using Computer Vision

Making pictures

23
Using Computer Vision

Image Transformation (Paired)

24
Using Computer Vision

Image Transformation (Unpaired)

25
Using Computer Vision

Image Transformation (Style transfer)

26
Using Computer Vision

Image Generation (by GAN)

27
Using Computer Vision

Controlling movement with vision


28
Using Computer Vision

Navigation

29
Image Analysis
§ Overview of Image Analysis
§ Collecting and Representing Image
§ Image Recognition
§ Bag-of-Visual-Words model
§ Deep Convolutional Neural Networks
Overview of Image Analysis
§ Image analysis
§ Refers to the representation, processing, and modelling of visual data to
derive useful insights
§ Suffers from the semantic gap
§ Visual data (image, video, …) is unstructured
§ Semantic gap
§ The gap between high-level concepts used by human and the low-level
features used by computer
Overview of Image Analysis
§ Image recognition (in a narrow sense)
§ Image classification
§ Object detection, localisation, tracking
§ Scene segmentation and reconstruction
§ Image search and retrieval
Overview of Image Analysis
§ Image classification

Face OCR recognition


recognition

Scene recognition Object recognition


Overview of Image Analysis
§ Object detection, localisation, tracking

Object detection and localization

Object tracking (https://www.youtube.com/watch?v=dKpRsdYSCLQ)


Overview of Image Analysis
§ Scene segmentation and reconstruction

[Farabet et al. PAMI 2013]

http://twd20g.blogspot.com.au/2011/12/this-work-presents-novel-system-that.html https://www.3dflow.net/elementsCV/S4.xhtml
Image Analysis Steps
§ Collection and labelling
§ Collect representative images from a given task and label the ground
truth
§ Image representation
§ Select and/or design appropriate image representations (invariant and
discriminative)
§ Image analysis techniques
§ Apply and/or design appropriate analysis techniques for the given tasks
(classification, detection, tracking, segmentation, etc.)
Representing Image
§ Why representing images is difficult?
§ Scale, rotation, illumination, occlusion, background clutter, deformation, …
§ Invariant and Discriminative representation

Cat:
Representing Image
§ Traditional representation (before year 2000)
§ Hand-crafted, global features
§ Intensity, colour, texture, shape, structure, etc.

Colour histogram in a RGB space Face recognition with raw pixel


intensities
Representing Image
§ Days of the BoVW model (2000 ~ 2012)
§ SIFT, HOG, SURF, CENTRIST, filter-based, …
§ Invariant to view angle, scale, illumination, ...

SIFT (Scale Invariant Feature


Transform)

http://www.robots.ox.ac.uk/~vgg/software
/ Image courtesy of David Lowe, IJCV04
Deep Learning Model
Convolutional Neural Networks (CNNs)
§ A special multi-stage architecture inspired by visual system
§ Higher stages compute more global, more invariant features
Deep Learning Model

https://www.datasciencecentral.com/lenet-5-a-classic-cnn-architecture/
Convolution

§ For standard 2D convolution:

Filter

§ The stride is 1.
§ The height and width are changed as:
&'( )&*'+,-.
!"#$ = + 1 = (5 − 3)⁄1 + 1 = 3.
/$0123
Convolution

We need Zero-Padding to keep image size:

The width/height will become:


!&' − !)&*$+, + 2×0122345
!"#$ = +1
678329
Convolution Layers
In convolution layers:
§ Filters are called Kernels and become 3D. The parameters of
kernels (i.e., weights) are to be learned.

Kernel 1

Kernel N

'( ×') ×*%&


!×#×$%& !×#×$+,-
Convolution Layers
In convolution layers:
§ Feature maps are the outputs of each layer. The number of
feature maps is the channel.

Feature map 1

Feature map N

!×#×$%& !×#×$'()
Convolutional Neural Networks

§ Multi-stage Architecture
Convolution
Non-linearity
Pooling
Convolutional Neural Networks
Convolution
- A set of filters convolve with the input
- Share weights across the input space (translation equivariance)

Input
Filters
Feature Map
Convolutional Neural Networks
Non-linearity

Sigmoid: f(x)=1/(1+e-x) Tanh: f(x)=(ex − e-x)/(ex +e-x) ReLu: f(x)=max(x, 0)


Convolutional Neural Networks

Spatial pooling
§ Non-overlapping / overlapping regions
§ Max or sum
§ Invariance to small transformations

Max pooling

Sum/Average
pooling
Deep Learning Model
CNNs: ImageNet Breakthrough

[Krizhevsky et al. NIPS 2012]


● Krizhevsky et al. win 2012 ImageNet classification with a much bigger ConvNet
○ deeper: 7 stages vs 3 before
○ larger: 60 million parameters vs 1 million before
○ 16.4% error (top-5) vs Next best 26.2% error

● This was made possible by:


○ fast hardware: GPU-optimized code
○ big dataset: 1.2 million images vs thousands before
○ better regularization: dropout et al. Image courtesy of Deng et al.
Deep Learning Model
Learned Features of CNNs

[Matthew D. Zeiler et al. ECCV 2014]


Deep Learning Model

Object detection (Source: Rich feature hierarchies for accurate object detection and semantic
segmentation, CVPR 2014)

Face Recognition (Source: DeepFace: Closing the Gap to Human-Level Performance in Face Verification,
CVPR 2014)
Deep Learning Model

§ Directly use pre-trained CNNs


§ Which layer to use?
§ How to pool the features in a convolutional layer?
Deep Learning Model

§ Directly use pre-trained CNNs


§ Which layer to use?
Convolutional layer
Fully connected
layer
Deep Learning Model
§ Fine-tune pre-trained CNNs
§ To incorporate extra information from the images of a
new recognition task
§ Make the pre-trained CNNs adapt to this new task
Pre-trained CNNs New recognition task
on

Fine-
tune

Image courtesy of Deng et al.


http://people.csail.mit.edu/bzhou/
Summary
§ Computer vision is a key component of AI
§ Image analysis is an important and broad area
§ Feature representation is key for image analysis
§ Deep Learning techniques are now widely used
Acknowledgement

The lecture slides are based on the materials from ai.Berkey.edu


Thank you. Questions?

You might also like