Image and Video Analytics Unit 1
Image and Video Analytics Unit 1
Image and Video Analytics Unit 1
1
CCS349 - IMAGE AND VIDEO
ANALYTICS
Dr. Arthi. A
Professor & Head
Department of Artificial Intelligence and
Data Science
Source:
1.Milan Sonka, Vaclav Hlavac, Roger Boyle, “Image Processing, Analysis,
and Machine Vision”, 4nd edition, Thomson Learning, 2013.
2.Vaibhav Verdhan,(2021, Computer Vision Using Deep Learning Neural
Network Architectures with Python and Keras,Apress 2021(UNIT-III,IV
and V)
OBJECTIVES
1: To understand the basics of image processing
techniques for computer vision.
2: To learn the techniques used for image pre-
processing.
3: To discuss the various object detection
techniques.
4: To understand the various Object recognition
mechanisms.
5: To elaborate on the video analytics techniques.
SYLLABUS
UNIT - I
INTRODUCTION
UNIT - III
OBJECT DETECTION USING MACHINE LEARNING
UNIT - IV
FACE RECOGNITION AND GESTURE RECOGNITION
UNIT - V
VIDEO ANALYTICS
Binary data
0 0 1 1
0 1 1 0
1 1 0 1
1 1 1 1
Image Types
Intensity image or Monochrome
Image
Each pixel corresponds to light
intensity normally represented
in Gray Scale (gray level).
Every bit s represented by 8 bits.
28 = 256 bits
(0(Black) ----------- 255(White))
In between different shades of
The (gray-scale) Gray
image function
values
correspond to
brightness at
image points.
Image Types
Color Table
• High-Level Processes:
• Image analysis and computer vision
High-level computer vision tries to imitate human cognition
and the ability to make decisions according to the information
contained in the image
High-level knowledge would be related to the ‘shape’ of a
cow and the subtle interrelationships between the different
parts of that shape, and their (inter-)dynamics
Computerized Processes Types
•High-level vision begins with some form of formal model of the
world, and then the ‘reality’ perceived in the form of digitized
images is compared to the model.
•A match is attempted, and when differences emerge, partial
matches (or subgoals) are sought that overcome them; the
computer switches to low-level image processing to find
information needed to update the model. This process is then
repeated iteratively, and ‘understanding’ an image thereby
becomes a co-operation between top-down and bottom-up
processes. A feedback loop is introduced in which high-level
partial results create tasks for low-level image processing, and
the iterative image understanding process should eventually
converge to the global goal.
Image Representation
An image is captured by a sensor (such as a
camera) and
•Low-level Image Processing digitized; then the computer suppresses
noise (image pre-processing) and maybe
•High-level Image Understanding
enhances some object features which are
relevant to understanding the image. Edge
extraction is an example of processing
• High-Level Processes: carried out at this stage.
• Image analysis and computer vision
High-level knowledge would be related to the ‘shape’ of a
cow and the subtle interrelationships between the different
parts of that shape, and their (inter-)dynamics
Computerized Processes Types
• Mid-Level Processes:
• Inputs, generally, are images. Outputs are
attributes extracted from those images (edges,
contours, identity of individual objects)
• Tasks:
• Segmentation (partitioning an image into regions or objects)
• Description of those objects to reduce them to a form
suitable
for computer processing
• Classifications (recognition) of objects
Computerized Processes Types
Image
•The image on the retina or on a camera sensor is intrinsically
two-dimensional (2D).
•Image processing often deals with static images, in which time
is constant.
•A monochromatic static image is represented by a
continuous image function f(x, y) whose arguments are co-
ordinates in the plane.
•An image to be processed by computer must be represented
using an appropriate discrete data structure, for example, a
Matrix.
Image
•An image captured by a sensor is expressed as a continuous
function f(x, y) of two co-ordinates in the plane.
•Image digitization means that the function f(x, y) is sampled
into a matrix with M rows and N columns.
•Image quantization assigns to each continuous sample an
integer value—the continuous range of the image function f(x,
y) is split into K intervals.
•The finer the sampling (i.e., the larger M and N) and
quantization (the larger K), the better the approximation of the
continuous image function f(x, y) achieved.
Image
•One infinitely small sampling point in the grid corresponds to
one picture element also called a pixel or image element in the
digital image;
•In a three-dimensional image, an image element is called a
voxel (volume element).
•The set of pixels together covers the entire image;
•however, the pixel captured by a real digitization device has
finite size
•The pixel is a unit which is not further divisible from the image
analysis point of view.
•We shall often refer to a pixel as a ‘point’.
Digital image properties
1.Metric and topological properties of digital images
2.Histograms
3.Entropy
4.Visual perception of the image
Contrast
Acuity
Visual acuity (VA) is a measure of the ability of the eye to
distinguish shapes and the details of objects at a given
distance.
Perceptual grouping
5.Image quality
6.Noise in images
Signal-to-Noise Ratio (SNR);
Quantization noise
Impulse noise
1.Metric and topological properties of digital images
A digital image consists of picture elements with finite size—
these pixels carry information about the brightness of a
particular location in the image.
Pixels are arranged into a rectangular sampling grid.
Digital image is represented by a two-dimensional matrix
whose elements are natural numbers corresponding to the
quantization levels in the brightness scale.
Any function D holding the following three condition is a
‘distance’ (or a metric)
D(p, q)>=0 ;
D(p, q) = 0 if and only if p = q , identity,
D(p, q) = D(q, p) , symmetry,
D(p, r) D(p, q) + D(q, r) , triangular inequality.
1.Metric and topological properties of digital images
1.Metric and topological properties of digital images
Chessboard’ distance. D8
If moves in diagonal directions are allowed in the digitization
grid, we obtain the distance D8, or ‘chessboard’ distance. D8 is
equal to the minimal number of king moves
on the chessboard from one part to another:
Digital image properties
Digital image properties
Pixel adjacency is another important concept in digital
images.
Two pixels (p, q) are called 4-neighbors if they have distance
D4(p, q) = 1.
Analogously, 8-neighbors have D8(p, q) = 1
A path from pixel P to pixel Q as a sequence of points A1, A2, .
. ., An, where
A1 = P, An = Q, and Ai+1 is a neighbor of Ai, i = 1, . . . , n − 1; then a
region is a set of pixels in which there is a path between any pair
of its pixels, all of whose pixels also belong to the set.
If there is a path between two pixels in the set of pixels in the
image, these pixels are called contiguous.
Image
•The brightness of a pixel is a very simple property which can
be used to find objects in some images;
•if, for example, a pixel is darker than some predefined value
(threshold),then it belongs to the object.
•All such points which are also contiguous constitute one object.
Subjective
Subjective methods are often used in television technology,
where the ultimate criterion is the perception of a selected
group of professional and lay viewers.
Matrices
Most common data structure for low level image representation
Elements of the matrix are integer numbers
Image data of this kind are usually the direct output of the image
Data Structure for Image Analysis
The representations can be stratified into four levels [Ballard
and Brown,1982]- However, there are no strict borders between
them and a more detailed classification of the representational
levels is used in some applications.
These four representational levels are ordered from signals at a
low level of abstraction to the description that a human can
perceive.
The information flow between the levels may be bi-directional
and some representations can be omitted for some specific uses.
In the first level, the lowest representational level-iconic images
- consists of images containing original data: - Integer matrices
with data about pixel brightness. - Images of this kind are also
the output of pre-processing operations (e.g., filtration or edge
sharpening) used for highlighting some aspects of the image,
which is important for further treatment.
Data Structure for Image Analysis
2. In the second level of representation in segmentation
process, images are segmented into several parts for image
transmission. - Such segmented parts are groups belong to same
object. - For instance, the segments corresponding to the faces of
bodies. - The domain deals with problems associated with image
data like noise and blur that are useful.
3.The third representational level consisting of Geometric
Representations - Holding knowledge of 2D and 3D shapes. -
The quantification of a shape is very difficult and very important
too. - Geometric representations are useful while doing general
and complex simulations for the influence of illumination and
motion in real objects. - We need them also for the transition
between natural, raster images (gained, for example, by a TV
camera) and data used in computer graphics (CAD – Computer-
Aided Design, DTP–desktop publishing).
Data Structure for Image Analysis
The fourth level of representation of image data consists of
Relational Models. - They are able to treat data more efficiently
and at a higher level of abstraction. 2. A prior knowledge about
the case being solved is usually used in processing of this kind of
techniques are often explored; the information gained from the
image may be represented by semantic nets or frames [Nilsson,
1982].
3.An example will illustrate prior knowledge. * Imagine a
satellite image of a piece of land, and the task of counting planes
standing at an airport;
4.A priori knowledge
Prior knowledge is the position of the airport, which can be
deduced, for instance, from a map.
Relations to other objects in the image (e.g., to roads, lakes,
urban areas)
Data Structure for Image Analysis
•A matrix is the most common data structure for low-level
representation of an image.
•Elements of the matrix are integer numbers
corresponding to brightness, or to another property of the
corresponding pixel of the sampling grid.
•Image data of this kind are usually the direct output of
the image-capturing device.
•Pixels of both rectangular and hexagonal sampling grids
can be represented by a matrix.
•The correspondence between data and matrix elements
is obvious for a rectangular grid;
•with a hexagonal grid every even row in the image is
shifted half a pixel to the right.
Data Structure for Image Analysis
A The matrix is a full representation of the image,
independent of the contents of image data—
Data Structure for Image Analysis
A The matrix is a full representation of the image,
independent of the contents of image data—
Data Structure for Image Analysis
Co-occurrence matrix which represents an estimate of
the probability of two pixels appearing in a spatial
relationship in which a pixel (i1, j1) has intensity z and a
pixel (i2, j2) has intensity y. Suppose that the probability
depends only on a certain spatial relation r between
a pixel of brightness z and a pixel of brightness y; then
information about the relation r is recorded in the square
co-occurrence matrix Cr, whose dimensions correspond
to the number of brightness levels of the image. To
reduce the number of matrices Cr, introduce some
simplifying assumptions; first consider only direct
neighbors, and then treat relations as symmetrical
(without orientation). The following algorithm calculates
the co-occurrence matrix Cr from the image f(i, j).
Data Structure for Image Analysis
Integral image construction The main use of
integral image data structures is in rapid calculation
of simple rectangle image features at multiple scales.
This kind of features is used for rapid object
identification and for object tracking.
The RGB Color Model
• We can represent this model using the unit cube
defined on R, G, and B axes, as shown in the following
Figure.
• The parameter names I and Q refer to the modulation methods used to encode the color information on this
carrier. An amplitude-modulation encoding (the “in-phase” signal) transmits the I value, using about 1.3 MHz
of the bandwidth. And a phase-modulation encoding (the “quadrature” signal), using about0.5 MHz, carries
the Q value.
Transformations Between RGB and YIQ Color Spaces
• Conversely, an NTSC video signal is converted to RGB color values using an NTSC decoder, which first
separates the video signal into the YIQ components, and then converts the YIQ values to RGB values. The
conversion from YIQ space to RGB space is accomplished with the inverse of transformation 9:
• Similarly, a combination of cyan and yellow ink produces green light, and a combination of magenta and
yellow ink yields red light. The CMY printing process often uses a collection of four ink dots, which are
arranged in a close pattern somewhat as an RGB monitor uses three phosphor dots.
• Thus, in practice, the CMY color model is referred to as the CMYK model, where K is the black color
parameter. One ink dot is used for each of the primary colors (cyan, magenta, and yellow), and one ink dot
is black.
• A black dot is included because reflected light from the cyan, magenta, and yellow inks typically produce
only shades of gray.
Transformations Between CMY and RGB Color Spaces
• We can express the conversion from an RGB representation to a CMY representation using the following matrix
transformation: where the white point in RGB space is represented as the unit column vector. And we convert from a CMY
color representation to an RGB representation using the matrix transformation
• In HSV space, saturation S is measured along a horizontal axis, and the value parameter V is measured
along a vertical axis through the center of the hexcone. Hue is represented as an angle about the vertical
axis, ranging from 0◦ at red through 360◦.
• Vertices of the hexagon are separated by 60◦ intervals. Yellow is at 60◦, green at 120◦, and cyan (opposite
the red point) is at H = 180◦.
• Complementary colors are 180◦ apart. Saturation parameter S is used to designate the purity of a color. A
pure color (spectral color) has the value S=1.0, and decreasing S values tend toward the grayscale line (S =
0) at the center of the hexcone.
• Value V varies from 0 at the apex of the hexcone to 1.0 at the top plane. The apex of the hexcone is the
black point. At the top plane, colors have their maximum intensity. When V = 1.0 and S = 1.0, we have the
pure hues.
• Parameter values for the white point are V = 1.0 and S= 0. To get a dark blue, for instance, V could be set
to 0.4 with S = 1.0 and H = 240◦.
• Similarly, when white is to be added to the selected hue, parameter S is decreased while keeping V constant.
• A light blue could be designated with S = 0.3 while V = 1.0 and H= 240◦.The human eye can distinguish
about 128 different hues and about 130 different tints (saturation levels). For each of these, a number of
shades (value settings) can be detected, depending on the hue selected
• The vertical axis in this model is called lightness, L. At L = 0, we have black, and at L = 1.0, we have white.
Grayscale values are along the L axis, and the pure colors lie on the L = 0.5 plane. Saturation parameter S again
specifies the purity of a color. This parameter varies from 0 to 1.0, and pure colors are those for which S = 1.0
and L = 0.5. As S decreases, more white is added to a color. The grayscale line is at S = 0.
• To specify a color, we begin by selecting hue angle H. Then a particular shade, tint, or tone for that hue is
obtained by adjusting parameters L and S. We obtain a lighter color by increasing L, and we obtain a darker color
by decreasing L. When S is decreased, the spatial color point moves toward the grayscale line.