Image and Video Analytics Unit 1

Department of
Artificial Intelligence and Data Science
1
CCS349 - IMAGE AND VIDEO
ANALYTICS
Dr. Arthi. A
Professor & Head
Department of Artificial Intelligence and
Data Science
Source:
1.Milan Sonka, Vaclav Hlavac, Roger Boyle, “Image Processing, Analysis,
and Machine Vision”, 4nd edition, Thomson Learning, 2013.
2.Vaibhav Verdhan,(2021, Computer Vision Using Deep Learning Neural
Network Architectures with Python and Keras,Apress 2021(UNIT-III,IV
and V)
OBJECTIVES
1: To understand the basics of image processing
techniques for computer vision.
2: To learn the techniques used for image pre-
processing.
3: To discuss the various object detection
techniques.
4: To understand the various Object recognition
mechanisms.
5: To elaborate on the video analytics techniques.
SYLLABUS
UNIT - I
INTRODUCTION
Computer Vision – Image representation and image

analysis tasks - Image representations – digitization –
properties – color images – Data structures for Image
Analysis - Levels of image data representation -
Traditional and Hierarchical image data structures.
SYLLABUS
UNIT - II
IMAGE PRE-PROCESSING
Local pre-processing - Image smoothing - Edge

detectors - Zero-crossings of the second derivative -
Scale in image processing - Canny edge detection -
Parametric edge models - Edges in multi-speralct
images - Local pre-processing in the frequency domain
- Line detection by local pre-processing operators -
Image restoration.
SYLLABUS
UNIT - III
OBJECT DETECTION USING MACHINE LEARNING
Object detection– Object detection methods – Deep

Learning framework for Object detection– bounding
box approach-Intersection over Union (IoU) –Deep
Learning Architectures-R-CNN-Faster R-CNN-You Only
Look Once(YOLO)-Salient features-Loss Functions-
YOLO architectures
SYLLABUS
UNIT - IV
FACE RECOGNITION AND GESTURE RECOGNITION
Face Recognition-Introduction-Applications of Face

Recognition-Process of Face Recognition- DeepFace
solution by Facebook-FaceNet for Face Recognition-
Implementation using FaceNet- Gesture Recognition.
SYLLABUS
UNIT - V
VIDEO ANALYTICS
Video Processing – use cases of video analytics-

Vanishing Gradient and exploding gradient problem-
RestNet architecture-RestNet and skip connections-
Inception Network-GoogleNet architecture-
Improvement in Inception v2-Video analytics-RestNet
and Inception v3.
OUTCOMES
CO1: Understand the basics of image processing
techniques for computer vision and video analysis.
CO2: Explain the techniques used for image pre-
processing.
CO3: Develop various object detection techniques.
CO4: Understand the various face recognition
mechanisms.
CO5: Elaborate on deep learning-based video analytics.
UNIT 1
I. Introduction to Computer Vision
A. Definition and Scope of Computer Vision
B. Importance of Image Analysis in Computer
Vision
C. Overview of Image Representation
II. Image Representation

A. Basics of Image Digitization
B. Properties of Images
C. Color Images
UNIT 1
III. Data Structures for Image Analysis
A. Levels of Image Data Representation
1. Pixel-level 2. Region-level 3. Object-level
B. Traditional Image Data Structures

1. Bitmaps and Raster Scan 2. Run-Length
Encoding 3. Chain Codes
C. Hierarchical Image Data Structures 1. Quad

Trees 2. Oct Trees 3. Region Quad Trees
UNIT 1
•Image digitization
oSampling
oQuantization
•Digital image properties

1 Metric and topological properties of digital
images
2 Histograms
3 Entropy
4 Visual perception of the image
5 Image quality
6 Noise in images
Color images
•Color images
1 Physics of color
2 Color perceived by humans
3 Color spaces
4 Palette images
5 Color constancy
Data structures for image analysis
•Levels of image data representation
•Traditional image data structures
1 Matrices
2 Chains
3 Topological data structures
4 Relational structures
•Hierarchical data structures
1 Pyramids
2 Quadtrees
3 Other pyramidal structures
Image Processing Fields
• Computer Graphics: The creation of images
• Image Processing: Enhancement or other

manipulation of the image
• Computer Vision: Analysis of the image content

Image analysis
Image analysis (also known as “computer vision”

or image recognition) is the ability of computers to
recognize attributes within an image.
Image Analytics
Image Analytics
•Image analytics can also identify faces within photos to
determine sentiment, gender, age, and more.
•It can recognize multiple elements within a photo at the
same time, including logos, faces, activities, objects, and scenes.
•The technology can automatically caption images “man and
woman standing outside wearing shirts with bike and
mountains in the background.” As social media image analysis
technology evolves, it will be able to provide even more context
around photos.
Image Analytics
What is image digitization?
Digitizing is the primary way of storing images in a form
suitable for transmission and computer processing, whether
scanned from two-dimensional analog originals or captured
using an image sensor-equipped device such as a digital camera,
tomographical instrument such as a CAT scanner, or acquiring
precise dimensions ..
Image Sampling and Quantization
•The output of most sensors is continuous in amplitude and
spatial coordinates.
•Converting an analog image to a digital image require sampling
and quantization
•Sampling: is digitizing the coordinate values
•Quantization: is digitizing the amplitude values
Image Analytics
Image Types
Binary Image or Black and

White Image:
Each pixel contains one bit :
1 represent White
0 represents Black
Binary data
0 0 1 1
0 1 1 0

1 1 0 1
1 1 1 1
Image Types
Intensity image or Monochrome
Image
Each pixel corresponds to light
intensity normally represented
in Gray Scale (gray level).
Every bit s represented by 8 bits.
28 = 256 bits
(0(Black) ----------- 255(White))
In between different shades of
The (gray-scale) Gray
image function
values
correspond to
brightness at
image points.
Image Types
Color image or RGB image:

Each pixel contains a vector
representing Red, Green and
Blue components.
8 bit+8 Bits+8 Bits
Image Types : Index Image
Index image
Each pixel contains index number pointing to a
color in a color table
Color Table
Index Red Green Blue

component component component
No.
1 0.1 0.5 0.3
2 1.0 0.0 0.0
1 4 9 
6 4 7  3 0.0 1.0 0.0
 
4 0.5 0.5 0.5
6 5 2 
5 0.2 0.8 0.9
Index value … … … …
Computerized Processes Types
•Low-level Image Processing
•High-level Image Understanding
• High-Level Processes:
• Image analysis and computer vision
High-level computer vision tries to imitate human cognition
and the ability to make decisions according to the information
contained in the image
High-level knowledge would be related to the ‘shape’ of a
cow and the subtle interrelationships between the different
parts of that shape, and their (inter-)dynamics
•High-level vision begins with some form of formal model of the
world, and then the ‘reality’ perceived in the form of digitized
images is compared to the model.
•A match is attempted, and when differences emerge, partial
matches (or subgoals) are sought that overcome them; the
computer switches to low-level image processing to find
information needed to update the model. This process is then
repeated iteratively, and ‘understanding’ an image thereby
becomes a co-operation between top-down and bottom-up
processes. A feedback loop is introduced in which high-level
partial results create tasks for low-level image processing, and
the iterative image understanding process should eventually
converge to the global goal.
Image Representation
An image is captured by a sensor (such as a
camera) and
•Low-level Image Processing digitized; then the computer suppresses
noise (image pre-processing) and maybe
•High-level Image Understanding
enhances some object features which are
relevant to understanding the image. Edge
extraction is an example of processing
• High-Level Processes: carried out at this stage.
• Image analysis and computer vision
High-level knowledge would be related to the ‘shape’ of a
cow and the subtle interrelationships between the different
parts of that shape, and their (inter-)dynamics
• Mid-Level Processes:
• Inputs, generally, are images. Outputs are
attributes extracted from those images (edges,
contours, identity of individual objects)
• Tasks:
• Segmentation (partitioning an image into regions or objects)
• Description of those objects to reduce them to a form
suitable
for computer processing
• Classifications (recognition) of objects
Image
•The image on the retina or on a camera sensor is intrinsically
two-dimensional (2D).
•Image processing often deals with static images, in which time
is constant.
•A monochromatic static image is represented by a
continuous image function f(x, y) whose arguments are co-
ordinates in the plane.
•An image to be processed by computer must be represented
using an appropriate discrete data structure, for example, a
Matrix.
Image
•An image captured by a sensor is expressed as a continuous
function f(x, y) of two co-ordinates in the plane.
•Image digitization means that the function f(x, y) is sampled
into a matrix with M rows and N columns.
•Image quantization assigns to each continuous sample an
integer value—the continuous range of the image function f(x,
y) is split into K intervals.
•The finer the sampling (i.e., the larger M and N) and
quantization (the larger K), the better the approximation of the
continuous image function f(x, y) achieved.
Image
•One infinitely small sampling point in the grid corresponds to
one picture element also called a pixel or image element in the
digital image;
•In a three-dimensional image, an image element is called a
voxel (volume element).
•The set of pixels together covers the entire image;
•however, the pixel captured by a real digitization device has
finite size
•The pixel is a unit which is not further divisible from the image
analysis point of view.
•We shall often refer to a pixel as a ‘point’.
Digital image properties
1.Metric and topological properties of digital images
2.Histograms
3.Entropy
4.Visual perception of the image
Contrast
Acuity
Visual acuity (VA) is a measure of the ability of the eye to
distinguish shapes and the details of objects at a given
distance.
Perceptual grouping
5.Image quality
6.Noise in images
Signal-to-Noise Ratio (SNR);
Quantization noise
Impulse noise
A digital image consists of picture elements with finite size—
these pixels carry information about the brightness of a
particular location in the image.
Pixels are arranged into a rectangular sampling grid.
Digital image is represented by a two-dimensional matrix
whose elements are natural numbers corresponding to the
quantization levels in the brightness scale.
Any function D holding the following three condition is a
‘distance’ (or a metric)
D(p, q)>=0 ;
D(p, q) = 0 if and only if p = q , identity,
D(p, q) = D(q, p) , symmetry,
D(p, r) D(p, q) + D(q, r) , triangular inequality.
•The distance between two points can also be expressed as the

minimum number of elementary steps in the digital grid
which are needed to move from the starting point to the end
point.
•The distance between points with co-ordinates (i, j) and (h, k)
may be defined in several different ways.
•The Euclidean distance DE known from classical geometry

and everyday experience is defined by
City block’ distance(D4)

If only horizontal and vertical moves are allowed, the ‘city
block’ distance D4 is obtained (also called the L1 metric or
Manhattan distance, because of the analogy with the distance
between two locations in a city with a rectangular grid of
streets):
Chessboard’ distance. D8
If moves in diagonal directions are allowed in the digitization
grid, we obtain the distance D8, or ‘chessboard’ distance. D8 is
equal to the minimal number of king moves
on the chessboard from one part to another:
Pixel adjacency is another important concept in digital
images.
Two pixels (p, q) are called 4-neighbors if they have distance
D4(p, q) = 1.
Analogously, 8-neighbors have D8(p, q) = 1
A path from pixel P to pixel Q as a sequence of points A1, A2, .
. ., An, where
A1 = P, An = Q, and Ai+1 is a neighbor of Ai, i = 1, . . . , n − 1; then a
region is a set of pixels in which there is a path between any pair
of its pixels, all of whose pixels also belong to the set.
If there is a path between two pixels in the set of pixels in the
image, these pixels are called contiguous.
Image
•The brightness of a pixel is a very simple property which can
be used to find objects in some images;
•if, for example, a pixel is darker than some predefined value
(threshold),then it belongs to the object.
•All such points which are also contiguous constitute one object.
•A hole consists of points which do not belong to the object

and are surrounded by the object, and all other points constitute
the background.
•An example is the black printed text on this white sheet of
paper, in which individual letters are objects.
•White
•areas surrounded by the letter are holes, for example, the area
inside the letter ‘o’. Other white parts of the paper are the
background.
Topological properties of digital images
•Topological properties of images are invariant to
homeomorphic transforms (rubber sheet transformations)
•Stretching does not change contiguity of the object parts and
does not change the number of holes in regions.
•One such image property is the Euler--Poincare characteristic
defined as the difference between the number of regions and
the number of holes in them.
•Convex hull is used to describe topological properties of
objects.
•The convex hull is the smallest region which contains the
object, such that any two points of the region can be connected
by a straight line, all points of which belong to the region.
2.Histograms
A histogram is a visual representation of the distribution of

numeric data.
A histogram is a graph that shows the frequency of numerical
data using rectangles.
The height of a rectangle (the vertical axis) represents the
distribution frequency of a variable (the amount, or how often
that variable appears).
The brightness histogram hf (z) of an image provides the
frequency of the brightness value z in the image—the
histogram of an image with L gray-levels is represented by a
one-dimensional array with L elements.
2.Histograms
The histogram provides a natural bridge between images and a

probabilistic description.
We might want to find a first-order probability function p1(z; x,

y) to indicate the probability that pixel (x, y) has brightness z.
Dependence on the position of the pixel is not of interest in the

histogram; the function p1(z) is of interest and the brightness
histogram is its estimate.
The histogram is often displayed as a bar graph, see Figure
2.Histograms
The histogram is usually the only global information about the

image which is available.
It is used when finding optimal illumination conditions for
capturing an image,gray-scale transformations, and image
segmentation to objects and background
Assign zero values to all elements of the array h.

For all pixels (x,y) of the image f, increment h(f(x,y)) by one.
2. 1.
Histograms may have many local maxima ... histogram

smoothing
Noise in images
Images are often degraded by random noise.
Noise can occur during image capture, transmission or
processing, and may be dependent on or independent ofimage
content.
Noise is usually described by its probabilistic characteristics.
White noise - constant power spectrum (its intensity does not
decrease with increasing frequency); verycrude approximation
of image noise
Gaussian noise is a very good approximation of noise that occurs
in many practical cases
probability density of the random variable is given by the
Gaussian curve;
1D Gaussian noise - μ is the mean and is the standard deviation
of the random variable.
Noise in images
During image transmission, noise which is usually independent

of the image signal occurs.
Noise may be additive, noise and image signal g are independent
multiplicative, noise is a function of signal magnitude

impulse noise (saturated = salt and pepper noise)
Perceptual grouping to aggregate elements
It permits the creation of more
meaningful chunks of information from
meaningless outcomes of low-level
operations such as edge detection.
Such grouping is useful in image
understanding.
Image quality
Image quality can be divided into two categories:
Subjective
Objective.
Subjective
Subjective methods are often used in television technology,
where the ultimate criterion is the perception of a selected
group of professional and lay viewers.
They appraise an image according to a list of criteria and give

appropriate marks.
Image quality
Objective
Objective quantitative methods measuring image quality are
more interesting for our purposes.
we might then use it as a criterion in parameter optimization.
The quality of an image f(x, y) is usually estimated by
comparison with a known reference image g(x, y)
, and a synthesized image is often used for this purpose.
One class of methods uses simple measures such as the mean
quadratic difference P(g −f)2, but this does not distinguish a few
big differences from many small differences.
Instead of the mean quadratic difference, the mean absolute
difference or simply maximal absolute difference may be used.
Correlation between images f and g is another alternative.
Image quality
Another class measures the resolution of small or

proximate objects in the image.
An image consisting of parallel black and white
stripes is used for this purpose; then the
number of black and white pairs per millimeter
gives the resolution.
Noise in images
•Real images are often degraded by some random errors—this
degradation is usually called noise.
•Noise can occur during image capture, transmission, or
processing, and may be dependent on, or independent of, the
image content.
•Noise is usually described by its probabilistic characteristics.
Idealized noise, called white noise is often used.
•A special case of white noise is Gaussian noise
Noise in images
•Real images are often degraded by some random errors—this
degradation is usually called noise.
•Noise can occur during image capture, transmission, or
processing, and may be dependent on, or independent of, the
image content.
•Noise is usually described by its probabilistic characteristics.
Idealized noise, called white noise is often used.
•A special case of white noise is Gaussian noise
Noise in images
A digital image consists of picture elements with finite size—
these pixels carry information about the brightness of a
particular location in the image.
Pixels are arranged into a rectangular sampling grid.
Digital image is represented by a two-dimensional matrix
whose elements are natural numbers corresponding to the
quantization levels in the brightness scale
Noise in images
The image on the retina or on a camera sensor is intrinsically
two-dimensional (2D).
The 2D intensity image is the result of a perspective
projection of the 3D scene
Image processing often deals with static images, in which
time is constant.
A monochromatic
static image is represented by a continuous image function
f(x, y) whose arguments are co-ordinates in the plane.
The range of image function values is also limited; by
convention, in monochromatic images the lowest value
corresponds to black and the highest to white. Brightness
values bounded by these limits are gray-levels.
An image is captured by a sensor (such as a camera) and
digitized;
Noise in images
Image quantization assigns to each continuous sample
an integer value—the continuous range of the image
function f(x, y) is split into K intervals. The finer the sampling
(i.e., the larger M and N) and quantization (the larger K), the
better the approximation of the continuous image function f(x,
y) achieved.
First, the sampling period should be determined—this is the
distance between two neighboring sampling points in the
image.
Second, the geometric arrangement of sampling points
(sampling grid) should be set.
A continuous image is digitized at sampling points. These
sampling points are ordered in the plane, and their
geometric relation is called the grid.
Noise in images
The digital image is then a data structure, usually a matrix.
Grids used in practice are usually square (Figure a) or
hexagonal (Figure b).
It is important to distinguish the grid from the raster; the raster

is the grid on which a neighborhood relation between
points is defined.
One infinitely small sampling point in the grid corresponds to
one picture element
also called a pixel or image element in the digital image; in
a three-dimensional image,
Noise in images
One infinitely small sampling point in the grid corresponds to
one picture element also called a pixel or image element in
the digital image;
in a three-dimensional image,an image element is called a
voxel (volume element).
The set of pixels together covers the entire image;
however, the pixel captured by a real digitization device has
finite size
The pixel is a unit which is not further divisible from the image
analysis point of view
Noise in images
An image is captured by a sensor (such as a camera) and
digitized;
then the computer suppresses noise (image pre-processing) and
maybe enhances some object features which are relevant to
understanding the image.
Edge extraction is an example of processing carried out at this
stage.
Image segmentation is the next step, in which the computer tries
to separate objects from the image background and from each
other.
Noise in images
An image captured by a sensor is expressed as a continuous
function f(x, y) of two co-ordinates in the plane.
Image digitization means that the function f(x, y) is sampled into
a matrix with M rows and N columns.
Digital Image Definition
• An image can be defined as a two-dimensional function f(x,y)

• x,y: Spatial coordinate
• f: the amplitude of any pair of coordinate x,y, which is called
the intensity or gray level of the image at that point.
• x,y and f, are all finite and discrete quantities.
Data Structure for Image Analysis
Data structures for vision may be loosely classified as
•Iconic
•Segmented
•Geometric
•Relational
•Traditional image data structures
•Matrices
•Chains
•Topological data structures
•Relational structures
•Hierarchical data structures
•Pyramids
•Quadtrees
•Other pyramidal structures
Iconic images - consists of images containing original data;
integer matrices with data about pixel brightness.
E.g., outputs of pre-processing operations (e.g., filtration or edge
sharpening) used for highlighting some aspects of the image
important for further treatment.
Segmented images - parts of the image are joined into groups

that probably belong to the same objects.
It is useful to know something about the application domain
while doing image segmentation; it is then easier to deal with
noise and other problems associated with erroneous image data.
Geometric representations - hold knowledge about 2D and 3D
shapes.
The quantification of a shape is very difficult but very important.
Relational models - give the ability to treat data more efficiently
and at a higher level of abstraction.
A priori knowledge about the case being solved is usually used
in processing of this kind.
Example - counting planes standing at an airport using satellite
images
A priori knowledge
position of the airport (e.g., from a map)
relations to other objects in the image ( e.g., to roads, lakes,
urban areas)
geometric models of planes for which we are searching
etc.
Traditional image data structures
matrices,
chains,
graphs,
lists of object properties,
relational databases,
etc.
used not only for the direct representation of image information,
also a basis of more complex hierarchical methods of image
representation.
Matrices
Most common data structure for low level image representation
Elements of the matrix are integer numbers
Image data of this kind are usually the direct output of the image
The representations can be stratified into four levels [Ballard
and Brown,1982]- However, there are no strict borders between
them and a more detailed classification of the representational
levels is used in some applications.
These four representational levels are ordered from signals at a
low level of abstraction to the description that a human can
perceive.
The information flow between the levels may be bi-directional
and some representations can be omitted for some specific uses.
In the first level, the lowest representational level-iconic images
- consists of images containing original data: - Integer matrices
with data about pixel brightness. - Images of this kind are also
the output of pre-processing operations (e.g., filtration or edge
sharpening) used for highlighting some aspects of the image,
which is important for further treatment.
2. In the second level of representation in segmentation
process, images are segmented into several parts for image
transmission. - Such segmented parts are groups belong to same
object. - For instance, the segments corresponding to the faces of
bodies. - The domain deals with problems associated with image
data like noise and blur that are useful.
3.The third representational level consisting of Geometric
Representations - Holding knowledge of 2D and 3D shapes. -
The quantification of a shape is very difficult and very important
too. - Geometric representations are useful while doing general
and complex simulations for the influence of illumination and
motion in real objects. - We need them also for the transition
between natural, raster images (gained, for example, by a TV
camera) and data used in computer graphics (CAD – Computer-
Aided Design, DTP–desktop publishing).
The fourth level of representation of image data consists of
Relational Models. - They are able to treat data more efficiently
and at a higher level of abstraction. 2. A prior knowledge about
the case being solved is usually used in processing of this kind of
techniques are often explored; the information gained from the
image may be represented by semantic nets or frames [Nilsson,
1982].
3.An example will illustrate prior knowledge. * Imagine a
satellite image of a piece of land, and the task of counting planes
standing at an airport;
4.A priori knowledge
Prior knowledge is the position of the airport, which can be
deduced, for instance, from a map.
Relations to other objects in the image (e.g., to roads, lakes,
urban areas)
•A matrix is the most common data structure for low-level
representation of an image.
•Elements of the matrix are integer numbers
corresponding to brightness, or to another property of the
corresponding pixel of the sampling grid.
•Image data of this kind are usually the direct output of
the image-capturing device.
•Pixels of both rectangular and hexagonal sampling grids
can be represented by a matrix.
•The correspondence between data and matrix elements
is obvious for a rectangular grid;
•with a hexagonal grid every even row in the image is
shifted half a pixel to the right.
A The matrix is a full representation of the image,
independent of the contents of image data—
A The matrix is a full representation of the image,
independent of the contents of image data—
Co-occurrence matrix which represents an estimate of
the probability of two pixels appearing in a spatial
relationship in which a pixel (i1, j1) has intensity z and a
pixel (i2, j2) has intensity y. Suppose that the probability
depends only on a certain spatial relation r between
a pixel of brightness z and a pixel of brightness y; then
information about the relation r is recorded in the square
co-occurrence matrix Cr, whose dimensions correspond
to the number of brightness levels of the image. To
reduce the number of matrices Cr, introduce some
simplifying assumptions; first consider only direct
neighbors, and then treat relations as symmetrical
(without orientation). The following algorithm calculates
the co-occurrence matrix Cr from the image f(i, j).
Integral image construction The main use of
integral image data structures is in rapid calculation
of simple rectangle image features at multiple scales.
This kind of features is used for rapid object
identification and for object tracking.
The RGB Color Model
• We can represent this model using the unit cube
defined on R, G, and B axes, as shown in the following
Figure.
• The origin represents black and the diagonally

opposite vertex, with coordinates (1, 1, 1), is white.
The RGB Color Model
• Vertices of the cube on the axes represent
the primary colors, and the remaining
vertices are the complementary color
points for each of the primary colors.
• As with the XYZ color system, the RGB
color scheme is an additive model. Each
color point within the unit cube can be
represented as a weighted vector sum of
the primary colors, using unit vectors R,G
and B:
The RGB Color Model
• For example, the magenta vertex is obtained by adding

maximum red and blue values to produce the triple (1, 0,
1), and white at (1, 1, 1) is the sum of the maximum
values for red, green, and blue.
• Shades of gray are represented along the main diagonal
of the cube from the origin (black) to the white vertex.
• Points along this diagonal have equal contributions from
each primary color, and a gray shade halfway between
black and white is represented as (0.5, 0.5, 0.5)
The YIQ and Related Color Models
• Although an RGB graphics monitor
requires separate signals for the red,
green, and blue components of an image,
a television monitor uses a composite
signal.
• NTSC color encoding for forming the
composite video signal is called the YIQ
color model.
The YIQ Parameters
• In the YIQ color model, parameter Y is the same as
the Y component in the CIE XYZ color space.
• Luminance (brightness) information is conveyed
by the parameter, while chromaticity information
(hue and purity) is incorporated into the I and Q
parameters.
• A combination of red, green, and blue is chosen for
the Y parameter to yield the standard luminosity
curve. Because Y contains the luminance
information, black-and-white television monitors
use only the Y signal.
• Parameter I contains orange-cyan color information
that provides the flesh-tone shading, and parameter Q
carries green-magenta color information.
• The NTSC composite color signal is designed to
provide information in a form that can be received by
black-and- white television monitors, which obtain
grayscale information for a picture within a 6-MHz
bandwidth.
• Thus, the YIQ information is also encoded within a 6-
MHz bandwidth, but the luminance and chromaticity
values are encoded on separate analog signals. In this
way, the luminance signal is unchanged for black-and-
white monitors, and the color information is simply
added within the same bandwidth.
The YIQ and Related Color Models
• Although an RGB graphics monitor requires separate
signals for the red, green, and blue components of an
image, a television monitor uses a composite signal.
• NTSC color encoding for forming the composite video
signal is called the YIQ color model.
The YIQ Parameters

• In the YIQ color model, parameter Y is the same as the
Y component in the CIE XYZ color space.
• Luminance (brightness) information is conveyed by
the parameter, while chromaticity information (hue
and purity) is incorporated into the I and Q
• Luminance information, the Y value, is conveyed as an amplitude modulation on a carrier signal with a
bandwidth of about 4.2 MHz. Chromaticity information, the I and Q values, is combined on a second carrier
signal that has a bandwidth of about 1.8 MHz.
• The parameter names I and Q refer to the modulation methods used to encode the color information on this
carrier. An amplitude-modulation encoding (the “in-phase” signal) transmits the I value, using about 1.3 MHz
of the bandwidth. And a phase-modulation encoding (the “quadrature” signal), using about0.5 MHz, carries
the Q value.
Transformations Between RGB and YIQ Color Spaces
• Conversely, an NTSC video signal is converted to RGB color values using an NTSC decoder, which first
separates the video signal into the YIQ components, and then converts the YIQ values to RGB values. The
conversion from YIQ space to RGB space is accomplished with the inverse of transformation 9:
The CMY and CMYK Color Models

The CMY Parameters
• A subtractive color model can be formed with the three primary colors cyan, magenta, and yellow.
• As we have noted, cyan can be described as a combination of green and blue. Therefore, when white light is
reflected from cyan colored ink, the reflected light contains only the green and blue components, and the
red component is absorbed, or subtracted, by the ink.
• Similarly, magenta ink subtracts the green component from incident light, and yellow subtracts the blue
component.
• In the CMY model, the spatial position (1, 1, 1) represents black, because all components of the incident
light are subtracted. The origin represents white light.
• Equal amounts of each of the primary colors produce shades of gray along the main diagonal of the cube.
• A combination of cyan and magenta ink produces blue light, because the red and green components of the
incident light are absorbed.
• Similarly, a combination of cyan and yellow ink produces green light, and a combination of magenta and
yellow ink yields red light. The CMY printing process often uses a collection of four ink dots, which are
arranged in a close pattern somewhat as an RGB monitor uses three phosphor dots.
• Thus, in practice, the CMY color model is referred to as the CMYK model, where K is the black color
parameter. One ink dot is used for each of the primary colors (cyan, magenta, and yellow), and one ink dot
is black.
• A black dot is included because reflected light from the cyan, magenta, and yellow inks typically produce
only shades of gray.
Transformations Between CMY and RGB Color Spaces
• We can express the conversion from an RGB representation to a CMY representation using the following matrix
transformation: where the white point in RGB space is represented as the unit column vector. And we convert from a CMY
color representation to an RGB representation using the matrix transformation
The HSV Color Model The HSV Parameters

• Color parameters in this model are called hue (H), saturation (S), and value (V).
• We derive this three- dimensional color space by relating the HSV parameters to the directions in the RGB cube.
• If we imagine viewing the cube along the diagonal from the white vertex to the origin (black),we see an outline of the cube
that has the hexagon shape shown in Figure 14.
• The boundary of the hexagon represents the various hues, and it is used as the top of the HSV hexcone and is shown in
the Figure.
• In HSV space, saturation S is measured along a horizontal axis, and the value parameter V is measured
along a vertical axis through the center of the hexcone. Hue is represented as an angle about the vertical
axis, ranging from 0◦ at red through 360◦.
• Vertices of the hexagon are separated by 60◦ intervals. Yellow is at 60◦, green at 120◦, and cyan (opposite
the red point) is at H = 180◦.
• Complementary colors are 180◦ apart. Saturation parameter S is used to designate the purity of a color. A
pure color (spectral color) has the value S=1.0, and decreasing S values tend toward the grayscale line (S =
0) at the center of the hexcone.
• Value V varies from 0 at the apex of the hexcone to 1.0 at the top plane. The apex of the hexcone is the
black point. At the top plane, colors have their maximum intensity. When V = 1.0 and S = 1.0, we have the
pure hues.
• Parameter values for the white point are V = 1.0 and S= 0. To get a dark blue, for instance, V could be set
to 0.4 with S = 1.0 and H = 240◦.
• Similarly, when white is to be added to the selected hue, parameter S is decreased while keeping V constant.
• A light blue could be designated with S = 0.3 while V = 1.0 and H= 240◦.The human eye can distinguish
about 128 different hues and about 130 different tints (saturation levels). For each of these, a number of
shades (value settings) can be detected, depending on the hue selected
The HLS Color Model

• Another model based on intuitive color parameters is the HLS system used by the Tektronix Corporation.
This color space has the double-cone representation shown in Figure.
• The three parameters in this color model are called hue (H), lightness (L), and saturation (S).
• Hue has the same meaning as in the HSV model. It specifies an angle about the vertical axis that locates a
hue (spectral color). In this model, H = 0◦ corresponds to blue.
• The remaining colors are specified around the perimeter of the cone in the same order as in the HSV model.
Magenta is located at H =60◦, red is at H =120◦, and cyan is at H =300◦. Again, complementary colors are 180◦
apart on the double cone.
• The vertical axis in this model is called lightness, L. At L = 0, we have black, and at L = 1.0, we have white.
Grayscale values are along the L axis, and the pure colors lie on the L = 0.5 plane. Saturation parameter S again
specifies the purity of a color. This parameter varies from 0 to 1.0, and pure colors are those for which S = 1.0
and L = 0.5. As S decreases, more white is added to a color. The grayscale line is at S = 0.
• To specify a color, we begin by selecting hue angle H. Then a particular shade, tint, or tone for that hue is
obtained by adjusting parameters L and S. We obtain a lighter color by increasing L, and we obtain a darker color
by decreasing L. When S is decreased, the spatial color point moves toward the grayscale line.

Image and Video Analytics Unit 1

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Image and Video Analytics Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image and Video Analytics Unit 1

Uploaded by

Copyright:

Available Formats

Department of

Artificial Intelligence and Data Science

Computer Vision – Image representation and image

Local pre-processing - Image smoothing - Edge

Object detection– Object detection methods – Deep

Face Recognition-Introduction-Applications of Face

Video Processing – use cases of video analytics-

II. Image Representation

B. Traditional Image Data Structures

C. Hierarchical Image Data Structures 1. Quad

•Digital image properties

• Computer Graphics: The creation of images

• Image Processing: Enhancement or other

• Computer Vision: Analysis of the image content

Image analysis (also known as “computer vision”

Binary Image or Black and

Color image or RGB image:

Index Red Green Blue

•The distance between two points can also be expressed as the

•The Euclidean distance DE known from classical geometry

City block’ distance(D4)

•A hole consists of points which do not belong to the object

A histogram is a visual representation of the distribution of

The histogram provides a natural bridge between images and a

We might want to find a first-order probability function p1(z; x,

Dependence on the position of the pixel is not of interest in the

The histogram is usually the only global information about the

Assign zero values to all elements of the array h.

Histograms may have many local maxima ... histogram

During image transmission, noise which is usually independent

Noise may be additive, noise and image signal g are independent

multiplicative, noise is a function of signal magnitude

They appraise an image according to a list of criteria and give

Another class measures the resolution of small or

It is important to distinguish the grid from the raster; the raster

• An image can be defined as a two-dimensional function f(x,y)

Segmented images - parts of the image are joined into groups

• The origin represents black and the diagonally

• For example, the magenta vertex is obtained by adding

The YIQ Parameters

The CMY and CMYK Color Models

The HSV Color Model The HSV Parameters

The HLS Color Model

You might also like