EEE6218 Visual Information Processing (VIP) : Topic 01: Digital Imaging
EEE6218 Visual Information Processing (VIP) : Topic 01: Digital Imaging
EEE6218 Visual Information Processing (VIP) : Topic 01: Digital Imaging
Frequency (Hz)
10 21
Gamma rays
Colour Vision
The electromagnetic spectrum is expressed
Wavelength ( m)
in terms of wavelength or frequency.
10 -7
10 20
10 -6
10 19
10 -5
10 18
X-rays
10 17
10 -4
10 -3
10 16
10 15
10 13
10 -1
Visible
1
Infrared
10 12
10 11
10 -2
Ultraviolet
10 14
10
10 2
Microwaves
10 3
10 10
10 4
10 9
10 5
Radio
10 8
10 6
10 7
10 7
10 6
10 8
10 5
10 9
10 4
10 10
Electromagnetic Spectrum
Speed of light , c =
choroid
Sclera
The eyeball is about 20 mm in diameter three main membranes - the cornea and sclera
outer cover, the choroid and the retina.
The cornea is a tough, transparent tissue that covers the anterior surface. The sclera is
the opaque membrane that covers the remainder of the eyeball.
The choroid contains a network of blood vessels that provide nutrition to the eye. The
choroid coat is heavily pigmented to limit extraneous light entering eye and reduce
backscatter.
Iris (pupil) can expand/dilate from 2 mm to 8 mm - front of iris contains the visible
pigment of the eye while the back contains a black pigment.
t1
t2
CIE
Colour
chart
Density of rods and cones for a cross section of right eye shown in the figure.
The nerves to the rods and cones exit the eye through the optic nerve - gives
rise to the Blind Spot.
The green and red cones are concentrated in the fovea centralis. The "blue"
cones have the highest sensitivity and are mostly found outside the fovea,
leading to some distinctions in the eye's blue perception.
Brightness Adaptation
The human eyes ability to discriminate between different intensity levels is an
important consideration for HVS models.
Brightness is the intensity as perceived by the HVS.
Although, the range of light intensity levels to which the HVS can adapt is huge, it
cannot operate on such a large range simultaneously.
It accomplishes this large variation by changes in its overall sensitivity. This is
called brightness adaptation.
The daylight vision (cone vision) adapts much more rapidly to changing light
levels.
E.g., adjusting to a change like coming indoors out of sunlight in a few seconds.
E.g., Can look into a flash light in daytime, but not in the night
The perceived brightness is not a simple function of intensity. This can be
explained using 2 phenomena.
1. Mach bands
2. Contrast sensitivity
Overshoot
Contrast Sensitivity
The perceived
brightness
depends on the
contrast with
respect to the
background.
Compare the shade of the ring before and after placing the solid bar
Acuity
Recognition acuity
Ability to identify objects. Not a suitable engineering measure
Detection acuity
Ability to perceive that an object is present (Limit is about 10 photons)
Resolution (or visual) acuity
Ability to resolve fine details (e.g., Snellen chart) clarity or clearness of ones vision
HVS Model
The HVS model addresses 3 main sensitivity variations as a function of
- light level
1
C = Contrast
- spatial frequency
Sensitivity s =
- signal content
C
Contrast C =
Lmax Lmin
Lmean
L = Luminance
Variable
Function
The modelled
component in the
eye
Light level
Amplitude nonlinearity
Retina
Spatial frequency
Contrast Sensitivity
Function (CSF)
Signal content
Masking
Neural Circuitry
White
IA
M(s) = (Lmax
IA - Lmin ) / (Lmax + Lmin)
Modulation, M(s)=
input
IM
Black
Distance, x
IM
x 100%
Imaging system
M(s) out
Modulation transfer function,
mtf (s) =
M(s)in
Intensity
White
output
Black
mtf (s)
1
Distance, x
0
0
spatial frequency, s
Time domain
Spatial domain
temporal frequency
spatial frequency
impulse function
amplitude response
modulation transfer
function
optical system
observer
scene
camera
studio
Tx
Rx
monitor
picture tube
mtf(s)system = mtf (s) optics x mtf (s) camera x x mtf (s) pic tube
cut-off ~ 70 cycles/degree
10
spatial frequency (cycles/degree)
100
1 cycle/degree
2 cycles/degree
Temporal sensitivity
Temporal sensitivity refers to our response to timing of visual signals.
Consider a light that is pulsing, such as the blinking cursor on a computer screen. It
is easy to discern that the light is blinking because of the low frequency, or rate at
which it flashes. Imagine gradually increasing the frequency. As the light blinks
faster and faster, it would eventually reach a rate where it would no longer be
possible to detect that the light is flashing, it would look like a "solid", or continuous
light. We say that our visual system has fused the flicker. The frequency at which
that occurs is called the critical flicker frequency (CFF)
Chief among the physical factors that determine the rate at which flicker is fused is
the location of the stimulus within the visual field. The peripheral receptor fields
(comprised of primarily rods) behave quite differently with respect to flicker than do
the foveal cones.
A flickering image on the fovea can be detected even if it blinks extremely rapidly, and
is only fused when it reaches a rate of 50 to 70 Hz, depending on adaptation level.
What resolution is needed for tv or computer displays to match the visual acuity
of humans?
You will need to discover some additional information to answer these questions.
Note
Conventional tv systems have the following specification - how does this fit in with
your estimates?
Number of active tv lines = 575
Aspect ratio (picture width:picture height) = 4:3
Frames per second = 25; fields per second = 50
y y y
= tan 1 if << 1
x x x
The minimum angle we can resolve is
1/12 degrees.
Image Acquisition
Illumination Source
Projection of the
Object on to the
Internal image plane
object
Imaging
system
Output digitized
Image.
2D array of
sensors
Image Acquisition
We denote images by 2D functions:
f(x,y) - This means amplitude of f at coordinates x and y.
(what coordinate system?)
When an image is generated from a physical process, its values are
proportional to energy radiated by the physical source.
Therefore we can say f(x,y) must be non-zero and finite.
Monochrome images
The intensity of a monochrome image at any coordinate (x.y) is called the
gray level (l)
we can write
l=f(x,y)
We can define the lower and upper bounds for l.
Image Representation
To generate digital images from the sensed data (voltage waveform output)
We need to convert the continuous signal into digital form.
Remember digitisation of data?
Two processes:
1. sampling
2. quantisation
In a 2D array of sensors, the spatial sampling is determined by the
arrangement of sensors. (we will discuss this later)
If there is no motion, the output of each sensor is quantised and gray level
values are mapped into discrete levels.
That means we can represent a digital image as a matrix. (An example of
image using M x N array of sensors.
f (0,0)
f (1,0)
f ( x, y ) =
f ( M 1,0)
f (0,1)
f (1,1)
...
...
M
f ( M 1,1) ...
f (0, N 1)
f (1, N 1)
f ( M 1, N 1)
Spatial Resolution
An example
256x256
64x64
128x128
32x32
90 x 72
QCIF
180 x 144
CIF
360 x 288
VCD resolution
4CIF
720 x 576
DVD resolution
SDTV
EDTV
HDTV
UHDTV 4 x HDTV
Compute Aspect Ratio
Are they 4:3 or 16:9?
p=progressive
(non-interlace)
i=interlace
How can you compute the total number of bits required to represent an N-bit
image of resolution P x Q.?
256 levels
8 bits
16 levels
4 bits
4 levels
2 bits
2 levels
1 bit
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
2
18
34
50
66
82
98
114
130
146
162
178
194
210
226
242
3
19
35
51
67
83
99
115
131
147
163
179
195
211
227
243
4
20
36
52
68
84
100
116
132
148
164
180
196
212
228
244
5
6
21 22
37 38
53 54
69 70
85 86
101 102
117 118
133 134
149 150
165 166
181 182
197 198
213 214
229 230
245 246
7
8
9 10 11
23 24 25 26 27
39 40 41 42 43
55 56 57 58 59
71 72 73 74 75
87 88 89 90 91
103 104 105 106 107
119 120 121 122 123
135 136 137 138 139
151 152 153 154 155
167 168 169 170 171
183 184 185 186 187
199 200 201 202 203
215 216 217 218 219
231 232 233 234 235
247 248 249 250 251
12
28
44
60
76
92
108
124
140
156
172
188
204
220
236
252
13
29
45
61
77
93
109
125
141
157
173
189
205
221
237
253
14
30
46
62
78
94
110
126
142
158
174
190
206
222
238
254
15
31
47
63
79
95
111
127
143
159
175
191
207
223
239
255
2
4
6
8
10
12
14
16
0
An Example
10
12
14
16
18
250
200
150
100
50
C1
C2
C3
A demo at http://www.cs.rit.edu/~ncs/color/a_chroma.html
Match to Cm by
varying C1, etc.
Original
Red
Green
Blue
Elements of Camera
Blue Detector
Three-chip Camera
3 CCDs
Green Detector
Lens System
Beam Splitter
Coloured Semi-silvered mirrors
Red Detector
Charge-Coupled Device
Hue
Saturation
Green Primary
Reference White
Red Primary
Blue Primary
Due to the position of the primary colours and reference white to produce
the effect of white light to the viewer need
0.30R + 0.59G + 0.11B
Colour Representation
Colour space models:
1. RGB (Additive Primaries) Human eye, image capture in cameras.
2. CMY (Subtractive Primaries) In printing industry
3. HSB or HSV (Hue-Saturation-Brightness) in Computer Graphics
4. YCbCr ( Luma and 2 Chroma Blue Chroma and Red Chroma)
- Used in when colour television broadcast was introduced.
RGB -> YCbCr
Y = 0.30 R + 0.59 G + 0.11B
Cb = B - Y
Cr = R - Y
YCbCr -> RGB
B = Cb + Y
R = Cr + Y
G = (Y - 0.30 R - 0.11 B)/0.59
RGB
Cb
Cr
Display Systems
Class Exercise
What resolution is needed for TV or computer displays to match the
visual acuity of humans?
Note
Conventional TV systems have the following specification:
Number of active tv lines = 575
Aspect ratio (picture width:picture height) = 4:3
Frames per second = 25;
fields per second = 50
How does this fit in with your estimates?
y y y
= tan 1 if << 1
x x x
We also know
W
4:3
H
W
8
W = 4 3 18 radian
= 10o
8H
H = 7o
W = 10o
With a spatial frequency cut-off (H and W) = ~60 cycles/degree
Now we can compute the number of cycles we the eye can resolve in both
directions
No. of cycles/picture height = ~420
No. of cycles/picture width = ~600
Consider picture composed of discrete elements
Picture elements = pixels = pels.
Now we need to convert the above measurement into pixels
Nyquist sampling limit would suggest
2 samples per cycle,
hence
No. of pixels/picture height = ~840
No. of pixels/picture width = ~1200
If the temporal cut-off frequency of human is ~ 70 Hz, then need 140 images per
second (remember Nyquist) to give impression of smooth moving images.
This means the bandwidth requirements for Monochrome and colour television
are
Bandwidth
pixels/line =
pixels/ frame.
bytes/pixel.
bytes/pixel x
pixels/frame
bytes/frame.
Now using, mains frequency and interlacing we can have 25 frames/sec frame
rate. This means a data rate of
bytes/frame x
frames/sec
bytes/sec.
RGB
Cb
Cr
EEE116.7.60
Chrominance subsampling
There are three Y Cb Cr formats:
4:4:4
4:2:2
4:2:0
Cb
x x
line 2
Cr
x x
x
Cb
Total bits
=N(WH+WH+WH)
=3NWH
4:4:4
WxH
WxH
WxH
4:2:2
Total bits
=N(WH+WH/2+WH/2)
=2NWH
WxH
4:2:0
W/2 x H
W/2 x H/2
WxH
W/2 x H
W/2 x H/2
Total bits
=N(WH+WH/4+WH/4)
=1.5NWH
pixels/line
Lines/frame
H x W pixels/frame (This is the pixels in Y channel)
bits/pixel (bpp)
S x N x H x W bits/frame
frames/sec
S x N x H x W x F bits/sec
Considering
S= ?
Another Example
Derive bit rate for video transmission over mobile networks
using QCIF 4:2:0 format
Summary
W
Spatial Resolution:
Pixel concept
WxH
Aspect ratio W : H
Used Visual Acuity to find
4:3
8H
H = 7o
W = 10o
2. Compute the data rate for transmitting QCIF 4:2:0 and 6.25 fps video over a
mobile communication link
What is available:
Capacity of a DVD Standard modem Broadband modem Mobile phone -
~4.5 Gbytes
~20K bits/sec
~1M bits/sec
~128K bits/sec
Homework: MATLAB
MATLAB Preliminaries
A good tutorial MATLAB Primer
http://www.math.toronto.edu/mpugh/primer.pdf
Useful commands: help, lookfor, helpdesk, who, whos
How to read an image into MATLAB (hint: lookfor image and
help <Command name>)
What image file formats can be read in MATLAB?
Homework: MATLAB
Exercise 1:
Download the image testimage.png from MOLE into your PC
Read testimage.png into MATLAB
What are the image dimensions? How many colour components? What
data type has been used?
Display image using MATLAB
Now convert the image into its luminance format
Save the luminance image and find out the file extension of the saved file
Clear the memory space
Load back the saved image
What is the average luminance value?
Reduce the resolution of the image by 2 and display
What is the current bit depth resolution? Reduce the bit depth resolution
by 2 and display.
Display a 100x128 rectangular region starting from the point (200,128) of
the original luminance image in a new window.
Homework: DSP
Next Lecture: Filtering and Transforms
Revise