Digital Image Processing Question Answer Bank
Digital Image Processing Question Answer Bank
UNIT I
PART- A
1. Define Image
An image may be defined as two dimensional light intensity function f(x, y) where x and
y denote spatial co-ordinate and the amplitude or value of f at any point (x, y) is called
intensity or grayscale or brightness of the image at that point.
3. Define Brightness
Brightness of an object is the perceived luminance of the surround. Two objects
with different surroundings would have identical luminance but different brightness.
27. Find the number of bits required to store a 256 X 256 image with 32 gray levels
32 gray levels = 25
= 5 bits
256 * 256 * 5 = 327680 bits.
28. Write the expression to find the number of bits to store a digital image?
The number of bits required to store a digital image is
b=M X N X k
When M=N, this equation becomes
b=N^2k
I+I
PART-C
The problem domain in this example consists of pieces of mail and the objective is to
read the address on each piece. Thus the desired output in this case is a stream of
alphanumeric characters.
The first step in the process is image acquisition that is acquire a digital image .To
do so requires an imaging sensor and the capability to digitize the signal produced by the
sensor.
After the digital image has been obtained the next step deals with preprocessing
that image. The key function of this is to improve the image in ways that increase the
chances for success of the other processes.
The next stage deals with segmentation. Broadly defined segmentation partitions
an input image into its constituent parts or objects. The key role of this is to extract
individual characters and words from the background,
The output of the segmentation stage usually is raw pixel data, constituting either
the boundary of a region or all the points in the region itself.
Choosing a representation is only part of the solution for transforming raw data
into a form suitable for subsequent computer processing. Description also called feature
selection deals with extracting features that result in some quantitative information of
interest that are basic for differentiating one class of object from another.
The last stage involves recognition and interpretation. Recognition is the process
that assigns a label to an object based on the information provided by its descriptors.
Interpretation involves assigning meaning to an ensemble of recognized objects.
Knowledge about a problem domain is coded into an image processing system in
the form of knowledge database. This knowledge may be simple as detailing regions of
an image where the information of interest is known to be located thus limiting the search
that has to be conducted in seeking that information.
The knowledge base also can be quite complex such as an interrelated list of all
major possible defects in a materials inspection problem or an image database containing
high resolution satellite images of a region in connection with change detection
application.
Although we do not discuss image display explicitly at this point it is
important to keep in mind that viewing the results of image processing can take place at
the output of any step.
GB Sector(120o <= H < 240o).If the given value of H is in this ,we first subtract 120o
from it
H = H -120o
Then the RGB components are
B = I (1 – S)
G = I [1 + S cos H/cos(60o - H)]
B = 1 - (R + G)
BR Sector(240o <=H<=360o).Finally if H is in this range we subtract 240o from it
H = H - 240o
Then the RGB components are
G = I (1 - S)
B = I [1 + S cos H/cos(60o - H)]
R = 1 - (G + B)
6. Describe the basic relationship between the pixels
2-D Mathematical preliminaries
• Neighbours of a pixel
• Adjacency, Connectivity, Regions and Boundaries
• Distance measures
Neighbours of a pixel
• A pixel p at coordinates (x,y) has four horizontal and vertical neighbours whose
coordinates are given by
(x+1,y), (x-1,y), (x,y+1), (x,y-1).
• This set of pixels, called the 4-neighbours of p, is denoted by N4(p). Each pixel is
a unit distance from (x,y) and some of the neighbours of p lie outside the digital
image if (x,y) is on the border of the image.
• The four diagonal neighbours of p have coordinates
(x+1,y+1), (x+1,y-1), (x-1,y+1), (x-1,y-1)
• And are denoted by ND(p). These points together with the 4-neighbours are called
the 8-neighbours of p, denoted by N8(p).
Adjacency, Connectivity, Regions and Boundaries
Three types of adjacency:
• 4-adjacency. Two pixels p and q with values from V are 4-adjacent if q is in the
set N4(p).
• 8-adjacency. Two pixels p and q with values from V are 8-adjacent if q is in the
set N8(p).
• M-adjacency. Two pixels p and q with values from V are m-adjacent if
q is in N4(p), or q is in ND(p) and the set N4(p) N4(q) has no pixels whose
values are from V.
• A (digital) path (or curve) from pixel p with coordinates (x,y) to pixel q with
coordinates (s,t) is a sequence of distinct pixels with coordinates
(x0,y0), (x1,y1),................(xn,yn)
Where (x0,y0)= (x,y), (xn,yn)=(s,t) and pixels (xi,yi) and (xi-1,yi-1) are adjacent for
1<=i<=n. N is the length of the path.
Distance measures
• For pixels p,q and z with coordinates (x,y), (s,t) and (v,w) respectively, D is a
distance function or metric if
D(p,q)>=0 (D(p,q)=0 iff p=q),
D(p,q) = D(q,p) and
D(p,z) <= D(p,q) + D(q,z)
• The Euclidean distance between p and q is defined as,
De(p,q) = [(x-s)2+(y-t)2]
• The D4 distance (also called city-block distance) between p and q is defined as
D4(p,q) = |x-s|+|y-t|
• The D8 distance (also called chessboard distance) between p and q is defined as
D8(p,q) = max( |x-s|+|y-t|)
UNIT II
1. What is the need for transform?
The need for transform is most of the signals or images are time domain signal (ie)
signals can be measured with a function of time. This representation is not always best.
For most image processing applications anyone of the mathematical transformation are
applied to the signal or images to obtain further information from that signal.
2. Symmetry
WN^(K+N/2)= -WN^K
1 0
1 0 0 0
-aN bN
aN bN
S
0
0 I(N/2-1) 0 I(N/2-1) N/2
SN = 1/√2 1 0 0 -1
0 0
-bN aN bN aN
S
0
0 I(N/2-1) 0 -I(N/2-1) N/2
X1
X2
X= .
.
Xn
U= m=1∑rψ√λm φmT
This equation is called as singular value decomposition of an image.
PART-B
1. Write short notes on Discrete Cosine Transform (DCT)
• I-D DCT
I-D DCT is defined as
C(u)= α(u) x=0∑N-1f(x)cos[(2x+1)uπ/2N] for u=0,1,2,…….N-1
Inverse DCT is defined as
f(x)= u=0∑N-1 α(u)C(u)cos[(2x+1)uπ/2N] for x=0,1,2,…….N-1
In both cases α(u)=1/√N for u=0 and √2/√N for u=1,2,…….N-1
• 2-D DCT
I-D DCT is defined as
C(u,v)= α(u) α(v) x=0∑N-1 y=0∑N-1f(x,y)cos[(2x+1)uπ/2N]cos[(2y+1)vπ/2N]
for u,v=0,1,2,…….N-1
Inverse DCT is defined as
f(x,y)=u=0∑N-1 v=0∑N-1 α(u) α(v)C(u,v)cos[(2x+1)uπ/2N] cos[(2y+1)uπ/2N]
for x,y=0,1,2,…….N-1
In both cases α(u)=1/√N for u=0 and √2/√N for u=1,2,…….N-1
X1
X2
X= .
.
Xn
• Features:
4. Rotation
Polar Coordinates x=rcosθ, y=rsinθ, u=wsinΦ, v=wsinΦ then f(x,y) and F(u,v)
become f(r,θ) and F(w,Φ) respectively. Rotating f(x,y) by an angle θ0 rotates F(u,v) by
the same angle. Similarly rotating F(u,v) rotates f(x,y) by the same angle.
i.e, f(r,θ+ θ0) F(w,Φ+ θ0)
5. Distributivity and scaling
• Distributivity:
The Discrete Fourier Transform and its inverse are distributive over addition but
not over multiplication.
F[f1(x,y)+f2(x,y)]=F[f1(x,y)]+F[f2(x,y)]
F[f1(x,y).f2(x,y)]≠F[f1(x,y)].F[f2(x,y)]
• Scaling
For the two scalars a and b,
Af(x,y) aF(u,v) and f(ax,by) 1/│ab│F(u/a,v/b)
6. Laplacian
The Laplacian of a two variable function f(x,y) is defined as ▼2f(x,y)=∂2f/∂x2+∂2f/∂y2
7. Convolution and Correlation
• Convolution
The convolution of two functions f(x) and g(x) denoted by f(x)*g(x) and is
defined by the integral, f(x)*g(x)=-∞∫∞f(α)g(x-α)dα where α is a dummy variable.
Convolution of two functions F(u) and G(u) in the frequency domain
=multiplication of their inverse f(x) and g(x) respectively.
Ie, f(x)*g(x) F(u)G(u)
• Correlation
The correlation of two functions f(x) and g(x) denoted by f(x)оg(x) and is defined
by the integral, f(x)оg(x)=-∞∫∞f*(α)g(x+α)dα where α is a dummy variable.
For the discrete case fe(x)оge(x)= 1/M M=0∑M-1f*(m)g(x+m)
fe(x)= {f(x), 0≤x≤A-1,
{0 , A≤x≤M-1
ge(x)= {g(x), 0≤x≤B-1,
{0 , B≤x≤N-1
f(x) (-1) ∑
n −1
∑
N −1
b (x) b (u)
H(u)=1/N x =0 i =0
i i
= ∑
n −1
x =0 f (x) g (x , u )
(-1) ∑
n −1
Hadamard kernel.
bk(x) is the kth bit binary representation of z
bi (z)=1
∑
N −1
• 2D Hadamard Transform
f(x , y) (-1) ∑
n −1
∑ ∑
n −1 N −1
b (x) b (u) + b (y) b (v)
H(u,v) = 1/N x =0 y =0 i =0
i i i i
∑ ∑
= n −1 N −1
x =0 y =0 f(x , y) g(x,y,u,v)
∑
n −1
where g (x, y ,u , v)= 1/N (-1) i =0 b (x) b (u) + b (y) b (v)
i i i i
∑
n −1
∑ ∑
n −1 N −1
Similarly ,f(x,y)= b (x) b (u) + b (y) b (v)
= 1/N u =0 v =0 H(u.v) (-1) i =0
i i i i
∑ ∑
n −1 N −1
f(x,y)=
= u =0 v =0 H (u.v) h(x, y , u , v)
∑
n −1
Where h( x , y , u , v) = 1/N (-1) i =0 [ bi(x) bi(u) + bi(y) bi(v) ] which is the inverse
kernel Therefore ,forward and reverse kernel are same
• Ordered Hadamard Transform
1D Ordered Hadamard Transform
f(x) (-1) ∑
n −1
∑
N −1
b (x) p (u)
H(u)=1/N x =0 i =0
i i
∑
n −1
= x =0 f (x) g (x , u )
(-1) ∑
n −1
(-1) ∑
n −1
• 2D ordered HT Pair
∑ ∑ f(x , y) (-1) ∑
n −1 N −1 n −1
= ∑ ∑
n −1 N −1
x =0 y =0
f(x , y) g(x,y,u,v)
(-1) ∑
n −1
∑ ∑ H(u.v) (-1) ∑
n −1 N −1 n −1
∑ ∑
n −1 N −1
(-1) ∑
n −1
When N=8
u v 0 1 2 3 4 5 6 7
0 + + + + + + + +
1 + + + + - - - -
2 + + - - + + - -
3 + + - - - - + +
4 + - + - + - + -
5 + - + - - + - +
6 + - - + + - - +
7 + - - + - + + -
2D Walsh Transform
W(u,v)=1/N x=0∑ N-1 y=0∑ N-1f(x,y) i=0Π n-1(-1)bi(x)bn-1-i(u)+ bi(y)bn-1-i(v)
forward transformational kernel, g(x,y,u,v)=1/N{ i=0Π n-1(-1)bi(x)bn-1-i(u) + bi(y)bn-1-i(v)}
W(u,v)= x=0∑ N-1 y=0∑ N-1f(x,y)g(x,y,u,v)
3)
7)
1)
4)
2)
5)
Fig .Basic set of the discrete cosine transform. The numbers correspond to the row of the
transform matrix.
Also, the basis matrices show increased variation as we go from the top-left
matrix, corresponding to the θ 00 coefficient, to the bottom-right matrix, corresponding
to the θ (N-1)(N-1) coefficient.
The DCT is closely related to the discrete Fourier transform(DFT) and DCT can
be obtained from DFT. In terms of compression, the DCT performs better than the DFT.
In DFT, to find the Fourier coefficients for s sequence of length N, we assume
that the sequence is periodic with period N. The DFT assumes that the sequence outside
the interval behaves in a different manner. This introduces sharp discontinuities, at the
beginning and the end of the sequence. In order to represent these sharp discontinuities,
the DFT needs nonzero coefficients for the high-frequency components. Because these
components are needed only at the two end points of the sequence, their effect needs to
be cancelled out at other points in the sequence. Thus, the DFT adjusts other coefficients
accordingly. When we discard the high frequency coefficients during the compression
process, the coefficients that were cancelling out the high-frequency effect in other parts
of the sequence result in the introduction of additional distortion.
The DCT can be obtained using the DFT by mirroring the original N-point
sequence to obtain a 2N-point sequence. The DCT is simply the first N points of the
resulting 2N-point DFT. When we take the DFT of the 2N-point mirrored sequence, we
again have to assume periodicity. Here it does not introduce any sharp discontinuities at
the edges.
The DCT is better at energy compaction for most correlated sources when
compared to the DFT. For Markov sources with high correlation coefficient ρ ,
E[ xnxn + 1]
ρ= ,
E[ xn 2]
The compaction ability of the DCT is very close to that of the KLT. As many sources can
be modelled as Markov sources with high values for ρ , this superior compaction ability
has made the DCT the most popular transform. It is a part of many international
standards, including JPEG,MPEG and CCITT H.261.
UNIT III
5. Define histogram.
The histogram of a digital image with gray levels in the range [0, L-1] is a
discrete function h(rk)=nk.
rk-kth gray level
nk-number of pixels in the image having gray level rk.
k k
Sk= T(rk) = ∑ Pr(rj) = ∑ nj/n where k=0,1,2,….L-1
j=0 j=0
This transformation is called histogram equalization.
0 -1 0 -1 -1 -1
-1 A+4 -1 -1 A+8 -1
0 -1 0 -1 -1 -1
η(x,y)
H
g(x,y)
f(x,y)
A system operator H, which together with an additive white noise term η(x,y) a
operates on an input image f(x,y) to produce a degraded image g(x,y).
23. Give the relation for degradation model for continuous function
g(x,y) =-∞∫∞∫f(α,β)§(x-α,y-β).dαdβ+η(x,y)
31. Which is the most frequent method to overcome the difficulty to formulate the
spatial relocation of pixels?
The point is the most frequent method, which are subsets of pixels whose location
in the input (distorted) and output (corrected) imaged is known precisely.
32. What are the three methods of estimating the degradation function?
1. Observation
2. Experimentation
3. Mathematical modeling.
mean μ=a+√πb/4
standard deviation σ2=b(4-π)/4
PART-B
1.Discuss different mean filters
Arithmetic mean filter
ƒ^(x,y)=1/mn Σ g(s,t)
(s,t)ЄSxy
Geometric mean filter
An image restored using a geometric mean filter is given by the expression
f^(x,y) = [ п g(s,t) ]
(s,t)ЄSxy
here ,each restored pixel is given by the product of the pixels in the subimage
window , raised to the power 1/mn
• Harmonic filters
The harmonic mean filtering operation is given by the expression
ƒ ^(x,y) = mn/∑(1/g(s,t))
• Contra harmonic mean filter
Contra harmonic mean filtering operation yields a restored image based on the
expression
Q+1 Q
f^(x,y)=∑g(x,t) /∑g(s,t)
where Q is called the order of the filter.This filter is well suited for
reducing or virtually eliminating the effect of salt and pepper noise.
G(u,v)=H(u,v)F(u,v)+N(u,v)
Where the terms in capital letters are the Fourier transforms of the corresponding
terms in the previous equation.
2,3,4,5,7,9,10,20,30
-Take the median value.
-This median filter is the non-linear spatial filtering.
1)median filtering smoothing
2)Max filter
3)Min filter
Max filter:
R=Max
-Max filter gives the brightest points
Min filter:
R=Min
-It helps to get the largest point in the image.
PART-C
1.Explain Histogram processing
• The Histogram of the digital image with gray levels in the range [0,L-1]is the
discrete function p(rk)=nk/n where rk is the kth gray level, nk is the number of
pixel,n is the total number of pixel in the image and k=0,1,2,…….L-1.
• P(rk) gives the an estimate probability of occurrence of gray-level rk.. Figure
show the the histogram of four basic types of images.
Figure: Histogram corresponding to four basic image types
Histrogram Equalization
• Let the variable r represent the gray levels in the image to be enhanced. The pixel
value are continous quantities normalized that lie in the interval [0,1] with r=0
represent black with r=1 represent white.
• The transformation of the form
• S=T(r) …………………………………(1)
• Which produce a level s for every pixel value r in the original image.it satisfy
condition:
o T(r) is the single-valued and monotonically increasing in the interval
0≤r≤1 and
o 0≤T(r)≤1 for 0≤r≤1
Condition 1 preserves the order from black to white in the gray
scale.
Condition 2 guarantees a mapping that is consistent with the
allowed range of pixel values.
R=T-¹(s) 0≤s≤1 ………………………..(2)
• The probability density function of the transformed graylevel is
Ps(s)=[pr(r)dr/ds] r=T-¹(s) …………………….(3)
• Consider the transformation function
S=T(r)= ∫Pr(w)dw 0≤r≤1 …………………….(4)
Where w is the dummy variable of integration .
From Eqn(4) the derivative of s with respect to r is
ds/dr=pr(r)
Substituting dr/ds into eqn(3) yields
Ps(s)=[1] 0≤s≤1
Histogram Specfication
• Histogram equalization method does not lent itself to interactive application .
• Let Pr(r) and Pz(z) be the original and desired probability function. Suppose the
histogram equalization is utilized on the original image
S=T(r)=∫Pr(w) dw ………………………………….(5)
W1 W2 W3
W4 W5 W6
W7 W8 W9
• Denoting the gray level of pixels under the mask at any location by
z1,z2,z3……,z9, the response of a linear mask is
R=w1z1+ w2z2 +………..+w9z9
Smoothing Filters
• Lowpass Spatial filtering:
The filter has to have positive coefficient.
The response would be the sum of gray levels of nine pixels which
could cause R to be out of the gray level range.
The solution is to scale the sum by dividing R by 9.The use of the
form of mask are called neighborhood averaging
1 1 1
1/9 1 1 1
1 1 1
• Median filtering:
To achive noise reduction rather than blurring.
The gray level of each pixel is replaced by the median of the gray
level in the neighbourhood of that pixel
Sharpening Filters
• Basic highpass spatial filtering:
The filter should be positive ecoefficient near the center and
negative in the outer periphery.
The sum of the coefficient are 0.
This eliminate the zero- frequency term reducing significantly the
global contrast of the image
-1 -1 -1
1/9* -1 8 -1
-1 -1 -1
• High_boost filtering:
The definition is
High-boost=(A)(Original)-Lowpass
=(A-1) (Original)+ Original –Lowpass
=(A-1) (Original)+Hignpass
• Derivative Filters:
Averaging is anlog to integration , differentiation can be expected
to have opposite effect and thus sharpen the image
Where F i(u,v)) and F(r(u,v)) are the Fourier transformation of i(x,y)and r(x,y)
respectively.
• The inverse (exponential) operation yields the desird enhanced image, denoted by
g(x,y); that is,
Ln[f(x,y)] = ln[i(x,y) r(x,y)]
F[ln(f(x,y))] = F[ln(i(x,y)]+F[ln( r(x,y))]
mean μ=a+√πb/4
standard deviation σ2=b(4-π)/4
Gamma noise:
The PDF is
P(Z)=ab zb-1 ae-az/(b-1) for Z>=0
0 for Z<0
mean μ=b/a
standard deviation σ2=b/a2
Exponential noise
The PDF is
P(Z)= ae-az Z>=0
0 Z<0
mean μ=1/a
standard deviation σ2=1/a2
Uniform noise:
The PDF is
P(Z)=1/(b-a) if a<=Z<=b
0 otherwise
mean μ=a+b/2
standard deviation σ2=(b-a)2/12
Impulse noise:
The PDF is
P(Z) =Pa for z=a
Pb for z=b
0 Otherwise
UNIT I V
1. What is segmentation?
Segmentation subdivides on image in to its constitute regions or objects. The level
to which the subdivides is carried depends on the problem being solved .That is
segmentation should when the objects of interest in application have been isolated.
7. What is edge?
An edge isa set of connected pixels that lie on the boundary between two regions
edges are more closely modeled as having a ramplike profile. The slope of the ramp is
inversely proportional to the degree of blurring in the edge.
PART –B
1. Write short notes on image segmentation.
• Segmentation subdivides on image in to its constitute regions or objects. The level
to which the subdivides is carried depends on the problem being solved .
• Examples: In autonomous air to ground target acquisition applications identifying
vehicles on a road is of interest.
• The first step is to segment the road from the image and then to segment the
elements of the road down to objects of a range of sizes that correspond potential
vehicles.
• In target acquistition ,the system designer has n control of the environment.
• So the usual approach is to focus on selecting the types of sensors most likely to
enhance the objects of interest .
• Example is the use of infrared imaging to detect objects with a strong heat
signature,such as tanks in motion.
• Segmentation algorithms for monochrome images are based on one of the two
basic properties of gray level values . They are discontinuity and similarity.
• Based on the first category ,the approach is based on abrupt changes in gray level
and the areas of interest based on this category are detection of isolated points
and detection of lines and edges in an image.
• Based on the second category the approach is based on thresholding, region
growing and region splitting and merging .
• The concept of segmenting an image based on discontinuity or similarity of the
gray level values of its pixels is applicable to both static and dynamic images.
▼ƒ≡grad(ƒ)≡ gy = ∂ƒ/∂y
The real-time automatic images processing and pattern recognition are very important for
many problems in medicine, physics, geology, space research, military applications and
so on. For example, it is necessary for pilots and drivers for immediate decision-making
in poor visibility conditions. An approach to image enhancement through artificial neural
network’s (ANN) processing is proposed.ANN is for images enhancement through
approximation of image transform function T. This function is approximated with use of
ANN which is trained evolutionary in the time of test images processing. Each ANN is
genetically encoded as the list of its connections. Truncation selection is used for parental
subpopulation formation. Original crossover and mutation operators, which respect
structures of the ANNs undergoing recombination and mutation, are used. Nodes with
sigmoid activation functions are considered. The population size adapts to the properties
of evolution during the algorithm run using simple resizing strategy. In this application
pixel-by-pixel brightness processing with use of ANN paradigm is adopted. The topology
of ANN is tuned simultaneously with connections weights. The ANN approximating T
function should have three input nodes and one output node. During the training we
evaluate each ANN with respect to the visual quality of the processed images.
The artificial neural network training stage with use of single 128х128 pixels image takes
about 70 seconds on the Intel Pentium IV 3 GHz processor. After completion of the
learning process the obtained artificial neural network is ready to process
arbitrary images that were not presented during the training. The processing time for
512х512 pixels image is about 0.25 second. The ANN, as a rule, included 3 input nodes,
one or more hidden nodes and one output node.
PART-C
Here, P(Ri) is a logical predicate defined over the points in set Ri and Ф is the null set.
Condition (a) indicates that the segmentation must be complete that is every pixel must
be in a region.
Condition (b) requires that points in a region must be connected in some predefined
sense.
Condition(c) indicates that the regions must be disjoint.
Condition(d) deals with the properties that must be satisfied by the pixels in a segmented
region.
Region Growing:
As its name implies region growing is a procedure that groups pixel or subregions
into larger regions based on predefined criteria. The basic approach is to start with a set
of “seed” points and from these grow regions.
If the result of these computation shows clusters of values, the pixels whose properties
place them near the centroid of these clusters can be used as seeds.
Descriptors alone can yield misleading results if connectivity or adjacency information is
not used in the region growing process.
Region Splitting and Merging:
The procedure just discussed grows regions from a set of seed points. An
alternative into subdivided an image initially into a set of arbitrary, disjointed regions and
then merge and/or split the regions in an attempt to satisfy the conditions.
R1 R2
R3 R41 R42
R43 R44
1. Split into four disjoint quadrants any region Ri for which P(Ri)=FALSE.
2. Merge any adjacent regions Rj and Rk for which P(RjURk)=TRUE.
3. Stop when no further merging or splitting is possible.
Mean and standard deviation of pixels in a region to quantify the texture of region.
Role of thresholding:
We introduced a simple model in which an image f(x,y) is formed as the
product of a reflectance component r(x,y) and an illumination components i(x,y).
consider the computer generated reflectance function.
The histogram of this function is clearly bimodal and could be portioned easily by
placing a single global threshold, T, in the histogram valley.
Multiplying the reflectance function by the illumination function.
Original valley was virtually eliminated, making segmentation by a single
threshold an impossible task.
Although we seldom have the reflectance function by itself to work with, this
simple illustration shows that the reflective nature of objects and background can
be such that they are separable.
ƒ(x,y)=i(x,y)r(x,y)
Taking the natural logarithm of this equation yields a sum:
z(x,y)=ln ƒ(x,y)
=ln i(x,y)+ln r(x,y)
=i (x,y)+r (x,y)
If i (x,y) and r (x,y) are independent random variable, the histogram of z(x,y) is
given by the convolution of the histogram of i (x,y) and r (x,y).
But if i (x,y) had a border histogram the convolution process would smear the
histogram of r (x,y), yielding a histogram for z(x,y) whose shape could be quite
different from that of the histogram of r (x,y).
The degree of distortion depends on the broadness of the histogram of i (x,y),
which in turn depends on the nonuniformity of the illumination function.
We have dealt with the logarithm of ƒ(x,y), instead of dealing with the image
function directly.
When access to the illumination source is available, a solution frequently used in
practice to compensate for nonuniformity is to project the illumination pattern
onto a constant, white reflective surface.
This yields an image g(x,y)=ki(x,y), where k is a constant that depends on the
surface and i(x,y) is the illumination pattern.
For any image ƒ(x,y)=i(x,y)r(x,y) obtained from the same illumination function,
simply dividing ƒ(x,y) by g(x,y) yields a normalized function h(x,y)=
ƒ(x,y)/g(x,y)= r(x,y)/k.
Thus, if r(x,y) can be segmented by using a single threshold T, then h(x,y) can be
segmented by using single threshold of value T/k.
A pattern class is a family of patterns that share some common properties .Pattern
classes are denoted w1,w2,----wm, where M is the number of classes
Three principle pattern arrangements used in practice are vectors(for quantitative
descriptors ) and strings and trees (for structural descriptions) .
Patternvectors are represented by bold lower case letters such as x,y, and z,where
Each component x represent the ith descriptors.Pattern vectors are represented in
coloumns (i.e. n x 10 marices) or in the equilant form x=9x1,x2,------xn)T,T-transpose.
The nature of the pattern vector depends on the measurement technique used to describe
the physical pattern itself.
Ex. If we want to describe the three types of iris floers(iris setosa,virginica,and
versicolor)by measuring the width and length of the petals.It is represented in the vector
form x=[x1,x2]T;x1,x2 correspond to width length respectively.Three pattern classes are
w1,w2,w3 corresponding to the three verities.
Because the petals of all flowers vary in width and length to some degree the pattern
vectors describing three flowers also will vary, not only between different classes ,but
also with in a class.
The result of this classic feature selection problem shows that the degree of class
seperability depends strongly on the choice of pattern measurements selected for an
application.
UNIT V
PART B
Compression: It is the process of reducing the size of the given data or an image. It will
help us to reduce the storage space required to store an image or File.
Data Redundancy:
The data or words that either provide no relevant information or simply
restate that which is already known .It is said to be data redundancy.
` Cr=N1/N2.
Types of Redundancy
3. Psychovisual Redundancy:
Certain information simply has less relative importance than other information in the
normal visual processing. This information is called Psycovisual Redundant.
In this approach the lable for the DC and AC coefficient are coded differently using
Huffman codes. The DC coefficient values partitioned into categories. The categories are
then Huffman coded. The AC coefficient is generated in slightly different manner. There
are two special codes: End-of-block(EOF) and ZRL
0 0
1 -1 1
2 -3 -2 2 3
3 -7 ……………………. -4 4 ……… 7
……………………………………….
Table: sample table for obtaining the Huffman code for a given label value and run length
To design a Huffman code ,we first sort the letters in descending probability
Entropy =H(ak)= - k =1 ∑ M
p(ak)log2p(ak)
=0.7667
Find Efficiency
Efficiency = η = Entropy/average length
=0.284%
Find redundancy
PART -C
There are two Structural model and they are broadly Classified as follows
1. An Encoder
2. A Decoder.
Encoder Channel
Decoder
An Input image f(x,y) is fed in to encoder and create a set of symbols and after
transmission over the channel ,the encoded representation is fed in to the decoder.
The Source Encoder Will removes the input redundancies. The channel
encoder will increase the noise immunity of the source encoder’s output. If the channel
between encoder and decoder is noise free then the channel encoder and decoder can be
omitted.
Quantize Symbol
Mapper
r Encoder
MAPPER:
It transforms the input data in to a format designed to reduce the interpixel redundancy in
the input image.
QUANTIZER:
It reduce the accuracy of the mapper’s output.
SYMBOL ENCODER:
It creates a fixed or variable length code to represent the quantizer’s output
and maps the output in accordance with the code.
Symbol Inverse
decoder mapper
SYMBOL DECODER:
The inverse operation of the source encoder’s symbol will be performed
and maps the blocks.
a
II
V
a3 a4
VI VII
I
This selection guarantees that the largest coefficient will lie in the interval[T0,2T0].In
each pass,the threshold Ti is reduced to half the value it had in the previous pass:
Ti=1/2(Ti-1)
For given value of Ti,we assign one of four possible labels to the coefficients:
1.significance positive(sp)
2.significant negative(sn)
3.zerotree root(zr)
4.isolated zero(iz)
The coefficients labeled significant are simply those that fall in the outer levels of the
quantized and are assigned an initial reconstructed value of 1.5Ti or -1.5Ti,depending on
whether the coefficient is positive or negative.
Introduction:
-The basic structure of the compression algorithm proposed by mpeg is very similar to
that of ITU-T H.261
-In mpeg the blocks are organized in macro blocks which are defined in the same manner
as that of H.261 algorithm
-The mpeg standard initially had applications that require digital storage and retrieval as a
major focus
Frames
I-Frames
-Mpeg includes some frames periodically that are coded without any reference to the past
Frames. These frames are called I-frames
-I-frames do not use temporal correlation for prediction.Thus the number of frames
between two consecutive Iframes is a trade-off between compression efficiency and
convenience.
P and B frames
-In order to improve the compression efficiency mpeg1 algorithm contains two other
types of frames: predictive coded and bidirectionally predictive coded frames
-Generally the compression efficiency of P-frames is substantially higher than Iframes
Anchor frames
-The I and P frames are sometimes are anchor frames
-To compensate for the reduction in the amount of compression due to the frequent use of
Iframes the mpeg standard introduced Bframes
Group of pictures(GOP)
-GOP is a small random access unit in the video sequence
-The GOP structure is set up as a tradeoff between the high compression efficiency of
-Motion compensated coding and the coding and the fast picture acquisition capability of
periodic intra-only processing
-The format for mpeg is very flexible however the mpeg committee has provided some
suggested value for the various parameters
-For Mpeg 1 these suggested values are called the constraint parameter bitstream
MPEG2
-It takes a toolkit approach providing a number of subsets each containing different
options
-For a particular application the user can select from a set of profiles and levels
Types of profiles
-Simple
-Main
-Snr-scalable
-Spatially scalable
-High
-Simple profile uses the Bframes.but removal of the Bframes makes the requirements
simpler.
MPEG 4
-Provides a more abstract approach to the coding of multimedia.The standard views the
multimedia scene as a collection of objects.These objects can be coded independently.
-Language called the binary format for scenes based on the virtual reality modeling
language has been developed by Mpeg.
-The protocol for managing the elementary streams and their multiplexed version called
the delivery multimedia integration framework is a part of Mpeg4
-The different objects that makeup the scene are coded and sent to the multiplexer
-The information about the presence of these objects is also provided to the motion
compensator predictor
-It is also used in facial animation controlled by facial definition parameter
-It allows for object scalability.
Q-1
Inverse
DCT
Predictor 1 Frame
switch store
Predictor 2
Predictor 3
Shape
ion
estimat
n
Motio
coding
MPEG7:
-Focus on the development of a multimedia content description interface seems to be
somewhat removed from the study of data compression
-These activities relate to the core principles of data compression which is the
development of compact descriptions of information
In vector quantization we group the source output into blocks or vectors. This vector of
source outputs forms the input to the vector quantizer. At both the encoder and decoder of
the vector quantizer, we have a set of L-dimensional vectors called the codebook of the
vector quantizer. The vectors in this codebook are known as code-vectors. Each code
vector is assigned a binary index.
At the encoder, the input vector is compared to each code-vector in order to find
the code vector closest to the input vector
In order to inform the decoder about which code vector was found to be the
closest to the input vector, we transmit or store the binary index of the code-vector.
Because the decoder has exactly the same codebook, it can retrieve the code vector
Although the encoder have to perform considerable amount of computations in
order to find the closest reproduction vector to the vector of source outputs, the decoding
consists of a table lookup. This makes vector quantization a very attractive encoding
scheme for applications in which the resources available for decoding are considerably
less than the resources available for encoding
We want to have the sub interval (tag) in the full [0,1) interval
E1:[0,0.5) E1(x)=2x
E1:[0.5,1) E1(x)=2(x-0.5)
This process of generating the bits if the tag without waiting to see the entire sequence is
called incremental encoding
Solution:
• first element is 1
Initialize u0=1 l0=0
l1=0+(1-0)0=0
u1=0+(1-0)0.8=0.8
The interval [0,0.8) is either in the upper or the lower half of unit interval so proceed
• Second element 3
l2=0+(0.8-0)0.82=0.656
u2=0+(0.8-0)0.1=0.8
interval [0.656,0.8) is in the upper limit. Send the binary code 1 and scale
l2=2(0.656-0.5)=0.312
u2=2(0.8-0.5)=0.6
• Third element 2
l3=0.312+(0.6-0.312)0.8=0.5424
u3=0.312+(0.6-0.312)0.82=0.54816
interval [0.5424,0.54816) is in the upper limit. Send the binary code 1 and scale
l3=2(0.5424-0.5)=0.0848
u3=2(0.54816-0.5)=0.09632
interval [0.0848,0.09632) is in the lower limit. Send the binary code 0 and scale
l3=2*0.0848=0.1696
u3=2*0.09632=0.19264
interval [0.1696,0.19264) is in the lower limit. Send the binary code 0 and scale
l3=2*0.1696=0.3392
u3=2*0.19264=0.38528
interval [0.3392,0.38528) is in the lower limit. Send the binary code 0 and scale
l3=2*0.3392=0.6784
u3=2*0.38528=0.77056
interval [0.6784,0.77056) is in the upper limit. Send the binary code 1 and scale
l3=2(0.6784-0.5)=0.3568
u3=2(0.77056-0.5)=0.54112
The interval [0.3598,0.54112) is either in the upper or the lower half of unit interval so
proceed
• Fourth element 1
4=
l 0.3568+(0.54112-0.3568)0=0.3568
u4=0.3568+(0.54112-0.3568)0.8=0.504256
Quantization
The JPEG algorithm uses uniform midthread quantization to quantize the various
coefficient. The quantizer step sizes are organized in a table called the quantization table
as shown in table
Table: Sample Quantization table
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 12 100 103 99
The lable corresponding to the quantized value of the transform coefficient θij is obtained
as
Lij=θij/Qij+0.5
Where Qij is the (i,j)th element of the quantization table. The reconstructed value is
obtained by multiplying the lable with corresponding entry in the quantization table
Table: The quantizer lable
21 0 0 0 0 0 0
-9 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Coding
In this approach the lable for the DC and AC coefficient are coded differently
using Huffman codes. The DC coefficient values partitioned into categories. The
categories are then Huffman coded. The AC coefficient is generated in slightly different
manner. There are two special codes: End-of-block(EOF) and ZRL
1 0
1 -1 1
2 -3 -2 2 3
3 -7 ……………………. -4 4 ……… 7
Table: sample table for obtaining the Huffman code for a given label value and run length
• The model that gives rise to run-length coding is the capon model[40], a two-state
markov model with state sw and sb
• The transition probabilities p(w/b) and p(b/w), and the probability of being in
each state p(sw) and p(sb), completely specify this model .
• For facsimile images, p(w/w) and p(w/b) are generally significantly higher than
p(b/w) and p(b/b)
• The markov model is represented by the state diagram
• The entropy using a probability model and the iid assumption was significantly
more than the entropy using the markov model
• Let us try to interpret what the model says about the structure of the data .
• The highly skewed nature of the probabilities p(b/w) and p(w/w),and to a lesser
extent p( w/b) and p(b/b), says that once a pixel takes on a particular color, it is
highly likely that the following pixels will also be of the same color
• So, rather than code the color of each pixel separately , we can simply code
the length of the runs of each color .
• For example, if we had 190 white pixels followed by 30 black pixels ,
followed by another 210 white pixels , instead of coding the 430 pixels
individually, we would code the sequence 190, 30, 210, along with an
indication of the color of the first string of pixels .
• Coding the lengths of runs instead of coding individual values is called run-
length coding