Digital Image Processing2
Digital Image Processing2
Image Negatives:
The negative of an image with gray levels in the range [0, L-1] is obtained by using the negative
transformation shown in Fig.1.1, which is given by the expression
s = L - 1 - r.
Reversing the intensity levels of an image in this manner produces the equivalent of a
photographic negative. This type of processing is particularly suited for enhancing white or gray
detail embedded in dark regions of an image, especially when the black areas are dominant in
size.
Digital Image Processing Question & Answers
Fig.1.1 Some basic gray-level transformation functions used for image enhancement
Log Transformations:
where c is a constant, and it is assumed that r ≥ 0.The shape of the log curve in Fig. 1.1 shows
that this transformation maps a narrow range of low gray-level values in the input image into a
wider range of output levels.The opposite is true of higher values of input levels.We would use a
transformation of this type to expand the values of dark pixels in an image while compressing the
higher-level values.The opposite is true of the inverse log transformation.
Digital Image Processing Question & Answers
Any curve having the general shape of the log functions shown in Fig. 1.1 would accomplish this
spreading/compressing of gray levels in an image. In fact, the power-law transformations
discussed in the next section are much more versatile for this purpose than the log
transformation. However, the log function has the important characteristic that it compresses the
dynamic range of images with large variations in pixel values. A classic illustration of an
application in which pixel values have a large dynamic range is the Fourier spectrum. At the
moment,we are concerned only with the image characteristics of spectra. It is not unusual to
encounter spectrum values that range from 0 to or higher.While processing numbers such as
these presents no problems for a computer, image display systems generally will not be able to
reproduce faithfully such a wide range of intensity values. The net effect is that a significant
degree of detail will be lost in the display of a typical Fourier spectrum.
Power-Law Transformations:
to account for an offset (that is, a measurable output when the input is zero).However, offsets
typically are an issue of display calibration and as a result they are normally ignored in Eq. Plots
of s versus r for various values of g are shown in Fig. 1.2. As in the case of the log
transformation, power-law curves with fractional values of g map a narrow range of dark input
values into a wider range of output values,with the opposite being true for high-er values of input
levels. Unlike the log function, however, we notice here a family of possible transformation
curves obtained simply by varying γ. As expected, we see in Fig.1.2 that curves generated with
values of g>1 have exactly the opposite effect as those generated with values of g<1. Finally, we
note that Eq. reduces to the identity transformation when c = γ = 1. A variety of devices used for
image capture, printing, and display respond according to a power law.By convention, the
exponent in the power-law equation is referred to as gamma. The proces used to correct this
power-law response phenomena is called gamma correction. For example, cathode ray tube
(CRT) devices have an intensity-to-voltage response that is a power function, with exponents
varying from approximately 1.8 to 2.5.With reference to the curve for g=2.5 in Fig.1.2, we see
that such display systems would tend to produce images that are darker than intended.
Digital Image Processing Question & Answers
Fig.1.2 Plots of the equation for various values of (c=1 in all cases).
The principal advantage of piecewise linear functions over the types of functions we have
discussed above is that the form of piecewise functions can be arbitrarily complex. In fact, as we
will see shortly, a practical implementation of some important transformations can be formulated
only as piecewise functions. The principal disadvantage of piecewise functions is that their
specification requires considerably more user input.
Digital Image Processing Question & Answers
Contrast stretching:
Figure 1.3 (a) shows a typical transformation used for contrast stretching.
The locations of points (r1 , s1) and (r2 , s2) control the shape of the transformation
Digital Image Processing Question & Answers
Fig.1.3 Contrast Stretching (a) Form of Transformation function (b) A low-contrast image
(c) Result of contrast stretching (d) Result of thresholding.
function. If r1=s1 and r2=s2, the transformation is a linear function that produces no changes in
gray levels. If r1=r2,s1=0 and s2=L-1, the transformation becomes a thresholding function that
creates a binary image, as illustrated in Fig. 1.3 (b). Intermediate values of (r1 , s1) and (r2 , s2)
produce various degrees of spread in the gray levels of the output image, thus affecting its
contrast. In general, r1 ≤ r2 and s1 ≤ s2 is assumed so that the function is single valued and
monotonically increasing.This condition preserves the order of gray levels, thus preventing the
creation of intensity artifacts in the processed image.
Figure 1.3 (b) shows an 8-bit image with low contrast. Fig. 1.3(c) shows the result of contrast
stretching, obtained by setting (r1 , s1) = (rmin , 0) and (r2 , s2) = (rmax , L-1) where rmin and rmax
denote the minimum and maximum gray levels in the image, respectively.Thus, the
transformation function stretched the levels linearly from their original range to the full range [0,
L-1]. Finally, Fig. 1.3 (d) shows the result of using the thresholding function defined
previously,with r1 = r2 = m, the mean gray level in the image.The original image on which these
results are based is a scanning electron microscope image of pollen,magnified approximately 700
times.
Gray-level slicing:
Highlighting a specific range of gray levels in an image often is desired. Applications include
enhancing features such as masses of water in satellite imagery and enhancing flaws in X-ray
images.There are several ways of doing level slicing, but most of them are variations of two
basic themes.One approach is to display a high value for all gray levels in the range of interest
and a low value for all other gray levels.This transformation, shown in Fig. 1.4 (a), produces a
binary image.The second approach, based on the transformation shown in Fig. 1.4 (b), brightens
the desired range of gray levels but preserves the background and gray-level tonalities in the
image. Figure 1.4(c) shows a gray-scale image, and Fig. 1.4 (d) shows the result of using the
transformation in Fig. 1.4 (a).Variations of the two transformations shown in Fig. 1.4 are easy to
formulate.
Digital Image Processing Question & Answers
Fig.1.4 (a) This transformation highlights range [A, B] of gray levels and reduce all others
to a constant level (b) This transformation highlights range [A, B] but preserves all other
levels (c) An image (d) Result of using the transformation in (a).
Bit-plane slicing:
contains all the lowest order bits in the bytes comprising the pixels in the image and plane 7
contains all the high-order bits.Figure 1.5 illustrates these ideas, and Fig. 1.7 shows the various
bit planes for the image shown in Fig.1.6 . Note that the higher-order bits (especially the top
four) contain themajority of the visually significant data.The other bit planes contribute tomore
subtle details in the image. Separating a digital image into its bit planes is useful for analyzing
the relative importance played by each bit of the image, a process that aids in determining the
adequacy of the number of bits used to quantize each pixel.
In terms of bit-plane extraction for an 8-bit image, it is not difficult to show that the (binary)
image for bit-plane 7 can be obtained by processing the input image with a thresholding gray-
level transformation function that (1) maps all levels in the image between 0 and 127 to one level
(for example, 0); and (2) maps all levels between 129 and 255 to another (for example, 255).
Digital Image Processing Question & Answers
Fig.1.7 The eight bit planes of the image in Fig.1.6. The number at the bottom, right of each
image identifies the bit plane.
Digital Image Processing Question & Answers
The term spatial domain refers to the aggregate of pixels composing an image. Spatial domain
methods are procedures that operate directly on these pixels. Spatial domain processes will be
denoted by the expression
where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator on f, defined
over some neighborhood of (x, y). In addition,T can operate on a set of input images, such as
performing the pixel-by-pixel sum of K images for noise reduction.
The principal approach in defining a neighborhood about a point (x, y) is to use a square or
rectangular subimage area centered at (x, y), as Fig.2.1 shows. The center of the subimage is
moved from pixel to pixel starting, say, at the top left corner. The operator T is applied at each
location (x, y) to yield the output, g, at that location.The process utilizes only the pixels in the
area of the image spanned by the neighborhood.
Although other neighborhood shapes, such as approximations to a circle, sometimes are used,
square and rectangular arrays are by far the most predominant because of their ease of
implementation. The simplest form of T is when the neighborhood is of size 1*1 (that is, a single
pixel). In this case, g depends only on the value of f at (x, y), and T becomes a gray-level (also
called an intensity or mapping) transformation function of the form
where, for simplicity in notation, r and s are variables denoting, respectively, the gray level of
f(x, y) and g(x, y) at any point (x, y). For example, if T(r) has the form shown in Fig. 2.2(a), the
effect of this transformation would be to produce an image of higher contrast than the original by
darkening the levels below m and brightening the levels above m in the original image. In this
technique, known as contrast stretching, the values of r below m are compressed by the
transformation function into a narrow range of s, toward black.The opposite effect takes place for
values of r above m. In the limiting case shown in Fig. 2.2(b), T(r) produces a two-level (binary)
image. A mapping of this form is called a thresholding function. Some fairly simple, yet
powerful, processing approaches can be formulated with gray-level transformations. Because
enhancement at any point in an image depends only on the gray level at that point, techniques in
this category often are referred to as point processing.
Larger neighborhoods allow considerably more flexibility. The general approach is to use a
function of the values of f in a predefined neighborhood of (x, y) to determine the value of g at
(x, y).One of the principal approaches in this formulation is based on the use of so-called masks
Digital Image Processing Question & Answers
(also referred to as filters, kernels, templates, or windows). Basically, a mask is a small (say,
3*3) 2-D array, such as the one shown in Fig. 2.1, in which the values of the mask coefficients
determine the nature of the process, such as image sharpening.
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function h(rk)
= (nk), where rk is the kth gray level and nk is the number of pixels in the image having gray level
rk. It is common practice to normalize a histogram by dividing each of its values by the total
number of pixels in the image, denoted by n. Thus, a normalized histogram is given by
for k=0,1,…… .,L-1. Loosely speaking, p(rk) gives an estimate of the probability of occurrence
of gray level rk. Note that the sum of all components of a normalized histogram is equal to 1.
Histograms are the basis for numerous spatial domain processing techniques.Histogram
manipulation can be used effectively for image enhancement. Histograms are simple to calculate
in software and also lend themselves to economic hardware implementations, thus making them
a popular tool for real-time image processing.
The vertical axis corresponds to values of h(rk) = nk or p(rk) = nk/n if the values are
normalized.Thus, as indicated previously, these histogram plots are simply plots of h(rk) = nk
versus rk or p(rk) = nk/n versus rk.
Digital Image Processing Question & Answers
Fig.3 Four basic image types: dark, light, low contrast, high contrast, and their
corresponding histograms.
We note in the dark image that the components of the histogram are concentrated on the low
(dark) side of the gray scale. Similarly, the components of the histogram of the bright image are
biased toward the high side of the gray scale.An image with low contrast has a histogram that
will be narrow and will be centered toward the middle of the gray scale. For a monochrome
image this implies a dull,washed-out gray look. Finally,we see that the components of the
histogram in the high-contrast image cover a broad range of the gray scale and, further, that the
distribution of pixels is not too far from uniform,with very few vertical lines being much higher
than the others. Intuitively, it is reasonable to conclude that an image whose pixels tend to
occupy the entire range of possible gray levels and, in addition, tend to be distributed
uniformly,will have an appearance of high contrast and will exhibit a large variety of gray tones.
The net effect will be an image that shows a great deal of gray-level detail and has high dynamic
Digital Image Processing Question & Answers
range. It will be shown shortly that it is possible to develop a transformation function that can
automatically achieve this effect, based only on information available in the histogram of the
input image.
Consider for a moment continuous functions, and let the variable r represent the gray levels of
the image to be enhanced. We assume that r has been normalized to the interval [0, 1], with r=0
representing black and r=1 representing white. Later, we consider a discrete formulation and
allow pixel values to be in the interval [0, L-1]. For any r satisfying the aforementioned
conditions, we focus attention on transformations of the form
that produce a level s for every pixel value r in the original image. For reasons that will become
obvious shortly, we assume that the transformation function T(r) satisfies the following
conditions:
The requirement in (a) that T(r) be single valued is needed to guarantee that the inverse
transformation will exist, and the monotonicity condition preserves the increasing order from
black to white in the output image.A transformation function that is not monotonically increasing
could result in at least a section of the intensity range being inverted, thus producing some
inverted gray levels in the output image. Finally, condition (b) guarantees that the output gray
levels will be in the same range as the input levels. Figure 4.1 gives an example of a
transformation function that satisfies these two conditions.The inverse transformation from s
back to r is denoted
It can be shown by example that even if T(r) satisfies conditions (a) and (b), it is possible that the
corresponding inverse T-1 (s) may fail to be single valued.
Digital Image Processing Question & Answers
Fig.4.1 A gray-level transformation function that is both single valued and monotonically
increasing.
The gray levels in an image may be viewed as random variables in the interval [0, 1].One of the
most fundamental descriptors of a random variable is its probability density function (PDF).Let
pr(r) and ps(s) denote the probability density functions of random variables r and s,
respectively,where the subscripts on p are used to denote that pr and ps are different functions.A
basic result from an elementary probability theory is that, if pr(r) and T(r) are known and T-1 (s)
satisfies condition (a), then the probability density function ps(s) of the transformed variable s
can be obtained using a rather simple formula:
Thus, the probability density function of the transformed variable, s, is determined by the gray-
level PDF of the input image and by the chosen transformation function. A transformation
function of particular importance in image processing has the form
where w is a dummy variable of integration.The right side of Eq. above is recognized as the
cumulative distribution function (CDF) of random variable r. Since probability density functions
are always positive, and recalling that the integral of a function is the area under the function, it
follows that this transformation function is single valued and monotonically increasing, and,
therefore, satisfies condition (a). Similarly, the integral of a probability density function for
variables in the range [0, 1] also is in the range [0, 1], so condition (b) is satisfied as well.
Digital Image Processing Question & Answers
Given transformation function T(r),we find ps(s) by applying Eq. We know from basic calculus
(Leibniz’s rule) that the derivative of a definite integral with respect to its upper limit is simply
the integrand evaluated at that limit. In other words,
Substituting this result for dr/ds, and keeping in mind that all probability values are positive,
yields
Because ps(s) is a probability density function, it follows that it must be zero outside the interval
[0, 1] in this case because its integral over all values of s must equal 1.We recognize the form of
ps(s) as a uniform probability density function. Simply stated, we have demonstrated that
performing the transformation function yields a random variable s characterized by a uniform
probability density function. It is important to note from Eq. discussed above that T(r) depends
on pr(r), but, as indicated by Eq. after it, the resulting ps(s) always is uniform, independent of the
form of pr(r). For discrete values we deal with probabilities and summations instead of
probability density functions and integrals. The probability of occurrence of gray level r in an
image is approximated by
where, as noted at the beginning of this section, n is the total number of pixels in the image, nk is
the number of pixels that have gray level rk, and L is the total number of possible gray levels in
the image.The discrete version of the transformation function given in Eq. is
Digital Image Processing Question & Answers
Thus, a processed (output) image is obtained by mapping each pixel with level rk in the input
image into a corresponding pixel with level sk in the output image. As indicated earlier, a plot of
pr (rk) versus rk is called a histogram. The transformation (mapping) is called histogram
equalization or histogram linearization. It is not difficult to show that the transformation in Eq.
satisfies conditions (a) and (b) stated previously. Unlike its continuos counterpart, it cannot be
proved in general that this discrete transformation will produce the discrete equivalent of a
uniform probability density function, which would be a uniform histogram.
Fig.4.2 (a) Images from Fig.3 (b) Results of histogram equalization. (c) Corresponding
histograms.
Digital Image Processing Question & Answers
Let us return for a moment to continuous gray levels r and z (considered continuous random
variables), and let pr(r) and pz(z) denote their corresponding continuos probability density
functions. In this notation, r and z denote the gray levels of the input and output (processed)
images, respectively.We can estimate pr(r) from the given input image, while pz(z) is the
specified probability density function that we wish the output image to have.
where t is a dummy variable of integration. It then follows from these two equations that
G(z)=T(r) and, therefore, that z must satisfy the condition
The transformation T(r) can be obtained once pr(r) has been estimated from the input image.
Similarly, the transformation function G(z) can be obtained because pz(z) is given. Assuming
that G-1 exists and that it satisfies conditions (a) and (b) in the histogram equalization process,
the above three equations show that an image with a specified probability density function can be
obtained from an input image by using the following procedure:
(4) Obtain the output image by applying above Eq. to all the pixels in the input image.
The result of this procedure will be an image whose gray levels, z, have the specified probability
density function pz(z). Although the procedure just described is straightforward in principle, it is
seldom possible in practice to obtain analytical expressions for T(r) and for G-1. Fortunately, this
problem is simplified considerably in the case of discrete values.The price we pay is the same as
in histogram equalization,where only an approximation to the desired histogram is achievable. In
spite of this, however, some very useful results can be obtained even with crude approximations.
where n is the total number of pixels in the image, nj is the number of pixels with gray level rj,
and L is the number of discrete gray levels. Similarly, the discrete formulation is obtained from
the given histogram pz (zi), i=0, 1, 2,……, L-1, and has the form
Digital Image Processing Question & Answers
As in the continuos case, we are seeking values of z that satisfy this equation.The variable vk was
added here for clarity in the discussion that follows. Finally, the discrete version of the above
Eqn. is given by
Or
Implementation:
We start by noting the following: (1) Each set of gray levels {rj} , {sj}, and {zj}, j=0, 1, 2, p , L-
1, is a one-dimensional array of dimension L X 1. (2) All mappings from r to s and from s to z
are simple table lookups between a given pixel value and these arrays. (3) Each of the elements
of these arrays, for example, sk, contains two important pieces of information: The subscript k
denotes the location of the element in the array, and s denotes the value at that location. (4) We
need to be concerned only with integer pixel values. For example, in the case of an 8-bit image,
L=256 and the elements of each of the arrays just mentioned are integers between 0 and 255.This
implies that we now work with gray level values in the interval [0, L-1] instead of the normalized
interval [0, 1] that we used before to simplify the development of histogram processing
techniques.
In order to see how histogram matching actually can be implemented, consider Fig. 5(a),
ignoring for a moment the connection shown between this figure and Fig. 5(c). Figure 5(a)
shows a hypothetical discrete transformation function s=T(r) obtained from a given image. The
first gray level in the image, r1 , maps to s1 ; the second gray level, r2 , maps to s2 ; the kth level rk
maps to sk; and so on (the important point here is the ordered correspondence between these
values). Each value sj in the array is precomputed, so the process of mapping simply uses the
actual value of a pixel as an index in an array to determine the corresponding value of s.This
process is particularly easy because we are dealing with integers. For example, the s mapping for
an 8-bit pixel with value 127 would be found in the 128th position in array {sj} (recall that we
start at 0) out of the possible 256 positions. If we stopped here and mapped the value of each
pixel of an input image by the
Digital Image Processing Question & Answers
Fig.5. (a) Graphical interpretation of mapping from rk to sk via T(r). (b) Mapping of zq to
its corresponding value vq via G(z) (c) Inverse mapping from sk to its corresponding value
of zk.
method just described, the output would be a histogram-equalized image. In order to implement
histogram matching we have to go one step further. Figure 5(b) is a hypothetical transformation
function G obtained from a given histogram pz(z). For any zq , this transformation function yields
a corresponding value vq. This mapping is shown by the arrows in Fig. 5(b). Conversely, given
any value vq, we would find the corresponding value zq from G-1. In terms of the figure, all this
means graphically is that we would reverse the direction of the arrows to map vq into its
corresponding zq. However, we know from the definition that v=s for corresponding subscripts,
so we can use exactly this process to find the zk corresponding to any value sk that we computed
previously from the equation sk = T(rk) .This idea is shown in Fig.5(c).
Digital Image Processing Question & Answers
Since we really do not have the z’s (recall that finding these values is precisely the objective of
histogram matching),we must resort to some sort of iterative scheme to find z from s.The fact
that we are dealing with integers makes this a particularly simple process. Basically, because vk =
sk, we have that the z’s for which we are looking must satisfy the equation G(zk)=sk, or (G(zk)-
sk)=0. Thus, all we have to do to find the value of zk corresponding to sk is to iterate on values of
z such that this equation is satisfied for k=0,1,2,…...., L-1. We do not have to find the inverse of
G because we are going to iterate on z. Since we are dealing with integers, the closest we can get
to satisfying the equation (G(zk)-sk)=0 is to let zk= for each value of k, where is the
smallest integer in the interval [0, L-1] such that
Given a value sk, all this means conceptually in terms of Fig. 5(c) is that we would start with and
increase it in integer steps until Eq is satisfied, at which point we let repeating this process for
all values of k would yield all the required mappings from s to z, which constitutes the
implementation of Eq. In practice, we would not have to start with each time because the values
of sk are known to increase monotonically. Thus, for k=k+1, we would start with and
increment in integer values from there.
The histogram processing methods discussed in the previous two sections are global, in the sense
that pixels are modified by a transformation function based on the gray-level content of an entire
image. Although this global approach is suitable for overall enhancement, there are cases in
which it is necessary to enhance details over small areas in an image. The number of pixels in
these areas may have negligible influence on the computation of a global transformation whose
shape does not necessarily guarantee the desired local enhancement. The solution is to devise
transformation functions based on the gray-level distribution—or other properties—in the
neighborhood of every pixel in the image.
pixel centered in the neighborhood.The center of the neighborhood region is then moved to an
adjacent pixel location and the procedure is repeated. Since only one new row or column of the
neighborhood changes during a pixel-to-pixel translation of the region, updating the histogram
obtained in the previous location with the new data introduced at each motion step is possible.
This approach has obvious a dvantages over repeatedly computing the histogram over all pixels
in the neighborhood region each time the region is moved one pixel location.Another approach
used some times to reduce computation is to utilize nonoverlapping regions, but this method
usually produces an undesirable checkerboard effect.
The difference between two images f(x, y) and h(x, y), expressed as
is obtained by computing the difference between all pairs of corresponding pixels from f and h.
The key usefulness of subtraction is the enhancement of differences between images. The higher-
order bit planes of an image carry a significant amount of visually relevant detail, while the
lower planes contributemore to fine (often imperceptible) detail. Figure 7(a) shows the fractal
image used earlier to illustrate the concept of bit planes. Figure 7(b) shows the result of
discarding (setting to zero) the four least significant bit planes of the original image.The images
are nearly identical visually, with the exception of a very slight drop in overall contrast due to
less variability of the graylevel values in the image of Fig. 7(b).The pixel-by-pixel difference
between these two images is shown in Fig. 7(c).The differences in pixel values are so small that
the difference image appears nearly black when displayed on an 8-bit display. In order to bring
out more detail,we can perform a contrast stretching transformation. We chose histogram
equalization, but an appropriate power-law transformation would have done the job also. The
result is shown in Fig. 7(d). This is a very useful image for evaluating the effect of setting to zero
the lower-order planes.
Digital Image Processing Question & Answers
Fig.7 (a) Original fractal image (b) Result of setting the four lower-order bit planes to zero
(c) Difference between (a) and(b) (d) Histogram equalized difference image.
One of the most commercially successful and beneficial uses of image subtraction is in the area
of medical imaging called mask mode radiography. In this case h(x, y), the mask, is an X-ray
image of a region of a patient’s body captured by an intensified TV camera (instead of traditional
X-ray film) located opposite an X-ray source.The procedure consists of injecting a contrast
medium into the patient’s bloodstream, taking a series of images of the same anatomical region
as h(x, y), and subtracting this mask from the series of incoming images after injection of the
contrast medium. The net effect of subtracting the mask from each sample in the incoming
stream of TV images is that the areas that are different between f(x, y) and h(x, y) appear in the
output image as enhanced detail. Because images can be captured at TV rates, this procedure in
essence gives a movie showing how the contrast medium propagates through the various arteries
in the area being observed.
Digital Image Processing Question & Answers
Consider a noisy image g(x, y) formed by the addition of noise h(x, y) to an original image
f(x,y); that is,
where the assumption is that at every pair of coordinates (x, y) the noise is uncorrelated and has
zero average value.The objective of the following procedure is to reduce the noise content by
adding a set of noisy images, {gi (x, y)}. If the noise satisfies the constraints just stated, it can be
shown that if an image is formed by averaging K different noisy images,
and
As K increases, the above equations indicate that the variability (noise) of the pixel values at
each location (x, y) decreases.Because this means that
approaches f(x, y) as the number of noisy images used in the averaging process increases. In
practice, the images gi(x, y) must be registered (aligned) in order to avoid the introduction of
blurring and other artifacts in the output image.
Some neighborhood operations work with the values of the image pixels in the neighborhood and
the corresponding values of a subimage that has the same dimensions as the neighborhood.The
subimage is called a filter,mask, kernel, template, or window,with the first three terms being the
most prevalent terminology.The values in a filter subimage are referred to as coefficients, rather
than pixels. The concept of filtering has its roots in the use of the Fourier transform for signal
processing in the so-called frequency domain. We use the term spatial filtering to differentiate
this type of process from the more traditional frequency domain filtering.
The mechanics of spatial filtering are illustrated in Fig.9.1. The process consists simply of
moving the filter mask from point to point in an image. At each point (x, y), the response of the
filter at that point is calculated using a predefined relationship. The response is given by a sum
of products of the filter coefficients and the corresponding image pixels in the area spanned by
the filter mask. For the 3 x 3 mask shown in Fig. 9.1, the result (or response), R, of linear
filtering with the filter mask at a point (x, y) in the image is
which we see is the sum of products of the mask coefficients with the corresponding pixels
directly under the mask. Note in particular that the coefficient w(0, 0) coincides with image
Digital Image Processing Question & Answers
value f(x, y), indicating that the mask is centered at (x, y) when the computation of the sum of
products takes place. For a mask of size m x n,we assume that m=2a+1 and n=2b+1,where a and
b are nonnegative integers.
Fig.9.1 The mechanics of spatial filtering. The magnified drawing shows a 3X3 mask and
the image section directly under it; the image section is shown displaced out from under the
mask for ease of readability.
In general, linear filtering of an image f of size M x N with a filter mask of size m x n is given by
the expression:
Digital Image Processing Question & Answers
where, from the previous paragraph, a=(m-1)/2 and b=(n-1)/2. To generate a complete filtered
image this equation must be applied for x=0,1,2,……, M-1 and y=0,1,2,……, N-1. In this way,
we are assured that the mask processes all pixels in the image. It is easily verified when m=n=3
that this expression reduces to the example given in the previous paragraph.
The process of linear filtering is similar to a frequency domain concept called convolution. For
this reason, linear spatial filtering often is referred to as “convolving a mask with an image.”
Similarly, filter masks are sometimes called convolution masks. The term convolution kernel
also is in common use. When interest lies on the response, R, of an m x n mask at any point
(x,y), and not on the mechanics of implementing mask convolution, it is common practice to
simplify the notation by using the following expression:
where the w’s are mask coefficients, the z’s are the values of the image graylevels corresponding
to those coefficients, and mn is the total number of coefficients in the mask. For the 3 x 3 general
mask shown in Fig.9.2 the response at any point (x, y) in the image is given by
Digital Image Processing Question & Answers
This keeps the size of the filtered image the same as the original, but the values of the padding
will have an effect near the edges that becomes more prevalent as the size of the mask
increases.The only way to obtain a perfectly filtered result is to accept a somewhat smaller
filtered image by limiting the excursions of the center of the filter mask to a distance no less than
(n-1)/2 pixels from the border of the original image.
Smoothing filters are used for blurring and for noise reduction. Blurring is used in preprocessing
steps, such as removal of small details from an image prior to (large) object extraction, and
bridging of small gaps in lines or curves. Noise reduction can be accomplished by blurring with a
linear filter and also by non-linear filtering.
The output (response) of a smoothing, linear spatial filter is simply the average of the pixels
contained in the neighborhood of the filter mask. These filters sometimes are called averaging
filters. The idea behind smoothing filters is straightforward.By replacing the value of every pixel
Digital Image Processing Question & Answers
in an image by the average of the gray levels in the neighborhood defined by the filter mask, this
process results in an image with reduced “sharp” transitions in gray levels. Because random
noise typically consists of sharp transitions in gray levels, the most obvious application of
smoothing is noise reduction.However, edges (which almost always are desirable features of an
image) also are characterized by sharp transitions in gray levels, so averaging filters have the
undesirable side effect that they blur edges. Another application of this type of process includes
the smoothing of false contours that result from using an insufficient number of gray levels.
Figure 10.1 shows two 3 x 3 smoothing filters. Use of the first filter yields the standard average
of the pixels under the mask.This can best be seen by substituting the coefficients of the mask in
which is the average of the gray levels of the pixels in the 3 x 3 neighborhood defined by the
mask.Note that, instead of being 1/9, the coefficients of the filter are all 1’s.The idea here is that
it is computationally more efficient to have coefficients valued 1. At the end of the filtering
process the entire image is divided by 9. An m x n mask would have a normalizing constant
equal to 1/mn.
A spatial averaging filter in which all coefficients are equal is sometimes called a box filter.
Digital Image Processing Question & Answers
The second mask shown in Fig.10.1 is a little more interesting. This mask yields a so-called
weighted average, terminology used to indicate that pixels are multiplied by different
coefficients, thus giving more importance (weight) to some pixels at the expense of others. In the
mask shown in Fig. 10.1(b) the pixel at the center of the mask is multiplied by a higher value
than any other, thus giving this pixel more importance in the calculation of the average.The other
pixels are inversely weighted as a function of their distance from the center of the mask. The
diagonal terms are further away from the center than the orthogonal neighbors (by a factor of √2)
and, thus, are weighed less than these immediate neighbors of the center pixel. The basic strategy
behind weighing the center point the highest and then reducing the value of the coefficients as a
function of increasing distance from the origin is simply an attempt to reduce blurring in the
smoothing process. We could have picked other weights to accomplish the same general
objective. However, the sum of all the coefficients in the mask of Fig. 10.1(b) is equal to 16, an
attractive feature for computer implementation because it has an integer power of 2. In practice,
it is difficult in general to see differences between images smoothed by using either of the masks
in Fig. 10.1, or similar arrangements, because the area these masks span at any one location in an
image is so small.
The general implementation for filtering an M x N image with a weighted averaging filter of size
m x n (m and n odd) is given by the expression
Order-statistics filters are nonlinear spatial filters whose response is based on ordering (ranking)
the pixels contained in the image area encompassed by the filter, and then replacing the value of
the center pixel with the value determined by the ranking result. The best-known example in this
category is the median filter, which, as its name implies, replaces the value of a pixel by the
median of the gray levels in the neighborhood of that pixel (the original value of the pixel is
included in the computation of the median). Median filters are quite popular because, for certain
types of random noise, they provide excellent noise-reduction capabilities, with considerably less
blurring than linear smoothing filters of similar size. Median filters are particularly effective in
the presence of impulse noise, also called salt-and-pepper noise because of its appearance as
white and black dots superimposed on an image.
The median, ε, of a set of values is such that half the values in the set are less than or equal to ε,
and half are greater than or equal to ε. In order to perform median filtering at a point in an image,
we first sort the values of the pixel in question and its neighbors, determine their median, and
Digital Image Processing Question & Answers
assign this value to that pixel. For example, in a 3 x 3 neighborhood the median is the 5th largest
value, in a 5 x 5 neighborhood the 13th largest value, and so on. When several values in a
neighborhood are the same, all equal values are grouped. For example, suppose that a 3 x 3
neighborhood has values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10, 15,
20, 20, 20, 20, 20, 25, 100), which results in a median of 20. Thus, the principal function of
median filters is to force points with distinct gray levels to be more like their neighbors. In fact,
isolated clusters of pixels that are light or dark with respect to their neighbors, and whose area is
less than n2 / 2 (one-half the filter area), are eliminated by an n x n median filter. In this case
“eliminated” means forced to the median intensity of the neighbors. Larger clusters are affected
considerably less.
11. What is meant by the Gradiant and the Laplacian? Discuss their role in
image enhancement.
The approach basically consists of defining a discrete formulation of the second-order derivative
and then constructing a filter mask based on that formulation. We are interested in isotropic
filters, whose response is independent of the direction of the discontinuities in the image to
which the filter is applied. In other words, isotropic filters are rotation invariant, in the sense that
rotating the image and then applying the filter gives the same result as applying the filter to the
image first and then rotating the result.
It can be shown (Rosenfeld and Kak [1982]) that the simplest isotropic derivative operator is the
Laplacian, which, for a function (image) f(x, y) of two variables, is defined as
Because derivatives of any order are linear operations, the Laplacian is a linear operator. In order
to be useful for digital image processing, this equation needs to be expressed in discrete form.
There are several ways to define a digital Laplacian using neighborhoods. digital second.Taking
into account that we now have two variables, we use the following notation for the partial
second-order derivative in the x-direction:
Digital Image Processing Question & Answers
This equation can be implemented using the mask shown in Fig.11.1(a), which gives an isotropic
result for rotations in increments of 90°.
The diagonal directions can be incorporated in the definition of the digital Laplacian by adding
two more terms to Eq., one for each of the two diagonal directions.The form of each new term is
the same as either Eq.
Digital Image Processing Question & Answers
Fig.11.1. (a) Filter mask used to implement the digital Laplacian (b) Mask used to
implement an extension of this equation that includes the diagonal neighbors. (c) and (d)
Two other implementations of the Laplacian.
but the coordinates are along the diagonals. Since each diagonal term also contains a –2f(x, y)
term, the total subtracted from the difference terms now would be –8f(x, y). The mask used to
implement this new definition is shown in Fig.11.1(b). This mask yields isotropic results for
increments of 45°. The other two masks shown in Fig. 11 also are used frequently in practice.
They are based on a definition of the Laplacian that is the negative of the one we used here. As
such, they yield equivalent results, but the difference in sign must be kept in mind when
combining (by addition or subtraction) a Laplacian-filtered image with another image.
Because the Laplacian is a derivative operator, its use highlights gray-level discontinuities in an
image and deemphasizes regions with slowly varying gray levels.This will tend to produce
images that have grayish edge lines and other discontinuities, all superimposed on a dark,
featureless background.Background features can be “recovered” while still preserving the
sharpening effect of the Laplacian operation simply by adding the original and Laplacian images.
As noted in the previous paragraph, it is important to keep in mind which definition of the
Laplacian is used. If the definition used has a negative center coefficient, then we subtract, rather
Digital Image Processing Question & Answers
than add, the Laplacian image to obtain a sharpened result.Thus, the basic way in which we use
the Laplacian for image enhancement is as follows:
First derivatives in image processing are implemented using the magnitude of the gradient. For a
function f(x, y), the gradient of f at coordinates (x, y) is defined as the two-dimensional column
vector
The components of the gradient vector itself are linear operators, but the magnitude of this vector
obviously is not because of the squaring and square root operations. On the other hand, the
partial derivatives are not rotation invariant (isotropic), but the magnitude of the gradient vector
is. Although it is not strictly correct, the magnitude of the gradient vector often is referred to as
the gradient.
Digital Image Processing Question & Answers
The computational burden of implementing over an entire image is not trivial, and it is common
practice to approximate the magnitude of the gradient by using absolute values instead of squares
and square roots:
This equation is simpler to compute and it still preserves relative changes in gray levels, but the
isotropic feature property is lost in general. However, as in the case of the Laplacian, the
isotropic properties of the digital gradient defined in the following paragraph are preserved only
for a limited number of rotational increments that depend on the masks used to approximate the
derivatives. As it turns out, the most popular masks used to approximate the gradient give the
same result only for vertical and horizontal edges and thus the isotropic properties of the gradient
are preserved only for multiples of 90°.
If we use absolute values, then substituting the quantities in the equations gives us the following
approximation to the gradient:
This equation can be implemented with the two masks shown in Figs. 11.2 (b) and(c). These
masks are referred to as the Roberts cross-gradient operators. Masks of even size are awkward to
implement. The smallest filter mask in which we are interested is of size 3 x 3.An approximation
using absolute values, still at point z5 , but using a 3*3 mask, is
Digital Image Processing Question & Answers
The difference between the third and first rows of the 3 x 3 image region approximates the
derivative in the x-direction, and the difference between the third and first columns approximates
the derivative in the y-direction. The masks shown in Figs. 11.2 (d) and (e), called the Sobel
operators. The idea behind using a weight value of 2 is to achieve some smoothing by giving
more importance to the center point. Note that the coefficients in all the masks shown in Fig.
11.2 sum to 0, indicating that they would give a response of 0 in an area of constant gray level,
as expected of a derivative operator.
Fig.11.2 A 3 x 3 region of an image (the z’s are gray-level values) and masks used to
compute the gradient at point labeled z5 . All masks coefficients sum to zero, as expected of
a derivative operator.
Digital Image Processing Question & Answers
Digital Image Processing Question & Answers
The frequency domain methods of image enhancement are based on convolution theorem. This is
represented as,
g(x, y) = h (x, y)*f(x, y)
Where.
g(x, y) = Resultant image
h(x, y) = Position invariant operator
f(x, y)= Input image
The Fourier transform representation of equation above is,
The function H (u, v) in equation is called transfer function. It is used to boost the edges of input
image f (x, y) to emphasize the high frequency components.
The different frequency domain methods for image enhancement are as follows.
1. Contrast stretching.
2. Clipping and thresholding.
3. Digital negative.
4. Intensity level slicing and
5. Bit extraction.
1. Contrast Stretching:
Due to non-uniform lighting conditions, there may be poor contrast between the background and
the feature of interest. Figure 1.1 (a) shows the contrast stretching transformations.
In the area of stretching the slope of transformation is considered to be greater than unity. The
parameters of stretching transformations i.e., a and b can be determined by examining the
histogram of the image.
Clipping is considered as the special scenario of contrast stretching. It is the case in which the
parameters are α = γ = 0. Clipping is more advantageous for reduction of noise in input signals of
range [a, b].
Threshold of an image is selected by means of its histogram. Let us take the image shown in the
following figure 1.2.
Digital Image Processing Question & Answers
Fig. 1.2
The figure 1.2 (b) consists of two peaks i.e., background and object. At the abscissa of histogram
minimum (D1) the threshold is selected. This selected threshold (D1) can separate background
and object to convert the image into its respective binary form. The thresholding transformations
are shown in figure 1.3.
Fig.1.3
3. Digital Negative:
The digital negative of an image is achieved by reverse scaling of its grey levels to the
transformation. They are much essential in displaying of medical images.
Fig.1.4
The images which consist of grey levels in between intensity at background and other objects
require to reduce the intensity of the object. This process of changing intensity level is done with
the help of intensity level slicing. They are expressed as
The histogram of input image and its respective intensity level slicing is shown in the figure 1.5.
Fig.1.5
Digital Image Processing Question & Answers
When an image is uniformly quantized then, the nth most significant bit can be extracted and
displayed.
Where
f(x, y) is the input image
g(x, y) is the processed image and
T is the operator on f defined over some neighborhood values of
(x, y).
Frequency domain techniques are based on convolution theorem. Let g(x, y) be the image
formed by the convolution of an image f(x, y) and linear position invariant operation h(x, y) i.e.,
Where G, H and F are the Fourier transforms of g, h and f respectively. In the terminology of
linear system the transform H (u, v) is called the transfer function of the process. The edges in
f(x, y) can he boosted by using H (u, v) to emphasize the high frequency components of F (u, v).
Lowpass Filter:
The edges and other sharp transitions (such as noise) in the gray levels of an image contribute
significantly to the high-frequency content of its Fourier transform. Hence blurring (smoothing)
is achieved in the frequency domain by attenuating us the transform of a given image.
where F (u, v) is the Fourier transform of an image to be smoothed. The problem is to select a
filter transfer function H (u, v) that yields G (u, v) by attenuating the high-frequency components
of F (u, v). The inverse transform then will yield the desired smoothed image g (x, y).
Ideal Filter:
A 2-D ideal lowpass filter (ILPF) is one whose transfer function satisfies the relation
where D is a specified nonnegative quantity, and D(u, v) is the distance from point (u, v) to the
origin of the frequency plane; that is,
Figure 3 (a) shows a 3-D perspective plot of H (u, v) u a function of u and v. The name ideal
filter indicates that oil frequencies inside a circle of radius
Digital Image Processing Question & Answers
Fig.3 (a) Perspective plot of an ideal lowpass filter transfer function; (b) filter cross
section.
Do are passed with no attenuation, whereas all frequencies outside this circle are completely
attenuated.
The lowpass filters are radially symmetric about the origin. For this type of filter,
specifying a cross section extending as a function of distance from the origin along a radial line
is sufficient, as Fig. 3 (b) shows. The complete filter transfer function can then be generated by
rotating the cross section 360 about the origin. Specification of radially symmetric filters
centered on the N x N frequency square is based on the assumption that the origin of the Fourier
transform has been centered on the square.
For an ideal lowpass filter cross section, the point of transition between H(u, v) =
1 and H(u, v) = 0 is often called the cutoff frequency. In the case of Fig.3 (b), for example, the
cutoff frequency is Do. As the cross section is rotated about the origin, the point Do traces a
circle giving a locus of cutoff frequencies, all of which are a distance Do from the origin. The
cutoff frequency concept is quite useful in specifying filter characteristics. It also serves as a
common base for comparing the behavior of different types of filters.
The sharp cutoff frequencies of an ideal lowpass filter cannot be realized with electronic
components, although they can certainly be simulated in a computer.
Digital Image Processing Question & Answers
Butterworth filter:
The transfer function of the Butterworth lowpass (BLPF) of order n and with cutoff frequency
locus at a distance Do, from the origin is defined by the relation
A perspective plot and cross section of the BLPF function are shown in figure 4.
Fig.4 (a) A Butterworth lowpass filter (b) radial cross section for n = 1.
Unlike the ILPF, the BLPF transfer function does not have a sharp discontinuity that establishes
a clear cutoff between passed and filtered frequencies. For filters with smooth transfer functions,
defining a cutoff frequency locus at points for which H (u, v) is down to a certain fraction of its
maximum value is customary. In the case of above Eq. H (u, v) = 0.5 (down 50 percent from its
maximum value of 1) when D (u, v) = Do. Another value commonly used is 1/√2 of the
maximum value of H (u, v). The following simple modification yields the desired value when D
(u, v) = Do:
Digital Image Processing Question & Answers
5. Discuss about Ideal High Pass Filter and Butterworth High Pass filter.
An image can be blurred by attenuating the high-frequency components of its Fourier transform.
Because edges and other abrupt changes in gray levels are associated with high-frequency
components, image sharpening can be achieved in the frequency domain by a high pass filtering
process, which attenuates the low-frequency components without disturbing high-frequency
information in the Fourier transform.
Ideal filter:
2-D ideal high pass filter (IHPF) is one whose transfer function satisfies the relation
where Do is the cutoff distance measured from the origin of the frequency plane. Figure 5.1
shows a perspective plot and cross section of the IHPF function. This filter is the opposite of the
ideal lowpass filter, because it completely attenuates all frequencies inside a circle of radius Do
while passing, without attenuation, all frequencies outside the circle. As in the case of the ideal
lowpass filler, the IHPF is not physically realizable.
Fig.5.1 Perspective plot and radial cross section of ideal high pass filter
Digital Image Proocessing Question & Answerss
Butterrworth filteer:
Figure 5.2 shows a perspective plot andd cross sectiion of the BHPF
B functtion. Note th
hat when D
(u, v) = Do, H (u, v) is dow wn to ½ of its maximuum value. As A in the caase of the Butterworth
B h
lowpasss filter, com
mmon practiice is to sellect the cutooff frequenccy locus at points for which
w H (u,
v) is doown to 1/√2 of its maxim
mum value.
6. Discuss about Gaussian High Pass and Gaussian Low Pass Filter.
where, D(u, v) is the distance from the origin of the Fourier transform.
Fig.6.1 (a) Perspective plot of a GLPF transfer function, (b) Filter displayed as an image,
(c) Filter radial cross sections for various values of Do.
σ is a measure of the spread of the Gaussian curve. By letting σ = Du, we can express the filter in
a more familiar form in terms of the notation:
where Do is the cutoff frequency. When D (u, v) = Do, the filter is down to 0.607 of its
maximum value.
The transfer function of the Gaussian highpass filter (GHPF) with cutoff frequency locus at a
distance Do from the origin is given by
Digital Image Processing Question & Answers
The figure 6.2 shows a perspective plot, image, and cross section of the GHPF function.
Fig.6.2. Perspective plot, image representation, and cross section of a typical Gaussian high
pass filter
Even the filtering of the smaller objects and thin bars is cleaner with the Gaussian filler.
The expression inside the brackets on the left side of the above Eq. is recognized as the
Laplacian of f(x, y). Thus, we have the important result
Digital Image Processing Question & Answers
which simply says that the Laplacian can be implemented in the frequency domain by using the
filter
As in all filtering operations, the assumption is that the origin of F (u, v) has been centered by
performing the operation f(x, y) (-1) x+y prior to taking the transform of the image. If f (and F)
are of size M X N, this operation shifts the center transform so that (u, v) = (0, 0) is at point
(M/2, N/2) in the frequency rectangle. As before, the center of the filter function also needs to be
shifted:
The Laplacian-filtered image in the spatial domain is obtained by computing the inverse Fourier
transform of H (u, v) F (u, v):
Conversely, computing the Laplacian in the spatial domain and computing the Fourier transform
of the result is equivalent to multiplying F(u, v) by H(u, v). We express this dual relationship in
the familiar Fourier-transform-pair notation
The spatial domain Laplacian filter function obtained by taking the inverse Fourier transform of
Eq. has some interesting properties, as Fig.7 shows. Figure 7(a) is a 3-D perspective plot. The
function is centered at (M/2, N/2), and its value at the top of the dome is zero. All other values
are negative. Figure 7(b) shows H (u, v) as an image, also centered. Figure 7(c) is the Laplacian
in the spatial domain, obtained by multiplying by H (u, v) by (-1)u+v , taking the inverse Fourier
transform, and multiplying the real part of the result by (-l)x+y . Figure 7(d) is a zoomed section at
about the origin of Fig.7(c).' Figure 7(e) is a horizontal gray-level profile passing through the
center of the zoomed section. Finally, Fig.7 (f) shows the mask to implement the definition of the
discrete Laplacian in the spatial domain.
Digital Image Processing Question & Answers
Fig.7 (a) 3-D plot of Laplacian in the frequency domain, (b) Image representation of (a), (c)
Laplacian in the spatial domain obtained from the inverse DFT of (b) (d) Zoomed section
of the origin of (c). (e) Gray-level profile through the center of (d). (f) Laplacian mask
Digital Image Processing Question & Answers
A horizontal profile through the center of this mask has the same basic shape as the profile in
Fig. 7(e) (that is, a negative value between two smaller positive values). We form an enhanced
image g(x, y) by subtracting the Laplacian from the original image:
All the filtered images have one thing in common: Their average background intensity has been
reduced to near black. This is due to the fact that the highpass filters we applied to those images
eliminate the zero-frequency component of their Fourier transforms. In fact, enhancement using
the Laplacian does precisely this, by adding back the entire image to the filtered result.
Sometimes it is advantageous to increase the contribution made by the original image to the
overall filtered result. This approach, called high-boost filtering, is a generalization of unsharp
masking. Unsharp masking consists simply of generating a sharp image by subtracting from an
image a blurred version of itself. Using frequency domain terminology, this means obtaining a
highpass-filtered image by subtracting from the image a lowpass-filtered version of itself. That is
Thus, high-boost filtering gives us the flexibility to increase the contribution made by the image
to the overall enhanced result. This equation may be written as
This result is based on a highpass rather than a lowpass image. When A = 1, high-boost filtering
reduces to regular highpass filtering. As A increases past 1, the contribution made by the image
itself becomes more dominant.
Digital Image Processing Question & Answers
We have Fhp (u,v) = F (u,v) – Flp (u,v). But Flp (u,v) = Hlp (u,v)F(u,v), where Hlp is the transfer
function of a lowpass filter. Therefore, unsharp masking can be implemented directly in the
frequency domain by using the composite filter
with A > 1. The process consists of multiplying this filter by the (centered) transform of the input
image and then taking the inverse transform of the product. Multiplication of the real part of this
result by (-l) x+y gives us the high-boost filtered image fhb (x, y) in the spatial domain.
Homomorphic filtering:
The illumination-reflectance model can be used to develop a frequency domain procedure for
improving the appearance of an image by simultaneous gray-level range compression and
contrast enhancement. An image f(x, y) can be expressed as the product of illumination and
reflectance components:
Equation above cannot be used directly to operate separately on the frequency components of
illumination and reflectance because the Fourier transform of the product of two functions is not
separable; in other words,
Digital Image Processing Question & Answers
where Fi (u, v) and Fr (u, v) are the Fourier transforms of ln i(x, y) and ln r(x, y), respectively. If
we process Z (u, v) by means of a filter function H (u, v) then, from
where S (u, v) is the Fourier transform of the result. In the spatial domain,
Now we have
Finally, as z (x, y) was formed by taking the logarithm of the original image f (x, y), the inverse
(exponential) operation yields the desired enhanced image, denoted by g(x, y); that is,
and
are the illumination and reflectance components of the output image. The enhancement approach
using the foregoing concepts is summarized in Fig. 9.1. This method is based on a special case of
a class of systems known as homomorphic systems. In this particular application, the key to the
approach is the separation of the illumination and reflectance components achieved. The
homomorphic filter function H (u, v) can then operate on these components separately.
Fig.9.2 Cross section of a circularly symmetric filter function D (u. v) is the distance from
the origin of the centered transform.
Digital Image Processing Question & Answers
Digital Image Processing Question & Answers
First-order derivatives of a digital image are based on various approximations of the 2-D
gradient. The gradient of an image f (x, y) at location (x, y) is defined as the vector
It is well known from vector analysis that the gradient vector points in the direction of maximum
rate of change of f at coordinates (x, y). An important quantity in edge detection is the magnitude
of this vector, denoted by f, where
This quantity gives the maximum rate of increase of f (x, y) per unit distance in the direction of
f. It is a common (although not strictly correct) practice to refer to f also as the gradient. The
direction of the gradient vector also is an important quantity. Let α (x, y) represent the direction
angle of the vector f at (x, y). Then, from vector analysis,
where the angle is measured with respect to the x-axis. The direction of an edge at (x, y) is
perpendicular to the direction of the gradient vector at that point. Computation of the gradient of
an image is based on obtaining the partial derivatives f/ x and f/ y at every pixel location.
Let the 3x3 area shown in Fig. 1.1 (a) represent the gray levels in a neighborhood of an image.
One of the simplest ways to implement a first-order partial derivative at point z5 is to use the
following Roberts cross-gradient operators:
Digital Image Processing Question & Answers
These derivatives can be implemented for an entire image by using the masks shown in Fig.
1.1(b). Masks of size 2 X 2 are awkward to implement because they do not have a clear center.
An approach using masks of size 3 X 3 is given by
Fig.1.1 A 3 X 3 region of an image (the z’s are gray-level values) and various masks used to
compute the gradient at point labeled z5.
Digital Image Processing Question & Answers
A weight value of 2 is used to achieve some smoothing by giving more importance to the center
point. Figures 1.1(f) and (g), called the Sobel operators, and are used to implement these two
equations. The Prewitt and Sobel operators are among the most used in practice for computing
digital gradients. The Prewitt masks are simpler to implement than the Sobel masks, but the latter
have slightly superior noise-suppression characteristics, an important issue when dealing with
derivatives. Note that the coefficients in all the masks shown in Fig. 1.1 sum to 0, indicating that
they give a response of 0 in areas of constant gray level, as expected of a derivative operator.
The masks just discussed are used to obtain the gradient components Gx and Gy. Computation of
the gradient requires that these two components be combined. However, this implementation is
not always desirable because of the computational burden required by squares and square roots.
An approach used frequently is to approximate the gradient by absolute values:
This equation is much more attractive computationally, and it still preserves relative changes in
gray levels. However, this is not an issue when masks such as the Prewitt and Sobel masks are
used to compute Gx and Gy.
It is possible to modify the 3 X 3 masks in Fig. 1.1 so that they have their strongest responses
along the diagonal directions. The two additional Prewitt and Sobel masks for detecting
discontinuities in the diagonal directions are shown in Fig. 1.2.
Digital Image Processing Question & Answers
The Laplacian:
For a 3 X 3 region, one of the two forms encountered most frequently in practice is
where the z's are defined in Fig. 1.1(a). A digital approximation including the diagonal neighbors
is given by
Masks for implementing these two equations are shown in Fig. 1.3. We note from these masks
that the implementations of Eqns. are isotropic for rotation increments of 90° and 45°,
respectively.
In practice, optics, sampling, and other image acquisition imperfections yield edges that
are blurred, with the degree of blurring being determined by factors such as the quality of the
image acquisition system, the sampling rate, and illumination conditions under which the image
is acquired. As a result, edges are more closely modeled as having a "ramp like" profile, such as
the one shown in Fig.2.1 (b).
Digital Image Processing Question & Answers
Fig.2.1 (a) Model of an ideal digital edge (b) Model of a ramp edge. The slope of the ramp is
proportional to the degree of blurring in the edge.
The slope of the ramp is inversely proportional to the degree of blurring in the edge. In this
model, we no longer have a thin (one pixel thick) path. Instead, an edge point now is any point
contained in the ramp, and an edge would then be a set of such points that are connected. The
"thickness" of the edge is determined by the length of the ramp, as it transitions from an initial to
a final gray level. This length is determined by the slope, which, in turn, is determined by the
degree of blurring. This makes sense: Blurred edges lend to be thick and sharp edges tend to be
thin. Figure 2.2(a) shows the image from which the close-up in Fig. 2.1(b) was extracted. Figure
2.2(b) shows a horizontal gray-level profile of the edge between the two regions. This figure also
shows the first and second derivatives of the gray-level profile. The first derivative is positive at
the points of transition into and out of the ramp as we move from left to right along the profile; it
is constant for points in the ramp; and is zero in areas of constant gray level. The second
derivative is positive at the transition associated with the dark side of the edge, negative at the
transition associated with the light side of the edge, and zero along the ramp and in areas of
constant gray level. The signs of the derivatives in Fig. 2.2(b) would be reversed for an edge that
transitions from light to dark.
We conclude from these observations that the magnitude of the first derivative can be used to
detect the presence of an edge at a point in an image (i.e. to determine if a point is on a ramp).
Similarly, the sign of the second derivative can be used to determine whether an edge pixel lies
Digital Image Processing Question & Answers
on the dark or light side of an edge. We note two additional properties of the second derivative
around an edge: A) It produces two values for every edge in an image (an undesirable feature);
and B) an imaginary straight line joining the extreme positive and negative values of the second
derivative would cross zero near the midpoint of the edge. This zero-crossing property of the
second derivative is quite useful for locating the centers of thick edges.
Fig.2.2 (a) Two regions separated by a vertical edge (b) Detail near the edge, showing a
gray-level profile, and the first and second derivatives of the profile.
Digital Image Processing Question & Answers
One of the simplest approaches for linking edge points is to analyze the characteristics of pixels
in a small neighborhood (say, 3 X 3 or 5 X 5) about every point (x, y) in an image that has been
labeled an edge point. All points that are similar according to a set of predefined criteria are
linked, forming an edge of pixels that share those criteria.
The two principal properties used for establishing similarity of edge pixels in this kind of
analysis are (1) the strength of the response of the gradient operator used to produce the edge
pixel; and (2) the direction of the gradient vector. The first property is given by the value of f.
Thus an edge pixel with coordinates (xo, yo) in a predefined neighborhood of (x, y), is similar in
magnitude to the pixel at (x, y) if
The direction (angle) of the gradient vector is given by Eq. An edge pixel at (xo, yo) in the
predefined neighborhood of (x, y) has an angle similar to the pixel at (x, y) if
where A is a nonnegative angle threshold. The direction of the edge at (x, y) is perpendicular to
the direction of the gradient vector at that point.
A point in the predefined neighborhood of (x, y) is linked to the pixel at (x, y) if both magnitude
and direction criteria are satisfied. This process is repeated at every location in the image. A
record must be kept of linked points as the center of the neighborhood is moved from pixel to
pixel. A simple bookkeeping procedure is to assign a different gray level to each set of linked
edge pixels.
Digital Image Processing Question & Answers
In this process, points are linked by determining first if they lie on a curve of specified shape. We
now consider global relationships between pixels. Given n points in an image, suppose that we
want to find subsets of these points that lie on straight lines. One possible solution is to first find
all lines determined by every pair of points and then find all subsets of points that are close to
particular lines. The problem with this procedure is that it involves finding n(n - 1)/2 ~ n2 lines
and then performing (n)(n(n - l))/2 ~ n3 comparisons of every point to all lines. This approach is
computationally prohibitive in all but the most trivial applications.
Hough [1962] proposed an alternative approach, commonly referred to as the Hough transform.
Consider a point (xi, yi) and the general equation of a straight line in slope-intercept form,
yi = a.xi + b. Infinitely many lines pass through (xi, yi) but they all satisfy the equation
yi = a.xi + b for varying values of a and b. However, writing this equation as b = -a.xi + yi, and
considering the ab-plane (also called parameter space) yields the equation of a single line for a
fixed pair (xi, yi). Furthermore, a second point (xj, yj) also has a line in parameter space
associated with it, and this line intersects the line associated with (xi, yi) at (a', b'), where a' is the
slope and b' the intercept of the line containing both (xi, yi) and (xj, yj) in the xy-plane. In fact, all
points contained on this line have lines in parameter space that intersect at (a', b'). Figure 3.1
illustrates these concepts.
Fig.3.2 Subdivision of the parameter plane for use in the Hough transform
The computational attractiveness of the Hough transform arises from subdividing the parameter
space into so-called accumulator cells, as illustrated in Fig. 3.2, where (amax , amin) and
(bmax , bmin), are the expected ranges of slope and intercept values. The cell at coordinates (i, j),
with accumulator value A(i, j), corresponds to the square associated with parameter space
coordinates (ai , bi).
Initially, these cells are set to zero. Then, for every point (xk, yk) in the image plane, we let the
parameter a equal each of the allowed subdivision values on the fl-axis and solve for the
corresponding b using the equation b = - xk a + yk .The resulting b’s are then rounded off to the
nearest allowed value in the b-axis. If a choice of ap results in solution bq, we let A (p, q) = A (p,
q) + 1. At the end of this procedure, a value of Q in A (i, j) corresponds to Q points in the xy-
plane lying on the line y = ai x + bj. The number of subdivisions in the ab-plane determines the
accuracy of the co linearity of these points. Note that subdividing the a axis into K increments
gives, for every point (xk, yk), K values of b corresponding to the K possible values of a. With n
image points, this method involves nK computations. Thus the procedure just discussed is linear
in n, and the product nK does not approach the number of computations discussed at the
beginning unless K approaches or exceeds n.
A problem with using the equation y = ax + b to represent a line is that the slope
approaches infinity as the line approaches the vertical. One way around this difficulty is to use
the normal representation of a line:
x cosθ + y sinθ = ρ
Digital Image Processing Question & Answers
Figure 3.3(a) illustrates the geometrical interpretation of the parameters used. The use of this
representation in constructing a table of accumulators is identical to the method discussed for the
slope-intercept representation. Instead of straight lines, however, the loci are sinusoidal curves in
the ρθ -plane. As before, Q collinear points lying on a line x cosθj + y sinθj = ρ, yield Q
sinusoidal curves that intersect at (pi, θj) in the parameter space. Incrementing θ and solving for
the corresponding p gives Q entries in accumulator A (i, j) associated with the cell determined by
(pi, θj). Figure 3.3 (b) illustrates the subdivision of the parameter space.
Fig.3.3 (a) Normal representation of a line (b) Subdivision of the ρθ-plane into cells
The range of angle θ is ±90°, measured with respect to the x-axis. Thus with reference to Fig. 3.3
(a), a horizontal line has θ = 0°, with ρ being equal to the positive x-intercept. Similarly, a
vertical line has θ = 90°, with p being equal to the positive y-intercept, or θ = - 90°, with ρ being
equal to the negative y-intercept.
In this process we have a global approach for edge detection and linking based on representing
edge segments in the form of a graph and searching the graph for low-cost paths that correspond
to significant edges. This representation provides a rugged approach that performs well in the
presence of noise.
Digital Image Processing Question & Answers
We begin the development with some basic definitions. A graph G = (N,U) is a finite, nonempty
set of nodes N, together with a set U of unordered pairs of distinct elements of N. Each pair (ni,
nj) of U is called an arc. A graph in which the arcs are directed is called a directed graph. If an
arc is directed from node ni to node nj, then nj is said to be a successor of the parent node ni. The
process of identifying the successors of a node is called expansion of the node. In each graph we
define levels, such that level 0 consists of a single node, called the start or root node, and the
nodes in the last level are called goal nodes. A cost c (ni, nj) can be associated with every arc (ni,
nj). A sequence of nodes n1, n2... nk, with each node ni being a successor of node ni-1 is called a
path from n1 to nk. The cost of the entire path is
The following discussion is simplified if we define an edge element as the boundary between
two pixels p and q, such that p and q are 4-neighbors, as Fig.3.4 illustrates. Edge elements are
identified by the xy-coordinates of points p and q. In other words, the edge element in Fig. 3.4 is
defined by the pairs (xp, yp) (xq, yq). Consistent with the definition an edge is a sequence of
connected edge elements.
We can illustrate how the concepts just discussed apply to edge detection using the
3 X 3 image shown in Fig. 3.5 (a). The outer numbers are pixel
Digital Image Processing Question & Answers
Fig.3.5 (a) A 3 X 3 image region, (b) Edge segments and their costs, (c) Edge corresponding
to the lowest-cost path in the graph shown in Fig. 3.6
coordinates and the numbers in brackets represent gray-level values. Each edge element, defined
by pixels p and q, has an associated cost, defined as
where H is the highest gray-level value in the image (7 in this case), and f(p) and f(q) are the
gray-level values of p and q, respectively. By convention, the point p is on the right-hand side of
the direction of travel along edge elements. For example, the edge segment (1, 2) (2, 2) is
between points (1, 2) and (2, 2) in Fig. 3.5 (b). If the direction of travel is to the right, then p is
the point with coordinates (2, 2) and q is point with coordinates (1, 2); therefore, c (p, q) = 7 - [7
- 6] = 6. This cost is shown in the box below the edge segment. If, on the other hand, we are
traveling to the left between the same two points, then p is point (1, 2) and q is (2, 2). In this case
the cost is 8, as shown above the edge segment in Fig. 3.5(b). To simplify the discussion, we
assume that edges start in the top row and terminate in the last row, so that the first element of an
edge can be only between points (1, 1), (1, 2) or (1, 2), (1, 3). Similarly, the last edge element has
to be between points (3, 1), (3, 2) or (3, 2), (3, 3). Keep in mind that p and q are 4-neighbors, as
noted earlier. Figure 3.6 shows the graph for this problem. Each node (rectangle) in the graph
corresponds to an edge element from Fig. 3.5. An arc exists between two nodes if the two
corresponding edge elements taken in succession can be part of an edge.
Digital Image Processing Question & Answers
Fig. 3.6 Graph for the image in Fig.3.5 (a). The lowest-cost path is shown dashed.
As in Fig. 3.5 (b), the cost of each edge segment, is shown in a box on the side of the arc leading
into the corresponding node. Goal nodes are shown shaded. The minimum cost path is shown
dashed, and the edge corresponding to this path is shown in Fig. 3.5 (c).
Digital Image Processing Question & Answers
Because of its intuitive properties and simplicity of implementation, image thresholding enjoys a
central position in applications of image segmentation.
Global Thresholding:
The simplest of all thresholding techniques is to partition the image histogram by using a single
global threshold, T. Segmentation is then accomplished by scanning the image pixel by pixel and
labeling each pixel as object or back-ground, depending on whether the gray level of that pixel is
greater or less than the value of T. As indicated earlier, the success of this method depends
entirely on how well the histogram can be partitioned.
Digital Image Processing Question & Answers
Fig.4.1 FIGURE 10.28 (a) Original image, (b) Image histogram, (c) Result of global
thresholding with T midway between the maximum and minimum gray levels.
Figure 4.1(a) shows a simple image, and Fig. 4.1(b) shows its histogram. Figure 4.1(c) shows
the result of segmenting Fig. 4.1(a) by using a threshold T midway between the maximum and
minimum gray levels. This threshold achieved a "clean" segmentation by eliminating the
shadows and leaving only the objects themselves. The objects of interest in this case are darker
than the background, so any pixel with a gray level ≤ T was labeled black (0), and any pixel with
a gray level ≥ T was labeled white (255).The key objective is merely to generate a binary image,
so the black-white relationship could be reversed. The type of global thresholding just described
can be expected to be successful in highly controlled environments. One of the areas in which
this often is possible is in industrial inspection applications, where control of the illumination
usually is feasible.
Digital Image Processing Question & Answers
2. Segment the image using T. This will produce two groups of pixels: G1 consisting of all pixels
with gray level values >T and G2 consisting of pixels with values < T.
3. Compute the average gray level values µ1 and µ2 for the pixels in regions G1 and G2.
5. Repeat steps 2 through 4 until the difference in T in successive iterations is smaller than a
predefined parameter To.
When there is reason to believe that the background and object occupy comparable areas in the
image, a good initial value for T is the average gray level of the image. When objects are small
compared to the area occupied by the background (or vice versa), then one group of pixels will
dominate the histogram and the average gray level is not as good an initial choice. A more
appropriate initial value for T in cases such as this is a value midway between the maximum and
minimum gray levels. The parameter To is used to stop the algorithm after changes become small
in terms of this parameter. This is used when speed of iteration is an important issue.
Imaging factors such as uneven illumination can transform a perfectly segmentable histogram
into a histogram that cannot be partitioned effectively by a single global threshold. An approach
for handling such a situation is to divide the original image into subimages and then utilize a
Digital Image Processing Question & Answers
different threshold to segment each subimage. The key issues in this approach are how to
subdivide the image and how to estimate the threshold for each resulting subimage. Since the
threshold used for each pixel depends on the location of the pixel in terms of the subimages, this
type of thresholding is adaptive.
Fig.5 (a) Original image, (b) Result of global thresholding. (c) Image subdivided into
individual subimages (d) Result of adaptive thresholding.
We illustrate adaptive thresholding with a example. Figure 5(a) shows the image, which we
concluded could not be thresholded effectively with a single global threshold. In fact, Fig. 5(b)
shows the result of thresholding the image with a global threshold manually placed in the valley
of its histogram. One approach to reduce the effect of nonuniform illumination is to subdivide
the image into smaller subimages, such that the illumination of each subimage is approximately
uniform. Figure 5(c) shows such a partition, obtained by subdividing the image into four equal
parts, and then subdividing each part by four again. All the subimages that did not contain a
boundary between object and back-ground had variances of less than 75. All subimages
containing boundaries had variances in excess of 100. Each subimage with variance greater than
100 was segmented with a threshold computed for that subimage using the algorithm. The initial
Digital Image Processing Question & Answers
value for T in each case was selected as the point midway between the minimum and maximum
gray levels in the subimage. All subimages with variance less than 100 were treated as one
composite image, which was segmented using a single threshold estimated using the same
algorithm. The result of segmentation using this procedure is shown in Fig. 5(d).
With the exception of two subimages, the improvement over Fig. 5(b) is evident. The boundary
between object and background in each of the improperly segmented subimages was small and
dark, and the resulting histogram was almost unimodal.
It is intuitively evident that the chances of selecting a "good" threshold are enhanced
considerably if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.
One approach for improving the shape of histograms is to consider only those pixels that lie on
or near the edges between objects and the background. An immediate and obvious improvement
is that histograms would be less dependent on the relative sizes of objects and the background.
For instance, the histogram of an image composed of a small object on a large background area
(or vice versa) would be dominated by a large peak because of the high concentration of one type
of pixels.
If only the pixels on or near the edge between object and the background were used, the
resulting histogram would have peaks of approximately the same height. In addition, the
probability that any of those given pixels lies on an object would be approximately equal to the
probability that it lies on the back-ground, thus improving the symmetry of the histogram peaks.
The principal problem with the approach just discussed is the implicit assumption that the edges
between objects and background arc known. This information clearly is not available during
segmentation, as finding a division between objects and background is precisely what
segmentation is all about. However, an indication of whether a pixel is on an edge may be
obtained by computing its gradient. In addition, use of the Laplacian can yield information
regarding whether a given pixel lies on the dark or light side of an edge. The average value of the
Laplacian is 0 at the transition of an edge, so in practice the valleys of histograms formed from
Digital Image Processing Question & Answers
where the symbols 0, +, and - represent any three distinct gray levels, T is a threshold, and the
gradient and Laplacian are computed at every point (x, y). For a dark object on a light
background, the use of the Eqn. produces an image s(x, y) in which (1) all pixels that are not on
an edge (as determined by being less than T) are labeled 0; (2) all pixels on the dark side of
an edge are labeled +; and (3) all pixels on the light side of an edge are labeled -. The symbols +
and - in Eq. above are reversed for a light object on a dark background. Figure 6.1 shows the
labeling produced by Eq. for an image of a dark, underlined stroke written on a light background.
The information obtained with this procedure can be used to generate a segmented,
binary image in which l's correspond to objects of interest and 0's correspond to the background.
The transition (along a horizontal or vertical scan line) from a light background to a dark object
must be characterized by the occurrence of a - followed by a + in s (x, y). The interior of the
object is composed of pixels that are labeled either 0 or +. Finally, the transition from the object
back to the background is characterized by the occurrence of a + followed by a -. Thus a
horizontal or vertical scan line containing a section of an object has the following structure:
Digital Image Processing Question & Answers
where (…) represents any combination of +, -, and 0. The innermost parentheses contain object
points and are labeled 1. All other pixels along the same scan line are labeled 0, with the
exception of any other sequence of (- or +) bounded by (-, +) and (+, -).
Figure 6.2 (a) shows an image of an ordinary scenic bank check. Figure 6.3 shows the histogram
as a function of gradient values for pixels with gradients greater than 5. Note that this histogram
has two dominant modes that are symmetric, nearly of the same height, and arc separated by a
distinct valley. Finally, Fig. 6.2(b) shows the segmented image obtained by with T at or near the
midpoint of the valley. Note that this example is an illustration of local thresholding, because the
value of T was determined from a histogram of the gradient and Laplacian, which are local
properties.
The objective of segmentation is to partition an image into regions. We approached this problem
by finding boundaries between regions based on discontinuities in gray levels, whereas
segmentation was accomplished via thresholds based on the distribution of pixel properties, such
as gray-level values or color.
Basic Formulation:
Let R represent the entire image region. We may view segmentation as a process that partitions
R into n subregions, R1, R2..., Rn, such that
Digital Image Processing Question & Answers
Here, P (Ri) is a logical predicate defined over the points in set Ri and Ǿ` is the null set.
Condition (a) indicates that the segmentation must be complete; that is, every pixel must be in a
region. Condition (b) requires that points in a region must be connected in some predefined
sense. Condition (c) indicates that the regions must be disjoint. Condition (d) deals with the
properties that must be satisfied by the pixels in a segmented region—for example P (Ri) =
TRUE if all pixels in Ri, have the same gray level. Finally, condition (c) indicates that regions Ri
and Rj are different in the sense of predicate P.
Region Growing:
As its name implies, region growing is a procedure that groups pixels or subregions into larger
regions based on predefined criteria. The basic approach is to start with a set of "seed" points and
from these grow regions by appending to each seed those neighboring pixels that have properties
similar to the seed (such as specific ranges of gray level or color). When a priori information is
not available, the procedure is to compute at every pixel the same set of properties that ultimately
will be used to assign pixels to regions during the growing process. If the result of these
computations shows clusters of values, the pixels whose properties place them near the centroid
of these clusters can be used as seeds.
The selection of similarity criteria depends not only on the problem under consideration, but also
on the type of image data available. For example, the analysis of land-use satellite imagery
depends heavily on the use of color. This problem would be significantly more difficult, or even
impossible, to handle without the inherent information available in color images. When the
images are monochrome, region analysis must be carried out with a set of descriptors based on
gray levels and spatial properties (such as moments or texture).
Basically, growing a region should stop when no more pixels satisfy the criteria for inclusion in
that region. Criteria such as gray level, texture, and color, are local in nature and do not take into
account the "history" of region growth. Additional criteria that increase the power of a region-
growing algorithm utilize the concept of size, likeness between a candidate pixel and the pixels
grown so far (such as a comparison of the gray level of a candidate and the average gray level of
Digital Image Processing Question & Answers
the grown region), and the shape of the region being grown. The use of these types of descriptors
is based on the assumption that a model of expected results is at least partially available.
Figure 7.1 (a) shows an X-ray image of a weld (the horizontal dark region) containing several
cracks and porosities (the bright, white streaks running horizontally through the middle of the
image). We wish to use region growing to segment the regions of the weld failures. These
segmented features could be used for inspection, for inclusion in a database of historical studies,
for controlling an automated welding system, and for other numerous applications.
Fig.7.1 (a) Image showing defective welds, (b) Seed points, (c) Result of region growing, (d)
Boundaries of segmented ; defective welds (in black).
Digital Image Processing Question & Answers
The first order of business is to determine the initial seed points. In this application, it is known
that pixels of defective welds tend to have the maximum allowable digital value B55 in this
case). Based on this information, we selected as starting points all pixels having values of 255.
The points thus extracted from the original image are shown in Fig. 10.40(b). Note that many of
the points are clustered into seed regions.
The next step is to choose criteria for region growing. In this particular
example we chose two criteria for a pixel to be annexed to a region: (1) The absolute gray-level
difference between any pixel and the seed had to be less than 65. This number is based on the
histogram shown in Fig. 7.2 and represents the difference between 255 and the location of the
first major valley to the left, which is representative of the highest gray level value in the dark
weld region. (2) To be included in one of the regions, the pixel had to be 8-connected to at least
one pixel in that region.
The procedure just discussed grows regions from a set of seed points. An alternative is to
subdivide an image initially into a set of arbitrary, disjointed regions and then merge and/or split
the regions in an attempt to satisfy the conditions. A split and merge algorithm that iteratively
works toward satisfying these constraints is developed.
Let R represent the entire image region and select a predicate P. One approach for segmenting R
is to subdivide it successively into smaller and smaller quadrant regions so that, for any region
Ri, P(Ri) = TRUE. We start with the entire region. If P(R) = FALSE, we divide the image into
quadrants. If P is FALSE for any quadrant, we subdivide that quadrant into subquadrants, and so
on. This particular splitting technique has a convenient representation in the form of a so-called
quadtree (that is, a tree in which nodes have exactly four descendants), as illustrated in Fig. 7.3.
Note that the root of the tree corresponds to the entire image and that each node corresponds to a
subdivision. In this case, only R4 was subdivided further.
If only splitting were used, the final partition likely would contain adjacent regions with identical
properties. This drawback may be remedied by allowing merging, as well as splitting. Satisfying
the constraints, requires merging only adjacent regions whose combined pixels satisfy the
predicate P. That is, two adjacent regions Rj and Rk are merged only if P (Rj U Rk) = TRUE.
The preceding discussion may be summarized by the following procedure, in which, at any step
we
1. Split into four disjoint quadrants any region Ri, for which P (Ri) = FALSE.
2. Merge any adjacent regions Rj and Rk for which P (Rj U Rk) = TRUE.
Digital Image Processing Question & Answers
Several variations of the preceding basic theme are possible. For example, one possibility is to
split the image initially into a set of blocks. Further splitting is carried out as described
previously, but merging is initially limited to groups of four blocks that are descendants in the
quadtree representation and that satisfy the predicate P. When no further mergings of this type
are possible, the procedure is terminated by one final merging of regions satisfying step 2. At this
point, the merged regions may be of different sizes. The principal advantage of this approach is
that it uses the same quadtree for splitting and merging, until the final merging step.