05 - Introduction To Digital Image Processing
05 - Introduction To Digital Image Processing
http://ebooks.cambridge.org/
Paul Suetens
Chapter
Digital images cyan, magenta, and yellow. Cyan is the color of a mate-
Visible light is essentially electromagnetic radiation rial, seen in white light, that absorbs red but reflects
with wavelengths between 400 and 700 nm. Each green and blue, and can thus be obtained by additive
wavelength corresponds to a different color. On the mixing of equal amounts of green and blue light. Sim-
other hand, a particular color does not necessarily ilarly, magenta is the result of the absorption of green
correspond to a single wavelength. Purple light, for light and consists of equal amounts of red and blue
example, is a combination of red and blue light. In light, and yellow is the result of the absorption of blue
general, a color is characterized by a spectrum of and consists of equal amounts of red and green light.
different wavelengths. Therefore, subtractive mixing of cyan and magenta
The human retina contains three types of photore- gives blue, subtractive mixing of cyan and yellow
ceptor cone cells that transform the incident light with gives green, and subtractive mixing of yellow and
different color filters. Because there are three types of magenta gives red. Subtractive mixing of yellow, cyan,
cone receptors, three numbers are necessary and suf- and magenta produces black (only absorption and no
ficient to describe any perceptible color. Hence, it is reflection) (see Figure 1.1(b)).
possible to produce an arbitrary color by superimpos- Note that equal distances in physical intensity are
ing appropriate amounts of three primary colors, each not perceived as equal distances in brightness. Inten-
with its specific spectral curve. In an additive color sity levels must be spaced logarithmically, rather than
reproduction system, such as a color monitor, these linearly, to achieve equal steps in perceived bright-
three primaries are red, green, and blue light. The ness. Hue refers to the dominant wavelength in the
color is then specified by the amounts of red, green, spectrum, and represents the different colors. Satu-
and blue. Equal amounts of red, green, and blue give ration describes the amount of white light present in
white (see Figure 1.1(a)). Ideal white light has a flat the spectrum. If no white light is present, the satura-
spectrum in which all wavelengths are present. In prac- tion is 100%. Saturation distinguishes colorful tones
tice, white light sources approximate this property. from pastel tones at the same hue. In the color cone
In a subtractive color reproduction system, such as of Figure 1.2, equal distances between colors by no
printing or painting, these three primaries typically are
Hue
Saturation
Brightness
(a) (b)
Figure 1.1 Color mixing: (a) additive color mixing, (b) subtractive
color mixing. Figure 1.2 Hue, brightness, and saturation.
means correspond to equal perceptual differences. The gray values (12 bpp). The problem with too many gray
Commission Internationale de l’Eclairage (CIE) has values, however, is that small differences in brightness
defined perceptually more uniform color spaces like cannot be perceived on the display. This problem can
L ∗ u ∗ v ∗ and L ∗ a ∗ b ∗ . A discussion of pros and cons be overcome for example by expanding a small gray
of different color spaces is beyond the scope of this value interval into a larger one by using a suitable gray
textbook. value transformation, as discussed on p. 4 below.
While chromatic light needs three descriptors or In the process of digital imaging, the continuous
numbers to characterize it, achromatic light, as pro- looking world has to be captured onto the finite num-
duced by a black-and-white monitor, has only one ber of pixels of the image grid. The conversion from a
descriptor, its brightness or gray value. Achromatic continuous function to a discrete function, retaining
light is light with a saturation of 0%. It contains only only the values at the grid points, is called sampling
white light. and is discussed in detail in Appendix A, p. 228.
Given a set of possible gray levels or colors and Much information about an image is contained in
a (rectangular) grid, a digital image attributes a gray its histogram. The histogram h of an image is a prob-
value (i.e., brightness) or a color (i.e., hue, saturation ability distribution on the set of possible gray levels.
and brightness) to each of the grid points or pixels. In The probability of a gray value v is given by its relative
a digital image, the gray levels are integers. Although frequency in the image, that is,
brightness values are continuous in real life, in a digital
image we have only a limited number of gray levels number of pixels having gray value v
h(v) = . (1.1)
at our disposal. The conversion from analog sam- total number of pixels
ples to discrete-valued samples is called quantization.
Figure 1.3 shows the same image using two different
quantizations. When too few gray values are used, con- Image quality
touring appears. The image is reduced to an artificial The resolution of a digital image is sometimes wrongly
looking height map. How many gray values are needed defined as the linear pixel density (expressed in dots
to produce a continuous looking image? Assume that per inch). This is, however, only an upper bound for
n + 1 gray values are displayed with corresponding the resolution. Resolution is also determined by the
physical intensities I0 , I1 , . . . , In . I0 is the lowest attain- imaging process. The more blurring, the lower is the
able intensity and In the maximum intensity. The ratio resolution. Factors that contribute to the unsharpness
In /I0 is called the dynamic range. The human eye can- of an image are (1) the characteristics of the imag-
not distinguish subsequent intensities Ij and Ij+1 if ing system, such as the focal spot and the amount of
they differ less than 1%, i.e., if Ij+1 ≤ 1.01 Ij . In detector blur, (2) the scene characteristics and geom-
that case In ≤ 1.01n I0 and n ≥ log1.01 (In /I0 ). For etry, such as the shape of the subject, its position and
a dynamic range of 100 the required number of gray motion, and (3) the viewing conditions.
values is 463 and a dynamic range of 1000 requires Resolution can be defined as follows. When imag-
694 different gray values for a continuous looking ing a very small, bright point on a dark background,
brightness. Most digital medical images today use 4096 this dot will normally not appear as sharp in the image
2 (a) (b)
(a) (b)
The contrast is defined by (1) the imaging process, physically impossible for the human eye to distinguish
such as the source intensity and the absorption effi- all these gray values at once in a single image. Conse-
ciency or sensitivity of the capturing device, (2) the quently, not all the diagnostic information encoded in
scene characteristics, such as the physical properties, the image may be perceived. Meaningful details must
size and shape of the object, and the use of contrast have a sufficiently high contrast to allow the clinician
agents, and (3) the viewing conditions, such as the to detect them easily.
room illumination and display equipment. Because The larger the number of gray values in the image,
the OTF drops off for larger frequencies, the con- the more important this issue becomes, as lower
trast of very small objects will be influenced by the contrast features may become available in the image
resolution as well. data. Therefore, image enhancement will not become
A third quality factor is image noise. The emission less important as the quality of digital image captur-
and detection of light and all other types of electromag- ing systems improves. On the contrary, it will gain
netic waves are stochastic processes. Because of the importance.
statistical nature of imaging, noise is always present.
It is the random component in the image. If the noise
level is high compared with the image intensity of an Gray level transformations
object, the meaningful information is lost in the noise. Given a digital image I that attributes a gray value
An important measure, obtained from signal theory, (i.e., brightness) to each of the pixels (i, j), a gray level
is therefore the signal-to-noise ratio (SNR or S/N). In transformation is a function g that transforms each
the terminology of images this is the contrast-to-noise gray level I (i, j) to another value I (i, j) independent
ratio (CNR). Both contrast and noise are frequency of the position (i, j). Hence, for all pixels (i, j)
dependent. An estimate of the noise can be obtained
by making a flat-field image, i.e., an image without I (i, j) = g (I (i, j)). (1.2)
an object between the source and the detector. The
noise amplitude as a function of spatial frequency can
be calculated from the square root of the so-called In practice, g is an increasing function. Instead of
Wiener spectrum, which is the Fourier transform of transforming gray values it is also possible to oper-
the autocorrelation of a flat-field image. ate on color (i.e., hue, saturation and brightness). In
Artifacts are artificial image features such as dust or that case three of these transformations are needed to
scratches in photographs. Examples in medical images transform colors to colors.
are metal streak artifacts in computed tomography Note that, in this textbook, the notation I is used
(CT) images and geometric distortions in magnetic not only for the physical intensity but also for the gray
resonance (MR) images. Artifacts may also be intro- value (or color), which are usually not identical. The
duced by digital image processing, such as edge gray value can represent brightness (logarithm of the
enhancement. Because artifacts may hamper the diag- intensity, see p. 1), relative signal intensity or any other
nosis or yield incorrect measurements, it is important derived quantity. Nevertheless the terms intensity and
to avoid them or at least understand their origin. intensity image are loosely used as synonyms for gray
In the following chapters, image resolution, noise, value and gray value image.
contrast, and artifacts will be discussed for each of the If pixel (i1 , j1 ) appears brighter than pixel (i2 , j2 ) in
imaging modalities. the original image, this relation holds after the gray
level transformation. The main use of such a gray
level transformation is to increase the contrast in some
Basic image operations regions of the image. The price to be paid is a decreased
In this section a number of basic mathematical opera- contrast in other parts of the image. Indeed, in a region
tions on images are described. They can be employed containing pixels with gray values in the range where
for image enhancement, analysis and visualization. the slope of g is larger than 1, the difference between
The aim of medical image enhancement is to allow these gray values increases. In regions with gray val-
the clinician to perceive better all the relevant diag- ues in the range with slope smaller than 1, gray values
nostic information present in the image. In digital come closer together and different values may even
4 radiography for example, 12-bit images with 4096 pos- become identical after the transformation. Figure 1.6
sible gray levels are available. As discussed above, it is shows an example of such a transformation.
white
4500
4000
3500
3000
2500
2000
1500
1000
500
black 0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
black white
(a) (b) (c)
Figure 1.6 A gray level transformation that increases the contrast in dark areas and decreases the contrast in bright regions. It can be used
when the clinically relevant information is situated in the dark areas, such as the lungs in this example: (b) the original image, (c) the
transformed image.
4000 4000
Figure 1.7 (a) Window/leveling with
l = 1500, w = 1000. (b) Thresholding with
3000 3000 tr = 1000.
2000 2000
1000 1000
0 0
0 1000 2000 3000 4000 0 1000 2000 3000 4000
(a) (b)
A particular and popular transformation is the These operations can be very useful for images with a
window/level operation (see Figure 1.7(a)). In this oper- bimodal histogram (see Figure 1.8).
ation, an interval or window is selected, determined by
the window center or level l, and the window width w.
Explicitly
Multi-image operations
A simple operation is adding or subtracting images in
w
0 for t < l − a pixelwise way. For two images I1 and I2 , the sum I+
2
M w w w and the difference I− are defined as
gl,w (t ) = t − l + for l − ≤ t ≤l+
w 2 2 2
I+ (i, j) = I1 (i, j) + I2 (i, j)
w (1.5)
M for t > l + ,
2 I− (i, j) = I1 (i, j) − I2 (i, j). (1.6)
(1.3)
where M is the maximal available gray value. Con- If these operations yield values outside the available
trast outside the window is lost completely, whereas gray value range, the resulting image can be brought
the portion of the range lying inside the window is back into that range by a linear transformation. The
stretched to the complete gray value range. average of n images is defined as
An even simpler operation is thresholding
1
(Figure 1.7(b)). Here all gray levels up to a certain Iav (i, j) = (I1 (i, j) + · · · + In (i, j)). (1.7)
threshold tr are set to zero, and all gray levels above n
the threshold equal the maximal gray value Averaging can be useful to decrease the noise in a
gtr (t ) = 0 for t ≤ tr sequence of images of a motionless object (Figure 1.9).
(1.4) The random noise averages out, whereas the object 5
gtr (t ) = M for t > tr.
remains unchanged (if the images match perfectly).
white
histogram
black
(a) (b)
(c) (d)
(a) (b)
This method can also be used for color images by aver- made, one without a contrast agent and another with
aging the different channels independently like gray contrast agent injected in the blood vessels. Subtrac-
level images. Subtraction can be used to get rid of the tion of these two images yields a pure image of the
background in two similar images. For example, in blood vessels because the subtraction deletes the other
6 blood vessel imaging (angiography), two images are anatomical features. Figure 1.10 shows an example.
Figure 1.10 (a) Radiographic image after injection of a contrast agent. (b) Mask image, that is, the same exposure before contrast injection.
(c) Subtraction of (a) and (b), followed by contrast enhancement. (Courtesy of Professor G. Wilms, Department of Radiology.)
To map the 3D image data onto the 2D image a pro- In practice, the flipped kernel h defined as h(i, j) =
jective transformation is needed. Assuming a pinhole f (−i, −j) is usually used. Hence, Eq. (1.13) can be
camera, such as an X-ray tube, any 3D point (x, y, z) rewritten as
is mapped onto its 2D projection point (u, v) by the
projective matrix (more details on p. 216) L(I )(i, j) = f ∗ I (i, j)
x = f (k, l)I (i − k, j − l)
u fx κx u0 0
v = κy fy v0 0 y k,l
z
w 0 0 1 0 = h(k, l)I (i + k, j + l)
1
u k,l
u w
v = v . = h • I (i, j), (1.14)
(1.11)
w
1 1
where h •I is the cross-correlation of h and I . If the filter
Using homogeneous coordinates the above geo- is symmetric, which is often the case, cross-correlation
metric transformations can all be represented by and convolution are identical.
matrices. In some cases, however, it might be neces- A cross-correlation of an image I (i, j) with a ker-
sary to use more flexible transformations. For exam- nel h has the following physical meaning. The kernel
ple, the comparison of images at different moments, h is used as an image template or mask that is shifted
such as in follow-up studies, may be hampered due across the image. For every image pixel (i, j), the tem-
to patient movement, organ deformations, e.g., dif- plate pixel h(0, 0), which typically lies in the center of
ferences in bladder and rectum filling, or breathing. the mask, is superimposed onto this pixel (i, j), and
Another example is the geometric distortion of mag- the values of the template and image that correspond
netic resonance images resulting from undesired devi- to the same positions are multiplied. Next, all these
ations of the magnetic field (see p. 92). Geometric values are summed. A cross-correlation emphasizes
transformations are discussed further in Chapter 7. patterns in the image similar to the template.
Often local filters with only a few pixels in diam-
eter are used. A simple example is the 3 × 3 mask
Filters with values 1/9 at each position (Figure 1.11). This
Linear filters filter performs an averaging on the image, making it
From linear system theory (see Eq. (A.22)), we know smoother and removing some noise. The filter gives
that an image I (i, j) can be written as follows: the same weight to the center pixel as to its neighbors.
A softer way of smoothing the image is to give a high
I (i, j) = I (k, l)δ(i − k, j − l). (1.12) weight to the center pixel and less weight to pixels fur-
k,l
ther away from the central pixel. A suitable filter for
For a linear shift-invariant transformation L (see also
Eq. (A.31)),
L(I )(i, j) = I (k, l)L(δ)(i − k, j − l)
k,l
1/9 1/9 1/9
= I (k, l)f (i − k, j − l)
k,l 1/9 1/9 1/9
1/9 1/9 1/9
= f (k, l)I (i − k, j − l)
k,l
this operation is the discretized Gaussian function Hence, the derivative is approximated by a convolu-
tion with a filter that is the sampled derivative of some
1
e(−r /2σ ) differentiable function f ( r ). This procedure can now
2 2
g (
r) = r = (i, j). (1.15)
2π σ 2
be used further to approximate the gradient and the
Small values are put to zero in order to produce a Laplacian of a digital image:
local filter. The Fourier transform of the Gaussian is
∇I = ∇f ∗ I
again Gaussian. In the Fourier domain, convolution
(1.17)
with a filter becomes multiplication. Taking this into ∇ 2I = ∇ 2f ∗ I ,
account, it is clear that a Gaussian filter attenuates
the high frequencies in the image. These averaging where it is understood that we use the discrete convo-
filters are therefore also called low-pass filters. In lution. If f is a Gaussian g , the following differential
contrast, filters that emphasize high frequencies are convolution operators are obtained:
called high-pass filters. A high-pass filter can be con-
structed simply from a low-pass one by subtracting 1
∇g (
r) = − g (
r ) · r
the low-pass filter g from the identity filter δ. A σ2
high-pass filter enhances small-scale variations in the (1.18)
2 1
image. It extracts edges and fine textures. An exam- r ) = 4 (r 2 − 2σ 2 ) · g (
∇ g ( r ).
ple of low-pass and high-pass filtering is shown in σ
Figure 1.12. For σ = 0.5, this procedure yields approximately the
Other types of linear filters are differential opera- following 3 × 3 filters (see Figure 1.13):
tors such as the gradient and the Laplacian. However,
these operations are not defined on discrete images. 0.01 0.08 0.01
Because derivatives are defined on differentiable func- Gaussian 0.08 0.64 0.08
tions, the computation is performed by first fitting a 0.01 0.08 0.01
differentiable function through the discrete data set.
This can be obtained by convolving the discrete image
0.05 0 −0.05
with a continuous function f . The derivative of this ∂
0.34 0 −0.34
result is evaluated at the points (i, j) of the origi- ∂x
0.05 0 −0.05
nal sampling grid. For the 1D partial derivative this
(1.19)
sequence of operations can be written as follows:
0.05 0.34 0.05
∂
0 0 0
∂ ∂ ∂y
I (i, j) ≈ I (k, l)f (x − k, y − l) −0.05 −0.34 −0.05
∂x ∂x
k,l x=i,y=j
0.3 0.7 0.3
∂f
= (i − k, j − l)I (k, l) . (1.16) ∇2 0.7 −4 0.7
∂x 0.3 0.7 0.3 9
k,l
(a) (b)
z
x
y
z
x
y
(c) (d)
Note that integration of a Gaussian over the whole Note that, if we compute the convolution of an image
spatial domain must be 1, and for the gradient and with a filter, it is necessary to extend the image at
Laplacian this must be 0. To satisfy this condition, the its boundaries because pixels lying outside the image
numbers in the templates above, which are spatially will be addressed by the convolution algorithm. This
limited, were adapted. is best done in a smooth way, for example by repeat-
The Laplacian of a Gaussian is sometimes approx- ing the boundary pixels. If not, artifacts appear at the
imated by a difference of Gaussians with different boundaries after the convolution.
values of σ . This can be derived from Eq. (1.18). As an application of linear filtering, let us discuss
Rewriting it as edge enhancement using unsharp masking . Figure 1.14
2 shows an example. As already mentioned, a low-pass
r 2 4 filter g can be used to split an image I into two parts: a
+ 2 g ( r ) − 2 g (r) (1.20)
σ4 σ σ smooth part g ∗ I , and the remaining high-frequency
shows us that the second term is proportional to the part I − g ∗ I containing the edges in the image or
original Gaussian g , while the first term drops off more image details. Hence
slowly because of the r 2 and acts as if it were a Gaussian
I = g ∗ I + (I − g ∗ I ). (1.22)
with a larger value of σ (the 2/σ 2 added to the r 2 /σ 4
makes it a monotonically decreasing function in the
Note that I − g ∗ I is a crude approximation of the
radial direction).
Laplacian of I . Unsharp masking enhances the image
Popular derivative filters are the Sobel operator
details by emphasizing the high-frequency part and
for the first derivative, and the average - δ for the
assigning it a higher weight. For some α > 0, the
Laplacian, which use integer filter elements:
output image I is then given by
1 0 −1
Sobel 2 0 −2 I = g ∗ I + (1 + α)(I − g ∗ I )
1 0 −1 = I + α(I − g ∗ I )
(1.21)
= (1 + α)I − α g ∗ I . (1.23)
1 1 1
average - δ 1 −8 1 The parameter α controls the strength of the enhance-
10 1 1 1 ment, and the parameter σ is responsible for the size
(a) (b)
(c) (d)
Figure 1.15 (a) Original karyotype (chromosome image). (b) Image smoothed with a Gaussian filter. (c) Image filtered with a median filter.
of the frequency band that is enhanced. The smaller in the image. The output image is, however, much
the value of σ , the more unsharp masking focuses on smoother than the input image. In particular, edges
the finest details. are smeared out and may even disappear. To avoid
smoothing, it can therefore be better to calculate the
Nonlinear filters median instead of the mean value in a small window
Not every goal can be achieved by using linear filters. around each pixel. This procedure better preserves
Many problems are better solved with nonlinear meth- the edges (check this with paper and pencil on a step
ods. Consider, for example, the denoising problem. edge). Figure 1.15 shows an example on a chromosome 11
As explained above, the averaging filter removes noise image.
(a) (b)
(c) (d)
Multiscale image processing fixed size of the mask, which is a parameter that must
In the previous sections a number of basic image be chosen. Figure 1.16 shows the effect of the filter size
operations have been described that can be employed for unsharp masking. Using a low-pass filter, the image
for image enhancement and analysis (see for example is split into a low-pass and a remaining high-pass part.
Figures 1.6 and 1.14). Next, the high-pass part is emphasized, both parts are
Gray value transformations (Figure 1.6), such added again (Eq. (1.23)), and the result is normalized
as the widespread window/level operation, increase to the available gray value range. If the filter size is
the contrast in a subpart of the gray value scale. small, this procedure emphasizes small-scale features
They are quite useful for low-contrast objects situ- and suppresses gray value variations that extend over
ated in the enhanced gray value band. Unfortunately, larger areas in the image. With a large-size filter, large
features outside this gray value interval are attenu- image features are enhanced at the expense of the small
ated instead of enhanced. In addition, gray value details. With this method, the following problems are
transformations do not make use of the spatial rela- encountered.
tionship among object pixels and therefore equally • The image operation is tuned to a particular fre-
enhance meaningful and meaningless features such as quency band that is predetermined by the choice
noise. of the filter size. However, diagnostic information
Spatial operations overcome this problem. Dif- is available at all scales in the image and is not
ferential operations, such as unsharp masking limited to a particular frequency band.
(Figure 1.14), enhance gray value variations or edges, • Gray value variations in the selected frequency
whereas other operations, such as spatial averaging
band are intensified equally. This is desired for low-
and median filtering, reduce the noise. However, they
contrast features but unnecessary for high-contrast
focus on features of a particular size because of the
features that are easily perceivable.
12
It is clear that a method is needed that is inde- this textbook. More about multiscale image analysis
pendent of the spatial extent or scale of the image can be found, for example, in [1].
features and emphasizes the amplitude of only the
low-contrast features. Multiscale image processing has
been studied extensively, not only by computer scien-
tists but also by neurophysiologists. It is well known [1] B. M. ter Haar Romeny. Front-End Vision and Multi-Scale Image
Analysis: Multi-Scale Computer Vision Theory and Applications writ-
that the human visual system makes use of a multiscale ten in Mathematica, Volume 27 of Computational Imaging and
approach. However, this theory is beyond the scope of Vision. Springer, 2003.
13