Fundamental Steps in Digital Image Processing
Fundamental Steps in Digital Image Processing
Fundamental Steps in Digital Image Processing
There are 11 fundamental steps in digital image processing (DIP) , all these steps may have sub-
steps. The fundamental steps in DIP are described below with a neat block diagram.
1. Image Acquisition
This is the first fundamental steps in digital image processing. Image acquisition could be as
simple as being given an image that is already in digital form. Generally, the image acquisition
stage involves pre-processing, such as scaling etc.
2. Image Enhancement
Image enhancement is among the simplest and most appealing areas of digital image processing.
Basically, the idea behind enhancement techniques is to bring out detail that is obscured, or
simply to highlight certain features of interest in an image. Such as, changing brightness &
contrast etc.
3. Image Restoration
Image restoration is an area that also deals with improving the appearance of an image.
However, unlike enhancement, which is subjective, image restoration is objective, in the sense
that restoration techniques tend to be based on mathematical or probabilistic models of image
degradation.
4. Color Image Processing
Color image processing is an area that has been gaining its importance because of the significant
increase in the use of digital images over the Internet. This may include color modeling and
processing in a digital domain etc.
5. Wavelets and Multi-Resolution Processing
Wavelets are the foundation for representing images in various degrees of resolution. Images
subdivision successively into smaller regions for data compression and for pyramidal
representation.
6. Compression
Compression deals with techniques for reducing the storage required to save an image or the
bandwidth to transmit it. Particularly in the uses of internet it is very much necessary to
compress data.
7. Morphological Processing
Morphological processing deals with tools for extracting image components that are useful in the
representation and description of shape.
8. Segmentation
1
Segmentation procedures partition an image into its constituent parts or objects. In general,
autonomous segmentation is one of the most difficult tasks in digital image processing. A rugged
segmentation procedure brings the process a long way toward successful solution of imaging
problems that require objects to be identified individually.
9. Representation and Description
Representation and description almost always follow the output of a segmentation stage, which
usually is raw pixel data, constituting either the boundary of a region or all the points in the
region itself. Choosing a representation is only part of the solution for transforming raw data into
a form suitable for subsequent computer processing. Description deals with extracting attributes
that result in some quantitative information of interest or are basic for differentiating one class of
objects from another.
10. Object recognition
Recognition is the process that assigns a label, such as, “vehicle” to an object based on its
descriptors.
11. Knowledge Base
Knowledge may be as simple as detailing regions of an image where the information of interest
is known to be located, thus limiting the search that has to be conducted in seeking that
information. The knowledge base also can be quite complex, such as an interrelated list of all
major possible defects in a materials inspection problem or an image database containing high-
resolution satellite images of a region in connection with change-detection applications.
(10 mark)
2
3
Introduction to JPEG Compression (10 MARK)
JPEG is an image compression standard which was developed by "Joint Photographic Experts
Group". In 1992, it was accepted as an international standard. JPEG is a lossy image
compression method. JPEG compression uses the DCT (Discrete Cosine Transform) method for
coding transformation. It allows a tradeoff between storage size and the degree of compression
can be adjusted.
Following are the steps of JPEG Image Compression-
Step 1: The input image is divided into a small block which is having 8x8 dimensions. This
dimension is sum up to 64 units. Each unit of the image is called pixel.
Step 2: JPEG uses [Y,Cb,Cr] model instead of using the [R,G,B] model. So in the 2nd step, RGB
is converted into YCbCr.
93.6K
We Live in a Cosmic Void
4
Step 3: After the conversion of colors, it is forwarded to DCT. DCT uses a cosine function and
does not use complex numbers. It converts information?s which are in a block of pixels from the
spatial domain to the frequency domain.
DCT Formula
Step 4: Humans are unable to see important aspects of the image because they are having high
frequencies. The matrix after DCT conversion can only preserve values at the lowest frequency
that to in certain point. Quantization is used to reduce the number of bits per sample.
There are two types of Quantization:
1. Uniform Quantization
2. Non-Uniform Quantization
Step 5: The zigzag scan is used to map the 8x8 matrix to a 1x64 vector. Zigzag scanning is used
to group low-frequency coefficients to the top level of the vector and the high coefficient to the
bottom. To remove the large number of zero in the quantized matrix, the zigzag matrix is used.
Step 6: Next step is vectoring, the different pulse code modulation (DPCM) is applied to the DC
component. DC components are large and vary but they are usually close to the previous value.
DPCM encodes the difference between the current block and the previous block.
5
Step 7: In this step, Run Length Encoding (RLE) is applied to AC components. This is done
because AC components have a lot of zeros in it. It encodes in pair of (skip, value) in which skip
is non zero value and value is the actual coded value of the non zero components.
( 5 mark)
6
7
Smoothing Spatial Filter: ( 5 mark)
Smoothing filter is used for blurring and noise reduction in the image. Blurring is pre-processing
steps for removal of small details and Noise Reduction is accomplished by blurring.
Types of Smoothing Spatial Filter:
1. Linear Filter (Mean Filter)
2. Order Statistics (Non-linear) filter
These are explained as following below.
1. Mean Filter:
Linear spatial filter is simply the average of the pixels contained in the neighborhood of
the filter mask. The idea is replacing the value of every pixel in an image by the average
of the grey levels in the neighborhood define by the filter mask.
Types of Mean filter:
(i) Averaging filter: It is used in reduction of the detail in image. All coefficients
are equal.
(ii) Weighted averaging filter: In this, pixels are multiplied by different
coefficients. Center pixel is multiplied by a higher value than average filter.
8
2. Order Statistics Filter:
It is based on the ordering the pixels contained in the image area encompassed by the
filter. It replaces the value of the center pixel with the value determined by the ranking
result. Edges are better preserved in this filtering.
Types of Order statistics filter:
(i) Minimum filter: 0th percentile filter is the minimum filter. The value of the
center is replaced by the smallest value in the window.
(ii) Maximum filter: 100th percentile filter is the maximum filter. The value of
the center is replaced by the largest value in the window.
(iii) Median filter: Each pixel in the image is considered. First neighboring pixels
are sorted and original values of the pixel is replaced by the median of the list.
BASIC RELATIONSHIP BETWEEN PIXELS ( 10 mark):
PIXEL In digital imaging, a pixel (or picture element) is a single point in a raster image.
The pixel is the smallest addressable screen element; it is the smallest unit of picture that
can be controlled. Each pixel has its own address. The address of a pixel corresponds to
its coordinates. Pixels are normally arranged in a 2-dimensional grid, and are often
represented using dots or squares. Each pixel is a sample of an original image; more
samples typically provide more accurate representations of the original. The intensity of
each pixel is variable. In color image systems, a color is typically represented by three or
four component intensities such as red, green, and blue, or cyan, magenta, yellow, and
black.
The word pixel is based on a contraction of pix ("pictures") and el (for "element");
similar formations with el for "element" include the words: voxel and texel.
Bits per pixel
The number of distinct colors that can be represented by a pixel depends on the number of bits
per pixel (bpp). A 1 bpp image uses 1-bit for each pixel, so each pixel can be either on or off.
Each additional bit doubles the number of colors available, so a 2 bpp image can have 4 colors,
and a 3 bpp image can have 8 colors:
24 bpp, 224 ≈ 16.8 million colors ("True color") For color depths of 15 or more bits per pixel,
the depth is normally the sum of the bits allocated to each of the red, green, and blue
components. High color, usually meaning 16 bpp, normally has five bits for red and blue, and six
9
bits for green, as the human eye is more sensitive to errors in green than in the other two primary
colors. For applications involving transparency, the 16 bits may be divided into five bits each of
red, green, and available: this means that each 24-bit pixel has an extra 8 bits to describe its blue,
with one bit left for transparency. A 24-bit depth allows 8 bits per component. On some systems,
32-bit depth is opacity (for purposes of combining with another image). Selected standard
display resolutions include:
10
The operator T applied on f (x, y) may be defined over:
(i) A single pixel (x, y) . In this case T is a grey level transformation (or mapping)
function.
(ii) Some neighborhood of (x, y) .
(iii) T may operate to a set of input images instead of a single image
The result of the transformation shown in the figure below is to produce an image of
higher contrast than the original, by darkening the levels below m and brightening the
levels above m in the original image. This technique is known as contrast stretching.
Contrast stretching reduces an image of higher contrast than the original by
darkening the levels below m and brightening the levels above m in the image.
Thresholding ( 5 mark)
Thresholding is a very popular segmentation technique, used for separating an object
from its background. In the article below, I have described various techniques used to
threshold gray scale images(8-bit).
The process of thresholding involves, comparing each pixel value of the image (pixel intensity)
to a specified threshold. This divides all the pixels of the input image into 2 groups:
1. Pixels having intensity value lower than threshold.
2. Pixels having intensity value greater than threshold.
These 2 groups are now given different values, depending on various segmentation types.
OpenCV supports 5 different thresholding schemes on Grayscale(8-bit) images using the
function :
Double threshold(InputArray src, OutputArray dst, double thresh, double maxval, int type)
Parameters:
InputArray src: Input Image (Mat, 8-bit or 32-bit)
OutputArray dst: Output Image ( same size as input)
double thresh: Set threshold value
double maxval: maxVal, used in type 1 and 2
int type* :Specifies the type of threshold to be use. (0-4)
*Below a list of thresholding types is given.
Input Image
The input RGB image is first converted to a grayscale image before thresholding is done.
Thresholding types
Binary Threshold(int type=0)
diagram
Of the two groups obtained earlier, the group having members with pixel intensity, greater
than the set threshold, are assignment “Max_Value”, or in case of a gray scale, a value of
11
255 (white).
The members of the remaining group have their pixel intensities set to 0 (black).
If the pixel intensity value at (x, y) in source image, is greater than threshold, the value in
final image is set to “maxVal”.
Inverted Binary Threshold(int type=1)
Inv. Binary threshold is the same as Binary threshold. The only essential difference being,
in Inv.Binary thresholding, the group having pixel intensities greater than set threshold,
gets assigned ‘0’, whereas the remaining pixels having intensities, less than the threshold,
are set to “maxVal”.
If the pixel intensity value at (x, y) in source image, is greater than threshold, the value in
final image is set to “0”, else it is set to “maxVal”
Lossless image data storage is used by many PACS systems and removes any doubt
regarding potential loss in diagnostic quality of medical images. Lossless compression
involves removing redundant data, and a commonly known example file is the Graphics
Interchange Format. The degree of compression is limited, however, and is typically in
the range of 1.5:1 to 3:1.
Lossy image compression methods remove potentially relevant pixel data from image
files. The compression ratio improves to greater than 10:1. Joint Photographic Experts
Group (JPEG) uses lossy image compression and is widely applied in digital
photography, with minimal impact on image quality. JPEG is also supported by the
DICOM standard. Studies have shown that some lossy image compression techniques can
be used effectively in medical imaging without an impact on diagnostic relevance.
Lossy image compression that does not affect a particular diagnostic task is referred to as
diagnostically acceptable irreversible compression. The ACR does not have a general
advisory statement on the type or amount of irreversible compression to be used to
achieve diagnostically acceptable irreversible compression, and only methods defined
and supported by the DICOM standard should be used, such as JPEG, JPEG-2000, or
Moving Picture Experts Group. In addition, the US Food and Drug Administration (FDA)
requires that images with lossy compression are labeled, including the compression ratio
12
and method used. The FDA prohibits the use of lossy compression of digital
mammograms for interpretation, although lossy compression can be used for prior
comparison studies.
Compression
Data and image compression facilitates transmission and storage of retinal images. The
time needed for transmission can also be dramatically reduced by image compression.
Compression may be used if algorithms have undergone clinical validation. Image data
can be compressed using a variety of standards, including JPEG, JPEG Lossless, JPEG
2000, and Run-length encoding (RLE). The International Standards Organization
(ISO/IEC JTC1/SC2/WG10) has prepared an International Standard, ISO/IS-15444–1
(JPEG 2000 Part 1), for the digital compression and coding of continuous-tone still
images. This standard is known as the JPEG 2000 Standard. Digital Imaging and
Communication in Medicine (DICOM) recognizes JPEG and JPEG 2000 for lossy
compression of medical images.18 ATA recommends that the compression types and
ratios should be periodically reviewed to ensure appropriate clinical image quality and
diagnostic accuracy. Some studies have attempted to look at the effect of various levels
of compression on the quality of the image with both subjective and objective
parameters.19,20 The level of acceptable compression ranges from 1 : 28 to 1 : 52.19,20
There are satellites that use 3D image acquisition techniques in order to build models of
different surfaces.
There are satellites that use 3D image acquisition techniques in order to build models of
different surfaces.
Depending on the field of work, a major factor involved in image acquisition in image
processing sometimes is the initial setup and long-term maintenance of the hardware used
to capture the images. The actual hardware device can be anything from a desktop
scanner to a massive optical telescope. If the hardware is not properly configured and
aligned, then visual artifacts can be produced that can complicate the image processing.
Improperly setup hardware also may provide images that are of such low quality that they
cannot be salvaged even with extensive processing. All of these elements are vital to
13
certain areas, such as comparative image processing, which looks for specific differences
between image sets.
One of the forms of image acquisition in image processing is known as real-time image
acquisition. This usually involves retrieving images from a source that is automatically
capturing images. Real-time image acquisition creates a stream of files that can be
automatically processed, queued for later work, or stitched into a single media format.
One common technology that is used with real-time image processing is known as
background image acquisition, which describes both software and hardware that can
quickly preserve the images flooding into a system.
There are some advanced methods of image acquisition in image processing that actually
use customized hardware. Three-dimensional (3D) image acquisition is one of these
methods. This can require the use of two or more cameras that have been aligned at
precisely describes points around a target, forming a sequence of images that can be
aligned to create a 3D or stereoscopic scene, or to measure distances. Some satellites use
3D image acquisition techniques to build accurate models of different surfaces.
14
• An ideal low pass filter with cutoff frequency 0
Note that the origin (0, 0) is at the center and not the corner of the image (recall the
“fftshift” operation).
• The abrupt transition from 1 to 0 of the transfer function H (u,v) cannot be realized in
practice, using electronic components. However, it can be simulated on a computer. Ideal
LPF with r0 = 57 Ideal LPF examples
• Notice the severe ringing effect in the blurred images, which is a characteristic of ideal
filters. It is due to the discontinuity in the filter transfer function. Original Image LPF
image, r0 = 57 LPF image, 36 r0 = LPF image, 26 r0 = Choice of cutoff frequency in
ideal LPF
• The cutoff frequency 0 r of the ideal LPF determines the amount of frequency
components passed by the filter.
• Smaller the value of 0 r , more the number of image components eliminated by the
filter.
• In general, the value of 0 r is chosen such that most components of interest are passed
through, while most components not of interest are eliminated.
• Usually, this is a set of conflicting requirements. We will see some details of this is
image restoration
• A useful way to establish a set of standard cut-off frequencies is to compute circles
which enclose a specified fraction of the total image power.
• Suppose − = − = = 1 0 1 0 ( , ) N v M u TP P u v , where 2 P(u,v) = F(u,v) , is the total
image power.
• Consider a circle of radius ( ) r0 α as a cutoff frequency with respect to a threshold α
such that T v u ∑∑P(u,v) = αP .
• We can then fix a threshold α and obtain an appropriate cutoff frequency ( ) r0 α
HISTOGRAM PROCESSING:
The histogram of a digital image with gray levels in the range [0, L-1] is a
discrete function of the form
H(rk)=nk
where rk is the kth gray level and nk is the number of pixels in the image
having the level rk. A normalized histogram is given by the equation
P(rk) gives the estimate of the probability of occurrence of gray level rk. The
sum of all components of a normalized histogram is equal to 1.
The histogram plots are simple plots of H(rk)=nk versus rk.
In the dark image the components of the histogram are concentrated on
the low (dark) side of the gray scale. In case of bright image, the histogram
components are biased towards the high side of the gray scale. The
histogram of a low contrast image will be narrow and will be centered
towards the middle of the gray scale.
The components of the histogram in the high contrast image cover a broad
15
range of the gray scale. The net effect of this will be an image that shows a
great deal of gray levels details and has high dynamic range.
In a graph, the horizontal axis of the graph is used to represent tonal variations whereas
the vertical axis is used to represent the number of pixels in that particular pixel. Black
and dark areas are represented in the left side of the horizontal axis, medium grey color
is represented in the middle, and the vertical axis represents the size of the area.
16
Applications of Histograms
1. In digital image processing, histograms are used for simple calculations in
software.
2. It is used to analyze an image. Properties of an image can be predicted by the
detailed study of the histogram.
3. The brightness of the image can be adjusted by having the details of its
histogram.
4. The contrast of the image can be adjusted according to the need by having
details of the x-axis of a histogram.
5. It is used for image equalization. Gray level intensities are expanded along the x-
axis to produce a high contrast image.
6. Histograms are used in thresholding as it improves the appearance of the image.
7. If we have input and output histogram of an image, we can determine which type
of transformation is applied in the algorithm.
17
Histogram Stretching
In histogram stretching, contrast of an image is increased. The contrast of an image is
defined between the maximum and minimum value of pixel intensity.
00:00/01:27
If we want to increase the contrast of an image, histogram of that image will be fully
stretched and covered the dynamic range of the histogram.
From histogram of an image, we can check that the image has low or high contrast.
18
Histogram Equalization
Histogram equalization is used for equalizing all the pixel values of an image.
Transformation is done in such a way that uniform flattened histogram is produced.
Histogram equalization increases the dynamic range of pixel values and makes an equal
count of pixels at each level which produces a flat histogram with high contrast image.
While stretching histogram, the shape of histogram remains the same whereas in
Histogram equalization, the shape of histogram changes and it generates only one
image.
19
20
Discrete Fourier Transform (10 mark)
Working with the Fourier transform on a computer usually involves a form of the
transform known as the discrete Fourier transform (DFT). A discrete transform is a
transform whose input and output values are discrete samples, making it convenient
for computer manipulation. There are two principal reasons for using this form of the
transform:
The input and output of the DFT are both discrete, which makes it convenient
for computer manipulations.
There is a fast algorithm for computing the DFT known as the fast Fourier
transform (FFT).
The DFT is usually defined for a discrete function that is nonzero only over
the finite region and . The two-dimensional M-by-
N DFT and inverse M-by-N DFT relationships are given by
21
Relationship to the Fourier Transform
Example
o
2. Compute and visualize the 30-by-30 DFT of f with these commands.
o F = fft2(f);
o F2 = log(abs(F));
o imshow(F2,[-1 5],'notruesize'); colormap(jet); colorbar
o
o
o
22
o
o
Entropy
When we observe the possibilities of the occurrence of an event, how surprising or
uncertain it would be, it means that we are trying to have an idea on the average
content of the information from the source of the event.
Entropy can be defined as a measure of the average information content per source
symbol. Claude Shannon, the “father of the Information Theory”, provided a formula for
it as −
23
H=−∑ipilogbpiH=−∑ipilogbpi
Mutual Information
Let us consider a channel whose output is Y and input is X
Let the entropy for prior uncertainty be X = Hxx
Thisisassumedbeforetheinputisapplied Thisisassumedbeforetheinputisapplied
To know about the uncertainty of the output, after the input is applied, let us consider
Conditional Entropy, given that Y = yk
H(x∣yk)=∑j=0j−1p(xj∣yk)log2[1p(xj∣yk)]H(x∣yk)=∑j=0j−1p(xj∣yk)log2[1p(xj∣yk)]
=∑k=0k−1∑j=0j−1p(xj∣yk)p(yk)log2[1p(xj∣yk)]=∑k=0k−1∑j=0j−1p(xj∣yk)p(yk)log
2[1p(xj∣yk)]
=∑k=0k−1∑j=0j−1p(xj,yk)log2[1p(xj∣yk)]=∑k=0k−1∑j=0j−1p(xj,yk)log2[1p(xj∣yk)]
Now, considering both the uncertainty
conditions beforeandafterapplyingtheinputsbeforeandafterapplyingtheinputs, we
come to know that the difference, i.e. H(x)−H(x∣y)H(x)−H(x∣y) must represent the
uncertainty about the channel input that is resolved by observing the channel output.
This is called as the Mutual Information of the channel.
Denoting the Mutual Information as I(x;y)I(x;y), we can write the whole thing in an
equation, as follows
I(x;y)=H(x)−H(x∣y)I(x;y)=H(x)−H(x∣y)
24
Hence, this is the equational representation of Mutual Information.
H(x,y)=∑j=0j−1∑k=0k−1p(xj,yk)log2(1p(xi,yk)
25
26