Module 5 Notes
Module 5 Notes
Module 5
Image Segmentation
Introduction
Segmentation is the process of partitioning a digital image into multiple regions and extracting
meaningful region known as the region of interest (ROI).Regions of interest vary with
applications. For example, if the goal of a doctor is to analyse the tumour in a computer
tomography (CT) image, then the tumour in the image is the ROI.
If the image application aims to recognize the iris in an eye image, then the iris in the eye image
is the required ROI. Segmentation of ROI in real-world images is the first major hurdle for
effective implementation of image processing applications as the segmentation process is often
difficult. Hence, the success or failure of the extraction of ROI ultimately influences the success
of image processing applications. No single universal segmentation algorithm exists for
segmenting the ROI in all images. Therefore, user has to try many segmentation algorithms and
pick an algorithm that performs the best for the given requirement.
Image segmentation algorithms are based on either discontinuity principle or similarity principle.
1) Discontinuity Principle
The idea behind discontinuity principle is to extract regions that differ in properties such as
intensity, colour, texture, or any other image statistics. Mostly, abrupt changes in intensity among
the regions result in extraction of edges.
2) Similarity Principle
The idea behind similarity principle is to group pixels based on a common property, to extract a
coherent region.
An image can be partitioned into many regions. R1,R2,R3,…..Rn . For example, the image R in
Figure 5.1 (a) is divided into three subregions R1 ,R2 and R3 as shown in Figure 5.1(b) and
Department of CSE Page 1
Computer Graphics and Fundamentals of Image Processing (21CS63)
Figure 5.1(c). A subregion or sub-image is a portion of the whole region R. The identified
subregions should exhibit characteristics such as uniformity and homogeneity with respect to
colour, texture, intensity, or any other statistical property. In addition, the boundaries that separate
the regions should be simple and clear.
20 20 10 10 10 20 20 10 10 10
20 20 10 10 10 20 20 10 10 10
20 20 10 10 10 20 20 10 10 10
15 15 10 10 10 15 15 10 10 10
15 15 10 10 10 15 15 10 10 10
(a) R1 (b)
R3
R2
(c)
Figure 5.1: Image Segmentation (a) Original Image (b) Pixels that form a region (c) Image
with three regions
1) If the subregions are combined the original region can be obtained. Mathematically, it can be
stated that U Ri =R for i=1,2,…….n. For example, if the three regions of Figure 5.1(c) R1, R2
and R3 are combined, the whole region R is obtained.
2) The subregions Ri should be connected. In other words, the region cannot be open ended during
the tracing process.
3) The regions R1,R2,….Rn do not share any common property. Mathematically, it can be stated
as Ri Rj = for all i and j where ij. Otherwise, there is no justification for the region to exist
separately.
4) Each region satisfies a predicate or a set of predicates such as intensity or other image statistics
that is, the predicate(P) can be colour, grey scale value, texture, or any other image statistic.
Mathematically, this is stated as P(Ri)=True.
There are different ways of classifying the segmentation algorithms. Figure 5.2 illustrates these
ways. One way is to classify the algorithms based on user interaction required for extracting the
ROI. Another way is to classify them based on the pixel relationships.
Based on user interaction, the segmentation algorithms can be classified into the following three
categories.
1. Manual
2. Semi automatic
3. Automatic
1. Manual
In the manual method, the object of interest is observed by an expert who traces its ROI boundaries
as well, with the help of software. Hence, the decisions related to segmentation are
made by the human observers. Many software systems assist experts in tracing the boundaries and
extracting them. By using the software systems, the experts outline the object.
The outline can be either an open or closed contour. Some software systems provide additional
help by connecting the open tracings automatically to give a closed region. These closed outlines
are then converted into a series of control points. These control points are then connected by spline.
The advantage of the control points is that even if there is a displacement, the software system
ensures that they are always connected. Finally, the software provides help to the user in extracting
the closed regions.
Boundary tracing is a subjective process and hence variations exist among opinions of different
experts in the field, leading to problems in reproducing the same results. In addition, a manual
method of extraction is time consuming, highly subjective, prone to human error, and has poor
intra-observer reproducibility.
2. Automatic
Automatic segmentation algorithms are a preferred choice as they segment the structure of objects
without any human intervention. Automatic segmentation are preferred if the tasks need to be
carried out for a large number of images.
3. Semi-Automatic
Region growing techniques are semi-automatic algorithms where the initial seeds are given by
the human observer in the region that needs to be segmented. The program process is automatic.
These algorithms can be called assisted manual segmentation algorithms.
Segmentation algorithms can be classified on the criterion of the pixel similarity relationships with
neighbouring pixels. The similarity relationships can be based on colour, texture, brightness, or
any other image statistic.
Contextual algorithms group pixels together based on common properties by exploiting the
relationships that exist among the pixels. These are also known as region-based or global
algorithms. In region-based algorithms, the pixels are grouped based on some sort of similarity
that exists between them.
2. Non-Contextual Algorithms
Non-contextual algorithms are also known as pixel-based or local algorithms. These algorithms
ignore the relationship that exists between the pixels or features. The idea is to identify the
discontinuities that are present in the image such as isolated lines and edges. These are then simply
grouped into a region based on some global-level property. Intensity-based thresholding is a
good example of this method.
Detection of Discontinuities
The three basic types of grey level discontinuities in a digital image are the following:
1. Points
2. Lines
3. Edges
1. Point Detection
An isolated point is a point whose grey level is significantly different from its background in a
homogeneous area.
z1 z2 z3
z4 z5 z6
z7 z8 z9
The mask is superimposed onto an image and the convolution process is applied. The response of
the mask is given as
R= ∑𝟗𝒌=𝟏 𝒛kfk
where the fk values are the grey level values of the pixels associated with the image. A threshold
value T is used to identify the points. A point is said to be detected at the location on which the
mask is centered if |R| T, where T is a non-negative integer. The mask values of a point detection
mask are shown in Figure 5.4.
1 1 1
1 -8 1
1 1 1
2. Line Detection
In line detection, four types of masks are used to get the responses, that is R1,R2,R3 and R4 for the
directions vertical, horizontal +450, -450 , respectively. The masks are shown in Figure 5.5(a).
These masks are applied to the image. The response of the mask is given as
Rk =∑ 𝟒𝒌=𝟏 𝒛kfk
R1 is the response for moving the mask from the left to the right of the image. R2 is the response
for moving the mask from top to the bottom of the image. R3 is the response of the mask along the
+450 line and R4 is the response of the mask with respect to a line of -450 .
Suppose at a certain line on the image |Ri| |Rj| j i , then that line is more likely to be associated
with the orientation of the mask. The final maximum response is defined by
−1 −1 −1
M1 = [ 2 2 2]
−1 −1 −1
−1 2 −1
M2 = [−1 2 −1]
−1 2 −1
−1 −1 2
M3 = [−1 2 −1]
2 −1 −1
2 −1 −1
M4 = [−1 2 −1]
−1 −1 2
Figure 5.5 Line Detection (a) Mask for line detection
Edge Detection
Edges plays a very important role in many image processing applications. They provide an outline
of the object. In the physical plane, edges correspond to the discontinuities in the depth, surface
orientation, change in material properties, and light variations. These variations are present in the
image as grey scale discontinuities. An edge is a set of connected pixels that lies on the
boundary between two regions that differ in grey value. The pixels on an edge are called edge
points. A reasonable definition of an edge requires the ability to measure grey level transitions in
meaningful manner. Most edges are unique in space, that is, their position and orientation remain
the same in space when viewed from different points. When an edge is detected, the unnecessary
details are removed, while only the important structural information is retained. In short, an edge
is a local concept which represents only significant intensity transitions.
An original image and its edges are shown in Figure 5.6 (a) and 5.6 (b) respectively.
Figure 5.6 : Edge Detection (a) Original Image (b) Extracted edges
An edge is typically extracted by computing the derivative of the image function. This consists
of two parts –
Some of the edges that are normally encountered in image processing are as follows:
1. Step Edge
2. Ramp Edge
3. Spike Edge
4. Roof Edge
Step edge is an abrupt intensity change. Ramp edge represents a gradual change in intensity. Spike
edge represents a quick change and immediately returns to the original intensity level. Roof edge
is not instantaneous over a short distance.
1) Filtering
2) Differentiation
3) Localization
1. Filtering
It is better to filter the input image to get maximum performance for the edge detectors. This
stage may be performed either explicitly or implicitly. It involves smoothing, where the noise is
suppressed without affecting the true edges. In addition, this phase uses a filter to enhance the
quality of the edges in the image. Normally, Gaussian filters are used as they are proven to be very
effective for real-time images.
2. Differentiation
This phase distinguishes the edge pixels from other pixels. The idea of edge detection is to find
the difference between two neighbourhood pixels. If the pixels have the same value, the difference
is zero. This means that there is no transition between the pixels. The non zero difference indicates
the presence of an edge point. A point is defined as an edge point (or edge) if its first derivative is
greater than the user-specified threshold and encounters a sign change (zero crossing) in the second
derivative.
𝛛𝐱 =𝐥𝐢𝐦 𝒇(𝒙)−𝒇(𝒙−∆𝒙)
𝛛𝐲 ∆𝒙→𝟎 ∆𝒙
Images are discrete. Hence, x should be discrete and should be at least 1. Therefore, the gradient
vector x (called grad of x) should be equal to f(x) – f(x-1). If the intensities are same, the
derivative is 0. In the case of second derivatives, the zero-crossings indicate the presence of edges.
Example 5.1:
Consider a one dimensional image f(x)=60 60 60 100 100 100 . What are the first and second
derivatives?
Solution:
The first derivative is f(x+1)- f(x). Therefore, the first derivative of the function is given as
0 0 40 0 0
The number 40 is due to the difference (100-60=40) Remaining values are all zeros. The second
derivative is the difference in values of the first derivative. This is given as
0 40 -40 0
The number 40 is due to difference (40-0) and -40 is due to the difference (0-40). Remaining values
are all zeros.
The highest magnitude 40 shows the presence of an edge in the first derivative. It can be observed
that the sign change in the second derivative represents the edge. This sign change is important
and is called zero crossing.
Images are two dimensional. Hence, the gradient vector of f(x,y) is also two-dimensional. The
gradient of an image f(x,y) at location (x,y) is a vector that consists of the partial derivatives of
f(x,y) is as follows:
𝜕𝑓 (𝑥 ,𝑦)
𝜕𝑥
f(x,y) = [ 𝜕𝑓 (𝑥 ,𝑦) ]
𝜕𝑦
or
𝒈𝒙
f(x,y) =[ 𝒈𝒚]
where
𝝏𝒇(𝒙,𝒚) 𝝏𝒇(𝒙,𝒚)
gx = [ ] and gy = [ ]
𝝏𝒙 𝝏𝒚
f(x,y) |gx|+|gy|
or
= tan-1 (𝒈𝒚)
𝒈𝒙
3. Localization
In this stage, the detected edges are localized. The localization process involves determining the
exact location of the edge. In addition, this stage involves edge thinning and edge linking steps to
ensure that the edge is sharp and connected. The sharp and connected edges are then displayed.
The prerequisite for the localization stage is normalization of the gradient magnitude. The
calculated gradient can be scaled to a specific range say, 0-K by performing this operation.
For example, the value of constant K may be an integer say 100. N(x,y) is called the normalized
edge image and is given as
N(x,y) = 𝑮(𝒙,𝒚) K
𝐦𝐚𝐱𝒊=𝟏,…..𝒏,𝒋=𝟏,….𝒏 𝑮(𝒊,𝒋)
The normalized magnitude can be compared with a threshold value T to generate the edge map.
The edge map is then displayed or stored for further image processing operations.
The edge detection process is implemented in all kinds of edge detectors. In image processing,
four types of edge detection operators are available. They are
Derivative filters use the differentiation technique to detect the edges. Template matching filters
uses templates that resemble the target shapes and match with the image. Gradient operations are
Department of CSE Page 12
Computer Graphics and Fundamentals of Image Processing (21CS63)
isotropic in nature as they detect edges in all directions. Hence, template matching filters are used
to perform directional smoothing as they are very sensitive to directions. If there is a match
between the target shape or directions and the masks, then a maximum gradient value is produced.
By rotating the template in all eight directions, masks that are sensitive in all directions, called
compass masks are produced. Point detection and line detection masks are good examples of
template matching filters. Gaussian derivatives are very effective for real-time images and are used
along with the derivative filters. Pattern filter is another approach, where a surface is considered
as a topographic surface, with the pixel value representing altitude. The aim is to fit a pattern over
a neighbourhood of a pixel where the edge strength is calculated. The properties of the edge points
are calculated based on the parameters.
= [𝜕𝑥
𝜕]
Applying this to the image f, one gets 𝜕𝑦
𝜕𝑓
f = [𝜕𝑓
𝜕𝑥 ]
𝜕𝑦
The differences between the pixels are quantified by the gradient magnitude. The direction of the
greatest change is given by the gradient vector. This gives the direction of the edge. Since the
gradient functions are continuous functions, the discrete versions of continuous functions can be
used. This can be done by finding the differences. The approaches in 1D are as follows: x and
y are the movements in x and y direction respectively.
𝐟(𝐱)−𝐟(𝐱−∆𝐱)
Backward difference =
∆𝐱
𝐟(𝐱+𝐱)−𝐟(𝐱)
Forward difference =
∆𝐱
𝐟(𝐱+𝐱)−𝐟(𝐱−𝐱)
Central difference =
𝟐∆𝐱
These differences can be obtained by applying the following masks, assuming x= 1:
𝝏𝒇
gx = and gy = 𝝏𝒇
𝝏𝒙 𝝏𝒚
Let f(x,y) and f(x+1,y) be neighbouring pixels. The difference between the adjacent pixels is
obtained by applying the mask [1 -1] directly to the image to get the difference between the pixels.
This is defined mathematically as
𝝏𝒇 = f(x+1, y) – f(x, y)
𝝏𝒙
Roberts kernels are derivatives with respect to the diagonal elements. Hence, they are called cross-
gradient operators. They are based on the cross diagonal differences. The approximation of Roberts
operator can be mathematically given as
𝜕𝑓
gx = = (z9 - z5)
𝜕𝑥
𝜕𝑓
gy = =(z8 - z6)
𝜕𝑦
−𝟏 𝟎 𝟎 −𝟏
gx = [ ] and gy = [ ]
𝟎 𝟏 𝟏 𝟎
Magnitude of this vector can be calculated as
Since the magnitude calculation involves square root operation, the common practice is to
approximate the gradient with absolute values that are simpler to implement as
f(x,y) | gx | + | gy |
The generic gradient-based algorithm can be given as
Prewitt Operator
The prewitt method takes the central difference of the neighbouring pixels; this difference can be
represented mathematically as
𝝏𝒇 = f(x+1)-f(x-1)/2
𝝏𝒙
f(x+1,y) - f(x-1,y)/2
The central difference can be obtained using the mask [-1 0 +1]. This method is very sensitive to
noise. Hence to avoid noise, the Prewitt method does some averaging. The Prewitt approximation
using a 3 3 mask is as follows:
This approximation is known as the Prewitt operator. Its masks are as follows:
−𝟏 −𝟏 −𝟏 −𝟏 𝟎 𝟏
Mx = [ 𝟎 𝟎 𝟎 ] and My = [−𝟏 𝟎 𝟏]
𝟏 𝟏 𝟏 −𝟏 𝟎 𝟏
Sobel Operator
The sobel operator also relies on central differences. This can be viewed as an approximation of
the first Gaussian derivative. This is equivalent to the first derivative of the Gaussian blurring
image obtained by applying a 3 3 mask to the image. Convolution is both commutative and
associative and is given as
𝝏 (f *G) = f * 𝝏 G
𝝏𝒙 𝝏𝒙
−1 −2 −1 −1 0 1
Mx = [ 0 0 0 ] and My = [−2 0 2]
1 2 1 −1 0 1
An additional mask can be used to detect the edges in the diagonal direction.
0 1 2 −2 −1 0
Mx = [−1 0 1] and My = [−1 0 1]
−2 −1 0 0 1 2
The edge mask can be extended to 55,77, etc.An extended mask always gives a better
performance. An original image and the result of applying the Roberts, Sobel and Prewitt masks
are shown in Figure 5.10 (a)-5.10 (d). It can be observed that the results of the masks vary.
Figure 5.10: Edge detection using first-order operators (a) Original image (b) Roberts edge
detection (c) Prewitts edge detection (d) Sobel edge detection
Gradient masks are isotropic and insensitive to directions. Sometimes it is necessary to design
direction sensitive filters. Such filters are called template matching filters. Some template
matching masks are
1. Kirsch Masks
2. Robinson compass mask
3. Frei-Chen Masks
1) Kirsch Masks
Kirsch masks are called compass masks because they are obtained by taking one mask and rotating
it to the eight major directions: north, north west, west, south west, south, south-east, east and
north-east. The respective masks are
−3 −3 5 −3 5 5 5 5 5 5 5 −3
5 −3 −3 −3 −3 −3 −3 −3 −3 −3 −3 −3
Each mask is applied to the image and the convolution process is carried out. The magnitude of
the final edge is the maximum value of all the eight masks. The edge direction is the direction
associated with the mask that produces maximum magnitude.
The spatial masks for the Robinson edge operator for all the directions are as follows:
−1 0 1 0 1 2 1 2 1
3) Frei-Chen Masks
Any image can be considered as the weighted sum of the nine Frei-Chen masks. The weights are
obtained by a process called projecting process by overlaying a 3 3 image onto each mask and
by summing the multiplication of coincident terms. The first four masks represent the edge space,
the next four represent the line subspace, and the last one represents the average subspace. The
Figure 5.11 (a) – 5.11(d) show an original images obtained by using Kirsch, Robinson compass, and
Frei-Chen masks respectively.
Figure 5.11 : Template matching masks (a) Original image (b) Image obtained using Krisch
mask
Figure 5.11 : (c) Image obtained using Robinson compass mask (d) Image obtained using
Frei-Chen mask
Edges are considered to be present in the first derivative when the edge magnitude is large
compared to the threshold value. In the case of the second derivative, the edge pixel is present at
the location where the second derivative is zero. This is equivalent to saying that f (x) has a zero-
crossing which can be observed as a sign change in pixel differences. The Laplacian algorithm is
one such zero-crossing algorithm.
The problem with Laplacian masks is that they are sensitive to noise as there is no magnitude
checking - even a small ripple causes the method to generate an edge point. Therefore, it is
necessary to filter the image before the edge detection process is applied. This method produces
two-pixel thick edges, although generally, one-pixel thick edges are preferred. However, the
advantage is that there is no need for the edge thinning process as the zero-crossings themselves
specify the location of the edge points. The main advantage is that these operators are rotationally
invariant.
The 2 operator is called Laplacian operator. The laplacian of the 2D function f(x,y) is defined
as
Since, the gradient is a vector, two orthogonal filters are required. However, since the Laplacian
operator is a scalar, a single mask is sufficient for the edge detection process. The Laplacian
estimate is given as
Similarly,
Laplacian masks are shown in figure 5.12 (a) -5.12 (d). The mask shown in Figure 5.12 (a) is
sensitive to horizontal and vertical edges. It can be observed that the sum of the elements amounts
to zero. To recognize the diagonal edges, the mask shown in Figure 5.12 (b) is used. This mask is
obtained by rotating the mask of Figure 5.12 (a) by 450. The addition of these two kernel results in
a variant of the Laplacian mask shown in Figure 5.12 (c). Two times of the mask shown in Figure
5.12 (a), when subtracted from the mask shown in Figure 5.12 (b), yields another variant mask as
shown in Figure 5.12 (d).
Figure 5.12 : Different Laplacian masks (a) Laplacian filter (b) 450 rotated mask
(c) Variant 1 (d) Variant 2
The Laplacian operations are seldom used in practice because they produce double edges and are
extremely sensitive to noise. However, the idea of zero-crossing is useful if it is combined with a
smoothing signal to minimize the sensitivity to noise. The result of applying the Laplacian method
on Figure 5.13 (a) is shown in Figure 5.13 (b)
Figure 5.13: Laplacian method (a) Original image (b) Result of applying Laplacian mask
To minimize the noise susceptibility of the Laplacian operator, the Laplacian of Gaussian(LoG)
operator is often preferred. As a first step, the given image is blurred using the Gaussian operator
Department of CSE Page 22
Computer Graphics and Fundamentals of Image Processing (21CS63)
and then the Laplacian operator is used. The Gaussian function reduces the noise and hence the
Laplacian minimizes the detection of false edges.
To suppress the noise, the image is convolved with the Gaussian smoothing function before using
the Laplacian for edge detection.
Similarly,
As increases, wider convolution masks are required for better performance of the edge operator.
A sample of an image and its result after application of the LoG operator are shown in figure
5.14(a) and 5.14(b), respectively.
Figure 5.14 : Laplacian of Gaussian operator (a) Original image (b) Result of applying LoG
operator
Combined Detection
The information obtained from different orthogonal operators can be combined to get better results.
One method is to combine the first and second-order derivatives. This can be achieved by
implementing the idea of scale space. Different types of Gaussian kernels of various sigma values
can be used to capture information of the image at different levels and these information can be
combined to get the edge map.
The LoG filter can be approximated by taking two differently sized Gaussians. The Difference of
Gaussians (DoG) filter is given as
The DoG expressed as the difference between these two Gaussian kernels :
So the given image has to be convolved with a mask that is obtained by subtracting two Gaussian
masks with two different values. If 1/2 value is between 1 and 2, the edge detection operator
yields a good performance.
The Canny approach is based on optimizing the trade-off between two performance criteria and
can be described as follows:
1) Good Edge Detection – The algorithm should detect only the real edge points and
discard all false edge points.
2) Good Edge Localization - The algorithm should have the ability to produce edge points
that are closer to the real edges.
3) Only one response to each edge – The algorithm should not produce any false, double
or spurious edges.
1) First convolve the image with the Gaussian filter. Compute the gradient of the resultant
smooth image. Store the edge magnitude and edge orientation separately in two arrays
M(x,y) and (x,y) respectively.
2) The next step is to thin the edges. This is done using a process called non-maxima
suppression. Examining every edge point orientation is a computationally intensive task.
To avoid such intense computations, the gradient direction is reduced to just four sectors.
The range of 0-3600 is divided into eight equal portions. Two equal portions are designated
as one sector. Therefore there will be four sectors. The gradient direction of the edge point
is first approximated to one of these sectors. After the sector is finalized, let us assume a
point of M(x,y). The edge magnitudes M(x1,y1) and M(x2,y2), of two neighbouring pixels
that fall on the same gradient direction, are considered. If the magnitude of the point M(x,y)
is less than the magnitude of the points (x1,y1) or (x2,y2), then the value is suppressed. That
is, the value is set to zero; otherwise the value is retained.
3) Apply hysteresis thresholding. The idea behind hysteresis thresholding is that only a large
amount of change in the gradient magnitude matters in edge detection and small changes
do not affect the quality of edge detection. This method uses two thresholds, t0
and t1. If the gradient magnitude is greater than the value t1, it is considered as a definite
edge point and is accepted. If the gradient magnitude is less than t0, it is considered as a
weak edge point and removed. If the edge gradient is between t0 and t1, it is considered as
either weak or strong based on the context. This is implemented by creating two images
using two thresholds t0 and t1. Low threshold creates a situation where noisier edge points
are accepted. A high value of the threshold removes many potential edge points. So this
process first thresholds the image with low and high thresholds to create two separate
images. The image containing the high threshold image will contain edges, but gaps will
be present. So the image created using low threshold is consulted and its 8-neighbours are
examined. So the gaps of the high threshold image are bridged using the edge points of the
low threshold image. This process thus ensures that the edges are linked properly to
generate a perfect contour of the image.
Figure 5.15: Canny edge detection (a) Original image (b) Canny edge detection at =1