Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Image Enhancement

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

• Image Enhancement

Image enhancement is carried out to improve the appearance of certain image features to assist in
human interpretation and analysis. You should note that image enhancement is different from
image preprocessing step. Image enhancement step highlights image features for interpreter
whereas image preprocessing step reconstructs a relatively better image from an originally
imperfect/degraded image.
• Thematic Information Extraction
It includes all the processes used for extracting thematic information from images. Image
classification is one such process which categorises pixels in an image into some thematic classes
such as land cover classes based on spectral signatures. Image classification procedures are further
categorised into supervised, unsupervised and hybrid depending upon the evel of human
intervention in the process of classification.

IMAGE HISTOGRAM AND ITS SIGNIFICANCE


Most commonly, remote sensing images are of 8 bits hence, the range of DNs varies between 0 and
255. If you tabulate frequencies of occurrence of each DN in an image, it can be graphically
presented in a histogram wherein range of DNs is presented on abscissa and frequency of
occurrence of each DN on the ordinate as shown in Fig. 10.9. In the context of remote sensing,
histogram can be defined as a plot of the number of pixels at each pixel value within each spectral
band.
Histograms in general are frequency distribution describing frequency of DNs occurring in an image.
Histogram basically is a graph that represents maximum range of DNs that a remote sensor captures
in 256 steps (0 = pure black and 255 = pure white) for an 8 bit data. Histogram provides a convenient
summary of brightness (pixel values) in an image. It is used to depict image statistics in an easily
interpreted visual format.

Fig. 10.9

Histogram, shown in Fig 10.9, looks like a mountain peak and its highest bar is a representative of
maximum concentration of a particular pixel value. Left to right direction of the histogram is related
to the darkness (minimum value lies at left side) and lightness (maximum value lies at right side) of
image, respectively, while up and down directions of histogram (valleys and peaks) correspond to
brightness (or colour in multispectral image) information. If an image is too dark, histogram will
show higher concentration in the left side and if the image is too bright its histogram will show
higher concentration in the right side.
Now, you have implicit that for a low contrast image, histogram will not be spread equally, i.e.
histogram will be narrow and tall covering a short range of pixel values. For a high contrast image,
histogram will have an equal spread of pixel values and produce short and flat (wide) histograms
covering a wide range of pixel values. Significance of a histogram lies in the fact that it provides an
insight about image contrast and brightness and also about the quality of the image.

1
Image enhancement simply refers to the techniques used to increase the human interpretability and
ability to derive meaningful information from remotely sensed images. It improves the quality and
clarity of images. The enhancement operations include removing blurring and noise, increasing
contrast and revealing details. The primary goal of image enhancement is to improve the visual
interpretability of the images. This is done by increasing the apparent distinction between features
in the image. Image enhancement involves use of a number of statistical and image manipulation
functions available in an image processing software. Image features are enhanced by the following
two operations:
• Point Operations: In this operation, value of pixel is enhanced independent of characteristics of
neighbourhood pixels within each band.
• Local (Neighbourhood) Operations: In this operation, value of pixel is enhanced based on
neighbouring brightness values.
Image enhancement techniques based on point operations are also called as contrast enhancement
techniques and the enhancement techniques based on neighbourhood operations are also known as
spatial enhancement techniques.

Contrast Enhancement
Before discussing about the contrast enhancement, let us first recognise the meaning of ‘contrast’.
Contrast is the difference in pixel values of two different objects at the same wavelength. It can be
defined as the ratio of maximum intensity to the minimum intensity over an image.

where,
C = contrast,
Lmax = maximum intensity value present in an image and
Lmin = minimum intensity value present in the same image.

The images usually acquired by a digital sensor might have a limited range of digital number (DN) to
accommodate all the values of reflectance resulting in a poor quality display of the image. Due to
this reason, the image gray scale distribution is skewed and it is very difficult to distinguish features
in the image. In order to exploit the range of data, the dynamic range of the image pixel is increased.
This technique is known as contrast enhancement. It involves changing the original pixel values so
that more of the available range is used which increases contrast between features and their
backgrounds.

Commonly applied image enhancement techniques are the followings:


• gray level thresholding
• level slicing and
• contrast stretching.
We shall now discuss about each of these techniques.

Gray Level Thresholding


This method is used to segment an input image into two classes - one for those pixels which have
pixel values below an analyst defined pixel value and one for those above this value. Thresholding
can be used to prepare a binary mask for an image as shown in Fig. 12.2 where gray level
thresholding has been carried out on near-infrared (NIR) image of LISS III data to create a binary
image showing only the land and water pixels.

2
Level Slicing
Level or density slicing is an enhancement technique whereby DNs distributed along the x axis of an
image histogram are divided into a series of analyst specified intervals or slices and thereby creating
a new image showing select number of slices (classes) in it.

Contrast Stretching
For example, if maximum pixel value in the original image is 158 and minimum pixel value is 60 as
illustrated in Fig. 12.3a, the ‘stretched’ values would be set at 255 and 0, respectively (Fig. 12.3b).
The intermediate gray level values of original image would be similarly and proportionately set to
new values. These types of enhancements are best applied to remotely sensed images with normal
or near-normal distribution, meaning, all the brightness values fall within a narrow range of the
image histogram and only one mode is apparent. This approach of linear contrast stretching is
known as minimum-maximum stretch.

With a typical application of a linear stretch, areas that were dark-toned will appear darker and
areas that were bright toned will appear even brighter. Linear contrast enhancement also makes
subtle variations within data more obvious. One major disadvantage of a linear stretch is that it
assigns an equal number of display levels for infrequently occurring values as it does for frequently
occurring values. This can result in certain features being enhanced, while other features, which may
be of greater importance, will not be sufficiently enhanced. In these circumstances, we use non-
linear contrast enhancement techniques as discussed below.

Contrary to the linear stretch, a non-linear stretch expands one portion of gray scale while
compressing other portions. This sort of enhancement is typically applied with purpose of
highlighting one part of the brightness range. While spatial information is preserved, quantitative
radiometric information can be lost. Two of the most common of non-linear contract stretching
methods are discussed below:

3
Fig. 12.3: Illustration of principle of contrast stretch.

Spatial Enhancement
Point operation based enhancement techniques operate on each pixel individually whereas spatial
enhancement techniques modify pixel values based on the values of surrounding (neighbourhood)
pixels. Spatial enhancement deals largely with the spatial frequency. Spatial frequency is the
difference between the highest and lowest values of a contiguous set of pixels. Jensen (1996)
defines spatial frequency as “the number of changes in brightness value per unit distance for any
particular part of an image”. A low spatial frequency image consists of a smoothly varying pixel
values whereas a high spatial frequency image consists of abruptly changing pixel values (Fig. 12.5).

Spatial frequency based on the image enhancement approach is popularly known as spatial filtering
techniques. These techniques are used to improve the appearance of image by transforming or
changing pixel values in the image. In simple terms, image filtering is a process for removing certain
image information to get more details about particular features. All filtering algorithms involve so-
called neighbourhood processing because they are based on the relationship between neighbouring
pixels rather than a single pixel. In general, there are three types of filtering techniques as given
below:

4
• Low pass filter is used to smooth the image display by emphasising low frequency details in an
image
• High pass filter is used to sharpen the linear appearance of the image objects such as roads and
rivers by emphasising high frequency details in an image and
• Edge detection filter is used to sharpen the edges around the objects.

Low pass filters pass only the low frequency information. They produce images that appear smooth
or blurred when compared to original data. High pass filters pass only the high frequency
information, or abrupt gray level changes. They produce images that have high frequency
information. Edge detection filters delineate edges and makes shapes more conspicuous. In another
approach, band pass filters are used in which a low pass filter is first applied and then a high pass
filter is applied on an image. The low pass filter blocks too high frequencies and high pass filter
blocks the low frequency information. Hence the resultant image has frequencies which are neither
too low nor too high.
The most commonly used filters for digital image filtering rely on the concept of convolution and
operate in the image domain for reasons of simplicity and processing efficiency. For example, low-
pass filters are mainly used to smoothen image features and to remove noise but often at the cost of
degrading image spatial resolution (blurring). In order to remove random noise with the minimum
degradation of resolution, various edge-preserved filters have been developed such as the adaptive
median filter.
Digital image filtering is useful for enhancing lineaments that may represent significant geological
structures such as faults, veins or dykes. It can also enhance image texture for discrimination of
lithology and drainage patterns. Digital image filtering is used for the land use studies, highlighting
textures of urban features, road systems and agricultural areas.

5
Image Transformations
Image transformations are operations similar in concept to those for image enhancement. However,
unlike image enhancement operations which are normally applied only to a single channel of data at
a time, image transformations usually involve algebraic operations of multi-layer images. Algebraic
operations such as subtraction, addition, multiplication, division, logarithms, exponentials and
trigonometric functions are applied to transform the original images into new images which display
better or highlight certain features in the image.
Image transformation method is also known as spectral enhancement method. It is important to
note that image transformation techniques require more than one band of data.
In this process, a raw image or several set of images undergo some mathematical treatment to
achieve a new imagery. As a consequence, new resultant transformed image generated from two or
more sources highlights particular features or properties of interest better than original input
images. Hence, the transformed image may have properties that make it more suited to a particular
purpose than original input images.
Image transformation techniques are useful for compressing bands having similar kind of
information into fewer bands and also to extract new bands of data that are more interpretable to
human eye. There are number of image transformation techniques available but we will discuss here
about two kinds of image transformation techniques:
• Arithmetic operations and
• Image fusion.

Arithmetic Operations
One of the techniques of image transformation is to apply simple arithmetic/logical operations to
the image data. Arithmetic operations of addition, subtraction, multiplication and division can be
performed on two or more images of the same geographical area. The prerequisite for the operation
is that all input images should be co-registered. These input images may either be individual bands
from images acquired at different dates or separate image bands from a single multispectral image.
Subtraction function between images is used to identify changes that have occurred between
images acquired on different dates. The subtraction reveals differences between the images and is
often used for the purpose of change detection (Fig. 12.6). Multiplication of images is somewhat
different from the other arithmetic operations because it generally involves the use of a real image
and a binary image. The binary image having two values (0 and 1) is used as a mask. When two
images are multiplied, the image pixels in the real image that are multiplied by zero become zero
and others which are multiplied by one remain same.

Fig. 12.6: Example of image subtraction operation showing the resultant image and its new values
obtained from subtraction of image B from image A

Image division or band ratio is probably the widely used arithmetic operation. Band ratio is an image
transformation technique that is applied to enhance the contrast between features by dividing DN
for pixels in one image band by DN of pixels in the other image band. Band ratioing technique is
generally applied on a multispectral image. Ratio is an effective technique for selectively enhancing
spectral features as shown in Fig. 12.7. Ratio images derived from different band pairs are often
used to generate ratio colour composites in a Red, Blue and Green (RGB) display. Many indices, such

6
as the normalised difference vegetation index (NDVI) have been developed based on both
differencing and ratio operations.

Band ratio is an effective technique for suppressing topographic shadows. For a given incident angle
of solar radiation, the radiation energy received by a land surface depends on the angle between the
land surface and incident radiation (Fig. 12.8). Therefore, solar illumination on a land surface varies
with terrain slope and aspect, which results in topographic shadows. In a remotely sensed image,
the spectral information is often occluded by sharp variations of topographic shadowing. DNs in
different spectral bands of a multispectral image are proportional to the solar radiation received by
the land surface and its spectral reflectance.

Fig. 12.8: Ratio of pixel values in NIR region to the corresponding pixel value in the visible red region
of the spectrum. The ratios for the illuminated and shaded slopes are very similar, although pixel
value differs by a factor of more than two. Hence, an image made up of NIR:R ratio values at pixel
positions will exhibit a much reduced shadow or topographic effect
Using combination of band ratio and subtraction techniques several indices have been developed
which are widely used for geological, ecological and other applications. Of the several indices, the
vegetation indices such as normalised differential vegetation index (NDVI) are more popular.
Vegetation index (VI) is a number that is generated by some combination of remote sensing bands
and may have some relationship to the amount of vegetation in a given image pixel. The index is
computed using several spectral bands that are sensitive to chlorophyll concentration and
photosynthetic activity.
The concept of vegetation indices is based on empirical evidence that some algebraic combination of
remotely sensed spectral bands can tell us something about vegetation. More than 200 VIs have
been mentioned in scientific literature, however, only a few of them have been systematically
studied and have biophysical meaning. Each of VIs is designed to accentuate a particular vegetation
property. Some of the widely used VIs are listed in Table 12.1.

7
Of the many vegetation indices we will discuss in detail about NDVI, which is most widely used. NDVI
is the acronym for Normalised Difference Vegetation Index. It is a numerical indicator that uses
visible and NIR bands of the electromagnetic (EM) spectrum. It is mainly adopted to analyse remote
sensing measurements and assess the ‘greenness’ of the target. NDVI is a very important tool to
study vegetation from remote sensing satellite as most of the multispectral satellite sensors have
visible and infrared channels, which can be utilised to calculate NDVI. For example, NDVI can be
calculated using the red band and infrared band of LISS III data, by applying the formula given below:
NDVI = (NIR – Red) / (NIR + Red)
NDVI works in the following ways. The pigment in plant leaves, chlorophyll strongly absorbs visible
light (from 0.4 to 0.7 μm) for use in photosynthesis. The cell structure of leaves, on the other hand,
strongly reflects NIR light (from 0.7 to 1.1 μm). The more leaves a plant has, more these wavelengths
of light are affected, respectively. We can measure intensity of light coming off the Earth in visible
and NIR wavelengths and quantify photosynthetic capacity of vegetation in a given pixel of land
surface. In general, if there is much more reflected radiation in NIR wavelengths than in visible
wavelengths then vegetation in that pixel is likely to be dense and may contain some types of forest.
If there is very little difference in the intensity of visible and NIR wavelengths reflected, then
vegetation is probably sparse and may consist of grassland, tundra, or desert. Since we know the
behaviour of plants across EM spectrum, we can derive NDVI information by focusing on the satellite
bands that are most sensitive to vegetation information (NIR and red). Therefore, bigger the
difference between NIR and red reflectance, more vegetation there has to be.

NDVI algorithm subtracts red reflectance values from NIR and divides it by the sum of NIR and red
bands as mentioned in the equation above.
This normalised index formulation allows us to cope with the fact that two identical patches of
vegetation could have different values if one were, for example in bright sunshine and another
under a cloudy sky. The bright pixels would all have larger values and therefore, a larger absolute
difference between the bands. This is avoided by dividing by the sum of reflectance values.
Theoretically, calculations of NDVI for a given pixel always result in a number that ranges from minus
one (–1) to plus one (+1); however, in practice, extreme negative values represent water, values
around zero represent bare soil and values over 0.6 represent dense green vegetation.
NDVI has found a wide application in vegetative studies as it has been used to estimate crop yields,
pasture performance and rangeland carrying capacities among others. It provides a crude estimate
of vegetation health and a means of monitoring changes in vegetation over time (Fig. 12.9). It can be
used to detect seasonal changes in green biomass but can also be used to detect changes to human
activities (logging) or natural disturbances such as wild fire. NDVI has been found to be useful for
continental or global scale vegetation monitoring because it can compensate for changing
illumination conditions, surface slope and viewing aspect.

Fig. 12.9: Healthy vegetation absorbs most of the visible light that hits it and reflects a large portion
of NIR light. Unhealthy or sparse vegetation reflects more visible light and less NIR light

8
It is often directly related to other ground parameters such as percent of ground cover,
photosynthetic activity of the plant, surface water, leaf area index and the amount of biomass.
The most popular and used satellite instrument for collecting NDVI is US National Oceanic and
Atmospheric Administration’s Advanced Very High Resolution Radiometer (AVHRR) satellite. It is
sensitive to wavelengths from 0.55 - 0.7 μm and 0.73 - 1.0 μm, both of which are idealised in NDVI
calculation. AVHRR’s detectors measure intensity of light being reflected from the different bands.
Landsat TM is also used to calculate NDVI but because its band wavelengths differ (uses bands 3 and
4), it is most often used to create images with greater detail covering less area. Moderate Resolution
Imaging Spectroradiometer (MODIS) sensor of NASA also has an NDVI standard product.
Despite its several uses, there are some limitations of NDVI as listed below:
 Temporal Resolution
 Atmospheric Interference
 Land Cover Types
 Sparse Vegetation and Soil Type
 Off-Nadir Effects

Several other indices such as Enhanced Vegetation Index (EVI) and Atmospherically Resistant
Vegetation Index (ARVI) have been developed to take these limiting factors into considerations.

Image Fusion
Image fusion is the process of combining relevant information of two or more remotely sensed
images of a scene into a highly informative single image. The primary reason image fusion has gained
prominence in remote sensing application is based on the fact that the remote sensing instruments
have design or observational constrains, therefore, a single image is not sufficient for visual or
machine level information analysis. In satellite imaging, two types of images are available. The
panchromatic images acquired by satellites have higher spatial resolution and multispectral data
have coarser spatial resolution. These two image types are merged (fused) in order to get
information of both the images in one single image. The image fusion technique, thus, allows
integration of different information sources and fused image can have complementary spatial and
spectral resolution characteristics. In other words, fused image will have spatial information of the
panchromatic image and spectral information of multispectral image.

Image fusion involves transforming a set of low or coarse spatial resolution multispectral (colour)
images to high spatial resolution colour images by fusing a co-registered fine spatial resolution
panchromatic (gray scale) image. Usually, three low-resolution images in the visible spectrum (blue,
green and red) are used as main inputs to produce a high-resolution natural (true) colour image as
shown in Fig. 12.10, where the image 12.10b is a natural colour image with a spatial resolution of 29
m (which has been resampled 400%) and image 12.10a is a panchromatic image with a spatial
resolution of 4 m. By combining these inputs, a high-resolution colour image is produced
(Fig.12.10c). The fused output retains spectral signatures of input colour image and spatial features
of input panchromatic image, and usually the best attributes of both inputs. The final output with its
high spectral and spatial resolution is often as good as high-resolution colour aerial photographs.

There are many methods which are used for image fusion. These methods can be broadly divided
into two categories - spatial and transform domain fusions. Some of the popular spatial domain
methods include intensity hue saturation (IHS) transformation, Brovey method, principal component
analysis (PCA) and high pass filtering based methods. Wavelet transform, Laplace pyramid and
curvelet transform based methods are some of the popular transform domain fusion methods.
In this unit, we will focus on the most widely used IHS transform method. IHS Image Fusion
Technique: It is based on RGB to IHS colour space transformation. As you know, an image is
displayed in a colour monitors through its three colour guns which correspond to the three additive

9
primary colours i.e. red, green and blue (RGB). When we display three bands of a multiband image
data set, the viewed image is said to be in RGB space. However, it is also possible to define an
alternate colour space that uses three parameters namely – intensity (I), hue (H) and saturation (S)
instead of RGB. IHS colour space is advantageous in that it presents colours more similar to as
perceived by the human eye.

Intensity is the overall brightness of the scene and varies from black to white and saturation
represents the purity of colour and also varies linearly while hue represents colour or dominant
wavelength of pixel. Intensity (I) component is similar to the panchromatic image. This peculiarity is
used to produce fusion between panchromatic data having high spatial resolution and multispectral
data characterised by a high spectral resolution and less spatial detail. The procedure of IHS based
image fusion can be summarised in the following steps (Fig. 12.11):

Fig. 12.11: RGB-IHS encoding and decoding for image fusion

• Resampling of the multispectral image (having RGB components) to the same pixel size of
the panchromatic image, using nearest neighbour resampling criteria. This method, unlike
others (Bi-linear and Cubic), does not introduce radiometric distortion in the images
• Transformation of resampled multispectral image from RGB to IHS colour space. This step
transforms RGB values to IHS values
• Replace intensity (I) component of IHS image by high resolution panchromatic image and
• Perform reverse IHS-RGB transformation. This step transforms HIS values to RGB values.

The resultant fused image is a mixture of spectral information from the low resolution colour
composite and high spatial resolution information from the panchromatic image, which better
shows the image features. Despite being useful, the major limitation of these image fusion
techniques is that they can distort spectral information of multispectral data while merging. Another
major disadvantage of fusion approaches is that they produce spatial distortion in the fused image.
Spectral distortion becomes a negative factor while we go for further processing, such as
classification.

10
Image classification

In this unit, we will move a step further and learn how to make more sense of the landscape by
dividing it into separate classes based on surface characteristics. This process is known as image
classification. It involves conversion of raster data into finite set of classes that represent surface
types in the imagery. It may be used to identify vegetation types, anthropogenic structures, mineral
resources, etc. or transient changes in any of these features. Additionally, classified raster image can
be converted to vector features (e.g., polygons) in order to compare with other data sets or to
calculate spatial attributes (e.g., area, perimeter, etc). Image classification is a very active field of
study broadly related to the field of pattern recognition.

Classification is the process of assigning spectral classes into information classes. Spectral classes are
groups of pixels that are uniform with respect to their brightness values in the different spectral
channels of data. Information classes are categories of interest that an analyst attempts to identify
in the image on the basis of his knowledge and experience about the area. For example, a remote
sensing image contains spectral signatures of several features present on the ground in terms of
pixels of different values. An interpreter or analyst identifies homogeneous groups of pixels having
similar values and labels the groups as information classes such as water, agriculture, forest, etc.
while generating a thematic map. When this thematic information is extracted with the help of
software, it is known as digital image classification. It is important to note that there could be many
spectral classes within an information class depending upon the nature of features the image
represents or the purpose of the classification. In other words, different spectral classes may be
grouped under one information class.

In short, we can define image classification as a process of assigning all pixels in the image to
particular classes or themes based on spectral information represented by the digital numbers
(DNs). The classified image comprises a mosaic of pixels, each of which belong to a particular theme
and is a thematic map of the original image.

Approaches to Classification
There are two general approaches to image classification:
• Supervised Classification: It is the process of identification of classes within a remote sensing
data with inputs from and as directed by the user in the form of training data, and
• Unsupervised Classification: It is the process of automatic identification of natural groups or
structures within a remote sensing data.

Both the classification approaches differ in the way the classification is performed. In the case of
supervised classification, specific land cover types are delineated based on statistical
characterisation of data drawn from known examples in the image (known as training sites). In
unsupervised classification, however, clustering algorithms are used to uncover the commonly
occurring land cover types, with the analyst providing interpretations of those cover types at a later
stage. Merits and demerits of both the supervised and unsupervised classification methods are
summarised in Table.

Both these methods can be combined together to come up with a ‘hybrid’ approach of image
classification. In the hybrid classification, firstly, an unsupervised classification is performed, then the
result is interpreted using ground referenced information and, finally, original image is reclassified
using a supervised classification with the aid of statistics of unsupervised classification as training
knowledge. This method utilises unsupervised classification in combination with ground referenced
information as a comprehensive training procedure and, therefore, provides more objective and
reliable results.

11
Table: Merits and Demerits

Stages in Classification
The image classification process consists of following three stages: training, signature evaluation and
decision making.

Training is the process of generating spectral signature of each class. For example, a forest class may
be defined by minimum and maximum pixel values in different image bands, thus defining a spectral
envelope for it. This simple statistical description of the spectral envelope is known as signature.
Training can be carried out either by an image analyst with guidance from his experience or
knowledge (i.e. supervised training) or by some statistical clustering techniques requiring little input
from image analysts (i.e. unsupervised training).

There are no specific rules regarding the number of training sites per class but it is advisable to take
several training sites for each class to be mapped. If you take very less number of training sites then
it may be difficult to obtain a spectral signature which truly represents that class and if you take
large number of training sites then a significant time may be getting wasted in collecting and
evaluating signatures with significantly improving the final signature.

A general rule of thumb is that training data for a class should be 10 x n where, n is the number of
bands. You should also remember that minimum number of pixels in a training site for a class
should be n +1 (Jensen, 1986).

12
Signature Evaluation is the checking of spectral signatures for their representativeness of the class
they attempt to describe and also to ensure a minimum of spectral overlap between signatures of
different classes.
Decision Making is the process of assigning all the image pixels into thematic classes using evaluated
signatures. It is achieved using algorithms, which are known as decision rules. The decision rules set
certain criteria. When signature of a candidate pixel passes the criteria set for a particular class, it is
assigned to that class. Pixels failing to satisfy any criteria remain unclassified. The term classifier is
widely used as a synonym of the term decision rule.

UNSUPERVISED CLASSIFICATION
As the name implies, this form of classification is done without interpretive guidance from an
analyst. An algorithm automatically organises similar pixel values into groups that become the basis
for different classes. This is entirely based on the statistics of the image data distribution and is often
called clustering.

The process is automatically optimised according to cluster statistics without the use of any
knowledge-based control (i.e. ground referenced data). The method is, therefore, objective and
entirely data driven. It is particularly suited to images of targets or areas where there is no ground
knowledge. Even for a well-mapped area, unsupervised classification may reveal some spectral
features which were not apparent beforehand. The basic steps of unsupervised classification are
shown in Fig.

The result of an unsupervised classification is an image of statistical clusters, where the classified
image still needs interpretation based on knowledge of thematic contents of the clusters. There are
hundreds of clustering algorithms available for unsupervised classification and their use varies by the
efficiency and purpose. K-means and ISODATA are the widely used algorithms.

K-Means Clustering
K-means algorithm assigns each pixel to a group based on an initial selection of mean values. The
iterative re-definition of groups continues till the means reach a threshold beyond which it does not
change. Pixels belonging to the groups are then classified using a minimum-distance to means or
other principle. K-means clustering algorithm, thus, helps split a given unknown dataset into a fixed
number (k) of user defined clusters. The objective of the algorithm is to minimise variability within
the cluster.
The data point at the centre of a cluster is known as a centroid. In most of the image processing
software, each centroid is an existing data point in the given input data set, picked at random, such
that all centroids are unique. Initially, a randomised set of clusters are produced. Each centroid is
thereafter set to the arithmetic mean of cluster it defines. The process of classification and centroid
adjustment is repeated until the values of centroids stabilise. The final centroids are used to produce
final classification or clustering of input data, effectively turning set of initially anonymous data
points into a set of data points, each with a class identity.

13
Advantage
• The main advantage of this algorithm is its simplicity and speed which allows it to run on
large datasets.
Disadvantages
• It does not yield the same result with each run, since the resulting clusters depend on the
initial random assignments
• It is sensitive to outliers, so, for such datasets k-medians clustering is used and
• One of the main disadvantages to k-means is the fact that one must specify the number of
clusters as an input to algorithm.

Outliers in remote sensing images represent observed pixel values that are significantly different
from their neighbourhood pixel values.

ISODATA Clustering
ISODATA (Iterative Self-Organising Data Analysis Technique) clustering method is an extension of k-
means clustering method (ERDAS, 1999). It represents an iterative classification algorithm and is
useful when one is not sure of the number of clusters present in an image. It is iterative because it
makes a large number of passes through the remote sensing dataset until specified results are
obtained. Good results are obtained if all bands in remote sensing image have similar data ranges. It
includes automated merging of similar clusters and splitting of heterogeneous clusters.

The clustering method requires us to input maximum number of clusters that you want, a
convergence threshold and maximum number of iterations to be performed. ISODATA clustering
takes place in the following steps:
• k arbitrary cluster means are established
• All pixels are relocated into the closest clusters by computing distance between pixel and
cluster
• Centroids of all clusters are recalculated and above step is repeated until the threshold
convergence and
• If the number of clusters are within the specified number and distances between the
clusters meet a prescribed threshold, then only clustering is considered complete.

Advantages
• It is good at finding “true” clusters within the data
• It is not biased to the top pixels in the image
• It does not require image data to be normally distributed and
• Cluster signatures can be saved, which can be easily incorporated and manipulated along
with supervised spectral signatures.
Disadvantages
• It is time consuming and
• It requires maximum number of clusters, convergence threshold and maximum number of
iteration as an input to algorithm.

SUPERVISED CLASSIFICATION
Supervised classification, as the name implies, requires human guidance. An analyst selects a group
of contiguous pixels from part of an image known as a training area that defines DN values in each
channel for a class. A classification algorithm computes certain properties (data attributes) of set of
training pixels, for example, mean DN for each channel (Fig. 13.3). Then, DN values of each pixel in
the image are compared with the attributes of the training set.

14
This is based on the statistics of training areas representing different ground objects selected
subjectively by users on the basis of their own knowledge or experience. Classification is controlled
by users’ knowledge but, on the other hand, is constrained and may even be biased by their
subjective view. Classification can, therefore, be misguided by inappropriate or inaccurate training,
area information and/or incomplete user knowledge. Steps involved in supervised classification are
given in Fig. 13.5.

We will discuss parallelepiped and maximum likelihood algorithms of supervised image classification.

Parallelepiped Classifier
Parallelepiped classifier uses the class limits stored in each class signature to determine if a given
pixel falls within the class or not. The class limits specify the dimensions (in standard deviation units)
of each side of a parallelepiped surrounding mean of the class in feature space. If pixel falls inside
the parallelepiped, it is assigned to the class. However, if pixel falls within more than one class, it is
put in the overlap class. If pixel does not fall inside any class, it is assigned to the null class.
In parallelepiped classifiers, an n-dimensional box is constructed around pixels within each category
of interest (Fig. 13.6). The n-dimensional space defined by the parallelepiped delimits different
categories.

15
Classification using this classifier is carried out in the following steps:
Step 1: Define the range of values in each training area and use these ranges to construct an n-
dimensional box (a parallelepiped) around each class.
Step 2: Use multi-dimensional ranges to create different surface categories. Notice that there can be
overlap between the categories when simple method is used. One solution to this problem is to use
a stepped decision region boundary.

Advantages
• It is a simple and computationally inexpensive method and
• It does not assume a class statistical distribution and includes class variance.
Disadvantages
• It is least accurate method
• It does not adapt well to elongated (high-covariance) clusters
• It often produces overlapping classes, requiring a second classification step
• It also becomes more cumbersome with increasing number of channels and
• Pixels falling outside the defined parallelepiped remain unclassified.

Maximum Likelihood Classifier


Maximum likelihood (MXL) classifier is one of the most widely used classifiers in the remote sensing.
In this method, a pixel is assigned to the class for which it has maximum likelihood of membership.
This classification algorithm uses training data to estimate means and variances of the classes, which
are then used to estimate probabilities of pixels to belong to different classes. Maximum likelihood
classification considers not only mean or average values in assigning classification but also the
variability of brightness values in each class around the mean. It is the most powerful of the
classification algorithms as long as accurate training data is provided and certain assumptions
regarding the distributions of classes are valid.

16
An advantage of this algorithm is that it provides an estimate of overlap areas based on statistics.
This method is different from parallelepiped in that it uses only maximum and minimum pixel values.
The distribution of data in each training set is described by a mean vector and a covariance matrix.
Pixels are assigned a posteriori probability of belonging to a given class and placed in the most
‘‘likely’’ class. This is the only algorithm in this list that takes into account the shape of the training
set distribution.

Maximum likelihood classifiers use expected (normal) distribution of DN values to define the
probability of a pixel being within a certain class. Plotting the number of pixels with any given DN
value yields a histogram or distribution of DN values within a particular band. Studies have shown
that for most surfaces DN values from visible or near-infrared (NIR) region of the electromagnetic
(EM) spectrum have a normal probability distribution. It means we can define curves based on the
mean and standard deviation of the sample that describe the normal probability distribution by
selecting category that has highest statistical probability for each pixel. These concentric circles,
called equi-probability contours, are derived from an assumed normal distribution around each
training site. Equi-probability contours define the level of statistical confidence in the classification
accuracy. Smaller the contour, higher is the statistical confidence.

Advantages
• It is one of the most accurate methods
• It overcomes unclassified pixel problem (subject to threshold values)
• It provides a consistent way to separate pixels in overlap zones between classes and
• Assignment of pixels to classes can be weighted by prior knowledge of the likelihood that a
class is correct.
Disadvantages
• Cluster distributions are assumed to be Gaussian in each class and band. Algorithm requires
enough pixels in each training area to describe a normal population and assumes class
covariance matrices are similar
• Classes not assigned to training sets tend to be misclassified – a particular problem for
mixtures
• It is reliant on the accuracy of training data. Changes in training set of any one class can
affect the decision boundaries with other classes
• It is relatively computationally expensive and
• It is also not practical with imaging spectrometer data.

SIGNATURE EVALUATION

Spectral signatures, however, are not always “pure” which means the sensor might record some
signatures that may be emitted by surrounding objects. “Pure” spectral signature for individual
materials or classes can be determined best under laboratory conditions, where the sensor is placed
very close to the target. There is no interference in a closed and controlled environment such as a
laboratory. Agencies such as ISRO, US Department of Agriculture and several universities maintain
large repositories of spectral signatures. Moreover, many image analysis tools have built-in spectral
libraries.

Ways of Signature Evaluation


One of the most common techniques for feature identification is spectral evaluation. Most of the
image analysis software provides an interface to plot spectral signature. With knowledge about the
spectral profile for a given feature, we can go back and change band combinations to make that
feature show up more clearly on the image.

17
Spectral signatures are evaluated in the following three ways:
• Classification is performed on the pixels within the training samples for each class and is
compared with classes as recorded in the field data on those location. Ideally, all pixels in a
training sample should classify correctly. However, you can expect high percentages of correctly
classified pixels if the signatures taken are appropriate
• Measuring spectral distance, i.e. Separability by computing divergence, transformed divergence
or the Jeffries-Matusita distance. You can find mathematics behind computation of these in a
book by Swain and Davis (1978). However, it is important to ensure that there is high
separability between signatures from different types of training samples and low separability
among signatures from the training samples of a particular class and
• Mean and standard deviation of each signature are used to plot ellipse diagrams in two or more
dimensions. The plotting allows the analyst to identify similar signatures and hence the classes
which are likely to suffer most from misclassification. If the amount of overlap between a pair of
signatures is large then those classes are not separable using that image data.

You should note that some of the training samples whose signatures have negative effect on the
classification outcome need to be either renamed or merged or deleted.

ACCURACY ASSESSMENT
Both supervised and unsupervised classification needs direct or indirect information of the surface
characteristics e.g., for unsupervised classification the user must define the classes based on prior
information of surface and in case of supervised classification, it is based on training samples from
the surface. Quality and quantity of training samples, therefore, have considerable implication on
the accuracy of the classified images.

Once you have an interpreted map, the obvious step is that you would want to know how much
accurate those outputs are because inaccuracies in outputs will have their bearing on the map’s
utility and users would have greater confidence in utilising data if its accuracy is high. Hence,
assessment of accuracy is a very important part of the interpretation as it not only tells you about
quality of maps generated or classified images but also provides you with a benchmark to compare
different interpretation and classification methods.

Accuracy assessment is the final step in the analysis of remote sensing data which help us to verify
how accurate our results are. It is carried out once the interpretation/classification has been
completed. Here, we are interested in assessing accuracy of thematic maps or classified images
which is known as thematic or classification accuracy. The accuracy is concerned with the
correspondence between class label and ‘true’ class. A ‘true’ class is defined as what is observed on
the ground during field surveys. For example, a class labelled as water on a classified image/map is
actually water on the ground.

In order to perform accuracy assessment correctly, we need to compare two sources of information
which include:
• Interpreted map/classified image derived from the remote sensing data and
• Reference map, high resolution images or ground truth data.

Relationship between these two sets of information is commonly expressed in two forms, namely -
• Error matrix that describes the comparison of these two sources of information and
• Kappa coefficient which consists a multivariate measure of agreement between rows and
columns of error matrix.

18
Error Matrix
An error matrix is a square array of rows and columns in which each row and column represents one
category/class in the interpreted map. Error matrix is also known as confusion matrix, evaluation
matrix, or a contingency table.
Once a classification exercise has been carried out, there is a need to determine the degree of error
in the end product which includes identified categories on the map. Errors are the result of incorrect
labeling of the pixels for a category. The most commonly used method of representing the degree of
accuracy of a classification is to build a k×k array, where k represents the number of categories. For
example, in Table, the left hand side of the table is marked with the categories on the standard (i.e.
reference) map/data. The top side of same table is marked with the same k categories but these
categories represent end product of a created map to be evaluated. The values in the matrix indicate
the numbers of pixels. This arrangement establishes a standard form which helps to find site-specific
error in the end product and is known as error matrix. Error matrix is useful for the determination of
overall errors for each category and misclassifications by category, as a result it is also known as
confusion matrix. The strength of a confusion matrix is that it not only identifies the nature of the
classification errors but also their quantities.

The table can read about the various components of the confusion matrix outlined below:
• Rows correspond to classes in the ground truth map (or test set)
• Columns correspond to classes in the classification result
• Diagonal elements in the matrix represent the number of correctly classified pixels of each
class
• Off-diagonal elements represent misclassified pixels or the classification errors
• Off-diagonal row elements represent ground truth pixels of a certain class which were
excluded from that class during classification. Such errors are also known as errors of
omission or exclusion.
• Off-diagonal column elements represent ground truth pixels of other classes that were
included in a certain classification class. Such errors are also known as errors of commission
or inclusion.
• Numbers in the column unclassified represent the ground truth pixels that were found not
classified in the classified image.

19
Accuracy or Producer’s Accuracy
Producer’s accuracy is defined as the probability that any pixel in that category has been correctly
classified. It is the values in column accuracy (producer’s accuracy) present the accuracies of the
categories in the classified image as shown in Table.

The water category of Table, for example, has accuracy 0.89 meaning that approximately 89% of the
water ground truth pixels also appear as water pixels in the classified image. This statistics is also
known as errors of commission.

Accuracy for water category of Table can be calculated as given below:


Total number of correct pixels for water = 240.
Total number of pixel in water row = 0+20+0+0+0+240+10 = 270.
Hence, accuracy for water = 240/270 = 0.89

The average accuracy is calculated as given below:

The average accuracy of data given in Table


= (0.83+0.71+0.58+0.56+0.88+0.89) / 6
= 4.428 / 6 = 0.74
= 74.25%
This means average accuracy of the classification shown in Table is 74.25% (or 0.74).

Reliability or User’s Accuracy


User’s accuracy is defined as the probability that a pixel classified on the image actually represents
that category on the ground. The figures in row reliability (user’s accuracy) present the reliability of
classes in the classified image.

The water category of Table, for example, has reliability 0.86 meaning that approximately 86% of the
water pixels in the classified image actually represent water on the ground. This statistics is also
called errors of omission.

Reliability for water category of Table can be calculated as shown below:


Total number of correct pixels for water = 240.
Total number of pixel in water column = 10+10+10+10+0+240 = 280.
Hence, reliability for water = 240/280 = 0.86.

The average reliability is calculated as given below:

20
Average reliability of data given in Table
= (0.90+0.76+0.88+0.92+0.51+0.86) / 6
= 4.81 / 6 = 0.80
= 80.27%
It indicates average reliability of the classification shown in Table as 80.27% (or 0.80).

From the accuracy and reliability values for different classes given in Table, it can be concluded that
the test set classes crop and urban were difficult to classify as many of such test set pixels were
excluded from the crop and urban categories, thus the areas of these classes in the classified image
are probably underestimated. On the other hand, class open land in the image is not very reliable as
many test set pixels of other categories were included in the open land category in the classified
image. Thus, the area of open land category in the classified image is probably overestimated.

Overall Accuracy
We have discussed about the individual classes and their accuracies. It is also desirable to calculate a
measure of accuracy for the entire image across all classes present in the classified image. The
collective accuracy of map for all the classes can be described using overall accuracy, which
calculates the proportion of pixels correctly classified.

The overall accuracy is calculated as given below:

For the sample data presented in Table, the overall accuracy


= (440+220+210+240+230+240) / (490+290+240+260+450+280)
= 1580 / 2010 = 0.78
= 78%.
It indicates that overall accuracy of the classification shown in Table is 78%.

Three crucial assumptions involved in the classification accuracy assessment:


• That the reference data are truly representative of the entire classification, which is quite
unlikely
• The reference data and classified image are perfectly co-registered, which is impossible and
• There is no error in the reference data, which again is highly unlikely.

The actual accuracy of our classification is unknown because it is impossible to perfectly assess the
true class of every pixel. It is possible to produce a misleading assessment of classification accuracy.
Depending on how the reference data are collected, our estimate of accuracy may be either
conservative or optimistic.

Therefore, if error matrix is generated by using improper reference data collection methods, then
the assessment can be misleading. Sampling methods used for reference data should be reported in
detail so that potential users can judge whether there may be significant biases in the classification
accuracy assessment.

Kappa Analysis
All these “naïve” accuracy measures can produce results due to classification of pixels by chance,
therefore; do not provide avenues to compare accuracy statistically. This paves way for use of other
accuracy assessment methods.

21
It is a discrete multivariate technique used to assess classification accuracy from an error matrix.
Kappa analysis generates a kappa coefficient or Khat statistics, the values of which range between 0
and 1.
Kappa coefficient (Khat) is a measure of the agreement between two maps taking into account all
elements of error matrix. It is defined in terms of error matrix as given below:

Where,
Obs = Observed correct, it represents accuracy reported in error matrix (overall accuracy)
Exp = Expected correct, it represents correct classification

Calculation Steps
Omission error = 100 – producer’s accuracy
Commission error = 100 – user’s accuracy

Kappa coefficient is calculated in the following steps:


Step 1: Construction of error (confusion) matrix (e.g., Table)

Step 2: Calculation of observed correct


Grand total = Sum of rows and columns
= 28+1+1+14+15+1+15+5+20 = 100
Total correct = Sum of the diagonal = 28+15+20 = 63
Observed correct = Total correct / Grand total = 63 / 100 = 0.63
Overall accuracy = 63%.
Step 3: Calculation of expected correct
Grand total = Sum of products of row and column marginals
= 1710+1710+2280+630+630+840+660+660+880
= 10000
Total correct = Sum of products of diagonal
= 1710+630+880
= 3220
Expected correct = Total correct / Grand total
= 3220/10000
= 0.32

Note: For the calculation of expected correct you need to prepare an error matrix showing products
of row and column marginals as shown in Table 14.4.

22
Step 4: Calculation of Khat
Now you have values of observed correct and expected correct.
Observed correct = 0.63
Expected correct = 0.32
As you know that
Khat = (Observed – Expected) / (1 – Expected)
This implies,
Khat = (0.63 – 0.32) / (1 – 0.32)
= 0.31/0.68
= 0.45.
Kappa coefficient of 0.45 implies that the classification process was avoiding 45% of the error that a
completely random classification would generate (Congalton, 1991).

Advantages
One of the advantages of using this method is that you can statistically compare two classification
products. For example, two classification maps can be made using different algorithms and you can
use the same reference data to verify them. Two Khats can be derived like Khat1, Khat2. For each Khat, the
variance can also be calculated. Kappa coefficient, unlike the overall accuracy, includes errors of
omission and commission.
Computation of the kappa and average mutual information (AMI).
AMI is based on use of posteriori entropies for one map given that the class identity from the second
map allows evaluation of individual class performance. Unlike the percentage correct or Kappa, that
measures correctness, the AMI measures consistency between two maps. It provides an alternate
viewpoint because it is used to access similarity of maps. For example, it can be used to compare the
consistency between maps of the same region that have entirely different themes.
Accuracy assessment is still relatively new and is an evolving area in remote sensing. The
effectiveness of different methods and measurement are still being explored and debated.

23

You might also like