Content-Based Image Retrieval - Some Basics

Content-Based Image Retrieval Some Basics
Gerald Schaefer
Department of Computer Science
Loughborough University
Loughborough, U.K.
gerald.schaefer@ieee.org
Abstract. Image collections are growing at a rapid rate, motivating the

need for efficient and effective tools to query these databases. Contentbased image retrieval (CBIR) techniques extract features directly from
image data and use these, coupled with a similarity measure, to search
through image collections. In this paper, we introduce some of the basic
image features that are used for CBIR.
Introduction
While image libraries are growing at a rapid rate (personal image collections
may contain thousands, commercial image repositories millions of images [1]),
most images remain un-annotated [2], preventing the application of a typical
text-based search. Content-based image retrieval (CBIR) [3, 4] does not require
any extra data, as it extracts image features directly from the image data and
uses these, coupled with a similarity measure, to query image collections. Image
features typically describe the colour, texture, and shape content of the images,
and in this paper we review several well-known descriptors that are employed in
CBIR. Our emphasis is on rather simple image features which nevertheless have
been shown to be effective for CBIR. In Section 2, we discuss some basic colour
image features, while Section 3 focusses on incorporating spatial information
into colour-based retrieval. Section 4 reviews texture image features, whereas in
Section 5, we present some shape-based retrieval techniques. Section 6 concludes
the paper.
Colour features
Colour features are the most widely used feature type for CBIR and are at the
heart of various image retrieval search engines such as QBIC [5] and Virage [6].
2.1
Colour moments
The simplest colour descriptor for CBIR are colour moments [7]. The n-th central
(normalised) moment of a colour distribution is defined as
r
1
n
n
M (I) =
(M 1 (I) c(x, y))n ,
(1)
N
with
1 X
c(x, y),
(2)
N
where N is the number of pixels of image I and c(x, y) describes the colour of
the pixel at location (x, y). The distance between two images is defined as the
sum of absolute distances between their moments (L1 norm)
M 1 (I) =
dMNT (I1 , I2 ) =
n
X
|M i (I1 ) M i (I2 )|.
(3)
i=1
2.2
Colour histograms
Swain and Ballard [8] introduced the use of colour histograms, which record the
frequencies of colours in the image, to describe images in order to perform image
retrieval. Indeed, it was Swain and Ballards work that laid the foundations for
the field of CBIR as we know it today. As distance measure they introduced (the
complement of) histogram intersection defined as
dHIS (I1 , I2 ) = 1
N
X
min(H1 (k), H2 (k)),
(4)
k=1
where H1 and H2 are the colour histograms of images I1 and I2 , and N is the
number of bins used for representing the histogram. It can be shown [8] that
histogram intersection is equivalent to the L1 norm and hence a metric.
An alternative to the L1 norm is to use the Euclidean distance (L2 norm)
between two histograms. This approach was taken in the QBIC system [9] and
also addresses the problem of possible false negatives due to slight colour shifts
by taking into account the similarity between separate histogram bins. This can
be expressed in a quadratic form distance measure as
dQBIC (I1 , I2 ) = (H1 H2 )A(H1 H2 )T ,
(5)
where H1 and H2 are again the two (vectorised) colour histograms, and A is an
N N matrix containing inter-bin colour differences.
2.3
Colour signatures
Rather than using colour histograms, a more compact descriptor for encoding
the colour distribution of images is a colour signature. Colour signatures are a set
{(c1 , 1 ), (c2 , 2 ), . . . , (cm , m )} where ci define colour co-ordinates and i their
associated weights (i.e. their relative frequencies in the image). A common way
of deriving colour signatures for images is through a clustering process. Once
colour signatures for images are determined, these signatures can be compared
by a metric known as the earth movers distance [10] which is a flow-based
measure defined as
Pm Pn
i=1
j=1 fij dij
,
(6)
dEMD (I1 , I2 ) = Pm Pn
i=1
j=1 fij
which is based on the linear programming problem

Work(S1 , S2 , F ) =
n
m X
X
fij dij
(7)
i=1 j=1
subject to
fij 0 1 i m, 1 j n
n
X
j=1
m
X
fij pi
1im
fij qj
1jn
(8)
i=1
n
m X
X
i=1 j=1
n
m
X
X
qj ),
pi ,
fij dij = min(
i=1
j=1
where S1 and S2 are the colour signatures of images I1 and I2 , F = [fij ] is the
work flow to be minimised in order to transform one colour signature to the
other one, and dij denote the colour differences between colour clusters.
Spatial colour features
Simple colour features such as colour histograms are fast to compute, and are invariant to rotation and translation as well as robust to scaling and occlusions. On
the other hand, they do not carry any information about the spatial distribution
of the colours. Consequently, several methods try to address this weakness.
3.1
Colour coherence vectors
Colour coherence vectors [11] were introduced as such a method of introducing

spatial information into the retrieval process. Colour coherence vectors consist of
two histograms: one histogram of coherent and one of non-coherent pixels. Pixels
are considered to be coherent if they are part of a continuous uniformly coloured
area and the size of this area exceeds some threshold where is usually defined
as 1% of the overall area of an image. The L1 norm is used as the distance metric
between two colour coherence vectors
dCCV (I1 , I2 ) =
N
X
[|H1c (k) H2c (k)| + |H1s (k) H2s (k)|] ,
(9)
k=1
where Hic and His and are the histograms of coherent and non-coherent (scattered) pixels respectively.
3.2
Colour correlograms
Another approach to incorporate information on the spatial correlation between

the colours present in an image are colour correlograms [12], defined as
(I) = PRp1 Ici ,p2 I [p2 Icj , |p1 p2 | = k],
c(k)
i ,cj
(10)
|p1 p2 | = max |x1 x2 |, |y1 y2 |,
(11)
with
where ci and cj denote two colours and (xk , yk ) denote pixel locations. In other
words, given any colour ci in the image, gives the probability that a pixel at
distance k away is of colour cj .
As full colour correlograms are expensive both in terms of computation and
storage requirements, usually a simpler form called auto-correlogram (ACR)
defined as
(k)
c(k) (I) = c,c
(I)
(12)
is often being used, i.e. only the spatial correlation of each colour to itself is
recorded. Two CCRs are compared using
P
(k)
(k)
i,j[m],k[d] |ci ,cj (I1 ) ci ,cj (I2 )|
.
(13)
dCCR (I1 , I2 ) = P
(k)
(k)
i,j[m],k[d] (1 + ci ,cj (I1 ) + ci ,cj (I2 ))
3.3
Spatial-chromatic histograms
Spatial-chromatic histograms (SCHs) [13] are another alternative for representing both colour and spatial information. They consist of a colour histogram
|Ak |
,
(14)
nm
where Ak is a set having the same colour k, and n and m are the dimensions
of the image; and location information on each colour characterised through its
baricentre
X
X
1
1
1
1
(15)
x,
y ,
b(k) =
n |Ak |
m |Ak |
h(k) =
(x,y)Ak
(x,y)Ak
and the standard deviation of distances of a given colour from its baricentre
s
1 X
(k) =
d(p, b(k))2 .
(16)
|Ak |
pAk
The SCH is then given as

HSCH (k) = [h(k), b(k), (k)],
(17)
and similarity between two SCHs calculated as
dSCH (I1 , I2 ) = 2

2d(bI1 (k),d(bI2 (k))
+
k=1 min(hI1 (k), hI2 (k))
2
PN
(18)
min(I1 (k),I2 (k))
max(I1 (k),I2 (k))
Texture features
Texture features do not exist at a single pixel but are rather a description of
a neighbourhood of pixels. Texture features often complement colour features
to improve retrieval accuracy in CBIR, and are also attractive since texture is
typically difficult to describe in terms of words.
4.1
Local binary patterns (LBP)
Local binary patterns (LBP) are a simple yet effective texture analysis technique [14]. It assigns, on a pixel basis, descriptors that describe the neighbourhood of that pixel and then forms a histogram of those descriptors. In detail,
let
g(1,1) g(1,0) g(1,1)

B = g(0,1) g(0,0) g(0,1)
(19)
g(1,1) g(1,0) g(1,1)
describe the 33 grayscale block of a pixel at location (0,0) and its 8-neighbourhood.
The first step is to subtract the value of the central pixel and consider only the
resulting values at the neighbouring locations
g(1,1) g(0,0) g(1,0) g(0,0) g(1,1) g(0,0)

g(0,1) g(0,0)
LBP1 = g(0,1) g(0,0)
(20)
g(1,1) g(0,0) g(1,0) g(0,0) g(1,1) g(0,0)
Next an operator
1 for x 0
0 for x < 0
is assigned at each location resulting in
s(g(1,1) g(0,0) ) s(g(1,0) g(0,0) ) s(g(1,1) g(0,0) )

s(g(0,1) g(0,0) )
LBP2 = s(g(0,1) g(0,0) )
s(g(1,1) g(0,0) ) s(g(1,0) g(0,0) ) s(g(1,1) g(0,0) )
s(x) =
(21)
(22)
Each pixel of the 8-neighbourhood is encoded as either 0 or 1 and an LBP

histogram with 256 bins can be built as an image descriptor.
4.2
Co-occurrence matrix
Co-occurrence matrices of an image I are defined by [15]

m
n X
X
1 if I(x, y) = i and I(x + p, y + q) = j
C(i, j) =
0 otherwise
(23)
x=1 y=1
where i and j correspond to image (grey-level) values, and p and q are offset
values. Typically several (p, q) pairs are employed and from the corresponding
co-occurrence matrices several statistical features such as the entropy
X
C(i, j) log C(i, j)
ij
calculated to form a feature vector.
Shape features
Since true shape features would require segmentation, often global shape feature
or feature distributions are employed in CBIR. Shape features are often combined
with colour and/or texture features.
5.1
Edge histograms
A simple yet effective shape feature can be derived by describing edge direction
information [16]. Following an edge detection step using the Canny edge detector [17], a histogram of edge directions (typically in 5 degree steps) is generated,
and then smoothed. Since it is a histogram feature, it can be compared using
e.g. histogram intersection as in Eq. (4).
5.2
Image moments
Image moments of an Image I are defined by

mpq =
1
M
1 N
X
X
xp y q I(x, y)
(24)
(x x
)p (y y)q I(x, y)
(25)
y=0 x=0
Rather than mpq often central moments

pq =
1
M
1 N
X
X
y=0 x=0
with
x
=
m10
m00
y =
m01
m00
are used, i.e. moments where the centre of gravity has been moved to the origin
(i.e. 10 = 01 = 0). Central moments have the advantage of being invariant to
translation.
Normalised central moments defined by
pq =
with
=
p+q
+1
2
pq
00
(26)
p + q = 2, 3, ...
are also invariant to changes in scale.

It is well known that a small number of moments can characterise an image
fairly well. In order to achieve invariance to rotation, rather than using the
moments themselves algebraic combinations thereof known as moment invariants
are used that are independent of these transformations. One such set of moment
invariants are Hus original moment invariants given by [18]
M1 = 20 + 02
(27)
2
4211
M2 = (20 02 ) +
M3 = (30 312 )2 + 3(21 + 03 )2
M4 = (30 + 12 )2 + (21 + 03 )2
M5 = (30 312 )(30 + 12 )[(30 + 12 )2 3(21 + 03 )2 ] +
(321 03 )(21 + 03 )[3(30 + 12 )2 (21 + 03 )2 ]
M6 = (20 02 )[(30 + 12 )2 (21 + 03 )2 ] + 411 (30 + 12 )(21 + 03 )
M7 = (321 03 )(30 + 12 )[(30 + 12 )2 3(21 + 03 )2 ] +
(30 312 )(21 + 03 )[3(30 + 12 )2 (21 + 03 )2 ]
which can be employed as a shape descriptor for CBIR.
Conclusions
In this paper, we have reviewed several basic image features employed for contentbased image retrieval. In particular, we have looked at colour, spatial colour,
texture and shape features in this context. Further details and other image features are discussed in survey papers such as [3, 4], while more advanced CBIR
topics are discussed in [19].
References
1. Osman, T., Thakker, D., Schaefer, G., Lakin, P.: An integrative semantic framework for image annotation and retrieval. In: IEEE/WIC/ACM International Conference on Web Intelligence. (2007) 366373
2. Rodden, K.: Evaluating Similarity-Based Visualisations as Interfaces for Image
Browsing. PhD thesis, University of Cambridge Computer Laboratory (2001)
3. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image
retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine
Intelligence 22 (2000) 12491380
4. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and
trends of the new age. ACM Computing Surveys 40 (2008) 160
5. Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, D., Petkovic, D.,
Yanker, P.: The QBIC project: Querying images by content using color, texture
and shape. In: Conf. on Storage and Retrieval for Image and Video Databases.
Volume 1908 of Proceedings of SPIE. (1993) 173187
6. Bach, J., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B., Humphrey, R., Jain,
R.: The Virage image search engine: An open framework for image management. In:
Storage and Retrieval for Image and Video Databases. Volume 2670 of Proceedings
of SPIE. (1996) 7687
7. Stricker, M., Orengo, M.: Similarity of color images. In: Conf. on Storage and
Retrieval for Image and Video Databases III. Volume 2420 of Proceedings of SPIE.
(1995) 381392
8. Swain, M., Ballard, D.: Color indexing. Int. Journal of Computer Vision 7 (1991)
1132
9. Faloutsos, C., Equitz, W., Flickner, M., Niblack, W., Petkovic, D., Barber, R.:
Efficient and effective querying by image content. journal of Intelligent Information
Retrieval 3 (1994) 231262
10. Rubner, Y., Tomasi, C., Guibas, L.: The earth movers distance as a metric for
image retrieval. Int. Journal of Computer Vision 40 (2000) 99121
11. Pass, G., Zabih, R.: Histogram refinement for content-based image retrieval. In:
3rd IEEE Workshop on Applications of Computer Vision. (1996) 96102
12. Huang, J., Kumar, S., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color
correlograms. In: IEEE Int. Conference Computer Vision and Pattern Recognition.
(1997) 762768
13. Cinque, L., Levialdi, S., Pellicano, A.: Color-based image retrieval using spatialchromatic histograms. In: IEEE Int. Conf. Multimedia Computing and Systems.
(1999) 969973
14. Ojala, T., Pietik
ainen, M., Harwood, D.: A comparative study for texture measures
with classification based on feature distributions. Pattern Recognition 29 (1996)
5159
15. Haralick, R.: Statistical and structural approaches to texture. Proceedings of the
IEEE 67 (1979) 786804
16. Jain, A., Vailaya, A.: Image retrieval using color and shape. Pattern Recognition
29 (1996) 12331244
17. Canny, J.: A computational approach to edge detection. PAMI 8 (1986) 679698
18. Hu, M.: Visual pattern recognition by moment invariants. IRE Transactions on
Information Theory 8 (1962) 179187
19. Schaefer, G.: Content-based image retrieval - advanced topics. In: Int. Conference
on Man-Machine Interactions. (2011) (in this volume).

Content-Based Image Retrieval - Some Basics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Content-Based Image Retrieval - Some Basics

Uploaded by

Copyright:

Available Formats

Content-Based Image Retrieval Some Basics

Abstract. Image collections are growing at a rapid rate, motivating the

|M i (I1 ) M i (I2 )|.

min(H1 (k), H2 (k)),

which is based on the linear programming problem

Spatial colour features

Colour coherence vectors

Colour coherence vectors [11] were introduced as such a method of introducing

[|H1c (k) H2c (k)| + |H1s (k) H2s (k)|] ,

Another approach to incorporate information on the spatial correlation between

|p1 p2 | = max |x1 x2 |, |y1 y2 |,

The SCH is then given as

and similarity between two SCHs calculated as

Local binary patterns (LBP)

g(1,1) g(1,0) g(1,1)

g(1,1) g(0,0) g(1,0) g(0,0) g(1,1) g(0,0)

s(g(1,1) g(0,0) ) s(g(1,0) g(0,0) ) s(g(1,1) g(0,0) )

Each pixel of the 8-neighbourhood is encoded as either 0 or 1 and an LBP

Co-occurrence matrices of an image I are defined by [15]

calculated to form a feature vector.

Image moments of an Image I are defined by

Rather than mpq often central moments

are also invariant to changes in scale.

You might also like