Content-Based Image Retrieval - Some Basics
Content-Based Image Retrieval - Some Basics
Gerald Schaefer
Department of Computer Science
Loughborough University
Loughborough, U.K.
gerald.schaefer@ieee.org
Introduction
While image libraries are growing at a rapid rate (personal image collections
may contain thousands, commercial image repositories millions of images [1]),
most images remain un-annotated [2], preventing the application of a typical
text-based search. Content-based image retrieval (CBIR) [3, 4] does not require
any extra data, as it extracts image features directly from the image data and
uses these, coupled with a similarity measure, to query image collections. Image
features typically describe the colour, texture, and shape content of the images,
and in this paper we review several well-known descriptors that are employed in
CBIR. Our emphasis is on rather simple image features which nevertheless have
been shown to be effective for CBIR. In Section 2, we discuss some basic colour
image features, while Section 3 focusses on incorporating spatial information
into colour-based retrieval. Section 4 reviews texture image features, whereas in
Section 5, we present some shape-based retrieval techniques. Section 6 concludes
the paper.
Colour features
Colour features are the most widely used feature type for CBIR and are at the
heart of various image retrieval search engines such as QBIC [5] and Virage [6].
2.1
Colour moments
The simplest colour descriptor for CBIR are colour moments [7]. The n-th central
(normalised) moment of a colour distribution is defined as
r
1
n
n
M (I) =
(M 1 (I) c(x, y))n ,
(1)
N
with
1 X
c(x, y),
(2)
N
where N is the number of pixels of image I and c(x, y) describes the colour of
the pixel at location (x, y). The distance between two images is defined as the
sum of absolute distances between their moments (L1 norm)
M 1 (I) =
dMNT (I1 , I2 ) =
n
X
(3)
i=1
2.2
Colour histograms
Swain and Ballard [8] introduced the use of colour histograms, which record the
frequencies of colours in the image, to describe images in order to perform image
retrieval. Indeed, it was Swain and Ballards work that laid the foundations for
the field of CBIR as we know it today. As distance measure they introduced (the
complement of) histogram intersection defined as
dHIS (I1 , I2 ) = 1
N
X
(4)
k=1
where H1 and H2 are the colour histograms of images I1 and I2 , and N is the
number of bins used for representing the histogram. It can be shown [8] that
histogram intersection is equivalent to the L1 norm and hence a metric.
An alternative to the L1 norm is to use the Euclidean distance (L2 norm)
between two histograms. This approach was taken in the QBIC system [9] and
also addresses the problem of possible false negatives due to slight colour shifts
by taking into account the similarity between separate histogram bins. This can
be expressed in a quadratic form distance measure as
dQBIC (I1 , I2 ) = (H1 H2 )A(H1 H2 )T ,
(5)
where H1 and H2 are again the two (vectorised) colour histograms, and A is an
N N matrix containing inter-bin colour differences.
2.3
Colour signatures
Rather than using colour histograms, a more compact descriptor for encoding
the colour distribution of images is a colour signature. Colour signatures are a set
{(c1 , 1 ), (c2 , 2 ), . . . , (cm , m )} where ci define colour co-ordinates and i their
associated weights (i.e. their relative frequencies in the image). A common way
of deriving colour signatures for images is through a clustering process. Once
colour signatures for images are determined, these signatures can be compared
by a metric known as the earth movers distance [10] which is a flow-based
measure defined as
Pm Pn
i=1
j=1 fij dij
,
(6)
dEMD (I1 , I2 ) = Pm Pn
i=1
j=1 fij
n
m X
X
fij dij
(7)
i=1 j=1
subject to
fij 0 1 i m, 1 j n
n
X
j=1
m
X
fij pi
1im
fij qj
1jn
(8)
i=1
n
m X
X
i=1 j=1
n
m
X
X
qj ),
pi ,
fij dij = min(
i=1
j=1
where S1 and S2 are the colour signatures of images I1 and I2 , F = [fij ] is the
work flow to be minimised in order to transform one colour signature to the
other one, and dij denote the colour differences between colour clusters.
Simple colour features such as colour histograms are fast to compute, and are invariant to rotation and translation as well as robust to scaling and occlusions. On
the other hand, they do not carry any information about the spatial distribution
of the colours. Consequently, several methods try to address this weakness.
3.1
N
X
(9)
k=1
where Hic and His and are the histograms of coherent and non-coherent (scattered) pixels respectively.
3.2
Colour correlograms
(10)
(11)
with
where ci and cj denote two colours and (xk , yk ) denote pixel locations. In other
words, given any colour ci in the image, gives the probability that a pixel at
distance k away is of colour cj .
As full colour correlograms are expensive both in terms of computation and
storage requirements, usually a simpler form called auto-correlogram (ACR)
defined as
(k)
c(k) (I) = c,c
(I)
(12)
is often being used, i.e. only the spatial correlation of each colour to itself is
recorded. Two CCRs are compared using
P
(k)
(k)
i,j[m],k[d] |ci ,cj (I1 ) ci ,cj (I2 )|
.
(13)
dCCR (I1 , I2 ) = P
(k)
(k)
i,j[m],k[d] (1 + ci ,cj (I1 ) + ci ,cj (I2 ))
3.3
Spatial-chromatic histograms
Spatial-chromatic histograms (SCHs) [13] are another alternative for representing both colour and spatial information. They consist of a colour histogram
|Ak |
,
(14)
nm
where Ak is a set having the same colour k, and n and m are the dimensions
of the image; and location information on each colour characterised through its
baricentre
X
X
1
1
1
1
(15)
x,
y ,
b(k) =
n |Ak |
m |Ak |
h(k) =
(x,y)Ak
(x,y)Ak
and the standard deviation of distances of a given colour from its baricentre
s
1 X
(k) =
d(p, b(k))2 .
(16)
|Ak |
pAk
(17)
dSCH (I1 , I2 ) = 2
2d(bI1 (k),d(bI2 (k))
+
k=1 min(hI1 (k), hI2 (k))
2
PN
(18)
min(I1 (k),I2 (k))
max(I1 (k),I2 (k))
Texture features
Texture features do not exist at a single pixel but are rather a description of
a neighbourhood of pixels. Texture features often complement colour features
to improve retrieval accuracy in CBIR, and are also attractive since texture is
typically difficult to describe in terms of words.
4.1
Local binary patterns (LBP) are a simple yet effective texture analysis technique [14]. It assigns, on a pixel basis, descriptors that describe the neighbourhood of that pixel and then forms a histogram of those descriptors. In detail,
let
describe the 33 grayscale block of a pixel at location (0,0) and its 8-neighbourhood.
The first step is to subtract the value of the central pixel and consider only the
resulting values at the neighbouring locations
Next an operator
1 for x 0
0 for x < 0
is assigned at each location resulting in
(21)
(22)
Co-occurrence matrix
(23)
x=1 y=1
where i and j correspond to image (grey-level) values, and p and q are offset
values. Typically several (p, q) pairs are employed and from the corresponding
co-occurrence matrices several statistical features such as the entropy
X
C(i, j) log C(i, j)
ij
Shape features
Since true shape features would require segmentation, often global shape feature
or feature distributions are employed in CBIR. Shape features are often combined
with colour and/or texture features.
5.1
Edge histograms
A simple yet effective shape feature can be derived by describing edge direction
information [16]. Following an edge detection step using the Canny edge detector [17], a histogram of edge directions (typically in 5 degree steps) is generated,
and then smoothed. Since it is a histogram feature, it can be compared using
e.g. histogram intersection as in Eq. (4).
5.2
Image moments
1
M
1 N
X
X
xp y q I(x, y)
(24)
(x x
)p (y y)q I(x, y)
(25)
y=0 x=0
1
M
1 N
X
X
y=0 x=0
with
x
=
m10
m00
y =
m01
m00
are used, i.e. moments where the centre of gravity has been moved to the origin
(i.e. 10 = 01 = 0). Central moments have the advantage of being invariant to
translation.
Normalised central moments defined by
pq =
with
=
p+q
+1
2
pq
00
(26)
p + q = 2, 3, ...
are used that are independent of these transformations. One such set of moment
invariants are Hus original moment invariants given by [18]
M1 = 20 + 02
(27)
2
4211
M2 = (20 02 ) +
M3 = (30 312 )2 + 3(21 + 03 )2
M4 = (30 + 12 )2 + (21 + 03 )2
M5 = (30 312 )(30 + 12 )[(30 + 12 )2 3(21 + 03 )2 ] +
(321 03 )(21 + 03 )[3(30 + 12 )2 (21 + 03 )2 ]
M6 = (20 02 )[(30 + 12 )2 (21 + 03 )2 ] + 411 (30 + 12 )(21 + 03 )
M7 = (321 03 )(30 + 12 )[(30 + 12 )2 3(21 + 03 )2 ] +
(30 312 )(21 + 03 )[3(30 + 12 )2 (21 + 03 )2 ]
which can be employed as a shape descriptor for CBIR.
Conclusions
In this paper, we have reviewed several basic image features employed for contentbased image retrieval. In particular, we have looked at colour, spatial colour,
texture and shape features in this context. Further details and other image features are discussed in survey papers such as [3, 4], while more advanced CBIR
topics are discussed in [19].
References
1. Osman, T., Thakker, D., Schaefer, G., Lakin, P.: An integrative semantic framework for image annotation and retrieval. In: IEEE/WIC/ACM International Conference on Web Intelligence. (2007) 366373
2. Rodden, K.: Evaluating Similarity-Based Visualisations as Interfaces for Image
Browsing. PhD thesis, University of Cambridge Computer Laboratory (2001)
3. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image
retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine
Intelligence 22 (2000) 12491380
4. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and
trends of the new age. ACM Computing Surveys 40 (2008) 160
5. Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, D., Petkovic, D.,
Yanker, P.: The QBIC project: Querying images by content using color, texture
and shape. In: Conf. on Storage and Retrieval for Image and Video Databases.
Volume 1908 of Proceedings of SPIE. (1993) 173187
6. Bach, J., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B., Humphrey, R., Jain,
R.: The Virage image search engine: An open framework for image management. In:
Storage and Retrieval for Image and Video Databases. Volume 2670 of Proceedings
of SPIE. (1996) 7687
7. Stricker, M., Orengo, M.: Similarity of color images. In: Conf. on Storage and
Retrieval for Image and Video Databases III. Volume 2420 of Proceedings of SPIE.
(1995) 381392
8. Swain, M., Ballard, D.: Color indexing. Int. Journal of Computer Vision 7 (1991)
1132
9. Faloutsos, C., Equitz, W., Flickner, M., Niblack, W., Petkovic, D., Barber, R.:
Efficient and effective querying by image content. journal of Intelligent Information
Retrieval 3 (1994) 231262
10. Rubner, Y., Tomasi, C., Guibas, L.: The earth movers distance as a metric for
image retrieval. Int. Journal of Computer Vision 40 (2000) 99121
11. Pass, G., Zabih, R.: Histogram refinement for content-based image retrieval. In:
3rd IEEE Workshop on Applications of Computer Vision. (1996) 96102
12. Huang, J., Kumar, S., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color
correlograms. In: IEEE Int. Conference Computer Vision and Pattern Recognition.
(1997) 762768
13. Cinque, L., Levialdi, S., Pellicano, A.: Color-based image retrieval using spatialchromatic histograms. In: IEEE Int. Conf. Multimedia Computing and Systems.
(1999) 969973
14. Ojala, T., Pietik
ainen, M., Harwood, D.: A comparative study for texture measures
with classification based on feature distributions. Pattern Recognition 29 (1996)
5159
15. Haralick, R.: Statistical and structural approaches to texture. Proceedings of the
IEEE 67 (1979) 786804
16. Jain, A., Vailaya, A.: Image retrieval using color and shape. Pattern Recognition
29 (1996) 12331244
17. Canny, J.: A computational approach to edge detection. PAMI 8 (1986) 679698
18. Hu, M.: Visual pattern recognition by moment invariants. IRE Transactions on
Information Theory 8 (1962) 179187
19. Schaefer, G.: Content-based image retrieval - advanced topics. In: Int. Conference
on Man-Machine Interactions. (2011) (in this volume).