Pattern Recognition 44 (2011) 2109–2122
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
Content-based image retrieval with relevance feedback using random walks
Samuel Rota Bulo , Massimo Rabbi, Marcello Pelillo
DAIS, Universita Ca’ Foscari Venezia, via Torino 155, 30172 Mestre Venezia, Italy
a r t i c l e i n f o
abstract
Article history:
Received 21 December 2010
Received in revised form
9 March 2011
Accepted 12 March 2011
Available online 21 March 2011
In this paper, we propose a novel approach to content-based image retrieval with relevance feedback,
which is based on the random walker algorithm introduced in the context of interactive image
segmentation. The idea is to treat the relevant and non-relevant images labeled by the user at every
feedback round as ‘‘seed’’ nodes for the random walker problem. The ranking score for each unlabeled
image is computed as the probability that a random walker starting from that image will reach a
relevant seed before encountering a non-relevant one. Our method is easy to implement, parameterfree and scales well to large datasets. Extensive experiments on different real datasets with several
image similarity measures show the superiority of our method over different recent approaches.
& 2011 Elsevier Ltd. All rights reserved.
Keywords:
Random walks
Content-based image retrieval
Relevance feedback
1. Introduction
The concept of relevance feedback, developed during the
1960s to improve document retrieval processes [1], consists of
using user feedback to judge the relevance of search results and
therefore improve their quality through iterative steps. This
technique has attracted the content-based image retrieval (CBIR)
community since the early 1990s and is still an active research
topic nowadays because, in contrast to text/document retrieval,
judging the relevance of an image for a user is an almost
instantaneous task. Moreover, by gathering feedbacks from the
user a CBIR system can dramatically boost its performance by
reducing the gap between the high-level semantics in the user’s
mind and low-level image descriptors.
Different feedback models have been proposed in the literature (see e.g., [2] for a review): positive feedback, which allows the
user to select only relevant (positive) images; positive–negative
feedback, where the user can specify both relevant and nonrelevant (negative) images; positive–neutral–negative feedback,
where also a neutral class is added among the user’s choices;
and feedback with (non)relevance degree, where the user implicitly ranks the images by specifying a degree of (non)relevance.
The new information inferred from the user can then be used
within a short-term-learning or long-term-learning process. The
former uses the user feedback only within the user’s query
context [3,4], while the latter updates the image similarities in
order to benefit from the feedback in future queries [5,6].
Corresponding author.
E-mail addresses: srotabul@dsi.unive.it (S. Rota Bulo),
mrabbi@dsi.unive.it (M. Rabbi), pelillo@dsi.unive.it (M. Pelillo).
0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2011.03.016
In order to take full advantage of the additional information
deriving from the user interaction, an effective learning method
should be adopted in order to identify relevant and non-relevant
images. Moreover, since not all images that have been classified
as relevant by the system can be inspected by the user an implicit
ranking of the relevant images is necessary. The approaches that
the literature offers can be divided into inductive and transductive ones according to whether unlabeled data is used in the
training stage or not [7]. The inductive approaches are principally
based on support vector machines (SVMs) [8] and boosting [9].
They basically solve a two-class (relevant and non-relevant)
classification problem and rank the images according to the
classification results. The main disadvantage of these approaches
is the low accuracy caused by the small sample size. Transductive
approaches overcome this problem by exploiting also the
information of the unlabeled data. Among them, we find manifold-ranking-based image retrieval (MRBIR) [7], which propagates
a ranking score across the unlabeled data to get the improved
retrieval result, Discriminant-EM [10], which constructs a
generative model by using the unlabeled data to measure
the relevance between query and database images, and multiple
random walk (MRW) [11], which creates two generative
models for the two classes of relevant and non-relevant images
by means of Markov random walks. Additionally, in [12] an
approach based on graph Laplacian is proposed, which allows to
learn the embedding of the manifold enclosing the dataset via
diffusion map.
In this paper, we propose a novel approach to CBIR with
relevance feedback, which is inspired by the random walker
algorithm for image segmentation introduced by Grady in [13].
Our approach is close in spirit to MRBIR and MRW as it casts the
CBIR problem with relevance feedback into a graph-theoretic
2110
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
problem, where nodes are images and image similarities represent the graph edge weights. The relevant and non-relevant
images labeled by the user at every feedback round are treated
as ‘‘seed’’ nodes for the random walker problem and the ranking
score at each unlabeled image is computed as the probability that
a random walker starting from that image will reach a relevant
seed before encountering a non-relevant one along the graph.
Among the positive properties of this formulation we have that
the algorithm is parameter-free, provided that image similarities
are given, easy to implement, and scales well to large datasets as it
works also with sparse graph abstractions of the data. Moreover,
although the presented approach is based on the positive–negative
relevance feedback model, it can be easily adapted to the
other models mentioned above. Extensive experiments on different
real datasets with several image similarity measures show the
superiority of our method over different recent approaches.
2. Random walks for CBIR with relevance feedback
The problem of CBIR with relevance feedback can be seen as
the problem of ranking a set of images in a way as to have images
visually consistent with a query image appearing earlier in the
ordering. The first K images in the ranking are presented to the
user, who has the opportunity of marking them as relevant or
non-relevant if not satisfied with the result. The user’s feedback
can then be used in order to bridge the semantic-gap between
what he perceives as similar and what the provided low-level
similarities classify as similar.
Since we will model CBIR as a graph-theoretic problem, we
start introducing some basic notions. A graph is a pair G ¼(V,E),
where V is the set of vertices (nodes) and ED V V is the set of
edges, each of which connects two vertices. A weighted graph
G ¼(V,E,w) is a graph with a weight function w : E-R þ , which
assigns a nonnegative weight to each edge in the graph. We will
denote by wij the weight associated to edge ði,jÞ A E. The
(weighted) adjacency matrix of G is given by W¼(wij), where we
assume wij ¼0 if ði,jÞ 2
= E, while the (weighted) Laplacian matrix of
G is given by L¼D W, where D ¼(dij) is a diagonal matrix with
P
dii ¼ j A V wij .
Consider a CBIR problem, where I ¼ fIi gN
i ¼ 0 is a set of N þ1
images, the first of which is the query image (i.e., I0), and f is a
‘‘low-level’’ similarity measure between two images. Each image
in I can be seen as a vertex of an edge-weighted graph G ¼(V,E,w),
where the edges set E consists of pairs of images for which a
weight is defined and the edge-weights reflect the similarities
among images, i.e., wuv ¼ fðIu ,Iv Þ. The vertex set V corresponds
thus to the index set of I and therefore each image Ij A I is related
to a vertex j A V, vertex 0 representing the query image. In the
sequel, we may refer to the elements of V as images. Beside the
graph G, which provides a static description of the problem, we
have to model the information deriving from the user interaction.
We formalize the user, who makes the query and is involved in
the feedback rounds, as a function C : V-f0,1g, which labels
images, and thus vertices of G, as relevant (1) or non-relevant (0).
Note that Cð0Þ ¼ 1 as the query image I0 is considered relevant for
the user. Let moreover VLðrÞ DV, r Z 0, be the subset of vertices that
have been labeled by the user within the first r feedback rounds.
Note that this set is always non-empty since initially VLð0Þ ¼ f0g,
i.e., it contains the query image. We will also make the mild
assumption that after the first feedback round, i.e., for r Z 1, at
least one non-relevant image appears in VLðrÞ .
Consider now a generic feedback round r 40. In order to take a
decision about a new ranking of the images based on the user’s
feedbacks, our image retrieval engine requires in input the graph
G, the set of labeled vertices VLðrÞ collected thus far and the user
function C, which provides the label information. A new ordering
is then produced by assigning a weight xiðrÞ to each vertex i A V and
Fig. 1. Examples of categories from three different benchmark datasets.
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
by sorting the corresponding images in descending weight order.
We compactly represent all the weights assigned at round r by a
(N þ1)-dimensional column vector x(r) called ranking vector.
A property that the ranking vectors x(r) must satisfy at every
round is not to violate the user’s feedbacks. To this end, we
impose the following conditions:
(a) 0 rxðrÞ
r 1, for all i A V,
i
(b) xiðrÞ ¼ CðiÞ, for all i A VLðrÞ ,
which guarantee that relevant images will always be top ranked,
while non-relevant ones will always be bottom ranked. Indeed,
xi ¼1 in the former case, while xi ¼0 in the latter. It is worth
noting that we are not interested in providing a relative ranking
of the relevant images, being considered of equal importance for
the user, and therefore they all have the same weight.
Our approach to CBIR with relevance feedback is based on the
idea of interpreting similarities as an indicator of two images to
be close within the ranking. This in terms of the ranking vector
2111
means that similar images will have similar weights, while
dissimilar one may have different weights. According to this
intuition and keeping conditions (a) and (b) in mind the solution
to our problem at feedback round r can be found by solving the
following convex optimization problem:
X
xðrÞ ¼ arg min
ðxi xj Þ2 wij ,
x
subject to
ði,jÞ A E
conditions ðaÞ and ðbÞ:
Note that each term of the energy function encloses the cost of
putting two images apart in the ordering and the higher the
similarity of the two images, the higher this cost will be. Hence,
similar images are forced to be close in the ranking. The
constraint set instead guarantees that the ranking vector will
not violate the user feedbacks. Note also that condition (a) can be
omitted, because it is easy to see that all weights are bound in the
interval [0,1]. Therefore, by removing condition (a) and rewriting
the energy function in matrix form our optimization problem
Fig. 2. Query example using our random walker algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are nonrelevant ones. (a) The query image used. (b) Results obtained after the initial k-NN execution. (c–e) Results obtained after different feedback rounds. (For interpretation of
the references to color in this figure legend, the reader is referred to the web version of this article.)
2112
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
becomes simply
ðrÞ
x
¼ arg min
x
>
x Lx, subject to
xi ¼ CðiÞ
for all
iA VLðrÞ :
ð1Þ
where L is the Laplacian matrix of G. Note that the constraint set
can be completely removed by substituting the fixed components
of the ranking vector in the energy function. This can be easily
seen if we opportunely reorder the vertex set in a way as to have
>
x> ¼ ½x>
U ,xM , where xU is a vector with the unknown ranking
weights of the unlabeled images, while xM is the vector with the
fixed ranking weights of the images marked by the user. Similarly,
the Laplacian matrix L can be block-structured as follows:
"
#
LUU LUM
:
L¼
LMU LMM
Then, the optimization problem in (1) becomes
"
#"
#
LUU LUM
xU
ðrÞ
T
T
:
x ¼ arg min ½xU xM
LMU LMM
xM
xU
Differentiation with respect to xU and finding the critical
point yields the following system of linear inequalities in the
unknowns xU:
LUU xU ¼ LUM xM ,
ð2Þ
which is nonsingular if the graph is connected or if every
connected component contains a labeled image [14]. The solution
of the ranking problem at each feedback round is thus obtained
by solving a simple system of linear equation. Moreover, if we
force the graph G to be sparse, by considering for instance a
k-nearest neighbor (k-NN) approximation, the solution could be
computed very efficiently, thus allowing our method to scale to
large datasets.
The formulation we obtain per feedback round is equivalent to
the random walker algorithm introduced by Grady on interactive
image segmentation [13], which is the problem of segmenting an
image into regions using seeds provided by the user. The focus,
however, is different, since interactive segmentation is, in its
Fig. 3. Query example using the feature re-weighting algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are
non-relevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
simplest form, a two-class classification problem, with typically a
one-shot user interaction and no ranking is involved, while in our
case we are aiming at obtaining a ranking of the images using
multiple user interactions.
The ranking vector x found as solution of (1) has an interesting
interpretation in terms of Markov random walks theory. Indeed,
each component xi is the probability that a random walker
starting from vertex i of G will reach a relevant image before
encountering a non-relevant one [15,16]. We refer to [13] for a
description of other connections to discrete potential theory and
the combinatorial Dirichlet problem.
Although the presented theory assumes a positive–negative
feedback model, it is straightforward to generalize it to other
models like the positive–neutral–negative model or the feedback
model with relevance degree, by simply replacing the user
function C. In the positive–neutral–negative case the range of
C would be f0,0:5,1g, 0.5 being the score associated to a neutral
2113
judgment, while we may have a continuous interval [0,1] (or a
quantization of it if discrete values are preferred) for models
where the user may specify a relevance degree. We may even
design a user function, which simulates a ‘‘hesitating’’ user, who
may change his opinion about feedbacks he previously provided
by simply replacing C.
3. The algorithm
In this section, we summarize our CBIR engine with relevance
feedback. The pseudocode of our approach is presented in Algorithm
1. Our method requires in input the graph G abstracting the CBIR
problem, where vertex 0 A V is assumed to be the query image, the
user function C, which encodes the user’s feedbacks, and a scope
size K, which represents the number of images that should be
presented to the user at each feedback round.
Fig. 4. Query example using the relevance score algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are nonrelevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
2114
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
Algorithm 1. Random walker for CBIR with relevance feedback.
Require: graph G ¼(V,E,w), user C, scope size K
1: {Initialization}
2: r’0
3:
4:
5:
6:
7:
8:
VLð0Þ ’f0g
S’ get the first K closest images to the query image
{Present images indexed by S to the user}
While user is not satisfied with S do
r’r þ1
{rth feedback round}
9:
10:
VLðrÞ ’VLðr1Þ [ S
xðrÞ ’ compute the solution of (1) using VLðrÞ and C
11: S’ get K top ranked vertices according to ranking
vector xðrÞ
12: {Present images indexed by S to the user}
13: end while
At lines 1–3 we set up the system by putting the round
counter r to zero, and by initializing the set of labeled images
VLð0Þ to a singleton with the query image f0g. Since our method
requires at least one non-relevant image to be specified, we can
either force the image having the lowest similarity to the query
image to be non-relevant, or we can present the user the K images
that are the most similar to the query image. At line 4 we opt for
the latter solution, although the former one may work as well,
and we store in the scope S the K images that will be then
presented to the user for gathering his feedback. At line 6 we
enter a loop of relevance feedback rounds, which will be interrupted as soon as the user is satisfied with the result. We assume
implicit user satisfaction if all the images in the scope are
considered relevant by the user, which formally happens when
CðiÞ ¼ 1 for all i A S. From lines 7–12 we start a new relevance
feedback round. Therefore we increment the round counter and
update the set of labeled images with all those in the scope
S. Note that at any moment we can get the user feedback on each
image in VL through the user function C. At line 10 we compute
Fig. 5. Query example using the relevance score stabilized algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed
are non-relevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of
the references to color in this figure legend, the reader is referred to the web version of this article.)
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
the ranking vector as the solution of (1), which involves solving
the system of linear equation (2). The K vertices with higher score
in the ranking vector are then stored in the scope S and presented
to the user for a new feedback round.
The proposed algorithm is simple and can be easily implemented. Moreover, there is no parameter that should be tuned.
Note also that, as previously pointed out, the per-round complexity of the algorithm is determined by step (10), which involves
solving a (possibly sparse) linear system of N equations. The
complexity of this task is in general O(N3) for a dense Laplacian
matrix and O(N2) for a sparse one, if we consider direct solvers.
However, we are not interested in finding an exact solution of (2),
but we want to discover the relative ordering of the components
of the solution. Therefore, iterative methods may become more
appealing, because they smoothly approach a solution and could
be stopped before convergence. Additionally, the ranking vector
obtained in a round can be used to initialize the iterative solver in
the next one. This allows to reduce the computational complexity
2115
up to an order of magnitude. We note finally that the running
time of our approach can be further boosted by adopting eigenvector precomputation techniques as described in [17].
4. Related works
Approaching the CBIR problem from a graph theoretic perspective, which involves directly or indirectly Markov random
walks, has already been done in the past, but in a different way.
The manifold-ranking algorithm (MRBIR) proposed by He et al.
[7] uses the idea of exploring the relationship among all images in
the database and measures the relevance between them and a
query image accordingly. This transductive approach represents
the images in the database as the vertices of a weighted graph.
The user’s relevance feedback is used to generate labeled
examples that help in propagating a ranking score for each image.
Fig. 6. Query example using the multiple random walk algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are
non-relevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
2116
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
Fig. 7. Plots of the average precision of different CBIR approaches on the Wang, Oliva and Caltech datasets with different image features and scope size 20.
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
Fig. 8. Plots of the average precision of different CBIR approaches on the Wang, Oliva and Caltech datasets with different image features and scope size 30.
2117
2118
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
Fig. 9. Plots of the average precision of different CBIR approaches on the Wang, Oliva and Caltech datasets with different image features and scope size 40.
2119
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
Oliva dataset
1
0.1
0.01
0.001
0.0001
1e-05
0
1
2 3 4 5 6
Feedback Rounds
7
8
Caltech dataset
100
10
1
0.1
0.01
0.001
0.0001
1e-05
Average time per round
Average time per round
Average time per round
Wang dataset
10
0
1
2 3 4 5 6
Feedback Rounds
7
8
1000
100
10
1
0.1
0.01
0.001
0
1
2 3 4 5 6
Feedback Rounds
7
8
Fig. 10. Average running time per round for the Wang, Oliva and Caltech datasets with the color histogram feature.
The MRBIR framework works with the only-positive as well as
positive-negative feedback models.
A further development of MRBIR by the same authors led to
the multiple random walks (MRW) approach [11], which is also
one of the methods we compared against in our experiments. The
authors’ idea is to use two Markov random walks to compute the
likelihoods for an image belong to the relevant/non-relevant class.
These likelihoods are estimated from the stationary distribution
of two Markov chains built upon the original graph of images
with an enlarged set of vertices, which include two (positive and
negative) additional absorbing boundaries. These estimations are
then refined by adopting an EM-like procedure.
As opposed to our approach, which is parameter-free, MRW
depends on a parameter a, which should be opportunely tuned.
Moreover, the EM-like refinement process requires a number of
iterative steps that should be pre-estimated. Similar parameters
can also be found in the previous MRBIR algorithm.
Finally, in [12] an approach has been proposed which is based
on graph Laplacian and allows to learn the embedding of the
manifold enclosing the dataset via diffusion map. The solution of
the ranking problem derives from an unconstrained minimization
problem, where the cost function is composed by a Laplacian
term governing the diffusion process and a regularization term
aimed at moving the solution towards the user’s preferences. In
contrast to this formulation, our method consists of a constrained
minimization problem, which can be seen as a limit case of the
one in [12]. Indeed, the regularizing term is replaced by constraints, which force the solution not to violate the user’s
feedback.
5. Experiments
We performed extensive experiments on real datasets with
different image similarity measures and compared against four
recent algorithms for CBIR with relevance feedback.
For all datasets, we computed image similarities based on the
Corel Image Features.1 As for the Oliva dataset, we considered one
additional feature, which has been introduced by the same
authors of this dataset. Summarizing, the following features have
been considered in our experiments:
Color histogram: the HSV color space is divided into 32
We normalized the feature vectors in a way as to have each
component in the range [0,1] following [22] and we used ‘1 -norm
to compute the dissimilarity between images. Similarities, where
needed, have been computed using a Gaussian kernel with s ¼ 1.
We compared our random walker (RW) based approach
against four different methods:
Feature re-weighting (FR): a method where the importance of
5.1. Experimental setting
In our experiments we used three different datasets. The first
dataset is the Wang dataset [18], which is a subset of the known
Corel dataset consisting of 1000 images grouped into 10 categories (100 images per category). The second dataset is the Oliva
dataset [19], which encompasses 2688 images divided into eight
categories. The last dataset is a subset of the Caltech-256 database
[20], including 4920 images divided into 43 categories. The
datasets are heterogeneous as they have different sizes and cover
different image domains as can be seen in Fig. 1.
subspaces (32 colors: eight ranges of H and four of S).
The density of each color in the image provides the values
for a 32-dimensional feature vector;
Color histogram layout: each image is partitioned into four
sub-images and a color histogram 4 2 is computed for each
sub-image. This yields a 32-dimensional feature vector
(H S sub-images ¼ 4 2 4);
Color moments: a nine-dimensional feature vector is computed
for each image by taking the mean, standard deviation and
skewness of each channel of the HSV color space over
the image;
Gray level co-occurrence matrix: a 20-dimensional feature
vector for each image is computed based on the gray level
co-occurrence matrix (GLCM) [21];
Global scene (GIST): a 60-dimensional feature vector is derived
from each image according to Oliva and Torralba’s holistic
model, which tries to represent real-world scenes using a new
set of spatial envelope properties [19].
the feature components that best describe the relevant images
category is emphasized [23];
Relevance score (RS): a score is computed for each image based
on the distances between the nearest non-relevant image and
the nearest relevant one [24];
Relevance score stabilized (RS-S): a variant of the relevance
score algorithm, which integrates the Bayesian query shift
framework [25];
Multiple random walk (MRW): for details see Section 4.
1
http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html.
2120
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
In the case of the Wang (1000 images) and Oliva (2688 images)
datasets, we evaluated the performances of the approaches (for
each combination of feature and dataset) over 500 simulated
queries, where the query images have been randomly sampled,
while for the Caltech dataset (4920 images) we reduced the
number of queries to 100 due to its large size. We measured the
quality of the results at each feedback round in terms of precision,
which is defined as
precision ¼
no: of relevant retrieved images
scope size
and we computed the average precisions obtained over all the
performed queries in all settings. In our experiments, we considered scope sizes of 20, 30 and 40.
Fig. 11. Comparison of the RW performance on the Oliva dataset with all features in the cases when a dense and sparse graph G is used. The sparse graph is a k-NN
approximation of the original graph G, where k¼ 20.
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
At each feedback round, all the unlabeled images within the
scope were automatically labeled using the ground truth in order
to simulate the user’s feedback.
2121
reports the results. Surprisingly, by using the sparse graph we
registered an overall increase in the average precision of our
approach, and a considerable reduction of the average running time.
5.2. Results
In Fig. 2, we provide an example of a query result obtained by
our algorithm on the Oliva dataset with the GLCM feature. We
show the results obtained at different feedback rounds. Green
framed images are relevant ones, while red ones are non-relevant.
Our approach performs well despite the very few relevant images
retrieved by the initial k-NN search. The precision goes up to 65%
at the 3rd round of relevance feedback and reaches 95% at the 6th
round. It is worth noting that although the GLCM feature provides
a poor description of the image, our method is able to improve the
performance within few feedback rounds. In Figs. 3–6 we report
also the results obtained on the same query by the other
competing approaches.
In Figs. 7–9, we summarize the results obtained in terms of
precision on all datasets, with all the considered features and
methods with scope sizes 20, 30 and 40, respectively. It is evident
from an inspection of all the plots that for all combinations of
datasets, features and scope sizes our method outperforms the
competitors. On the other hand, the feature re-weighting
approach turns out to be the worst method in all tests. We also
notice that it exhibits a stationary behavior after the 3rd or 4th
round of relevance feedback.
The overall results are better, as one could expect, on datasets
with narrow image domains and less categories, like in the case of
the Wang dataset, as opposed to larger ones like the Caltech
dataset. Indeed, our algorithm, which achieves the best results,
never exceeds 60% of precision. Moreover, the performances are
definitely affected by the choice of the image features used to
describe the whole image. This can be noticed in particular in the
Oliva dataset, where the GIST features allow our approach to
obtain very high precision scores after few feedback rounds.
From a global perspective, it becomes apparent that approaches
to CBIR with relevance feedback based on random walks are
promising. Indeed, the MRW approach is in many cases the second
best performing approach.
5.3. Running time
In the experiments presented in the previous subsection, we
worked with dense graphs. We run the experiments with MatLab
on a machine equipped with 8 Intel Xeon 2.33 GHz CPUs and 8 GB
RAM. In Fig. 10, we report the average running time per round
registered by each approach on the different datasets with the
color histogram feature. Our RW method outperforms remarkably
the other random-walk-based approach M-RW on all the datasets. Specifically, on the largest datasets (Oliva and Caltech) our
algorithm yields higher running time compared to FR, RS, RS-S.
However, this speed difference is justified by a significantly
higher precision as shown in the previous section. On the Wang
dataset, instead, our algorithm is competitive also in terms of
running time. Note that in Fig. 10 we do not report the results
obtained for each feature, since the influence of the feature
adopted on the running time is on average irrelevant.
A distinct feature of our approach, and in general of random-walkbased ones, is that it works even if we render the graph G sparse. This
allows us to considerably reduce the time needed to compute the
ranking vector. We performed preliminary experiments in order to
test the gains in terms of running time and the influence that the
graph approximation has on the precision of our RW approach.
Specifically, we run experiments on the Oliva dataset, by considering
all features, using a k-NN graph approximation with k¼20. Fig. 11
6. Conclusions
In this paper, we proposed a novel approach to CBIR with
relevance feedback, which is based on the random walker algorithm introduced in the context of interactive image segmentation. Relevant and non-relevant images labeled by the user at
every feedback round are used as ‘‘seed’’ nodes for the random
walker problem. Each unlabeled image is finally ranked according
to the probability that a random walker starting from that image
will reach a relevant seed before encountering a non-relevant one.
Our method is easy to implement, it has no parameters to tune
and scales well to large datasets. Extensive experiments on
different real datasets with several image similarity measures
have shown the superiority of the proposed method over different
recent approaches.
References
[1] J.J. Rocchio, Document retrieval systems — optimization and evaluation, Ph.D.
Thesis, Harvard Computational Laboratory, Harvard University, Cambridge, 1966.
[2] X.S. Zhou, T.S. Huang, Relevance feedback for image retrieval: a comprehensive review, Multimedia Syst. 8 (6) (2003) 536–544.
[3] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool
for interactive content-based image retrieval, IEEE Trans. Circuits Syst. Video
Technol. 8 (5) (1998) 644–655.
[4] A. Kushki, P. Androutsos, K.N. Plataniotis, A.N. Venetsanopoulos, Query
feedback for interactive image retrieval, IEEE Trans. Circuits Syst. Video
Technol. 14 (5) (2004) 644–655.
[5] J. Fournier, M. Cord, Long-term similarity learning in content-based image
retrieval, in: International Conference on Image Processing (ICIP), 2002, pp.
441–444.
[6] M. Cord, P.H. Gosselin, Image retrieval using long-term semantic learning, in:
International Conference on Image Processing (ICIP), 2006, pp. 2909–2912.
[7] J. He, M. Li, H. Zhang, H. Tong, C. Zhang, Manifold-ranking based image
retrieval, in: International Conference on Multimedia, 2004, pp. 9–16.
[8] L. Zhang, F. Lin, B. Zhang, Support vector machine learning for image retrieval,
in: International Conference on Image Processing (ICIP), 2001, pp. 721–724.
[9] K. Tieu, P. Viola, Boosting image retrieval, IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), vol. 1, 2000, pp. 228–235.
[10] Y. Wu, Q. Tian, T. Huang, Discriminant-EM algorithm with application to
image retrieval, International Conference on Image Processing (ICIP), vol. 1,
2000, pp. 155–162.
[11] J. He, H. Tong, M. Li, W.Y. Ma, C. Zhang, Multiple random walk and its
application in content-based image retrieval, in: International Workshop on
Multimedia Information Retrieval, 2005, pp. 151–158.
[12] H. Sahbi, P. Etyngier, J.Y. Audibert, R. Keriven, Manifold learning using robust
graph Laplacian for interactive image search, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
[13] L. Grady, Random walks for image segmentation, IEEE Trans. Pattern Anal.
Mach. Intell. 28 (11) (2006) 1768–1783.
[14] E. Mortensen, W. Barrett, Interactive segmentation with intelligent scissors,
Graph. Mod. Image Process. 60 (5) (1998) 349–384.
[15] S. Kakutani, Markov processes and the Dirichlet problem, Proc. Jpn. Acad. 21
(21) (1945) 227–233.
[16] P. Doyle, L. Snell, Random Walks and Eletric Networks, No. 22 in Carus
Mathematical Monographs, Mathematical Association of America, 1984.
[17] L. Grady, A.K. Sinop, Fast approximate random walker segmentation using
eigenvector precomputation, in: IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2008, pp. 1–8.
[18] G.W.J.Z. Wang, J. Li, SIMPLIcity: semantics-sensitive integrated matching for
picture LIbraries, IEEE Trans. Pattern Anal. Mach. Intell. 23 (9) (2001)
947–963.
[19] A. Oliva, A. Torralba, Modeling the shape of the Scene: a holistic representation of the spatial envelope, Int. J. Comput. Vision 42 (3) (2001) 145–175.
[20] G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset, Technical
Report 7694, California Institute of Technology, 2007.
[21] R.M. Haralick, K.S. Shanmugan, I. Dunstein, Textural features for image
classification, IEEE Trans. Syst. Man Cybern. 3 (6) (1973) 610–621.
[22] S. Aksoy, R.M. Haralick, Feature normalization and likelihood-based similarity measures for image retrieval, Pattern Recognition Lett. 22 (5) (2001)
563–582.
2122
S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122
[23] G. Das, S. Ray, C. Wilson, Feature re-weighting in content-based image
retrieval, in: International Conference on Image and Video Retrieval, 2006,
pp. 193–200.
[24] G. Giacinto, F. Roli, Instance-based relevance feedback for image retrieval,
Advances in Neural Information Processing Systems (NIPS), vol. 17, 2005, pp.
489–496.
[25] G. Giacinto, A nearest-neighbor approach to relevance feedback in content
based image retrieval, in: International Conference on Image and Video
Retrieval, 2007, pp. 456–463.
Samuel Rota Bulo He received the bachelor and master degrees (both summa cum laude) in Computer Science from the ‘‘Ca’ Foscari’’ University of Venice in 2003 and
2005, respectively, and the Ph.D. degree in Computer Science in 2009. Since 2009 he is postdoctoral researcher in the Computer Vision and Pattern Recognition group at
‘‘Ca’ Foscari’’ University of Venice. He held research visiting positions at the University of Vienna and at the IST Technical University of Lisbon. He worked as external
collaborator with the companies ‘‘System V S.r.l’’ in Mestre, Italy, and ‘‘Softcomet S.r.l.’’ in Treviso, Italy. He published technical papers in refereed journals and conference
proceedings in the areas of computer vision, pattern recognition, optimization, stochastic modeling and game theory.
Massimo Rabbi was born on September 24, 1981 in Padua, Italy. He received both the bachelor (summa cum laude) and master (110/110) degrees in Computer Science
from the ‘‘Ca’ Foscari’’ University of Venice, respectively, in 2004 and 2010. During his studies, he worked for an Italian IT Company ‘‘Lynx S.p.a.’’, mainly as J2EE developer
but also in projects involving Eclipse RCP and plug-ins technologies. He is actually working at ‘‘Finantix S.r.l’’, as Technical Specialist, developing an Eclipse based IDE and
dealing with frameworks like EMF/GEF/GMF. His main research interests, besides Java technologies, are computer security and forensics, networking and computer vision.
Marcello Pelillo joined in 1991 the faculty of the University of Bari, Italy, as an assistant professor of computer science. Since 1995, he has been with the University of
Venice, Italy, where he is currently a professor of Computer Science and leads the Computer Vision and Pattern Recognition group. He held visiting research positions at
Yale University, the University College London, McGill University, the University of Vienna, York University (UK), and the National ICT Australia (NICTA). Prof. Pelillo has
published more than a hundred technical papers in refereed journals, handbooks, and conference proceedings in the areas of computer vision, pattern recognition and
neural computation. He has been actively involved in the organization of several scientific meetings including the NIPS 99 Workshop on ‘‘Complexity and Neural
Computation: The Average and the Worst Case,’’ the 2008 International Workshop on Computer Vision and the ICML 2010 Workshop on ‘‘Learning in non-(geo)metric
spaces.’’ In 1997, he co-established a new series of international conferences devoted to energy minimization methods in computer vision and pattern recognition
(EMMCVPR), which has now reached the seventh edition. He was a guest coeditor of four journal special issues: two for IEEE Transactions on Pattern Analysis and Machine
Intelligence and two for Pattern Recognition, the last one, in 2006, being devoted to ‘‘similarity-based pattern recognition.’’ He serves on the editorial board for the journals
IEEE Transactions on Pattern Analysis and Machine Intelligence and Pattern Recognition, and is regularly on the program committees of the major international
conferences and workshops of his fields. He is (or has been) scientific coordinator of several research projects, including SIMBAD, an EU-FP7 project devoted to similaritybased pattern analysis and recognition. Prof. Pelillo is a Fellow of the IAPR and a Senior Member of the IEEE.