Content-based image retrieval with relevance feedback using random walks

Samuel Rota Bulò

Pattern Recognition 44 (2011) 2109–2122 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Content-based image retrieval with relevance feedback using random walks Samuel Rota Bulo , Massimo Rabbi, Marcello Pelillo DAIS, Universita Ca’ Foscari Venezia, via Torino 155, 30172 Mestre Venezia, Italy a r t i c l e i n f o abstract Article history: Received 21 December 2010 Received in revised form 9 March 2011 Accepted 12 March 2011 Available online 21 March 2011 In this paper, we propose a novel approach to content-based image retrieval with relevance feedback, which is based on the random walker algorithm introduced in the context of interactive image segmentation. The idea is to treat the relevant and non-relevant images labeled by the user at every feedback round as ‘‘seed’’ nodes for the random walker problem. The ranking score for each unlabeled image is computed as the probability that a random walker starting from that image will reach a relevant seed before encountering a non-relevant one. Our method is easy to implement, parameterfree and scales well to large datasets. Extensive experiments on different real datasets with several image similarity measures show the superiority of our method over different recent approaches. & 2011 Elsevier Ltd. All rights reserved. Keywords: Random walks Content-based image retrieval Relevance feedback 1. Introduction The concept of relevance feedback, developed during the 1960s to improve document retrieval processes [1], consists of using user feedback to judge the relevance of search results and therefore improve their quality through iterative steps. This technique has attracted the content-based image retrieval (CBIR) community since the early 1990s and is still an active research topic nowadays because, in contrast to text/document retrieval, judging the relevance of an image for a user is an almost instantaneous task. Moreover, by gathering feedbacks from the user a CBIR system can dramatically boost its performance by reducing the gap between the high-level semantics in the user’s mind and low-level image descriptors. Different feedback models have been proposed in the literature (see e.g., [2] for a review): positive feedback, which allows the user to select only relevant (positive) images; positive–negative feedback, where the user can specify both relevant and nonrelevant (negative) images; positive–neutral–negative feedback, where also a neutral class is added among the user’s choices; and feedback with (non)relevance degree, where the user implicitly ranks the images by specifying a degree of (non)relevance. The new information inferred from the user can then be used within a short-term-learning or long-term-learning process. The former uses the user feedback only within the user’s query context [3,4], while the latter updates the image similarities in order to beneﬁt from the feedback in future queries [5,6]. Corresponding author. E-mail addresses: srotabul@dsi.unive.it (S. Rota Bulo), mrabbi@dsi.unive.it (M. Rabbi), pelillo@dsi.unive.it (M. Pelillo). 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.03.016 In order to take full advantage of the additional information deriving from the user interaction, an effective learning method should be adopted in order to identify relevant and non-relevant images. Moreover, since not all images that have been classiﬁed as relevant by the system can be inspected by the user an implicit ranking of the relevant images is necessary. The approaches that the literature offers can be divided into inductive and transductive ones according to whether unlabeled data is used in the training stage or not [7]. The inductive approaches are principally based on support vector machines (SVMs) [8] and boosting [9]. They basically solve a two-class (relevant and non-relevant) classiﬁcation problem and rank the images according to the classiﬁcation results. The main disadvantage of these approaches is the low accuracy caused by the small sample size. Transductive approaches overcome this problem by exploiting also the information of the unlabeled data. Among them, we ﬁnd manifold-ranking-based image retrieval (MRBIR) [7], which propagates a ranking score across the unlabeled data to get the improved retrieval result, Discriminant-EM [10], which constructs a generative model by using the unlabeled data to measure the relevance between query and database images, and multiple random walk (MRW) [11], which creates two generative models for the two classes of relevant and non-relevant images by means of Markov random walks. Additionally, in [12] an approach based on graph Laplacian is proposed, which allows to learn the embedding of the manifold enclosing the dataset via diffusion map. In this paper, we propose a novel approach to CBIR with relevance feedback, which is inspired by the random walker algorithm for image segmentation introduced by Grady in [13]. Our approach is close in spirit to MRBIR and MRW as it casts the CBIR problem with relevance feedback into a graph-theoretic 2110 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 problem, where nodes are images and image similarities represent the graph edge weights. The relevant and non-relevant images labeled by the user at every feedback round are treated as ‘‘seed’’ nodes for the random walker problem and the ranking score at each unlabeled image is computed as the probability that a random walker starting from that image will reach a relevant seed before encountering a non-relevant one along the graph. Among the positive properties of this formulation we have that the algorithm is parameter-free, provided that image similarities are given, easy to implement, and scales well to large datasets as it works also with sparse graph abstractions of the data. Moreover, although the presented approach is based on the positive–negative relevance feedback model, it can be easily adapted to the other models mentioned above. Extensive experiments on different real datasets with several image similarity measures show the superiority of our method over different recent approaches. 2. Random walks for CBIR with relevance feedback The problem of CBIR with relevance feedback can be seen as the problem of ranking a set of images in a way as to have images visually consistent with a query image appearing earlier in the ordering. The ﬁrst K images in the ranking are presented to the user, who has the opportunity of marking them as relevant or non-relevant if not satisﬁed with the result. The user’s feedback can then be used in order to bridge the semantic-gap between what he perceives as similar and what the provided low-level similarities classify as similar. Since we will model CBIR as a graph-theoretic problem, we start introducing some basic notions. A graph is a pair G ¼(V,E), where V is the set of vertices (nodes) and ED V V is the set of edges, each of which connects two vertices. A weighted graph G ¼(V,E,w) is a graph with a weight function w : E-R þ , which assigns a nonnegative weight to each edge in the graph. We will denote by wij the weight associated to edge ði,jÞ A E. The (weighted) adjacency matrix of G is given by W¼(wij), where we assume wij ¼0 if ði,jÞ 2 = E, while the (weighted) Laplacian matrix of G is given by L¼D W, where D ¼(dij) is a diagonal matrix with P dii ¼ j A V wij . Consider a CBIR problem, where I ¼ fIi gN i ¼ 0 is a set of N þ1 images, the ﬁrst of which is the query image (i.e., I0), and f is a ‘‘low-level’’ similarity measure between two images. Each image in I can be seen as a vertex of an edge-weighted graph G ¼(V,E,w), where the edges set E consists of pairs of images for which a weight is deﬁned and the edge-weights reﬂect the similarities among images, i.e., wuv ¼ fðIu ,Iv Þ. The vertex set V corresponds thus to the index set of I and therefore each image Ij A I is related to a vertex j A V, vertex 0 representing the query image. In the sequel, we may refer to the elements of V as images. Beside the graph G, which provides a static description of the problem, we have to model the information deriving from the user interaction. We formalize the user, who makes the query and is involved in the feedback rounds, as a function C : V-f0,1g, which labels images, and thus vertices of G, as relevant (1) or non-relevant (0). Note that Cð0Þ ¼ 1 as the query image I0 is considered relevant for the user. Let moreover VLðrÞ DV, r Z 0, be the subset of vertices that have been labeled by the user within the ﬁrst r feedback rounds. Note that this set is always non-empty since initially VLð0Þ ¼ f0g, i.e., it contains the query image. We will also make the mild assumption that after the ﬁrst feedback round, i.e., for r Z 1, at least one non-relevant image appears in VLðrÞ . Consider now a generic feedback round r 40. In order to take a decision about a new ranking of the images based on the user’s feedbacks, our image retrieval engine requires in input the graph G, the set of labeled vertices VLðrÞ collected thus far and the user function C, which provides the label information. A new ordering is then produced by assigning a weight xiðrÞ to each vertex i A V and Fig. 1. Examples of categories from three different benchmark datasets. S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 by sorting the corresponding images in descending weight order. We compactly represent all the weights assigned at round r by a (N þ1)-dimensional column vector x(r) called ranking vector. A property that the ranking vectors x(r) must satisfy at every round is not to violate the user’s feedbacks. To this end, we impose the following conditions: (a) 0 rxðrÞ r 1, for all i A V, i (b) xiðrÞ ¼ CðiÞ, for all i A VLðrÞ , which guarantee that relevant images will always be top ranked, while non-relevant ones will always be bottom ranked. Indeed, xi ¼1 in the former case, while xi ¼0 in the latter. It is worth noting that we are not interested in providing a relative ranking of the relevant images, being considered of equal importance for the user, and therefore they all have the same weight. Our approach to CBIR with relevance feedback is based on the idea of interpreting similarities as an indicator of two images to be close within the ranking. This in terms of the ranking vector 2111 means that similar images will have similar weights, while dissimilar one may have different weights. According to this intuition and keeping conditions (a) and (b) in mind the solution to our problem at feedback round r can be found by solving the following convex optimization problem: X xðrÞ ¼ arg min ðxi xj Þ2 wij , x subject to ði,jÞ A E conditions ðaÞ and ðbÞ: Note that each term of the energy function encloses the cost of putting two images apart in the ordering and the higher the similarity of the two images, the higher this cost will be. Hence, similar images are forced to be close in the ranking. The constraint set instead guarantees that the ranking vector will not violate the user feedbacks. Note also that condition (a) can be omitted, because it is easy to see that all weights are bound in the interval [0,1]. Therefore, by removing condition (a) and rewriting the energy function in matrix form our optimization problem Fig. 2. Query example using our random walker algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are nonrelevant ones. (a) The query image used. (b) Results obtained after the initial k-NN execution. (c–e) Results obtained after different feedback rounds. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.) 2112 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 becomes simply ðrÞ x ¼ arg min x > x Lx, subject to xi ¼ CðiÞ for all iA VLðrÞ : ð1Þ where L is the Laplacian matrix of G. Note that the constraint set can be completely removed by substituting the ﬁxed components of the ranking vector in the energy function. This can be easily seen if we opportunely reorder the vertex set in a way as to have > x> ¼ ½x> U ,xM , where xU is a vector with the unknown ranking weights of the unlabeled images, while xM is the vector with the ﬁxed ranking weights of the images marked by the user. Similarly, the Laplacian matrix L can be block-structured as follows: " # LUU LUM : L¼ LMU LMM Then, the optimization problem in (1) becomes " #" # LUU LUM xU ðrÞ T T : x ¼ arg min ½xU xM LMU LMM xM xU Differentiation with respect to xU and ﬁnding the critical point yields the following system of linear inequalities in the unknowns xU: LUU xU ¼ LUM xM , ð2Þ which is nonsingular if the graph is connected or if every connected component contains a labeled image [14]. The solution of the ranking problem at each feedback round is thus obtained by solving a simple system of linear equation. Moreover, if we force the graph G to be sparse, by considering for instance a k-nearest neighbor (k-NN) approximation, the solution could be computed very efﬁciently, thus allowing our method to scale to large datasets. The formulation we obtain per feedback round is equivalent to the random walker algorithm introduced by Grady on interactive image segmentation [13], which is the problem of segmenting an image into regions using seeds provided by the user. The focus, however, is different, since interactive segmentation is, in its Fig. 3. Query example using the feature re-weighting algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are non-relevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.) S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 simplest form, a two-class classiﬁcation problem, with typically a one-shot user interaction and no ranking is involved, while in our case we are aiming at obtaining a ranking of the images using multiple user interactions. The ranking vector x found as solution of (1) has an interesting interpretation in terms of Markov random walks theory. Indeed, each component xi is the probability that a random walker starting from vertex i of G will reach a relevant image before encountering a non-relevant one [15,16]. We refer to [13] for a description of other connections to discrete potential theory and the combinatorial Dirichlet problem. Although the presented theory assumes a positive–negative feedback model, it is straightforward to generalize it to other models like the positive–neutral–negative model or the feedback model with relevance degree, by simply replacing the user function C. In the positive–neutral–negative case the range of C would be f0,0:5,1g, 0.5 being the score associated to a neutral 2113 judgment, while we may have a continuous interval [0,1] (or a quantization of it if discrete values are preferred) for models where the user may specify a relevance degree. We may even design a user function, which simulates a ‘‘hesitating’’ user, who may change his opinion about feedbacks he previously provided by simply replacing C. 3. The algorithm In this section, we summarize our CBIR engine with relevance feedback. The pseudocode of our approach is presented in Algorithm 1. Our method requires in input the graph G abstracting the CBIR problem, where vertex 0 A V is assumed to be the query image, the user function C, which encodes the user’s feedbacks, and a scope size K, which represents the number of images that should be presented to the user at each feedback round. Fig. 4. Query example using the relevance score algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are nonrelevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.) 2114 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 Algorithm 1. Random walker for CBIR with relevance feedback. Require: graph G ¼(V,E,w), user C, scope size K 1: {Initialization} 2: r’0 3: 4: 5: 6: 7: 8: VLð0Þ ’f0g S’ get the ﬁrst K closest images to the query image {Present images indexed by S to the user} While user is not satisﬁed with S do r’r þ1 {rth feedback round} 9: 10: VLðrÞ ’VLðr1Þ [ S xðrÞ ’ compute the solution of (1) using VLðrÞ and C 11: S’ get K top ranked vertices according to ranking vector xðrÞ 12: {Present images indexed by S to the user} 13: end while At lines 1–3 we set up the system by putting the round counter r to zero, and by initializing the set of labeled images VLð0Þ to a singleton with the query image f0g. Since our method requires at least one non-relevant image to be speciﬁed, we can either force the image having the lowest similarity to the query image to be non-relevant, or we can present the user the K images that are the most similar to the query image. At line 4 we opt for the latter solution, although the former one may work as well, and we store in the scope S the K images that will be then presented to the user for gathering his feedback. At line 6 we enter a loop of relevance feedback rounds, which will be interrupted as soon as the user is satisﬁed with the result. We assume implicit user satisfaction if all the images in the scope are considered relevant by the user, which formally happens when CðiÞ ¼ 1 for all i A S. From lines 7–12 we start a new relevance feedback round. Therefore we increment the round counter and update the set of labeled images with all those in the scope S. Note that at any moment we can get the user feedback on each image in VL through the user function C. At line 10 we compute Fig. 5. Query example using the relevance score stabilized algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are non-relevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.) S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 the ranking vector as the solution of (1), which involves solving the system of linear equation (2). The K vertices with higher score in the ranking vector are then stored in the scope S and presented to the user for a new feedback round. The proposed algorithm is simple and can be easily implemented. Moreover, there is no parameter that should be tuned. Note also that, as previously pointed out, the per-round complexity of the algorithm is determined by step (10), which involves solving a (possibly sparse) linear system of N equations. The complexity of this task is in general O(N3) for a dense Laplacian matrix and O(N2) for a sparse one, if we consider direct solvers. However, we are not interested in ﬁnding an exact solution of (2), but we want to discover the relative ordering of the components of the solution. Therefore, iterative methods may become more appealing, because they smoothly approach a solution and could be stopped before convergence. Additionally, the ranking vector obtained in a round can be used to initialize the iterative solver in the next one. This allows to reduce the computational complexity 2115 up to an order of magnitude. We note ﬁnally that the running time of our approach can be further boosted by adopting eigenvector precomputation techniques as described in [17]. 4. Related works Approaching the CBIR problem from a graph theoretic perspective, which involves directly or indirectly Markov random walks, has already been done in the past, but in a different way. The manifold-ranking algorithm (MRBIR) proposed by He et al. [7] uses the idea of exploring the relationship among all images in the database and measures the relevance between them and a query image accordingly. This transductive approach represents the images in the database as the vertices of a weighted graph. The user’s relevance feedback is used to generate labeled examples that help in propagating a ranking score for each image. Fig. 6. Query example using the multiple random walk algorithm on the Oliva dataset with GLCM feature: green framed images are relevant images, while red framed are non-relevant ones. (a) The query image used. (b) Initial results obtained by the algorithm. (c–e) Results obtained after different feedback rounds. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.) 2116 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 Fig. 7. Plots of the average precision of different CBIR approaches on the Wang, Oliva and Caltech datasets with different image features and scope size 20. S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 Fig. 8. Plots of the average precision of different CBIR approaches on the Wang, Oliva and Caltech datasets with different image features and scope size 30. 2117 2118 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 Fig. 9. Plots of the average precision of different CBIR approaches on the Wang, Oliva and Caltech datasets with different image features and scope size 40. 2119 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 Oliva dataset 1 0.1 0.01 0.001 0.0001 1e-05 0 1 2 3 4 5 6 Feedback Rounds 7 8 Caltech dataset 100 10 1 0.1 0.01 0.001 0.0001 1e-05 Average time per round Average time per round Average time per round Wang dataset 10 0 1 2 3 4 5 6 Feedback Rounds 7 8 1000 100 10 1 0.1 0.01 0.001 0 1 2 3 4 5 6 Feedback Rounds 7 8 Fig. 10. Average running time per round for the Wang, Oliva and Caltech datasets with the color histogram feature. The MRBIR framework works with the only-positive as well as positive-negative feedback models. A further development of MRBIR by the same authors led to the multiple random walks (MRW) approach [11], which is also one of the methods we compared against in our experiments. The authors’ idea is to use two Markov random walks to compute the likelihoods for an image belong to the relevant/non-relevant class. These likelihoods are estimated from the stationary distribution of two Markov chains built upon the original graph of images with an enlarged set of vertices, which include two (positive and negative) additional absorbing boundaries. These estimations are then reﬁned by adopting an EM-like procedure. As opposed to our approach, which is parameter-free, MRW depends on a parameter a, which should be opportunely tuned. Moreover, the EM-like reﬁnement process requires a number of iterative steps that should be pre-estimated. Similar parameters can also be found in the previous MRBIR algorithm. Finally, in [12] an approach has been proposed which is based on graph Laplacian and allows to learn the embedding of the manifold enclosing the dataset via diffusion map. The solution of the ranking problem derives from an unconstrained minimization problem, where the cost function is composed by a Laplacian term governing the diffusion process and a regularization term aimed at moving the solution towards the user’s preferences. In contrast to this formulation, our method consists of a constrained minimization problem, which can be seen as a limit case of the one in [12]. Indeed, the regularizing term is replaced by constraints, which force the solution not to violate the user’s feedback. 5. Experiments We performed extensive experiments on real datasets with different image similarity measures and compared against four recent algorithms for CBIR with relevance feedback. For all datasets, we computed image similarities based on the Corel Image Features.1 As for the Oliva dataset, we considered one additional feature, which has been introduced by the same authors of this dataset. Summarizing, the following features have been considered in our experiments: Color histogram: the HSV color space is divided into 32 We normalized the feature vectors in a way as to have each component in the range [0,1] following [22] and we used ‘1 -norm to compute the dissimilarity between images. Similarities, where needed, have been computed using a Gaussian kernel with s ¼ 1. We compared our random walker (RW) based approach against four different methods: Feature re-weighting (FR): a method where the importance of 5.1. Experimental setting In our experiments we used three different datasets. The ﬁrst dataset is the Wang dataset [18], which is a subset of the known Corel dataset consisting of 1000 images grouped into 10 categories (100 images per category). The second dataset is the Oliva dataset [19], which encompasses 2688 images divided into eight categories. The last dataset is a subset of the Caltech-256 database [20], including 4920 images divided into 43 categories. The datasets are heterogeneous as they have different sizes and cover different image domains as can be seen in Fig. 1. subspaces (32 colors: eight ranges of H and four of S). The density of each color in the image provides the values for a 32-dimensional feature vector; Color histogram layout: each image is partitioned into four sub-images and a color histogram 4 2 is computed for each sub-image. This yields a 32-dimensional feature vector (H S sub-images ¼ 4 2 4); Color moments: a nine-dimensional feature vector is computed for each image by taking the mean, standard deviation and skewness of each channel of the HSV color space over the image; Gray level co-occurrence matrix: a 20-dimensional feature vector for each image is computed based on the gray level co-occurrence matrix (GLCM) [21]; Global scene (GIST): a 60-dimensional feature vector is derived from each image according to Oliva and Torralba’s holistic model, which tries to represent real-world scenes using a new set of spatial envelope properties [19]. the feature components that best describe the relevant images category is emphasized [23]; Relevance score (RS): a score is computed for each image based on the distances between the nearest non-relevant image and the nearest relevant one [24]; Relevance score stabilized (RS-S): a variant of the relevance score algorithm, which integrates the Bayesian query shift framework [25]; Multiple random walk (MRW): for details see Section 4. 1 http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html. 2120 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 In the case of the Wang (1000 images) and Oliva (2688 images) datasets, we evaluated the performances of the approaches (for each combination of feature and dataset) over 500 simulated queries, where the query images have been randomly sampled, while for the Caltech dataset (4920 images) we reduced the number of queries to 100 due to its large size. We measured the quality of the results at each feedback round in terms of precision, which is deﬁned as precision ¼ no: of relevant retrieved images scope size and we computed the average precisions obtained over all the performed queries in all settings. In our experiments, we considered scope sizes of 20, 30 and 40. Fig. 11. Comparison of the RW performance on the Oliva dataset with all features in the cases when a dense and sparse graph G is used. The sparse graph is a k-NN approximation of the original graph G, where k¼ 20. S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 At each feedback round, all the unlabeled images within the scope were automatically labeled using the ground truth in order to simulate the user’s feedback. 2121 reports the results. Surprisingly, by using the sparse graph we registered an overall increase in the average precision of our approach, and a considerable reduction of the average running time. 5.2. Results In Fig. 2, we provide an example of a query result obtained by our algorithm on the Oliva dataset with the GLCM feature. We show the results obtained at different feedback rounds. Green framed images are relevant ones, while red ones are non-relevant. Our approach performs well despite the very few relevant images retrieved by the initial k-NN search. The precision goes up to 65% at the 3rd round of relevance feedback and reaches 95% at the 6th round. It is worth noting that although the GLCM feature provides a poor description of the image, our method is able to improve the performance within few feedback rounds. In Figs. 3–6 we report also the results obtained on the same query by the other competing approaches. In Figs. 7–9, we summarize the results obtained in terms of precision on all datasets, with all the considered features and methods with scope sizes 20, 30 and 40, respectively. It is evident from an inspection of all the plots that for all combinations of datasets, features and scope sizes our method outperforms the competitors. On the other hand, the feature re-weighting approach turns out to be the worst method in all tests. We also notice that it exhibits a stationary behavior after the 3rd or 4th round of relevance feedback. The overall results are better, as one could expect, on datasets with narrow image domains and less categories, like in the case of the Wang dataset, as opposed to larger ones like the Caltech dataset. Indeed, our algorithm, which achieves the best results, never exceeds 60% of precision. Moreover, the performances are deﬁnitely affected by the choice of the image features used to describe the whole image. This can be noticed in particular in the Oliva dataset, where the GIST features allow our approach to obtain very high precision scores after few feedback rounds. From a global perspective, it becomes apparent that approaches to CBIR with relevance feedback based on random walks are promising. Indeed, the MRW approach is in many cases the second best performing approach. 5.3. Running time In the experiments presented in the previous subsection, we worked with dense graphs. We run the experiments with MatLab on a machine equipped with 8 Intel Xeon 2.33 GHz CPUs and 8 GB RAM. In Fig. 10, we report the average running time per round registered by each approach on the different datasets with the color histogram feature. Our RW method outperforms remarkably the other random-walk-based approach M-RW on all the datasets. Speciﬁcally, on the largest datasets (Oliva and Caltech) our algorithm yields higher running time compared to FR, RS, RS-S. However, this speed difference is justiﬁed by a signiﬁcantly higher precision as shown in the previous section. On the Wang dataset, instead, our algorithm is competitive also in terms of running time. Note that in Fig. 10 we do not report the results obtained for each feature, since the inﬂuence of the feature adopted on the running time is on average irrelevant. A distinct feature of our approach, and in general of random-walkbased ones, is that it works even if we render the graph G sparse. This allows us to considerably reduce the time needed to compute the ranking vector. We performed preliminary experiments in order to test the gains in terms of running time and the inﬂuence that the graph approximation has on the precision of our RW approach. Speciﬁcally, we run experiments on the Oliva dataset, by considering all features, using a k-NN graph approximation with k¼20. Fig. 11 6. Conclusions In this paper, we proposed a novel approach to CBIR with relevance feedback, which is based on the random walker algorithm introduced in the context of interactive image segmentation. Relevant and non-relevant images labeled by the user at every feedback round are used as ‘‘seed’’ nodes for the random walker problem. Each unlabeled image is ﬁnally ranked according to the probability that a random walker starting from that image will reach a relevant seed before encountering a non-relevant one. Our method is easy to implement, it has no parameters to tune and scales well to large datasets. Extensive experiments on different real datasets with several image similarity measures have shown the superiority of the proposed method over different recent approaches. References [1] J.J. Rocchio, Document retrieval systems — optimization and evaluation, Ph.D. Thesis, Harvard Computational Laboratory, Harvard University, Cambridge, 1966. [2] X.S. Zhou, T.S. Huang, Relevance feedback for image retrieval: a comprehensive review, Multimedia Syst. 8 (6) (2003) 536–544. [3] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Trans. Circuits Syst. Video Technol. 8 (5) (1998) 644–655. [4] A. Kushki, P. Androutsos, K.N. Plataniotis, A.N. Venetsanopoulos, Query feedback for interactive image retrieval, IEEE Trans. Circuits Syst. Video Technol. 14 (5) (2004) 644–655. [5] J. Fournier, M. Cord, Long-term similarity learning in content-based image retrieval, in: International Conference on Image Processing (ICIP), 2002, pp. 441–444. [6] M. Cord, P.H. Gosselin, Image retrieval using long-term semantic learning, in: International Conference on Image Processing (ICIP), 2006, pp. 2909–2912. [7] J. He, M. Li, H. Zhang, H. Tong, C. Zhang, Manifold-ranking based image retrieval, in: International Conference on Multimedia, 2004, pp. 9–16. [8] L. Zhang, F. Lin, B. Zhang, Support vector machine learning for image retrieval, in: International Conference on Image Processing (ICIP), 2001, pp. 721–724. [9] K. Tieu, P. Viola, Boosting image retrieval, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2000, pp. 228–235. [10] Y. Wu, Q. Tian, T. Huang, Discriminant-EM algorithm with application to image retrieval, International Conference on Image Processing (ICIP), vol. 1, 2000, pp. 155–162. [11] J. He, H. Tong, M. Li, W.Y. Ma, C. Zhang, Multiple random walk and its application in content-based image retrieval, in: International Workshop on Multimedia Information Retrieval, 2005, pp. 151–158. [12] H. Sahbi, P. Etyngier, J.Y. Audibert, R. Keriven, Manifold learning using robust graph Laplacian for interactive image search, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8. [13] L. Grady, Random walks for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 28 (11) (2006) 1768–1783. [14] E. Mortensen, W. Barrett, Interactive segmentation with intelligent scissors, Graph. Mod. Image Process. 60 (5) (1998) 349–384. [15] S. Kakutani, Markov processes and the Dirichlet problem, Proc. Jpn. Acad. 21 (21) (1945) 227–233. [16] P. Doyle, L. Snell, Random Walks and Eletric Networks, No. 22 in Carus Mathematical Monographs, Mathematical Association of America, 1984. [17] L. Grady, A.K. Sinop, Fast approximate random walker segmentation using eigenvector precomputation, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8. [18] G.W.J.Z. Wang, J. Li, SIMPLIcity: semantics-sensitive integrated matching for picture LIbraries, IEEE Trans. Pattern Anal. Mach. Intell. 23 (9) (2001) 947–963. [19] A. Oliva, A. Torralba, Modeling the shape of the Scene: a holistic representation of the spatial envelope, Int. J. Comput. Vision 42 (3) (2001) 145–175. [20] G. Grifﬁn, A. Holub, P. Perona, Caltech-256 object category dataset, Technical Report 7694, California Institute of Technology, 2007. [21] R.M. Haralick, K.S. Shanmugan, I. Dunstein, Textural features for image classiﬁcation, IEEE Trans. Syst. Man Cybern. 3 (6) (1973) 610–621. [22] S. Aksoy, R.M. Haralick, Feature normalization and likelihood-based similarity measures for image retrieval, Pattern Recognition Lett. 22 (5) (2001) 563–582. 2122 S. Rota Bulo et al. / Pattern Recognition 44 (2011) 2109–2122 [23] G. Das, S. Ray, C. Wilson, Feature re-weighting in content-based image retrieval, in: International Conference on Image and Video Retrieval, 2006, pp. 193–200. [24] G. Giacinto, F. Roli, Instance-based relevance feedback for image retrieval, Advances in Neural Information Processing Systems (NIPS), vol. 17, 2005, pp. 489–496. [25] G. Giacinto, A nearest-neighbor approach to relevance feedback in content based image retrieval, in: International Conference on Image and Video Retrieval, 2007, pp. 456–463. Samuel Rota Bulo He received the bachelor and master degrees (both summa cum laude) in Computer Science from the ‘‘Ca’ Foscari’’ University of Venice in 2003 and 2005, respectively, and the Ph.D. degree in Computer Science in 2009. Since 2009 he is postdoctoral researcher in the Computer Vision and Pattern Recognition group at ‘‘Ca’ Foscari’’ University of Venice. He held research visiting positions at the University of Vienna and at the IST Technical University of Lisbon. He worked as external collaborator with the companies ‘‘System V S.r.l’’ in Mestre, Italy, and ‘‘Softcomet S.r.l.’’ in Treviso, Italy. He published technical papers in refereed journals and conference proceedings in the areas of computer vision, pattern recognition, optimization, stochastic modeling and game theory. Massimo Rabbi was born on September 24, 1981 in Padua, Italy. He received both the bachelor (summa cum laude) and master (110/110) degrees in Computer Science from the ‘‘Ca’ Foscari’’ University of Venice, respectively, in 2004 and 2010. During his studies, he worked for an Italian IT Company ‘‘Lynx S.p.a.’’, mainly as J2EE developer but also in projects involving Eclipse RCP and plug-ins technologies. He is actually working at ‘‘Finantix S.r.l’’, as Technical Specialist, developing an Eclipse based IDE and dealing with frameworks like EMF/GEF/GMF. His main research interests, besides Java technologies, are computer security and forensics, networking and computer vision. Marcello Pelillo joined in 1991 the faculty of the University of Bari, Italy, as an assistant professor of computer science. Since 1995, he has been with the University of Venice, Italy, where he is currently a professor of Computer Science and leads the Computer Vision and Pattern Recognition group. He held visiting research positions at Yale University, the University College London, McGill University, the University of Vienna, York University (UK), and the National ICT Australia (NICTA). Prof. Pelillo has published more than a hundred technical papers in refereed journals, handbooks, and conference proceedings in the areas of computer vision, pattern recognition and neural computation. He has been actively involved in the organization of several scientiﬁc meetings including the NIPS 99 Workshop on ‘‘Complexity and Neural Computation: The Average and the Worst Case,’’ the 2008 International Workshop on Computer Vision and the ICML 2010 Workshop on ‘‘Learning in non-(geo)metric spaces.’’ In 1997, he co-established a new series of international conferences devoted to energy minimization methods in computer vision and pattern recognition (EMMCVPR), which has now reached the seventh edition. He was a guest coeditor of four journal special issues: two for IEEE Transactions on Pattern Analysis and Machine Intelligence and two for Pattern Recognition, the last one, in 2006, being devoted to ‘‘similarity-based pattern recognition.’’ He serves on the editorial board for the journals IEEE Transactions on Pattern Analysis and Machine Intelligence and Pattern Recognition, and is regularly on the program committees of the major international conferences and workshops of his ﬁelds. He is (or has been) scientiﬁc coordinator of several research projects, including SIMBAD, an EU-FP7 project devoted to similaritybased pattern analysis and recognition. Prof. Pelillo is a Fellow of the IAPR and a Senior Member of the IEEE.

RELATED PAPERS

RELATED TOPICS

Log In

Content-based image retrieval with relevance feedback using random walks

Content-based image retrieval with relevance feedback using random walks

Related Papers

RELATED PAPERS

RELATED TOPICS