A Content-Based Image Retrieval Scheme Allowing for
Robust Automatic Personalization
Sotirios Chatzis
Anastasios Doulamis
Theodora Varvarigou
Electrical and Computer
Engineering Department
National Technical University
of Athens
15772, Zografos, Athens,
Greece
Electrical and Computer
Engineering Department
National Technical University
of Athens
15772, Zografos, Athens,
Greece
Electrical and Computer
Engineering Department
National Technical University
of Athens
15772, Zografos, Athens,
Greece
stchat@telecom.ntua.gr
adoulam@cs.ntua.gr
ABSTRACT
dora@telecom.ntua.gr
The unprecedented upsurge in multimedia databases has
set off multimedia information retrieval as an important research topic for many computer science communities [17].
One of the key aspects of multimedia information retrieval
is content-based image retrieval (CBIR). CBIR systems are
faced with two key-challenges stemming from the high levelsemantic nature and the subjectivity of the way humans perceive the content of images. The first one is the semantic
gap between the low-level visual features and high-level human perception [17]. Humans perceive the content of images
based on high-level semantic concepts. Despite the extensive efforts, the formulation of techniques and mathematical
models to effectively extract and represent this type of information based on the visual attributes of image pixels is
extremely laborious and a general scope approach has yet
to be proposed [20]. The second major challenge concerns
the subjectivity of human perception. The way humans determine the content of an image is a rather nebulous procedure. Characteristic of its ill-defined nature is the fact that
the same individual might perceive the same semantic entities at different times in a different manner, let alone the
case where different individuals are considered [14]. Hence,
personalization is one of the most important functions in designing successful CBIR systems, providing the mechanisms
to make the system adaptable to the individual perception
of its users [3].
Relevance feedback [13] provides the feasible means to
mitigate the semantic gap between low-level image features
and high-level semantic concepts by exploiting user-provided
information to create successful mappings of low-level image
features to high-level semantic concepts. Furthermore, relevance feedback allows for the effective resolution of the human perception subjectivity issue, allowing for the personalization of CBIR systems. Personalization of CBIR systems
can be attained by adapting the retrieval models and criteria they use, individually to the feedback provided by each
of their users. Relevance feedback techniques for CBIR systems have evolved [7] from earlier heuristic weighting techniques [14], to optimal learning [5] and the more recent machine learning techniques (e.g. [11, 19]).
The majority of the proposed relevance feedback techniques for CBIR systems regard the problem as a strict twoclass classification problem, with equal treatments on both
positive and negative examples. Although it is reasonable to
assume that positive examples of a semantic class follow a
The retrieval performance of content-based image retrieval
(CBIR) systems is often disappointingly low, mainly due
to the subjectivity of human perception. Relevance feedback (RF) has been widely considered as a powerful tool to
enhance CBIR systems by incorporating human perception
subjectivity into the retrieval procedure. However, usually,
the obtained feedback logs are scarce and contain lots of
outliers, undermining the RF adaptation effectiveness. In
this paper, we tackle these shortcomings exploiting the inherent outlier downweighting capabilities mixtures of Student’s t distributions offer. Each semantic class is modeled
by a mixture of t distributions fitted to data provided by
the system operators. Further, the semantic class models
get personalized by application of a novel, efficient RF algorithm allowing for the robust adaptation of the semantic
class models to the accumulated feedback of each user. The
efficacy of our approach is validated through a series of experiments using objective performance criteria.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—clustering, relevance feedback,
retrieval models
General Terms
Algorithms
Keywords
t distributions, mixture models, personalization
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CIVR’07, July 9–11, 2007, Amsterdam, The Netherlands
Copyright 2007 ACM 978-1-59593-733-9/07/0007 ...$5.00.
1
tion X j of the random sample X 1 , X 2 ,..., X n is generated
by a set (mixture) of t distributions, consisting of g components, with (prior) probabilities ci (i = 1, .., g), means µi ,
positive definite inner product matrices Σi , and νi degrees
of freedom, i.e
common distribution, this is not the case with negative examples, which usually belong to multiple classes. Hence, it is
almost impossible to estimate the real distribution of negative images based on the relevant feedback. Extensive efforts
have been made towards the attenuation of this shortcoming (e.g. [21], [4]). One of the recently proposed, promising
alternatives is the Gaussian mixture modeling of the distribution of relevant images. In [11], each semantic class
is represented by a Gaussian mixture model (GMM) fitted
using positive examples accumulated through the user feedback. In [18], negative examples are also utilized to apply a
similarity metric adaptation strategy.
GMMs have been widely used by the pattern recognition and machine learning communities. Their popularity
stems from the fact that they provide a sound statistical
framework for the approximation of unknown, non-Gaussian
distributions, including distributions with multiple modes.
However, GMMs suffer by a significant drawback: the estimation of their parameters can be severely affected by the
presence of outliers in the data sample used. The problem of providing protection against outliers in multivariate
data is a very difficult problem and increases in difficulty
with the dimension of the data [12]. The replacement of
Gaussian distributions with the, longer-tailed, Student’s t
distributions has been proposed recently as a way to successfully overcome these hurdles [12, 16], providing a mathematically sound mechanism for the effective conduction of
outlier downweighting, within a well-founded statistical context.
Relevance feedback algorithms assume that their users
are consistent when performing their relevance judgements.
However, user consistency is extremely difficult to achieve.
In general, users tend to provide limited and conflicting
judgements during relevance feedback iterations, resulting in
scarce feedback logs containing lots of outliers, that might
affect severely the efficacy of a GMM-based CBIR framework. To effectively address these shortcomings, we propose in this paper the use of mixtures of t densities to represent the distributions of the images corresponding to the
considered semantic classes. Further, we apply a novel relevance feedback algorithm to adapt these models to the userprovided feedback. Performing this procedure separately for
each user, we achieve the robust adaptation of the retrieval
procedure of our CBIR system to the perception of each
user, achieving the successful mitigation of the semantic gap
issue in conjunction with the effective and efficient personalization of our CBIR system. We evaluate the efficacy of
our methodology using a real-life database on the basis of
objective performance criteria. The remainder of this paper
is organized as follows: In Section 2, we briefly provide the
theoretical background of tMMs. In Section 3, we provide
a comprehensive description of the proposed CBIR framework. In Section 4, the experimental evaluation of our system is conducted and discussed. Section 5 concludes this
paper.
X j ∼ t(µi , Σi , νi ) with probability ci
(1)
and hence
f (xj ; Θ) =
g
X
ci f (xj ; µi , Σi , νi )
(2)
i=1
where t(µ, Σ, ν) is a t distribution with mean µ, inner product matrix Σ, and ν degrees of freedom; the pdf of t(µ, Σ, ν)
is given by
`
´
Γ ν+p
|Σ|−1/2
2
f (xj ; µ,Σ, ν) =
(πν)p/2 Γ (ν/2){1 + d(xj , µ; Σ)/ν}(ν+p)/2
(3)
where, d(xj , µ; Σ) is the squared Mahalanobis distance
d(xj , µ; Σ) = (xj − µ)T Σ−1 (xj − µ)
and Θ comprises the ci , the νi , and the elements of µi and
Σi . Alternatively, using the properties of t distributions, we
obtain that
X j |uj ∼ N (µi , Σi /uj )
(4)
where the scalar Uj is a random variable such that Uj ∼
Γ (ν/2, ν/2), Γ (α, β) is the gamma distribution, with pdf
−u/β
p(u; α, β) = uα−1 βeα Γ (a) , and N (µ, Σ) stands for a normal
distribution with mean µ and covariance matrix Σ. The
value uj of the random variable Uj is used as the weighting
factor of the data during the estimation procedure of the
model parameters and hence, is the factor downweighting
the outliers in the model parameters estimation.
2.2
ML Estimation of tMM Parameters
The Maximum Likelihood (ML) treatment of a tMM comprises the calculation of the ML estimator Θ̂ of the model
parameters vector Θ given a random sample of fitting data.
The Expectation-Maximization (EM) algorithm is a powerful iterative procedure for the computational conduction of
the ML treatment of statistical models. The ML treatment
of tMMs has been conducted by Peel et al. in [12].
The EM algorithm comprises the maximization of an intermediate quantity, the conditional expected value of the
complete data log-likelihood, given the random sample x =
(x1 , ..., xn ). Here, for each j = 1, ..., n, the datum xj is
viewed as a partial observation of the “complete” data, and
we let the missing data be the scalars u1 ,..., un , and the
component-indicator vectors z 1 ,..,z n , where z j = (zij ) and
zij = 1 if X j is viewed as deriving from the i-th component
density of the model, zij = 0 otherwise. Then, the complete
data log-likelihood is given by
logLc (Θ) =
2. THEORETICAL BACKGROUND: MIXTURE MODELS OF T DISTRIBUTIONS
g
n
X
X
zij log{ci f (xj ; µi , Σi , νi )}
(5)
i=1 j=1
2.2.1 E-step
2.1
t-distributed Mixture Models (tMMs)
We let X 1 „..., X n denote a random sample of size n on a
p-dimensional random vector. We assume that each observa-
The E-step on the (k +1)-th iteration requires the calculation of the conditional expectation Q(Θ; Θ(k) ) of the
complete-data log likelihood (5) given the random sample
2
3. THE PROPOSED FRAMEWORK
x, using the current estimator Θ(k) for Θ
Q(Θ; Θ(k) ) = E(logLc (Θ)|x; Θ(k) )
(6)
3.1
As it has been shown in [12], the estimation of this quantity
is reduced to the computation of the component distribution
membership posterior probabilities of the data
(k)
(k)
(k)
The images residing in the database of a CBIR system
can be viewed as belonging to different semantic classes.
Such semantic classes might be, for example, building, sunset, elephant, and so forth. In the proposed framework, the
operators define initially a number of semantic classes that
the images residing in its database shall be classified into.
Further, for each semantic class of image content an appropriate training dataset is selected by the system operators
and used to fit a tMM model, given by (2). These datasets
comprise the feature vectors of images considered as characteristic of each semantic class and consist of common image
descriptors, such as color, texture and shape. The tMM
models fitted to these data shall be referred to as the global
models of the corresponding semantic classes. The estimation of the tMM model representing each semantic class is
conducted under an ML framework using the EM algorithm
presented in section 2.2.
The trained tMM models are used for the classification of
the images residing in our system’s database into the considered semantic classes. In detail, initially, each image residing
in the database of our system is processed, under the same
procedure as the one applied to the training datasets, to extract its feature vector. Further, using the derived feature
vectors, the images are classified into the considered semantic classes represented by the fitted tMMs, on the basis of
a Maximum A Posteriori probability (MAP) classification
procedure.
To conduct content-based image retrieval, the user of our
system is asked to enter a query image. The system processes the query image in the same way it processes the images residing in its database to extract their feature vectors
and classifies it in a semantic image class using the MAP
classification methodology, as described above. Finally, the
system returns to the user the top M images residing in
its database that have been classified to the same semantic
class as the query image, ranked in a descending order of
their likelihood with respect to the model of the class they
belong to.
A basic aspect of the proposed CBIR system is the notion
of personalization. Personalization is effected in our system
by adapting the global semantic class models to the relevance feedback provided by each user. The benefits our system yields by the application of this procedure are twofold.
First, the system exploits user interaction to help improve
its mapping of low-level visual image features to high-level
semantic concepts, and hence, mitigate the semantic gap issue. Second, the application of the relevance feedback procedure on a per user basis provides the effective means for the
robust adaptation of the system retrieval procedure to the
individual perception of each user (system personalization).
The relevance feedback adaptation of the global models,
applied to acquire the target distributions of each user, is
conducted on the basis of a novel relevance feedback algorithm for tMMs, that we shall describe in the following
subsection. The proposed relevance feedback algorithm, exploiting the merits of t distributions, offers a robust against
outliers model updating mechanism, based on a well founded
statistical concept. This way, we offer a sound mathematical
framework for the attenuation of the well-known issues faced
(k)
c f (xj ; µi , Σi , νi )
(k)
rij = E(Zij |xj ; Θ(k) ) = Pg i (k)
(k)
(k)
(k)
h=1 ch f (xj ; µh , Σh , νh )
(7)
and of the posterior expected values of the scalars Uj given
the component distributions they derive from
(k)
νi
(k)
uij = E(Uj |xj , zij = 1; Θ(k) ) =
(k)
νi
+p
(k)
(k)
+ d(xj , µi ; Σi )
(8)
2.2.2 M-step
On the M-step on the (k +1)-th iteration the expressions
(k+1)
(k+1)
(k+1)
(k+1)
of ci
, µi
, νi
and Σi
are computed by max(k)
imizing Q(Θ; Θ ) over each one of them. As it has been
shown in [12] this yields
(k+1)
=
ci
n
X
(k)
(9)
rij /n (i = 1, ..., g)
j=1
(k+1)
µi
n
X
=
(k) (k)
rij uij xj /
j=1
n
X
(k) (k)
(10)
rij uij
j=1
and
(k)
Σi
(k) (k)
Pn
j=1
=
(k)
(k+1)
while the estimator of νi
−ψ
“ν ”
i
2
+ log
+ 1 + Pn
(k)
rij uij (xj − µi )(xj − µi )T
Pn
(k)
j=1 rij
“ν ”
i
2
+ψ
1
j=1
(k)
rij
n
X
(11)
is the solution of the equation
!
!
(k)
(k)
νi + p
νi + p
− log
+
2
2
(k)
rij
j=1
“
”
(k)
(k)
loguij − uij = 0
(12)
where ψ(s) is the digamma function, ψ(s) = ∂logΓ (s)/∂s.
The solution of this equation does not exist in closed form
[16]. However, a good closed form approximation of its solution would suffice for the effective and efficient estimation
of a tMM. In this work we adopt a successful approximation of (12) presented in [16], which under the assumption
νη = νζ = ν ∀ζ 6= η = 1, .., g, gives ν as
«««
„
„
„
2.1971
ν (k+1) = 0.0416 1 + erf 0.6594log
τ + logτ − 1
2
(13)
+
τ + logτ − 1
where τ is defined as
τ ,−
«
» „ (k)
g
n
1 X X (k)
ν +p
+
rij ψ
n i=1 j=1
2
+ log
2
(k)
(k)
ν (k) + d(xj , µi ; Σi )
!
−
(k)
uij
#
Semantic Class Modeling and Progressive
Learning Process
(14)
3
(data X 1 , ..., X n ) and the subsequent replacement of the
initial model (2) means and prior probabilities (mixing proportions) by the newly estimated ones.
by relevance feedback algorithms, concerning the scarcity of
user feedback logs and the inconsistency of the way users
provide their feedback.
3.2
3.2.1 E-Step
Relevance Feedback Algorithm
The E-step on the (k +1)-th iteration of the relevance feedback adaptation of the tMM model (2) requires the calculation of the intermediate quantity Q(Ψ; Ψ(k) ), where
Given a query image q, our system assigns it to a semantic
image class under a MAP classification notion, as described
in section 3.1. The tMM model representing this class is
further adapted to the positive examples provided by the
user during the relevance feedback iterations, to acquire the
user’s target distribution of this semantic class. In this paper, we propose a novel algorithm for the relevance feedback
adaptation of tMM models. Let us consider a tMM model
representing a semantic class given by equation (2)
f (xj ; Θ) =
g
X
Q(Ψ; Ψ(k) ) = E(logLc (Ψ)|x; Ψ(k) )
is the conditional expectation of the complete-data log likelihood (16) given the random sample x, using Ψ(k) for Ψ.
From (17) and (16) we obtain (see Appendix A) that the
estimation of (17) is reduced to the computation of the
component-distribution membership posterior probabilities
of the relevant data
ci f (xj ; µi , Σi , νi )
i=1
g
X
ci f (xj ; Ai µi + bi , Σi , νi )
(k)
(k)
rij = Pg
with priors ci , νi degrees of freedom, means µi and positive
definite inner product matrices Σi . The proposed algorithm
comprises the EM fitting of the model
f (xj ; Ψ) =
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
ch f (xj ; Ah µh + bh , Σh , νh )
(18)
and of the posterior expected values of the scalars Uj of these
data given the component distributions they derive from
(15)
(k)
νi
(k)
uij =
where ci , µi , Σi and νi , i = 1, ..., g are the prior probabilities, means, inner product matrices and degrees of freedom of the considered tMM model of the form (2), respectively. Considering as the complete data the observations
xj , the scalars u1 ,..., un , and the component-indicator vectors z 1 ,..,z n , we yield that the expression of the complete
data log-likelihood of the model (15) is given by
g
n
X
X
(k)
ci f (xj ; Ai µi + bi , Σi , νi )
h=1
i=1
logLc (Ψ) =
(17)
(k)
νi
+p
(k)
(k)
(19)
(k)
+ d(xj , Ai µi + bi ; Σi )
3.2.2 M-Step
On the M-step on the (k +1)-th iteration of the relevance
(k+1)
(k+1)
feedback adaptation, the expressions of ci
, Ai
and
(k+1)
(k)
bi
are computed by maximizing Q(Ψ; Ψ ) over each
one of them. It can be shown that this procedure yields (see
Appendix B)
zij log{ci f (xj ; Ai µi + bi , Σi , νi )} (16)
(k+1)
ci
i=1 j=1
=
n
X
(k)
(20)
rij /n (i = 1, ..., g)
j=1
where the parameter vector Ψ comprises the ci along with
the elements of the Ai and bi , where Ai is a diagonal p × p
matrix and bi is a p × 1 vector. The estimation of Ψ is conducted by fitting the model (15) to the relevance feedback
data.
Hence, we introduce an affine probabilistic model for the
adaptation of the means and the priors of each component
distribution of the initial tMM corresponding to a semantic
class, while considering the Σi and the νi fixed to their initial
values. Our approach of updating only the prior probabilities and the means of the component distributions is aimed
to allow for the robust and efficient adaptation of the initial
tMMs, given the very limited number of relevant feedback
data, avoiding overfitting effects, and is motivated by results
from speech processing research literature, indicating that
the major discriminative information of a mixture model is
retained by the mean vectors instead of the covariance matrices [9], and also under the consideration that the determination of the exact value of the component-distribution
degrees of freedom is not of vital importance for the effective
estimation of a tMM, especially in cases of big values for the
degrees of freedom [16].
Let us suppose that the relevance feedback data provided
by some user regarding a semantic class of image content
comprises the feature vectors X 1 , ..., X n . Then, the proposed relevance feedback algorithm for the adaptation of the
corresponding tMM model to these data, comprises the EM
fitting of the model (15) to the provided relevant feedback
(k+1)
bi
=
n
X
(k) (k)
rij uij
j=1
(k+1)
Ai
= diag
n
“
” X
(k)
(k) (k)
xj − Ai µi /
rij uij
(" n
X
(k) (k)
rij uij (xj
−
j=1
×
" n
X
j=1
(21)
j=1
(k) (k)
rij uij µi µTi
(k+1)
bi
)µTi
#
#−1 )
×
(22)
3.2.3 Model Parameters Update
After the convergence of the EM fitting of model (15) to
the logged feedback data, the parameters of the initial tMM
of type (2), modeling the semantic class under consideration,
are updated as follows:
1. The component distribution priors are updated to the
newly computed ones (eq. (20)).
2. The means µi are updated to Âi µi + bˆi , where Âi and
bˆi are the estimators obtained by the RF model fitting
(eq. (21),(22)).
3. All the other parameters of the initial model remain
fixed to their initial values.
4. EXPERIMENTAL EVALUATION
4
4.1
Experimental Setup
retrieval precision of the system is then given by
A subset of the database of the National Technical University of Athens, including still images in JPEG format is used
in our case for conducting the experiments. The overall data
set consists of around 5 000 data (images) covering a wide
variety of content. All data have been human annotated
by domain professionals and classified into 12 categories according to their content, which namely are spaceships, tigers
and lions, cars, dogs, fishes, cats, airplanes, buildings, boats
and ships, birds, flags and coins. Fig. 1 illustrates some
randomly selected data from the categories “Birds”, “Cars”
and “Airplanes” of our database.
Two different types of descriptors are used for the representation of the visual content of the used images; the
global-based descriptors, referring to global visual characteristics, and the object-based descriptors, exploiting regionbased properties obtained by applying a segmentation algorithm. The used global-based image descriptors, comprise
global color and texture. The color feature extracted in
our experiments is the color moment. We prefer it due to
the fact that it is close to natural human perception, whose
effectiveness in CBIR has been shown in many previous research studies [18]. Three different color moments are used:
color mean, color variance, and color skewness in each color
channel (H, S, and V), respectively. Texture information is
extracted by employing the wavelet-based texture extraction
technique [10]. The considered object-based image descriptors are extracted by conducting color segmentation using
a multiresolution implementation of the Recursive Shortest
Spanning Tree algorithm (RSST). RSST is preferred due to
its efficiency and low computational complexity [2]. For each
color segment, the average color, size and segment location
are extracted as appropriate descriptors. Each one of the
considered features characterizes the type of image content
in a unique, powerful way. These global-based and objectbased features are eventually combined by our system into a
feature vector which is normalized into a standardized normal distribution.
Initially, we use the 30% of the available human annotated
images, classified to the considered semantic classes, to fit a
tMM per semantic class, obtaining the global models of the
considered semantic classes. Further, we classify the rest
of the images of our database to the considered semantic
classes using the trained tMMs under a MAP classification
notion, as we have explained in section 3.1. Finally, we conduct a series of relevance feedback iterations regarding each
semantic class to evaluate the efficacy of the proposed relevance feedback algorithm. The assessment of the retrieval
performance of the proposed system is conducted using two
objective evaluation criteria, the Precision-Recall Curve and
the Average Normalized Modified Retrieval Rank (ANMRR)
measure.
4.2
Q
1 X N (q)
P¯r =
Q q=1 M (q)
(23)
On the other hand, the retrieval recall Re(q) of a system with
respect to a query q is the ratio of the number of retrieved
relevant images, N (q), over the total number of relevant
images in the database for the respective query, G(q) [15].
Given a set of Q queries, the average retrieval recall of the
system is then given by
Q
X
N (q)
¯ = 1
Re
Q q=1 G(q)
(24)
It is a common place in information retrieval literature that
in practical content-based retrieval systems, as the number
of images returned to the user increases, precision decreases,
while recall increases. Due to this fact, instead of using average precision or recall as separate performance measures for
CBIR systems, the precision-recall curve is usually adopted.
4.2.2 Average Normalized Modified Retrieval Rank
(ANMRR) Criterion
Another popular quantitative criterion is the ANMRR
measure, derived from the MPEG-7 core experiment [1].
ANMRR is an estimation of the number of relevant images retrieved and of their ranking among the retrievals. To
define the ANMRR measure, we have firstly to define the
Average Retrieval Rank (ARR). Given a query q, the ARR
measure is defined as
G(q)
ARR(q) =
X r(i)
G(q)
i=1
(25)
where r(i), i = 1, ..., G(q) is the ranking of all relevant images returned in the top M retrievals and the value of M + 1
of all the missed relevant images. The measure M is defined
as M = min{4∗G(q), 2∗GT M }, where GT M = max{G(q)}
over all Q queries submitted to the system. Then, the Modified Retrieval Rank (MRR) is defined as
M RR(q) = ARR(q) −
1
(G(q) + 1)
2
(26)
The MRR metric is further normalized to the range [0, 1]
yielding the Normalized Modified Retrieval Rank (NMRR)
N M RR(q) =
M RR(q)
M + 0.5 − 0.5G(q)
(27)
Finally, the Average Normalized Modified Retrieval Rank
(ANMRR) is defined as the average NMRR over the set of
all available queries Q, yielding an effective overall retrieval
performance criterion
Objective Evaluation Criteria
AN M RR =
4.2.1 Precision-Recall Curve
Q
1 X
N M RR(q)
Q i=1
(28)
Low values of ANMRR denote a high retrieval rate, with
the relevant images ranked at the top. On the other hand, a
value of ANMRR equal to one represents the worst possible
retrieval performance with none of the relevant items in the
database being present in the top retrievals.
The retrieval precision P r(q) of a system with respect to
a query q is defined as the ratio of the number of retrieved
relevant images, N (q), over the number of total retrieved
images, M (q) [15]. Given a set of Q queries, the average
5
(a)
(b)
(c)
Figure 1: Representative images of three different categories of our image database. (a) “Birds” category.
(b) “Cars” category. (c) “Airplanes” category.
(a)
(b)
Figure 2: (a) Precision-Recall curves and (b) Precision values versus the number of feedback iterations, for
the methods presented in [6], [8] and the proposed novel method
6
tation algorithm for tMMs.
Table 1: ANMRR Measure of the Proposed Scheme
Compared With Other Works for Relevance Feedback
Relevance Feedback Algorithms ANMRR
The proposed method
0.038%
The method of [6]
0.07%
The method of [8]
0.19%
4.3
The experimental evaluation of the proposed framework,
indicates the effectiveness of our method, yielding a notably
higher retrieval performance comparing to competing CBIR
techniques.
6. REFERENCES
[1] MPEG-7 Visual Part of eXperimentation Model
Version 2.0. ISO/MPEG MPEG-7 Output Document,
1999.
[2] Y. Avrithis, N. Doulamis, A. Doulamis, and S. Kollias.
Optimization methods for key-frames and scenes
extraction. Comput. Vis. Image Understanding,
75(1/2):3–24, Jul./Aug. 1999.
[3] N. Babaguchi, K. Ohara, and T. Ogura. Effect of
personalization on retrieval and summarization of
sports video. In Joint Conference of the Fourth
International Conference on Information,
Communications and Signal Processing, and the
Fourth Pacific Rim Conference on Multimedia,
volume 2, pages 940–944, 2003.
[4] Y. Chen, X. S. Zhou, and T. Huang. One-class SVM
for learning in image retrieval. In Proc. IEEE Int’l
Conf. Image Processing, 2001, volume 1, pages 34–37,
2001.
[5] I. Cox, M. L. Miller, S. M. Omohundro, and P. N.
Yianilos. Pichunter: Bayesian relevance feedback for
image retrieval. In Proc. Int. Conf. Pattern
Recognition, 1996, volume 3, pages 362–369, 1996.
[6] A. Doulamis and N. Doulamis. Generalized nonlinear
relevance feedback for interactive content-based
retrieval and organization. IEEE Trans. Circuits Syst.
Video Technol., 14(5):656–671, 2004.
[7] T. S. Huang and X. S. Zhou. Image retrieval by
relevance feedback: from heuristic weight adjustment
to optimal learning methods. In Proc. IEEE Int’l
Conf. Image Processing, 2001, volume 3, pages 2–5,
2001.
[8] Y. Ishikawa, R. Subramanya, and C. Faloutsos.
Mindreader: Query databases through multiple
examples. In Proc. 24th VLDB Conf, pages 218–227,
1998.
[9] C. J. Leggetter and P. C. Woodland. Maximum
likelihood linear regression for speaker adaptation of
continuous density hidden markov models. Computer
Speech & Language, 9(2):171–185, 1995.
[10] B. Manjunath, P. Wu, S. Newsam, and H. Shin. A
texture descriptor for browsing and similarity retrieval.
J. Signal Processing: Image Comm., 16:33–42, 2000.
[11] P. Muneesawang and L. Guan. Image retrieval with
embeded sub-class information using Gaussian
mixture models. In Proc. Int. Conference Multimedia
and Expo, 2003, volume 1, pages 769–772, 2003.
[12] D. Peel and G. J. McLachlan. Robust mixture
modeling using the t distribution. Statistics and
Computing, 10(4):339–348, 2000.
[13] Y. Rui, T. Huang, and S. Mehrotra. Content-based
image retrieval with relevance feedback in MARS. In
Proc. IEEE Int’l Conf. Image Processing, 1997,
volume 2, pages 815–818, 1997.
[14] Y. Rui, T. Huang, M. Ortega, and S. Mehrotra.
Relevance feedback: A power tool for interactive
Performance Assessment
After obtaining the global models of the semantic classes,
we conduct a series of relevance feedback adaptation rounds
to assess the performance of the proposed relevance feedbackenhanced CBIR system, as described in section 4.1. To obtain some comparative results, we conduct the same series
of relevance feedback iterations using the methods proposed
in [6] and [8]. In Fig. 2 we illustrate the obtained precisionrecall curves as well as the precision yielded by the different
methods as a function of the number of the completed relevance feedback iterations, for recall equal to 35%. Table 1
depicts the yielded ANMRR values for the examined methods.
As we we notice, the proposed tMM-based CBIR system offers superior retrieval performance comparing to its
competitors. We also mention that the proposed relevance
feedback adaptation algorithm for tMMs achieves the rapid
enhancement of the retrieval precision of our system while
requiring a minimal user interaction, both in terms of the
number of relevance feedback iterations and the number of
the provided feedback samples, completely outperforming
its competitors.
Finally, the average required number of EM-algorithm iterations, until convergence, per relevance feedback iteration,
using a convergence threshold equal to 10−5 , is 5.03 iterations, yielding a significantly low computational complexity
for the proposed relevance feedback model adaptation algorithm.
5. CONCLUSIONS
In this work we have proposed a novel probabilistic framework for content-based image retrieval. Initially, the considered semantic classes of images are modeled using mixture
models of t distributions fitted to data provided by the system operators, deriving the so-called global models of the
considered semantic classes. Further, a novel, efficient, robust relevance feedback algorithm is applied for the adaptation of the global semantic class models to the feedback
provided by each user. This way, the representation of the
considered semantic classes is adapted to the individual perception of each user, allowing for the effective personalization of our system’s retrieval criteria, with a minimal user
interaction.
The major contributions of this work are
1. the introduction of a robust relevance feedback framework for content-based image retrieval (CBIR), which,
exploiting the inherent outlier downweighting capabilities mixtures of t distributions offer, provides the effective means to resolve the common outlier vulnerabilityrelated problems usual relevance feedback algorithms
suffer by,
2. the provision of an efficient relevance feedback adap-
7
[15]
[16]
[17]
[18]
[19]
[20]
[21]
content-based image retrieval. IEEE Trans. Circuits
Syst. Video Technol., 8:644–655, 1998.
G. Salton and M. J. McGill. Introduction to Modern
Information Retrieval. McGraw-Hill, New York, 1982.
S. Shoham. Robust clustering by deterministic
agglomeration EM of mixtures of multivariate t
distributions. Statistics and Computing,
35(55):1127–1142, 2002.
A. W. M. Smeulders, M. Worring, S. Santini,
A. Gupta, and R. Jain. Content-based image retrieval
at the end of the early years. IEEE Trans. Pattern
Analysis and Machine Intelligence, 22(12):1349–1380,
2000.
Z. Su, H. Zhang, S. Li, and S. Ma. Relevance feedback
in content-based image retrieval: Bayesian framework,
feature subspaces, and progressive learning. IEEE
Trans. Image Processing, 12(8):924–937, 2003.
N. Vasconcelos and A. Lippman. Learning from user
feedback in image retrieval systems. In Proceedings of
Neural Information Processing Systems 12, volume 1,
1999.
N. Vasconcelos and A. Lippman. Statistical models of
video structure for content analysis and
characterization. IEEE Trans. Image Processing,
9:3–19, 2000.
X. S. Zhou and T. Huang. Small sample learning
during multimedia retrieval using biasmap. In Proc.
IEEE Conf. Computer Vision and Pattern
Recognition, 2001, volume 1, pages 11–17, 2001.
and where
”
1 “
Q2j (ξ i ; Ψ(k) ) = − E Uj |xj , zij = 1; Ψ(k) ×
2
T
−1
× (xj − bi )T Σ−1
i (xj − bi ) + (xj − bi ) Σi Ai ×
“
”
h
1
× E Uj |xj , zij = 1; Ψ(k) µi − trace ATi Σ−1
i Ai ×
i2
“
”
×E Uj |xj , zij = 1; Ψ(k) µi µTi
(35)
B. M-STEP
First of all, let us consider
the maximization of Q1 (c; Ψ(k) ),
P
under the constraint gi=1 ci = 1. Using a Lagrange multiplier λ to enforce the constraint we have
!#
"
g
n
X
X
∂
rij
ch − 1
=
−λ=0
Q1 − λ
∂ci
ci
j=1
h=1
which gives (20).
To derive the expression of the ML estimator of bi we
have to perform the maximization of Q2 (ξ; Ψ(k) ) w.r.t. bi .
From (34),(35) we have that the expression of Q2 (ξ; Ψ(k) ),
ignoring terms not containing bi , is given by
g
n
1 X X (k) (k) h
r u
−2xTj Σ−1
i bi +
2 i=1 j=1 ij ij
i
T
−1
+2bTi Σ−1
i Ai µi + bi Σi bi
Q∗2 (ξ; Ψ(k) ) = −
Since
APPENDIX
A.
= 2Σ−1
i bi ,
−1
∂x T
j Σi b i
∂b i
E-STEP
(k+1)
logLc (Ψ) = logL1c (c) + logL2c (ξ)
0 yields the maximizer bi
which is given by (21).
Finally, let us consider the maximization of Q2 (ξ; Ψ(k) )
over Ai . We need hence to compute the partial derivative
of Q2 (ξ; Ψ(k) ) with respect to Ai . From (34) and (35) it
follows that we have to compute
“
”
(k)
∂(xj − bi )T Σ−1
i Ai µi E Uj |xj , zij = 1; Ψ
(29)
where
logL1c (c) =
g
n
X
X
zij logci
(30)
i=1 j=1
∂Ai
g
n
X
X
uj
logL2c (ξ) =
zij {− (xj − Ai µi − bi )T Σ−1
i (xj −
2
i=1 j=1
=
Ai µi − bi )}
and c = (c1 , ..., cg ), ξ = (ξ 1 , ..., ξ g ) and ξ i contains the elements of the Ai and the bi . This result can be easily derived by adapting the expression of the complete data loglikelihood of a mixture of t distributions, given in [12], in
the context of model (15).
Thus, the conditional expectation Q(Ψ; Ψ(k) ) in (17) can
be written as
Q(Ψ; Ψ(k) ) = Q1 (c; Ψ(k) ) + Q2 (ξ; Ψ(k) )
=
(32)
(k)
rij logci
(33)
rij Q2j (ξ i ; Ψ(k) )
(34)
i=1 j=1
g
n
X
X
(k)
“
”
− bi )E Uj |xj , zij = 1; Ψ(k) µTi
(36)
2Σ−1
i Ai E
∂A
“ i
”
Uj |xj , zij = 1; Ψ(k) µi µTi
(37)
Using (37) and (36) and constraining the matrix Ai to be diagonal, the equation ∂Q2 (ξ; Ψ(k) )/∂Ai = 0 yields the maximizer (22).
where
g
n
X
X
Σ−1
i (xj
“
“
”
”
(k)
∂trace ATi Σ−1
µi µTi
i Ai E Uj |xj , zij = 1; Ψ
(31)
Q1 (c; Ψ(k) ) =
−1
∂b T
i Σi Ai µ i
=
∂b i
(k)
∂Q∗
(ξ;Ψ
)
2
=
∂b i
= Σ−1
i xj ,
Σ−1
i Ai µi , it is easy to show that the solution of
Using (16) and ignoring constant terms, we get
Q2 (ξ; Ψ(k) ) =
−1
∂b T
i Σi b i
∂b i
i=1 j=1
8