SEMCOG:
Wen-Syan
An Object-based
Image Retrieval
and Its Visual Query Interface
Li
K. Sel@
Candant
Kyoji Hirata
System
Yoshinori
Hara
C&C Research Laboratories,
NEC USA, Inc.
110 Rio Robles, San Jose, CA 95134
Email: {wen,candan,hirata,hara}
@ccrl.sj.nec.com
1
2 SEMCOG
Introduction
Approach
We argue that image retrieval based on either approach
alone is not sufficient in terms of modeling and query
specification flexibilityy. We also argue that a visual query
interface which is capability of visualizing target images
is essential.
SEMCOG[l]
(SEMantics
and COGnitionbased image retrieval) aims at integrating semautics and
cognition-based
approaches to give users a greater flexibility to pose queries.
SEMCOG’S image matching is
based on objects in the images rather than the whole images. In SEMCOG, a query “Retrieve all images in which
there is a man to the right of a car and the man looks like
this image” can be posed using combinations of semantics
and visual expressions. The queries are posed in the way
of specifying image objects and their layouts using a visual query interface, IFQ (In Frame Query), rather than
complicated multimedia database query languages.
The user’s query can be simplified as a mental model
shown at the top of Figure 1. She then specifies her mental model using IFQ using combinations
of visual examples or semantics.
IFQ window shows a query “Retrieve
all images in which there is a person who is to the right
of an object that looks like this image (a computer)”.
An
actual window dump of IFQ for this query is shown in the
middle of Figure 1. This specification is then translated
(by IFQ) to a CSQL (Cognition
and Semantics-based
Query Language) query, a SQL-like query language used
in SEMCOG. In this example, three types of queries are
involved:
semantics-based,
cognition-based,
and scenebased. The results for this example contain two images.
Please note that person has been relaxed to man and
Image retrieval is a key issue in many image database
applications.
Major approaches include queries by image examples (cognition-based
approach) and query by
semantics (semantics-based
approach).
The semantics-based
approach is good at image retrieval based on image semantics. However, this approach
has low visual expressive capability. Since images are visual, it is hard to describe in detail using text alone.
The cognition-based
approach haa an advantage of using visual examples. However, one disadvantage
of using
the cognition-based
approach alone is its low precision.
This is because users’ drawings are usually not precise
enough and current image recognition technique still has
its limits. For example, if a user wants to retrieve images containing a dog and provides a drawing of a dog,
currently systems of cognition-based
approach can only
return many images with some animals which look like
dogs, such aa cats and tigers. Another weakness of this
approach is that it can not support queries on generahzed
concepts, such as transportation
and appliances. For example, if a user wants to retrieve all images containing
any type of appliances, the user must provide drawings
for all possible appliances, such aa TVs, radios, computers, and etc. This is not practical.
As to the query specification
flexibility, SQLbased
languages provide support for specification of queries using only text. However, it is important to support query
specification by visual examples. Existing ‘(visual” interfaces are either GUIS to pose textual queries or drawing
windows to provide sketches or examples of whole images. These interfaces do not give users flexibility in using
combinations of semantics and visual expressions to pose
queries. These GUIS do not really visualize target images
of users’ query specifications.
woman accordingly and an image contains a computer
and a book is also included. Their ranking is calculated
by matching objects and their spatial relationships (layout ) with users’ query.
Through integration of semantics and cognition-based
approaches of image retrieval, SEMCOG gives users a
greater flexibility to pose queries so that queries in CSQL
Permission to mske digital/hard copy of part or sII this work for
personel or clessroom use is grsnted without fee provided thst
copies ere not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date
appear, and notice is given that copying ia by permission of ACM,
Inc. To copy otherwise, to republish, to post on servers, or to
t This work was performed when the author
SIGMOD
redistribute to lists, requires prior specific permission and/or a fae.
521
visited
’97 AZ, USA
@J 1997 ACM 0-89791-91
1-4197 /0005 . ..$3.50
NEC,
CCRL.
the Facilitator
because the Facilitator has more complete
knowledge
of the query execution
statistics
and it can
provide a globally optimum
query processing.
—
Natural
to users
COIR (Content-Oriented
Image Retrieval) [2] is an objectbased image retrieval engine based on colors and shapes.
We use it as the Cognition-based Query Processor in SEMCOG. The main task of the COIR is to identify image regions based on pre-extracted
image metadata,
colors and
shapes.
Since an object may consist of multiple image
regions, COIR consults the image component catalog for
matching image objects.
E!E
but
~o
not precise
to computers
“
m
&
When an image is “registered” at SEMCOG, the Image Semantics Editor interacts with COIR to edit the
semantics of an image and objects in the image. The hnage Semantics Editor then stores the image, semantics,
and image metadata to the database.
The Terminology Manager maintains
a terminology
base which is used for query relaxation.
For example, a
user may submit a query as “Retrieve all images containing an appliance.” Since appliance is a generalized concept
rather than an atomic term, the faciktator consults the
Terminology Manager to reformulate the query. Existing
dictionaries, such as Wordnet, can be employed to build
a terminology base.
The Semantics-based
Quey Processorperforms queries
concerning image semantics.
The image semantics re
quired for query processing is generated during the image registration.
The semantics-based
query processing
is the same as traditional
query processing on relational
DBMSS.
parson
IFQ
Select imageP
CSQL
‘
(Semantics-basedQuery)
(1)
(2)
Xispcrson
—
Y i_W ~
(3)
X to_rbe_rigbt_o~
Y (Scene-based
QueIY)
(Cosnition-basedQuery)
Image Database
\
/
Preciseto
computsrs
4
Query Language
but
CSQL, a SQL-like language, is the underlying query language used in SEMCOG. SQL is augmented
by adding
predicates which are capability of handling multimedia
data. These predicates extend the underlying database
system to a multimedia
database system.
These predicates defined in CSQL include: (1) Semantics-based:
is
(e.g. man vs. man), is.a (e.g. car vs. transportation,
man vs. human), and sJike for “semantics like” (e.g. car
vs. truck); (2) Cognition-based:
iJike for “image like”
that compares visual signatures
of two arguments
and
contains; (3) spatial relationship-based
to-the-right-of,
and etc.
to-the-left-of
not natural
to Users
Figure
1: Example
Query in SEMCOG
can represent users’ candidate images better.
As a result, the query specifications are more precise to database
systems. Another advantage is that in the query specification process, users do not need to be aware of schema
design and implementation
of the image database aa well
as query language syntax since queries are generated by
IFQ automatically.
3
System Architecture
5
SEMCOG architecture contains five components as shown
in Figure
2. Their
functionahties
Query Interface
IFQ is an attempt to bridge the gap between precise
computer languages and users mental models. IFQ is a
visual, rather than “graphical”, query interface which allows users to input keywords, concepts, semantics, image examples, sketches, and spatial relationships.
IFQ
can visualize target images as query specification process
progresses.
are as follows:
The Facilitator
coordinates
the interactions
between
components of SEMCOG. It forwards image matching related tasks to the Cognition-based Query Processor and
non-image matching taska to the Semantics-based
Query
Processor.
One advantage of assigning these tasks to
522
Users
E!!!!!!
WEBBrowner10ther QueryInterfaces
Image
n
Semantics andCognition-bawd
Queries
Images
00
Figure 2: System
Component
catalog
/
architecture
of SEMCOG
ing icons.
The query specification process in IFQ consists of three
steps: introducing
image objects, describing them, and
specifyhg
their spatial relationships.
In IFQ, objects
are represented
as bullets and descriptors, represented
In many
ages in mind.
cases,
Users
users
want
do not
have
to extract
specific
and browse
target
im-
semantic
and information
about image objects.
IFQ also supports
interactions
through
specifying
unbound
descriptors
and
as small bullets, attached to these objects describe their
properties.
F@e 3 shows a query “Retrieve all images in which
there is a man to the right of a car and he looks like
this image” posed using IFQ. The IFQ query is posed
conditions,
such as IS-A and S-Like predicates.
In F@re 5, the user relaxes the condkion of being a car
to “being a kind of transportation”.
IS-A tmnsportation
is specified instead.
The user can further introduces an
unbound descriptor
outJ to check the actual semantics.
The result given in Figure 5 shows two candidate images
including an image containing a bus. This image haa a
lower ranking, because the human in the image can not
be identifkd aa men.
relaxing
as follows: The user introduces
the first object in the
image. and then further describes the object by attaching
‘filike < image >“ and “is man” descriptors.
After a
user specifies an image path or provides a drawing, the
interface automatically
replaces the descriptor with the
thumbnail size image the user specifies. Then, the user
introduces another object and describes it using the “is
car” descriptor.
Finally, the user describes the spatial
relationship between these two objects by drawing a line,
labeled by to-the-right-of, from the man object to the car
object.
While user is specifying the query using IFQ, the corresponding CSQL query is automatically
generated in the
CSQL window. Users can pose queries simply by clicking
buttons and dragging and dropping icons representing entities and descriptors.
Figure 4 shows the result for the
query in Figure 3, including a thumbnail size image and
its ranking. Users can click on any thumbnail image to
see the red image as shown on the right side.
IFQ also provides an armnge function. IFQ checks the
matching between the layout specifications
provided by
the user and the actual layout on the screen. If there is a
mismatch, IFQ reammges the query objects on the screen
according to the query specifications.
Another functionality for increasing the perceptual qualities of IFQ is iconize.
6
Conclusions
SEMCOG is currently being implemented on top a deductive database and a commercial object-relational
DBMS.
We have shown the design of SEMCOG and its query
interface. The novelty of our work includes: (1) Object,based image retrieval rather than a whole image; (2) queries
using combinations of semantics and visual examples; and
(3) a visual query interface and query generator.
References
[1] Wen-Syan Li, K. Selguk Candan, and K. Hirata. SEMCOG: An Integration
of SEMantics and COGnitionbased Approaches for Image Retrieval.
In Proceedings of 1997 ACM Symposium on Applied Computing
Special Tkack on Database Technology,San Jose, CA,
USA, February 1997.
[2] Kyoji Hirata,
Yoshinori
Hara,
H. Takano,
and
S. Kawasaki. Content-Oriented
Integration in Hypermedia Systems. In Proceedings of 1996 ACM Conference on Hypertext,
Maxch 1996.
replaces the semantics terms on the IFQ window
with the correspondkg icons. As shown in Figure 5, IFQ
replaces IS.A tmnsportation and IS man with correspondIconize
523
Figure 3: IFQ Specification
Window
(top) and CSQL Generating
Window
Figure 4: Query Result and Image Retrieved for the Query in Figure 3
Figure 5: Relaxed Query for Extracting Semantics and its Result
524
(button)