Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Semcog

1997, ACM SIGMOD Record

SEMCOG: Wen-Syan An Object-based Image Retrieval and Its Visual Query Interface Li K. Sel@ Candant Kyoji Hirata System Yoshinori Hara C&C Research Laboratories, NEC USA, Inc. 110 Rio Robles, San Jose, CA 95134 Email: {wen,candan,hirata,hara} @ccrl.sj.nec.com 1 2 SEMCOG Introduction Approach We argue that image retrieval based on either approach alone is not sufficient in terms of modeling and query specification flexibilityy. We also argue that a visual query interface which is capability of visualizing target images is essential. SEMCOG[l] (SEMantics and COGnitionbased image retrieval) aims at integrating semautics and cognition-based approaches to give users a greater flexibility to pose queries. SEMCOG’S image matching is based on objects in the images rather than the whole images. In SEMCOG, a query “Retrieve all images in which there is a man to the right of a car and the man looks like this image” can be posed using combinations of semantics and visual expressions. The queries are posed in the way of specifying image objects and their layouts using a visual query interface, IFQ (In Frame Query), rather than complicated multimedia database query languages. The user’s query can be simplified as a mental model shown at the top of Figure 1. She then specifies her mental model using IFQ using combinations of visual examples or semantics. IFQ window shows a query “Retrieve all images in which there is a person who is to the right of an object that looks like this image (a computer)”. An actual window dump of IFQ for this query is shown in the middle of Figure 1. This specification is then translated (by IFQ) to a CSQL (Cognition and Semantics-based Query Language) query, a SQL-like query language used in SEMCOG. In this example, three types of queries are involved: semantics-based, cognition-based, and scenebased. The results for this example contain two images. Please note that person has been relaxed to man and Image retrieval is a key issue in many image database applications. Major approaches include queries by image examples (cognition-based approach) and query by semantics (semantics-based approach). The semantics-based approach is good at image retrieval based on image semantics. However, this approach has low visual expressive capability. Since images are visual, it is hard to describe in detail using text alone. The cognition-based approach haa an advantage of using visual examples. However, one disadvantage of using the cognition-based approach alone is its low precision. This is because users’ drawings are usually not precise enough and current image recognition technique still has its limits. For example, if a user wants to retrieve images containing a dog and provides a drawing of a dog, currently systems of cognition-based approach can only return many images with some animals which look like dogs, such aa cats and tigers. Another weakness of this approach is that it can not support queries on generahzed concepts, such as transportation and appliances. For example, if a user wants to retrieve all images containing any type of appliances, the user must provide drawings for all possible appliances, such aa TVs, radios, computers, and etc. This is not practical. As to the query specification flexibility, SQLbased languages provide support for specification of queries using only text. However, it is important to support query specification by visual examples. Existing ‘(visual” interfaces are either GUIS to pose textual queries or drawing windows to provide sketches or examples of whole images. These interfaces do not give users flexibility in using combinations of semantics and visual expressions to pose queries. These GUIS do not really visualize target images of users’ query specifications. woman accordingly and an image contains a computer and a book is also included. Their ranking is calculated by matching objects and their spatial relationships (layout ) with users’ query. Through integration of semantics and cognition-based approaches of image retrieval, SEMCOG gives users a greater flexibility to pose queries so that queries in CSQL Permission to mske digital/hard copy of part or sII this work for personel or clessroom use is grsnted without fee provided thst copies ere not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying ia by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to t This work was performed when the author SIGMOD redistribute to lists, requires prior specific permission and/or a fae. 521 visited ’97 AZ, USA @J 1997 ACM 0-89791-91 1-4197 /0005 . ..$3.50 NEC, CCRL. the Facilitator because the Facilitator has more complete knowledge of the query execution statistics and it can provide a globally optimum query processing. — Natural to users COIR (Content-Oriented Image Retrieval) [2] is an objectbased image retrieval engine based on colors and shapes. We use it as the Cognition-based Query Processor in SEMCOG. The main task of the COIR is to identify image regions based on pre-extracted image metadata, colors and shapes. Since an object may consist of multiple image regions, COIR consults the image component catalog for matching image objects. E!E but ~o not precise to computers “ m & When an image is “registered” at SEMCOG, the Image Semantics Editor interacts with COIR to edit the semantics of an image and objects in the image. The hnage Semantics Editor then stores the image, semantics, and image metadata to the database. The Terminology Manager maintains a terminology base which is used for query relaxation. For example, a user may submit a query as “Retrieve all images containing an appliance.” Since appliance is a generalized concept rather than an atomic term, the faciktator consults the Terminology Manager to reformulate the query. Existing dictionaries, such as Wordnet, can be employed to build a terminology base. The Semantics-based Quey Processorperforms queries concerning image semantics. The image semantics re quired for query processing is generated during the image registration. The semantics-based query processing is the same as traditional query processing on relational DBMSS. parson IFQ Select imageP CSQL ‘ (Semantics-basedQuery) (1) (2) Xispcrson — Y i_W ~ (3) X to_rbe_rigbt_o~ Y (Scene-based QueIY) (Cosnition-basedQuery) Image Database \ / Preciseto computsrs 4 Query Language but CSQL, a SQL-like language, is the underlying query language used in SEMCOG. SQL is augmented by adding predicates which are capability of handling multimedia data. These predicates extend the underlying database system to a multimedia database system. These predicates defined in CSQL include: (1) Semantics-based: is (e.g. man vs. man), is.a (e.g. car vs. transportation, man vs. human), and sJike for “semantics like” (e.g. car vs. truck); (2) Cognition-based: iJike for “image like” that compares visual signatures of two arguments and contains; (3) spatial relationship-based to-the-right-of, and etc. to-the-left-of not natural to Users Figure 1: Example Query in SEMCOG can represent users’ candidate images better. As a result, the query specifications are more precise to database systems. Another advantage is that in the query specification process, users do not need to be aware of schema design and implementation of the image database aa well as query language syntax since queries are generated by IFQ automatically. 3 System Architecture 5 SEMCOG architecture contains five components as shown in Figure 2. Their functionahties Query Interface IFQ is an attempt to bridge the gap between precise computer languages and users mental models. IFQ is a visual, rather than “graphical”, query interface which allows users to input keywords, concepts, semantics, image examples, sketches, and spatial relationships. IFQ can visualize target images as query specification process progresses. are as follows: The Facilitator coordinates the interactions between components of SEMCOG. It forwards image matching related tasks to the Cognition-based Query Processor and non-image matching taska to the Semantics-based Query Processor. One advantage of assigning these tasks to 522 Users E!!!!!! WEBBrowner10ther QueryInterfaces Image n Semantics andCognition-bawd Queries Images 00 Figure 2: System Component catalog / architecture of SEMCOG ing icons. The query specification process in IFQ consists of three steps: introducing image objects, describing them, and specifyhg their spatial relationships. In IFQ, objects are represented as bullets and descriptors, represented In many ages in mind. cases, Users users want do not have to extract specific and browse target im- semantic and information about image objects. IFQ also supports interactions through specifying unbound descriptors and as small bullets, attached to these objects describe their properties. F@e 3 shows a query “Retrieve all images in which there is a man to the right of a car and he looks like this image” posed using IFQ. The IFQ query is posed conditions, such as IS-A and S-Like predicates. In F@re 5, the user relaxes the condkion of being a car to “being a kind of transportation”. IS-A tmnsportation is specified instead. The user can further introduces an unbound descriptor outJ to check the actual semantics. The result given in Figure 5 shows two candidate images including an image containing a bus. This image haa a lower ranking, because the human in the image can not be identifkd aa men. relaxing as follows: The user introduces the first object in the image. and then further describes the object by attaching ‘filike < image >“ and “is man” descriptors. After a user specifies an image path or provides a drawing, the interface automatically replaces the descriptor with the thumbnail size image the user specifies. Then, the user introduces another object and describes it using the “is car” descriptor. Finally, the user describes the spatial relationship between these two objects by drawing a line, labeled by to-the-right-of, from the man object to the car object. While user is specifying the query using IFQ, the corresponding CSQL query is automatically generated in the CSQL window. Users can pose queries simply by clicking buttons and dragging and dropping icons representing entities and descriptors. Figure 4 shows the result for the query in Figure 3, including a thumbnail size image and its ranking. Users can click on any thumbnail image to see the red image as shown on the right side. IFQ also provides an armnge function. IFQ checks the matching between the layout specifications provided by the user and the actual layout on the screen. If there is a mismatch, IFQ reammges the query objects on the screen according to the query specifications. Another functionality for increasing the perceptual qualities of IFQ is iconize. 6 Conclusions SEMCOG is currently being implemented on top a deductive database and a commercial object-relational DBMS. We have shown the design of SEMCOG and its query interface. The novelty of our work includes: (1) Object,based image retrieval rather than a whole image; (2) queries using combinations of semantics and visual examples; and (3) a visual query interface and query generator. References [1] Wen-Syan Li, K. Selguk Candan, and K. Hirata. SEMCOG: An Integration of SEMantics and COGnitionbased Approaches for Image Retrieval. In Proceedings of 1997 ACM Symposium on Applied Computing Special Tkack on Database Technology,San Jose, CA, USA, February 1997. [2] Kyoji Hirata, Yoshinori Hara, H. Takano, and S. Kawasaki. Content-Oriented Integration in Hypermedia Systems. In Proceedings of 1996 ACM Conference on Hypertext, Maxch 1996. replaces the semantics terms on the IFQ window with the correspondkg icons. As shown in Figure 5, IFQ replaces IS.A tmnsportation and IS man with correspondIconize 523 Figure 3: IFQ Specification Window (top) and CSQL Generating Window Figure 4: Query Result and Image Retrieved for the Query in Figure 3 Figure 5: Relaxed Query for Extracting Semantics and its Result 524 (button)