Abstract.
Nonspecific images in a broad domain remain a challenge for content-based image retrieval. As a typical example, consumer photos exhibit highly varied content, diverse resolutions, and inconsistent quality. The objects are usually ill-posed, occluded, and cluttered with poor lighting, focus, and exposure. Traditional image retrieval approaches face many obstacles such as semantic description of images, robust semantic object segmentation, small sampling problem, semantic gaps between low-level features and high-level semantics, etc.
To manage the high diversity of images in a broad domain, we propose a structured learning framework to systematically design domain-relevant visual semantics, known as semantic support regions, to support index and query in a content-based image retrieval system. Semantic support regions are segmentation-free image regions that exhibit semantic meanings and that can be learned statistically to span a new indexing space. They are detected from image content, reconciled across multiple resolutions, and aggregated spatially to form local semantic histograms. The resulting compact and abstract representation can support both similarity-based query and compositional visual query efficiently. The query by spatial icons (QBSI) formulation is a unique visual query language to explicitly specify visual icons and spatial extents in a Boolean expression.
For empirical evaluation, we perform the learning and indexing processes of 26 semantic support regions over 2400 heterogeneous consumer photos from a single family using Support Vector Machines. We report a \(27\%\) improvement in average precision over a very high dimension feature-based approach on 24 semantic queries based on multiple examples and pooled ground truths. Last but not least, we demonstrate the usefulness of the visual query language with 15 QBSI queries that have attained high precision values at top retrieved images on the 2400 consumer images.
Similar content being viewed by others
References
Armitage L, Enser P (1997) Analysis of user need in image archives. J Inf Sci 23(4):287-299
Bach J R, Fuller C, Gupta A, Hampapur A, Horowitz B, Humphrey R, Jain R C, Shu C (1996) Virage image search engine: an open framework for image management. In: Storage and Retrieval for Image and Video Databases IV, Proc. SPIE 2670, pp 76-87
Barnard K, Duygulu P, Freitas ND, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107-1135
Cox I J, Miller M L, Minka T P, Papathomas T, Yianilos PN (2000) The Bayesian image retrieval system, PicHunter: theory, implementation and psychophysical experiments. IEEE Trans Image Process 9:20-37
Del Bimbo A, Pala P (1997) Visual image retrieval by elastic matching of user sketches. IEEE Trans Pattern Anal Mach Intell 19:121-132
Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
Bradshaw B (2000) Semantic based image retrieval: a probabilistic approach. In: Proc. ACM Multimedia’2000, pp 167-176
Carson C, Belongie S, Greenspan H, Malik J (2002) Blobworld: image segmentation using expectation-maximization and its application to image querying. IEEE Trans Pattern Anal Mach Intell 24(8):1026-1038
Cinque L, Lecca F, Levialdi S, Tanimoto S L (2000) Retrieval of images using rich-region descriptions. J Vis Lang Comput 11:303-321
Daoudi M, Matusiak S (2000) Visual image retrieval by multiscale description of user sketches. J Vis Lang Comput 11:287-301
Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proc. ECCV’2002, pp 97-112
Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. IEEE Comput 28(9):23-32
Gevers T, Smeulders A (1997) PicToSeek: a content-based image search system for the World Wide Web. In: Proc. Visual 97, pp 93-100
Joachims T (1999) Making large-scale SVM learning practical. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods - support vector learning. MIT Press, Cambridge, MA
Klir GJ, Folger T (1992) Fuzzy sets, uncertainty, A, information. Prentice Hall, Upper Saddle River, NJ
Kumar S, Loui AC, Hebert M (2002) Probabilistic classification of image regions using an observation-constrained generative approach. In: 1st international workshop on generative-model-based vision
Lew M (2000) Next-generation web searches for visual content. IEEE Comput 33(11):46-52
Li J, Wang JZ Wiederhold G (2000) Integrated region matching for image retrieval. In: Proc. ACM Multimedia’2000, pp 147-156
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(10):1-14
Lim JH (1999) Learnable visual keywords for image classification. In: Proc. ACM Digital Libraries, pp 139-145
Lim JH (1999) Learning visual keywords for content-based retrieval. In: Proc. IEEE ICMCS, pp 169-173
Lim JH (2000) Explicit query formulation with visual keywords. In: Proc. ACM Multimedia’2000, pp 407-409
Lim JH (2000) Visual keywords: from text IR to multimedia IR. In: Crestani F, Pasi G (eds) Soft computing in information retrieval: techniques and applications, Physica, Springer, Berlin Heidelberg New York, pp 77-101
Lim JH (2001) Building visual vocabulary for image indexation and query formulation. Pattern Anal Appl 4(2/3):125-139
Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2:139-154
Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837-842
Martinez AM, Serra JR (2000) A new approach to object-related image retrieval. J Vis Lang Comput 11:345-363
Moghaddam B, Biermann H, Margaritis D (2001) Regions-of-interest and spatial layout for content-based image retrieval. Multimedia Tools Appl 14:201-210
Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349-361
Naphade MR, Kozintsev IV, Huang TS (2002) A factor graph framework for semantic video indexing. IEEE Trans CSVT 12(1):40-52
Papageorgiou PC, Oren M, Poggio T (1997) A general framework for object detection. In: Proc. international conference on computer vision, pp 555-562
Pentland A, Picard RW, Sclaroff S (1995) Photobook: content-based manipulation of image databases. Int J Comput Vis 18(3):233-254
Rao A, Srihari R, Zhu L, Zhang A (2002) A theory for measuring the complexity of image databases. IEEE Trans Multimedia 4(2):160-173
Rowley HA, Baluja S, Kanade T (1998) Neural network-based face detection. IEEE Trans Pattern Anal Mach Intell 20(1):23-38
Rui Y, Huang TS, Mehrotra S (1997) Content-based image retrieval with relevance feedback in MARS. In: Proc. IEEE international conference on image processing, pp 815-818
Santini S, Gupta A, Jain R (2001) Emergent semantics through interaction in image databases. IEEE Trans Knowl Data Eng 13(3):337-351
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349-1380
Smith JR, Chang S-F (1996) VisualSEEk: a fully automated content-based image query system. In: Proc. ACM Multimedia 96, Boston
Smith JR, Chang S-F (1997) Visually searching the web for content. IEEE Multimedia 4(3):12-20
Snoek CGM, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools Appl 25(1):5-35
Song Y, Zhang A (2003) Analyzing scenery images by monotonic tree. Multimedia Syst 8(6):495-511
Sung KK, Poggio T (1998) Example-based learning for view-based human face detection. IEEE Trans Pattern Anal Mach Intell 20(1):39-51
Swain MJ, Ballard DN (1991) Color indexing. Int J Comput Vis 7(1):11-32
Tao Y, Grosky WI (2000) Image indexing and retrieval using object-based point feature maps. J Vis Lang Comput 11:323-343
Taycher L, Cascia M, Sclaroff S (1997) Image digestion and relevance feedback in the ImageRover WWW search engine. In: Proc. Visual 97, pp 85-91
Tieu K, Viola P (2000) Boosting image retrieval. In: Proc. CVPR’2000, pp 1228-1235
Town C, Sinclair D (2000) Content-based image retrieval using semantic visual categories. Technical Report 2000.14, AT&T Research, Cambridge, MA
Wu JK, Lim JH, Hong DZ (2000) Toward semantics level indexing and retrieval of images and video. In: Proc. 2000 RWC symposium, Tokyo, 17-19 January 2000, pp 159-164
Wu Y, Tian Q, Huang TS (2000) Discriminant-EM algorithm with application to image retrieval. In: Proc. CVPR’2000, pp 1222-1227
Zhu L, Rao AB, Zhang AD (2002) Theory of keyblock-based image retrieval. ACM Trans Inf Syst 20:224-257
W3C: Synchronized Multimedia Integration Language (SMIL 2.0). http://www.w3.org/TR/smil20/
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online: 12 January 2005
Correspondence to: Joo-Hwee Lim
Rights and permissions
About this article
Cite this article
Lim, JH., Jin, J.S. A structured learning framework for content-based image indexing and visual query. Multimedia Systems 10, 317–331 (2005). https://doi.org/10.1007/s00530-004-0158-z
Issue Date:
DOI: https://doi.org/10.1007/s00530-004-0158-z