Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A structured learning framework for content-based image indexing and visual query

  • Published:
Multimedia Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract.

Nonspecific images in a broad domain remain a challenge for content-based image retrieval. As a typical example, consumer photos exhibit highly varied content, diverse resolutions, and inconsistent quality. The objects are usually ill-posed, occluded, and cluttered with poor lighting, focus, and exposure. Traditional image retrieval approaches face many obstacles such as semantic description of images, robust semantic object segmentation, small sampling problem, semantic gaps between low-level features and high-level semantics, etc.

To manage the high diversity of images in a broad domain, we propose a structured learning framework to systematically design domain-relevant visual semantics, known as semantic support regions, to support index and query in a content-based image retrieval system. Semantic support regions are segmentation-free image regions that exhibit semantic meanings and that can be learned statistically to span a new indexing space. They are detected from image content, reconciled across multiple resolutions, and aggregated spatially to form local semantic histograms. The resulting compact and abstract representation can support both similarity-based query and compositional visual query efficiently. The query by spatial icons (QBSI) formulation is a unique visual query language to explicitly specify visual icons and spatial extents in a Boolean expression.

For empirical evaluation, we perform the learning and indexing processes of 26 semantic support regions over 2400 heterogeneous consumer photos from a single family using Support Vector Machines. We report a \(27\%\) improvement in average precision over a very high dimension feature-based approach on 24 semantic queries based on multiple examples and pooled ground truths. Last but not least, we demonstrate the usefulness of the visual query language with 15 QBSI queries that have attained high precision values at top retrieved images on the 2400 consumer images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armitage L, Enser P (1997) Analysis of user need in image archives. J Inf Sci 23(4):287-299

    Google Scholar 

  2. Bach J R, Fuller C, Gupta A, Hampapur A, Horowitz B, Humphrey R, Jain R C, Shu C (1996) Virage image search engine: an open framework for image management. In: Storage and Retrieval for Image and Video Databases IV, Proc. SPIE 2670, pp 76-87

  3. Barnard K, Duygulu P, Freitas ND, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107-1135

    MATH  Google Scholar 

  4. Cox I J, Miller M L, Minka T P, Papathomas T, Yianilos PN (2000) The Bayesian image retrieval system, PicHunter: theory, implementation and psychophysical experiments. IEEE Trans Image Process 9:20-37

    Google Scholar 

  5. Del Bimbo A, Pala P (1997) Visual image retrieval by elastic matching of user sketches. IEEE Trans Pattern Anal Mach Intell 19:121-132

    Google Scholar 

  6. Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford

  7. Bradshaw B (2000) Semantic based image retrieval: a probabilistic approach. In: Proc. ACM Multimedia’2000, pp 167-176

  8. Carson C, Belongie S, Greenspan H, Malik J (2002) Blobworld: image segmentation using expectation-maximization and its application to image querying. IEEE Trans Pattern Anal Mach Intell 24(8):1026-1038

    Article  Google Scholar 

  9. Cinque L, Lecca F, Levialdi S, Tanimoto S L (2000) Retrieval of images using rich-region descriptions. J Vis Lang Comput 11:303-321

    Google Scholar 

  10. Daoudi M, Matusiak S (2000) Visual image retrieval by multiscale description of user sketches. J Vis Lang Comput 11:287-301

    Google Scholar 

  11. Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proc. ECCV’2002, pp 97-112

  12. Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. IEEE Comput 28(9):23-32

    Google Scholar 

  13. Gevers T, Smeulders A (1997) PicToSeek: a content-based image search system for the World Wide Web. In: Proc. Visual 97, pp 93-100

  14. Joachims T (1999) Making large-scale SVM learning practical. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods - support vector learning. MIT Press, Cambridge, MA

  15. Klir GJ, Folger T (1992) Fuzzy sets, uncertainty, A, information. Prentice Hall, Upper Saddle River, NJ

  16. Kumar S, Loui AC, Hebert M (2002) Probabilistic classification of image regions using an observation-constrained generative approach. In: 1st international workshop on generative-model-based vision

  17. Lew M (2000) Next-generation web searches for visual content. IEEE Comput 33(11):46-52

    Google Scholar 

  18. Li J, Wang JZ Wiederhold G (2000) Integrated region matching for image retrieval. In: Proc. ACM Multimedia’2000, pp 147-156

  19. Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(10):1-14

    Google Scholar 

  20. Lim JH (1999) Learnable visual keywords for image classification. In: Proc. ACM Digital Libraries, pp 139-145

  21. Lim JH (1999) Learning visual keywords for content-based retrieval. In: Proc. IEEE ICMCS, pp 169-173

  22. Lim JH (2000) Explicit query formulation with visual keywords. In: Proc. ACM Multimedia’2000, pp 407-409

  23. Lim JH (2000) Visual keywords: from text IR to multimedia IR. In: Crestani F, Pasi G (eds) Soft computing in information retrieval: techniques and applications, Physica, Springer, Berlin Heidelberg New York, pp 77-101

  24. Lim JH (2001) Building visual vocabulary for image indexation and query formulation. Pattern Anal Appl 4(2/3):125-139

    Google Scholar 

  25. Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2:139-154

    Google Scholar 

  26. Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837-842

    Article  Google Scholar 

  27. Martinez AM, Serra JR (2000) A new approach to object-related image retrieval. J Vis Lang Comput 11:345-363

    Google Scholar 

  28. Moghaddam B, Biermann H, Margaritis D (2001) Regions-of-interest and spatial layout for content-based image retrieval. Multimedia Tools Appl 14:201-210

    Google Scholar 

  29. Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349-361

    Article  Google Scholar 

  30. Naphade MR, Kozintsev IV, Huang TS (2002) A factor graph framework for semantic video indexing. IEEE Trans CSVT 12(1):40-52

    Google Scholar 

  31. Papageorgiou PC, Oren M, Poggio T (1997) A general framework for object detection. In: Proc. international conference on computer vision, pp 555-562

  32. Pentland A, Picard RW, Sclaroff S (1995) Photobook: content-based manipulation of image databases. Int J Comput Vis 18(3):233-254

    Google Scholar 

  33. Rao A, Srihari R, Zhu L, Zhang A (2002) A theory for measuring the complexity of image databases. IEEE Trans Multimedia 4(2):160-173

    Google Scholar 

  34. Rowley HA, Baluja S, Kanade T (1998) Neural network-based face detection. IEEE Trans Pattern Anal Mach Intell 20(1):23-38

    Article  Google Scholar 

  35. Rui Y, Huang TS, Mehrotra S (1997) Content-based image retrieval with relevance feedback in MARS. In: Proc. IEEE international conference on image processing, pp 815-818

  36. Santini S, Gupta A, Jain R (2001) Emergent semantics through interaction in image databases. IEEE Trans Knowl Data Eng 13(3):337-351

    Article  Google Scholar 

  37. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349-1380

    Article  Google Scholar 

  38. Smith JR, Chang S-F (1996) VisualSEEk: a fully automated content-based image query system. In: Proc. ACM Multimedia 96, Boston

  39. Smith JR, Chang S-F (1997) Visually searching the web for content. IEEE Multimedia 4(3):12-20

    Article  Google Scholar 

  40. Snoek CGM, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools Appl 25(1):5-35

    Google Scholar 

  41. Song Y, Zhang A (2003) Analyzing scenery images by monotonic tree. Multimedia Syst 8(6):495-511

    Article  Google Scholar 

  42. Sung KK, Poggio T (1998) Example-based learning for view-based human face detection. IEEE Trans Pattern Anal Mach Intell 20(1):39-51

    Article  MathSciNet  Google Scholar 

  43. Swain MJ, Ballard DN (1991) Color indexing. Int J Comput Vis 7(1):11-32

    Article  Google Scholar 

  44. Tao Y, Grosky WI (2000) Image indexing and retrieval using object-based point feature maps. J Vis Lang Comput 11:323-343

    Google Scholar 

  45. Taycher L, Cascia M, Sclaroff S (1997) Image digestion and relevance feedback in the ImageRover WWW search engine. In: Proc. Visual 97, pp 85-91

  46. Tieu K, Viola P (2000) Boosting image retrieval. In: Proc. CVPR’2000, pp 1228-1235

  47. Town C, Sinclair D (2000) Content-based image retrieval using semantic visual categories. Technical Report 2000.14, AT&T Research, Cambridge, MA

  48. Wu JK, Lim JH, Hong DZ (2000) Toward semantics level indexing and retrieval of images and video. In: Proc. 2000 RWC symposium, Tokyo, 17-19 January 2000, pp 159-164

  49. Wu Y, Tian Q, Huang TS (2000) Discriminant-EM algorithm with application to image retrieval. In: Proc. CVPR’2000, pp 1222-1227

  50. Zhu L, Rao AB, Zhang AD (2002) Theory of keyblock-based image retrieval. ACM Trans Inf Syst 20:224-257

    Article  Google Scholar 

  51. W3C: Synchronized Multimedia Integration Language (SMIL 2.0). http://www.w3.org/TR/smil20/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joo-Hwee Lim.

Additional information

Published online: 12 January 2005

Correspondence to: Joo-Hwee Lim

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lim, JH., Jin, J.S. A structured learning framework for content-based image indexing and visual query. Multimedia Systems 10, 317–331 (2005). https://doi.org/10.1007/s00530-004-0158-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-004-0158-z

Keywords: