Abstract
As an important procedure in image retrieval, off-line indexing focuses on organizing relevant images together and largely decides the efficiency, accuracy, and memory cost of the retrieval system. Because the image contains multi-level visual and semantic clues, the described indexing strategy should be able to reflect such multi-level relevance. However, most of the existing indexing strategies view database images individually and only consider partial relevance, i.e., relevance reflected by either local or global feature. To overcome these issues and design better indexing strategy, we propose to package semantically relevant images into superimages, and then index superimages instead of single images. Superimage effectively packages multiple images into one new unit, and hence significantly decreases the number of images to be indexed. This naturally saves the memory cost and retrieval time. To make the final index file discriminative to both visual and semantic relevances, we extract local descriptors from superimages and index them with inverted file. During online retrieval, we only need to extract local descriptors from queries, but could get semantic-aware retrieval results. This is because during our off-line indexing stage, both the semantically and visually relevant images are organized together by indexing heterogeneous features in superimages. Therefore, our approach is naturally superior to many online retrieval fusion algorithms in terms of retrieval efficiency and memory consumption. Moreover, extensive experiments on multiple retrieval tasks also manifest the promising accuracy of our approach.
Similar content being viewed by others
References
Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS
Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: ECCV. Springer, Berlin, pp 404–417
Bergamo A, Torresani L (2012) Meta-class features for large-scale object categorization on a budget. In: CVPR
Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: Binary robust independent elementary features. In: ECCV
Deng J, Berg AC, Fei-Fei L (2011) Hierarchical semantic indexing for large scale image retrieval. In: CVPR
Douze M, Jégou H, Sandhawalia H, Amsaleg L, Schmid C (2009) Evaluation of gist descriptors for web-scale image search. In: ICIVR. ACM, p 19
Douze M, Ramisa A, Schmid C (2011) Combining attributes and fisher vectors for effcient image retrieval. In: CVPR
Fagin R, Kumar R, Sivakumar D (2003) Efficient similarity search and classification via rank aggregation. In: ACM SIGMOD
Fellbaum C (1998) Wordnet: an electronic lexical database. Bradford Books
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Gionis A, Indyky P, Motwaniz R (1999) Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: MIR ’08: Proceedings of the 2008 ACM ICMIR. ACM, New York
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: ECCV
Jégou H, Douze M, Schmid C (2010) Improving bag-of- feature for large scale image search. IJCV 87(3):316–336
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. TPAMI 33(1):117–128
Jégou H, Schmid C, Harzallah H, Verbeek J (2010) Accurate image search using the contextual dissimilarity measure. TPAMI 32(1):2–11
Karp RM (1972) Reducibility among combinatorial problems. Springer, Berlin
Ke Y, Sukthankar R (2004) Pca-sift: A more distinctive representation for local image descriptors. In: CVPR, IEEE, vol. 2, pp II-506
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS
Liu Z, Li H, Zhou W, Tian Q (2012) Embedding spatial context into inverted file for large-scale image search. In: ACM Multimedia
Large scale visual recognition challenge (2010). http://www.image-net.org/challenges/LSVRC/2010
Lowe DG (2004) Distinctive image features from scale invariant keypoints. IJCV 60(2):91–110
Makino K, Uno T (2004) New algorithms for enumerating all maximal cliques. In: Algorithm Theory-SWAT 2004, pp. 260–272. Springer, Berlin
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. TPAMI 27(10):1615–1630
Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: Analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: CVPR
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3):145–175
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. ECCV 4:143–156
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: CVPR
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an effcient alternative to sift or surf. In: ICCV
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: ICCV
Tomita E, Tanaka A, Takahashi H (2006) The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci 363(1):28–42
Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: CVPR
Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: ECCV, pp. 776–789
Wu Z, Ke Q, Isard M, Sun J (2009) Bundling feature for large scale partial-duplicated web image search. In: CVPR
Ye G, Liu D, Jhuo IH, Chang SF (2012) Robust late fusion with rank minimization. In: CVPR
Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: CVPR, IEEE, pp 3312–3319
Zhang S, Huang Q, Hua G, Jiang S, Gao W (2010) Tian, Q.: building contextual visual vocabulary for large-scale image applications. In: ACM multimedia
Zhang S, Tian Q, Hua G, Huang Q, Gao W (2009) Descriptive visual words and visual phrases for image applications. In: ACM multimedia
Zhang S, Tian Q, Lu K, Huang Q, Gao W (2013) Edge-sift: discriminative binary descriptor for scalable partial-duplicate mobile search. TIP
Zhang S, Yang M, Cour T, Yu K, Metaxas DN (2012) Query specific fusion for image retrieval. ECCV 2:660–673
Zhang S, Yang M, Wang X, Lin Y, Tian Q (2013) Sematnic-aware co-indexing for image retrieval. In: ICCV
Zhang Y, Jia, Z, Chen T (2011) Image retrieval with geometry-preserving visual phrases. In: CVPR
Acknowledgments
This work was supported in part to Dr. Qi Tian by ARO grant W911NF-12-1-0057, Faculty Research Award by NEC Laboratories of America, and 2012 UTSA START-R Research Award, respectively. This work was supported in part by NSFC 61128007.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luo, Q., Zhang, S., Huang, T. et al. Indexing heterogeneous features with superimages. Int J Multimed Info Retr 3, 245–257 (2014). https://doi.org/10.1007/s13735-014-0064-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-014-0064-x