Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Cross Indexing With Grouplets

Published: 01 November 2015 Publication History

Abstract

Most of the current image indexing systems for retrieval view a database as a set of individual images. It limits the flexibility of the retrieval framework to conduct sophisticated cross-image analysis, resulting in higher memory consumption and sub-optimal retrieval accuracy. To conquer this issue, we propose cross indexing with grouplets, where the core idea is to view the database images as a set of grouplets, each of which is defined as a group of highly relevant images. Because a grouplet groups similar images together, the number of grouplets is smaller than the number of images, thus naturally leading to less memory cost. Moreover, the definition of a grouplet could be based on customized relations, allowing for seamless integration of advanced image features and data mining techniques like the deep convolutional neural network (DCNN) in off-line indexing . To validate the proposed framework, we construct three different types of grouplets, which are respectively based on local similarity, regional relation, and global semantic modeling. Extensive experiments on public benchmark datasets demonstrate the efficiency and superior performance of our approach.

References

[1]
O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., pp. 1–42, Apr. 2015, [Online]. Available: http://www.image-net.org/challenges/LSVRC/2010.
[2]
R. Arandjelović and A. Zisserman, “Three things everyone should know to improve object retrieval,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 2911–2918.
[3]
A. Babenko, A. Slesarev, A. Chigorin, and V. S. Lempitsky, “Neural codes for image retrieval,” in Proc. ECCV, 2014, pp. 584–599.
[4]
H. Bay, T. Tuytelaars, and L. V. Gool, “Surf: Speeded up robust features,” in Proc. ECCV, 2006, pp. 404–417.
[5]
Y.-L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of feature pooling in visual recognition,” in Proc. ICML, 2010, pp. 111–118.
[6]
O. Chum and J. Matas, “Large-scale discovery of spatially related images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 371–377, Feb. 2010.
[7]
J. Deng, A. C. Berg, and L. Fei-Fei, “Hierarchical semantic indexing for large scale image retrieval,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 785–792.
[8]
M. Douze, A. Ramisa, and C. Schmid, “Combining attributes and fisher vectors for efficient image retrieval,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 745–752.
[9]
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013.
[10]
A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2009, pp. 1778–1785.
[11]
C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA USA: Bradford Book, 1998.
[12]
V. Ferrari and A. Zisserman, “Learning visual attributes,” in Proc. NIPS, 2007, pp. 433–440.
[13]
B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007.
[14]
A. Gionis, P. Indyky, and R. Motwaniz, “Similarity search in high dimensions via hashing,” in Proc. VLDB, 1999, pp. 518–529.
[15]
A. G. Hauptmann, M. G. Christel, and R. Yan, “Video retrieval based on semantic concepts,” Proc. IEEE, vol. 96, no. 4, pp. 602–622, Apr. 2008.
[16]
H. Jégou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in Proc. ECCV, 2008, pp. 304–317.
[17]
H. Jégou, M. Douze, and C. Schmid, “Improving bag-of- feature for large scale image search,” Int. J. Comput. Vis., vol. 87, no. 3, pp. 316–336, 2010.
[18]
H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 117–128, Jan. 2011.
[19]
Y. Jia et al., “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014, [Online]. Available: http://caffe.berkeleyvision.org/.
[20]
R. M. Karp, Reducibility Among Combinatorial Problems, New York, NY USA: Springer, 1972.
[21]
A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. NIPS, 2012, pp. 1097–1105.
[22]
Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. Huang, “Large-scale image classification: Fast feature extraction and SVM training,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 1689–1696.
[23]
W. Liu, J. Wang, R. Ji, Y. Jiang, and S.-F. Chang, “Supervised hashing with kernels,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 2074–2081.
[24]
Z. Liu, H. Li, W. Zhou, and Q. Tian, “Embedding spatial context into inverted file for large-scale image search,” in Proc. ACM Multimedia, 2012, pp. 199–208.
[25]
D. G. Lowe, “Distinctive image features from scale invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[26]
Q. Luo, S. Zhang, T. Huang, W. Gao, and Q. Tian, “Superimage: Packing semantic-relevant images for indexing and retrieval,” in Proc. ICMR, 2014, pp. 41–48.
[27]
K. Makino and T. Uno, New Algorithms for Enumerating All Maximal Cliques, New York, NY USA: Springer, 2004.
[28]
A. Mikulk, M. Perdoch, O. Chum, and J. Matas, “Learning a fine vocabulary,” in Proc. ECCV, 2010, pp. 1–14.
[29]
D. Nistér and H. Stewénius, “Scalable recognition with a vocabulary tree,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2006, vol. 2, pp. 2161–2168.
[30]
M. Norouzi and D. J. Fleet, “Cartesian k-means,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2013, pp. 3017–3024.
[31]
F. Perronnin, J. Sánchez, and T. Mensink, “Improving the fisher kernel for large-scale image classification,” in Proc. ECCV, pp. 143–156.
[32]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2007, pp. 1–8.
[33]
D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. van Gool, “Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 777–784.
[34]
D. Qin, C. Wengert, and L. van Gool, “Query adaptive similarity for large scale object retrieval,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2013, pp. 1610–1617.
[35]
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to sift or surf,” in Proc. Int. Conf. Comput. Vis., Nov. 2011, pp. 2564–2571.
[36]
X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu, “Object retrieval and localization with spatially-constrained similarity measure and k-NN reranking,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 3013–3020.
[37]
J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. Int. Conf. Comput. Vis., Oct. 2003, vol. 2, pp. 1470–1477.
[38]
C. Szegedy, T. Alexander, and E. Dumitru, “Deep neural networks for object detection,” in Proc. NIPS, 2013, pp. 2553–2561.
[39]
E. Tomita, A. Tanaka, and H. Takahashi, “The worst-case time complexity for generating all maximal cliques and computational experiments,” Theoretical Comput. Sci., vol. 363, no. 1, pp. 28–42, 2006.
[40]
L. Torresani, M. Szummer, and A. Fitzgibbon, “Efficient object category recognition using classemes,” in Proc. ECCV, pp. 776–789.
[41]
J. Wang, K. Sanjiv, and S.-F. Chang, “Semi-supervised hashing for large-scale search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 12, pp. 2393–2406, Dec. 2012.
[42]
X. Wang, M. Yang, T. Cour, S. Zhu, K. Yu, and T. X. Han, “Contextual weighting for vocabulary tree based image retrieval,” in Proc. Int. Conf. Comput. Vis., Nov. 2011, pp. 209–216.
[43]
Z. Wu, Q. Ke, M. Isard, and J. Sun, “Bundling feature for large scale partial-duplicated web image search,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2009, pp. 25–32.
[44]
G. Ye, D. Liu, I.-H. Jhuo, and S.-F. Chang, “Robust late fusion with rank minimization,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 3021–3028.
[45]
F. X. Yu, R. Ji, M.-H. Tsai, G. Ye, and S.-F. Chang, “Weak attributes for large-scale image retrieval,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2012, pp. 2949–2956.
[46]
S. Zhang, Q. Huang, G. Hua, S. Jiang, W. Gao, and Q. Tian, “Building contextual visual vocabulary for large-scale image applications,” in Proc. ACM Multimedia, 2010, pp. 501–510.
[47]
S. Zhang, Q. Tian, G. Hua, Q. Huang, and W. Gao, “Objectpatchnet: Towards scalable and semantic image annotation and retrieval,” Comput. Vis. Image Understand., no. 118, pp. 16–29, 2014.
[48]
S. Zhang, Q. Tian, Q. Huang, W. Gao, and Y. Rui, “Cascade category-aware visual search,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2514–2527, Jun. 2014.
[49]
S. Zhang, Q. Tian, Q. Huang, W. Gao, and Y. Rui, “USB: Ultra short binary descriptor for fast visual matching and retrieval,” IEEE Trans. Image Process., vol. 23, no. 8, pp. 3671–3683, Aug. 2014.
[50]
S. Zhang, Q. Tian, K. Lu, Q. Huang, and W. Gao, “Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search,” IEEE Trans. Image Process., vol. 22, no. 7, pp. 2889–2902, Jul. 2013.
[51]
S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas, “Query specific fusion for image retrieval,” in Proc. ECCV, 2012, pp. 660–673.
[52]
S. Zhang, M. Yang, X. Wang, Y. Lin, and Q. Tian, “Semantic-aware co-indexing for image retrieval,” in Proc. Int. Conf. Comput. Vis., Dec. 2013, pp. 1673–1680.
[53]
S. Zhang, M. Yang, X. Wang, Y. Lin, and Q. Tian, “Semantic-aware co-indexing for image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell.
[54]
Y. Zhang, Z. Jia, and T. Chen, “Image retrieval with geometry-preserving visual phrases,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., Jun. 2011, pp. 809–816.
[55]
W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian, “Spatial coding for large scale partial-duplicate web image search,” in Proc. ACM Multimedia, 2010, pp. 511–520.

Cited By

View all
  • (2017)GLADProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123279(420-428)Online publication date: 23-Oct-2017
  • (2017)One-Shot Fine-Grained Instance RetrievalProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123278(342-350)Online publication date: 23-Oct-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 17, Issue 11
Nov. 2015
235 pages

Publisher

IEEE Press

Publication History

Published: 01 November 2015

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)GLADProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123279(420-428)Online publication date: 23-Oct-2017
  • (2017)One-Shot Fine-Grained Instance RetrievalProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123278(342-350)Online publication date: 23-Oct-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media