Abstract
During the last years, the problem of Content-Based Image Retrieval (CBIR) was addressed in many different ways, achieving excellent results in small-scale datasets. With growth of the data to evaluate, new issues need to be considered and new techniques are necessary in order to create an efficient yet accurate system. In particular, computational time and memory occupancy need to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. For this reason, a brute-force approach is no longer feasible, and an Approximate Nearest Neighbor (ANN) search method is preferable. This paper describes the state-of-the-art ANN methods, with a particular focus on indexing systems, and proposes a new ANN technique called Bag of Indexes (BoI). This new technique is compared with the state of the art on several public benchmarks, obtaining 86.09% of accuracy on Holidays+Flickr1M, 99.20% on SIFT1M and 92.4% on GIST1M. Noteworthy, these state-of-the-art accuracy results are obtained by the proposed approach with a very low retrieval time, making it excellent in the trade off between accuracy and efficiency.
Similar content being viewed by others
Notes
The C++ code is available on https://www.github.com/fmaglia/BoI
References
Böhm C., Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR) 33(3):322–373
Brandenburg FJ, Gleißner A, Hofmeier A (2013) The nearest neighbor spearman footrule distance for bucket, interval, and partial orders. J Combinatorial Optim 26(2):310–332
Cao Y, Liu B, Long M, Wang J, KLiss M (2018) Hashgan: Deep learning to hash with pair conditional wasserstein GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1287–1296
Chen CC, Hsieh SL (2015) Using binarization and hashing for efficient SIFT matching. J Vis Commun Image Represent 30:86–93
Du S, Zhang W, Chen S, Wen Y (2014) Learning flexible binary code for linear projection based hashing with random forest. In: Proceedings of the 22nd international conference on pattern recognition. IEEE, pp 2685–2690
Ercoli S, Bertini M, Del Bimbo A (2017) Compact hash codes for efficient visual descriptors retrieval in large scale databases. IEEE Transactions on Multimedia 19(11):2521–2532
Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management 48(5):889–902
Ge T, He K, Ke Q, Sun J (2014) Optimized product quantization. IEEE Trans Pattern Anal Mach Intell 36(4):744–755
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257
Greene D, Parnas M, Yao F (1994) Multi-index hashing for information retrieval. In: Proceedings of the 35th Annual symposium on foundations of computer science. IEEE, pp 722–731
Guo D, Li C, Wu L (2016) Parametric and nonparametric residual vector quantization optimizations for ANN search. Neurocomputing 217:92–102
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 604–613
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision. Springer, pp 304–317
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Jin Z, Li C, Lin Y, Cai D (2014) Density sensitive hashing. IEEE Trans Cybern 44(8):1362–1371
Kalantidis Y, Avrithis Y (2014) Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2321–2328
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision. Springer, pp 685–701
Lin J, Morere O, Petta J, Chandrasekhar V, Veillard A (2016) Tiny descriptors for image retrieval with unsupervised triplet hashing. In: Data Compression Conference (DCC). IEEE, pp 397–406
Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lu X, Song L, Xie R, Yang X, Zhang W (2017) Deep hash learning for efficient image retrieval. In: IEEE international conference on multimedia & expo workshops. IEEE, pp 579–584
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 950–961
Magliani F, Bidgoli NM, Prati A (2017) A location-aware embedding technique for accurate landmark recognition. In: Proceedings of the 11th international conference on distributed smart cameras. ACM, pp 9–14
Magliani F, Fontanini T, Prati A (2018) A dense-depth representation for VLAD descriptors in content-based image retrieval. In: International symposium on visual computing. Springer, pp 662–671
Magliani F, Fontanini T, Prati A (2018) Efficient nearest neighbors search for large-scale landmark recognition. Proceedings of the 13th international symposium on visual computing 11241:541–551
Magliani F, Prati A (2018) An accurate retrieval through R-MAC+ descriptors for landmark recognition. In: Proceedings of the 12th international conference on distributed smart cameras. ACM, p 6
Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the international conference on multimedia retrieval. ACM, pp 327–331
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Norouzi M, Fleet DJ (2013) Cartesian k-means. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3017–3024
Norouzi M, Punjani A, Fleet DJ (2012) Fast search in hamming space with multi-index hashing. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3108–3115
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Ren G, Cai J, Li S, Yu N, Tian Q (2014) Salable image search with reliable binary code. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 769–772
Wang D, Cui P, Ou M, Zhu W (2015) Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Transactions on Multimedia 17(9):1404–1416
Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: A survey. arXiv:1408.2927
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, vol 1, p 2
Zhou W, Lu Y, Li H, Tian Q (2012) Scalar quantization for large scale image search. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 169–178
Zhou W, Yang M, Li H, Wang X, Lin Y, Tian Q, et al (2014) Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Transactions of Multimedia 16(3):601–611
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: AAAI, pp 2415–2421
Acknowledgements
This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy.
This is work is partially funded by Regione Emilia Romagna under the “Piano triennale alte competenze per la ricerca, il trasferimento tecnologico e l’imprenditorialità”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A - LSH projection algorithm
As mentioned in Section 2.3, the hash function used for projecting the vectors is the following:
where x represents the input feature vector and v represents the projecting vector. Before starting the calculation, it is important to generate the vector v for the projection. The values of this vector are sampled from a Gaussian distribution \(\mathcal {N}(0,1)\). The dimensions of this list of vectors depend from the hash dimension (δ), the number of hash tables (L) and the dimension of the input vector. After that, it is possible to calculate the correct bucket using LSH projections. Assuming to have δ = 8 and L = 10, 80 LSH projections will be generated for each image descriptor, but only 10 buckets for each image are obtained for projecting the descriptors (one for each hash table). This is due to the fact that, for each hash table, hash dimension (δ) projections are calculated, so in the end the vectors will projected in only L buckets. Applying more projections for each hash table (thus, increasing the hash dimension δ) allows to improve the robustness of LSH approach and reduce the possibility of projecting different elements in close buckets because increasing the number of bits used (δ) increase the number of possible buckets for the final projection. To summarize, once we have the projection vectors, the dot product between the input vector and the projection vector is computed. If it is greater than zero, the bucket value (index) is increased by a power of two.
B - Memory requirements
The memory requirements of the ANN algorithm depend from the considered number of images. For example, if 1M images are represented by 1M descriptors of 128D (float = 4 bytes) are employed, the brute-force approach requires 0.5Gb (1M x 128 x 4). For the same task, LSH needs only 100 Mb, because it needs to store 1M indexes for each of the L = 100 hash tables, since each index is stored using a single byte (8 bit). In addition, the proposed BoI only requires additional 4 Mb to store 1M weights, allowing to scale better the proposed approach on larger datasets than the brute-force approach.
Rights and permissions
About this article
Cite this article
Magliani, F., Fontanini, T. & Prati, A. Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search. Multimed Tools Appl 80, 23135–23156 (2021). https://doi.org/10.1007/s11042-020-10262-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10262-4