Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Indexing Metric Spaces for Exact Similarity Search

Published: 07 December 2022 Publication History

Abstract

With the continued digitization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity, and variety. Many studies address volume or velocity, while fewer studies concern the variety. Metric spaces are ideal for addressing variety because they can accommodate any data as long as it can be equipped with a distance notion that satisfies the triangle inequality. To accelerate search in metric spaces, a collection of indexing techniques for metric data have been proposed. However, existing surveys offer limited coverage, and a comprehensive empirical study exists has yet to be reported. We offer a comprehensive survey of existing metric indexes that support exact similarity search: we summarize existing partitioning, pruning, and validation techniques used by metric indexes to support exact similarity search; we provide the time and space complexity analyses of index construction; and we offer an empirical comparison of their query processing performance. Empirical studies are important when evaluating metric indexing performance, because performance can depend highly on the effectiveness of available pruning and validation as well as on the data distribution, which means that complexity analyses often offer limited insights. This article aims at revealing strengths and weaknesses of different indexing techniques to offer guidance on selecting an appropriate indexing technique for a given setting, and to provide directions for future research on metric indexing.

Supplementary Material

3534963.supp (3534963.supp.pdf)
Supplementary material

References

[1]
Charu C. Aggarwal and Philip S. Yu. 2001. Outlier detection for high dimensional data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 37–46.
[2]
Jurandy Almeida, Ricardo da S. Torres, and Neucimar J. Leite. 2010. BP-tree: An efficient index for similarity search in high-dimensional metric spaces. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 1365–1368.
[3]
Jurandy Almeida, Eduardo Valle, Ricardo da S. Torres, and Neucimar J. Leite. 2010. DAHC-tree: An effective index for approximate search in high-dimensional metric spaces. Journal of Information and Data Management 1, 3 (2010), 375–390.
[4]
Giuseppe Amato, Claudio Gennaro, and Pasquale Savino. 2014. MI-file: Using inverted files for scalable approximate similarity search. Multimedia Tools and Applications 71, 3 (2014), 1333–1362.
[5]
Laurent Amsaleg, Oussama Chelly, Michael E. Houle, Ken-Ichi Kawarabayashi, Miloš Radovanović, and Weeris Treeratanajaru. 2019. Intrinsic dimensionality estimation within tight localities. In Proceedings of the 2019 SIAM International Conference on Data Mining. 181–189.
[6]
Fabrizio Angiulli and Fabio Fassetti. 2012. Indexing uncertain data in general metric spaces. IEEE Transactions on Knowledge and Data Engineering 24, 9 (2012), 1640–1657.
[7]
Arvind Arasu, Venkatesh Ganti, and Raghav Kaushik. 2006. Efficient exact set-similarity joins. In Proceedings of the 32nd International Conference on Very Large Data Bases. 918–929.
[8]
Luis G. Ares, Nieves R. Brisaboa, María F. Esteller, Oscar Pedreira, and Angeles S. Places. 2009. Optimal pivots to minimize the index size for metric access methods. In Proceedings of the 2009 2nd International Workshop on Similarity Search and Applications. 74–80.
[9]
Luis G. Ares, Nieves R. Brisaboa, Alberto Ordóñez Pereira, and Oscar Pedreira. 2012. Efficient similarity search in metric spaces with cluster reduction. In Proceedings of the International Conference on Similarity Search and Applications. 70–84.
[10]
Lior Aronovich and Israel Spiegler. 2007. CM-tree: A dynamic clustered index for similarity search in metric databases. Data & Knowledge Engineering 63, 3 (2007), 919–946.
[11]
Vassilis Athitsos, Jonathan Alon, Stan Sclaroff, and George Kollios. 2007. Boostmap: An embedding method for efficient nearest neighbor retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1 (2007), 89–104.
[12]
Vassilis Athitsos, Michalis Potamias, Panagiotis Papapetrou, and George Kollios. 2008. Nearest neighbor retrieval using distance-based hashing. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. 327–336.
[13]
Ricardo Baeza-Yates. 1997. Searching: An algorithmic tour. Encyclopedia of Computer Science and Technology 37 (1997), 331–359.
[14]
Ricardo Baeza-Yates, Walter Cunto, Udi Manber, and Sun Wu. 1994. Proximity matching using fixed-queries trees. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching. 198–212.
[15]
Marcelo Barroso, Nora Reyes, and Rodrigo Paredes. 2010. Enlarging nodes to improve dynamic spatial approximation trees. In Proceedings of the 3rd International Conference on Similarity Search and Applications. 41–48.
[16]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The \(\rm R^*\) -tree: An efficient and robust access method for points and rectangles. In Proceedings of the SIGMOD Record, Vol. 19. 322–331.
[17]
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Communication of the ACM 18, 9 (1975), 509–517.
[18]
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When is “nearest neighbor” meaningful?. In Proceedings of the International Conference on Database Theory. 217–235.
[19]
Alina Beygelzimer, Sham Kakade, and John Langford. 2006. Cover trees for nearest neighbor. In Proceedings of the 23rd International Conference on Machine Learning. 97–104.
[20]
Tolga Bozkaya and Meral Ozsoyoglu. 1997. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. 357–368.
[21]
Tolga Bozkaya and Meral Ozsoyoglu. 1999. Indexing large metric spaces for similarity search queries. ACM Transactions on Database Systems 24, 3 (1999), 361–404.
[22]
Svein Erik Bratsberg and Magnus Lie Hetland. 2012. Dynamic optimization of queries in pivot-based indexing. Multimedia Tools and Applications 60, 2 (2012), 261–275.
[23]
Sergey Brin. 1995. Near neighbor search in large metric spaces. In Proceedings of the 21th International Conference on Very Large Data Bases (1995), 574–584.
[24]
Luis Britos, A. Marcela Printista, and Nora Reyes. 2012. DSACL \(^+\) -tree: A dynamic data structure for similarity search in secondary memory. In Proceedings of the International Conference on Similarity Search and Applications. 116–131.
[25]
Walter A. Burkhard and Robert M. Keller. 1973. Some approaches to best-match file searching. Communication of the ACM 16, 4 (1973), 230–236.
[26]
Benjamin Bustos, Gonzalo Navarro, and Edgar Chávez. 2003. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters 24, 14 (2003), 2357–2366.
[27]
Benjamin Bustos and Tomáš Skopal. 2006. Dynamic similarity search in multi-metric spaces. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. 137–146.
[28]
Domenico Cantone, Alfredo Ferro, Alfredo Pulvirenti, Diego Reforgiato Recupero, and Dennis Shasha. 2005. Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces. IEEE Transactions on Knowledge and Data Engineering 17, 4 (2005), 535–550.
[29]
Caio César Mori Carélo, Ives Rene Venturini Pola, Ricardo Rodrigues Ciferri, Agma Juci Machado Traina, Caetano Traina Jr, and Cristina Dutra de Aguiar Ciferri. 2011. Slicing the metric space to provide quick indexing of complex data in the main memory. Information Systems 36, 1 (2011), 79–98.
[30]
Caio CÚsar Mori CarÚlo, Ives Renŕ Venturini Pola, Ricardo Rodrigues Ciferri, Agma Juci Machado Traina, Cristina Dutra de Aguiar Ciferri, et al. 2009. The onion-tree: Quick indexing of complex data in the main memory. In Proceedings of the East European Conference on Advances in Databases and Information Systems. 235–252.
[31]
Edgar Chávez, Karina Figueroa, and Gonzalo Navarro. 2008. Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 9 (2008), 1647–1658.
[32]
Edgar Chávez, Verónica Luduena, Nora Reyes, and Patricia Roggero. 2016. Faster proximity searching with the distal SAT. Information Systems 59, July (2016), 15–47.
[33]
Edgar Chávez and Gonzalo Navarro. 2000. An effective clustering algorithm to index high dimensional metric spaces. In Proceedings of the 7th International Symposium on String Processing and Information Retrieval. 75–86.
[34]
Edgar Chávez and Gonzalo Navarro. 2005. A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26, 9 (2005), 1363–1376.
[35]
Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, and José Luis Marroquín. 2001. Searching in metric spaces. ACM Computing Survey 33, 3 (2001), 273–321.
[36]
Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, and Gang Chen. 2015. Efficient metric indexing for similarity search. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. 591–602.
[37]
Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, and Gang Chen. 2015. Efficient metric indexing for similarity search and similarity joins. IEEE Transactions on Knowledge and Data Engineering 29, 3 (2015), 556–571.
[38]
Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen, and Baihua Zheng. 2015. Indexing metric uncertain data for range queries. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 951–965.
[39]
Lu Chen, Yunjun Gao, Baihua Zheng, Christian S. Jensen, Hanyu Yang, and Keyu Yang. 2017. Pivot-based metric indexing. PVLDB 10, 10 (2017), 1058–1069.
[40]
Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, and Baihua Zheng. 2017. Indexing metric uncertain data for range queries and range joins. The VLDB Journal 26, 4 (2017), 585–610.
[41]
Kenneth Ward Church. 2017. Word2Vec. Natural Language Engineering 23, 1 (2017), 155–162.
[42]
Paolo Ciaccia and Marco Patella. 1998. Bulk loading the M-tree. In Proceedings of the 9th Australasian Database Conference. 15–26.
[43]
Paolo Ciaccia and Marco Patella. 2000. The \(\rm M^2\) -tree: Processing complex multi-feature queries with just one index. In Proceedings of the DELOS Workshop.
[44]
Paolo Ciaccia and Marco Patella. 2002. Searching in metric spaces with user-defined and approximate distances. ACM Transactions on Database Systems 27, 4 (2002), 398–437.
[45]
Paolo Ciaccia and Marco Patella. 2017. The power of distance distributions: Cost models and scheduling policies for quality controlled similarity queries. In Proceedings of the International Conference on Similarity Search and Applications. 3–16.
[46]
Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the VLDB. 426–435.
[47]
Kenneth L. Clarkson. 2006. Nearest-neighbor searching and metric space dimensions. In Nearest-neighbor Methods for Learning and Vision: Theory and Practice (2006), 15–59.
[48]
Richard Connor. 2016. Reference point hyperplane trees. In Proceedings of the International Conference on Similarity Search and Applications. 65–78.
[49]
Richard Connor, Franco Alberto Cardillo, Lucia Vadicamo, and Fausto Rabitti. 2016. Hilbert exclusion: Improved metric search through finite isometric embeddings. ACM Transactions on Information Systems 35, 3 (2016), 1–27.
[50]
Richard Connor and Alan Dearle. 2018. Querying metric spaces with bit operations. In Proceedings of the International Conference on Similarity Search and Applications. 33–46.
[51]
Richard Connor, Lucia Vadicamo, Franco Alberto Cardillo, and Fausto Rabitti. 2019. Supermetric search. Information Systems 80, February (2019), 108–123.
[52]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communication of the ACM 51, 1 (2008), 107–113.
[53]
FKHA Dehne and Hartmut Noltemeier. 1987. Voronoi trees and clustering problems. Information Systems 12, 2 (1987), 171–175.
[54]
Frank Dehne and Hartmut Noltemeier. 1988. Voronoi trees and clustering problems. In Proceedings of the Syntactic and Structural Pattern Recognition. 185–194.
[55]
Vlastislav Dohnal. 2004. An access structure for similarity search in metric spaces. In Proceedings of the International Conference on Extending Database Technology. 133–143.
[56]
Vlastislav Dohnal, Claudio Gennaro, Pasquale Savino, and Pavel Zezula. 2003. D-index: Distance searching index for metric data sets. Multimedia Tools and Applications 21, 1 (2003), 9–33.
[57]
Vlastislav Dohnal, Claudio Gennaro, and Pavel Zezula. 2003. Similarity join in metric spaces using eD-index. In Proceedings of the International Conference on Database and Expert Systems Applications. 484–493.
[58]
Karina Figueroa, Edgar Chávez, Gonzalo Navarro, and Rodrigo Paredes. 2006. On the least cost for proximity searching in metric spaces. In Proceedings of the International Workshop on Experimental and Efficient Algorithms. 279–290.
[59]
Karina Figueroa, Edgar Chávez, Gonzalo Navarro, and Rodrigo Paredes. 2010. Speeding up spatial approximation search in metric spaces. Journal of Experimental Algorithmics 14, 6 (2010), 3–6.
[60]
Karina Figueroa and Nora Reyes. 2019. Permutation’s signatures for proximity searching in metric spaces. In Proceedings of the International Conference on Similarity Search and Applications. 151–159.
[61]
Maximilian Franzke, Tobias Emrich, Andreas Züfle, and Matthias Renz. 2016. Indexing multi-metric data. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering. 1122–1133.
[62]
Kimmo Fredriksson. 2005. Exploiting distance coherence to speed up range queries in metric indexes. Information Processing Letters 95, 1 (2005), 287–292.
[63]
Kimmo Fredriksson. 2007. Engineering efficient metric indexes. Pattern Recognition Letters 28, 1 (2007), 75–84.
[64]
Ada Wai-chee Fu, Polly Mei-shuen Chan, Yin-Ling Cheung, and Yiu Sang Moon. 2000. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. The VLDB Journal 9, 2 (2000), 154–173.
[65]
Jonathan Goldstein and Raghu Ramakrishnan. 2000. Contrast plots and p-sphere trees: Space vs. time in nearest neighbour searches. In Proceedings of the 26th International Conference on Very Large Data Bases. 429–440.
[66]
Magnus Lie Hetland. 2009. The basic principles of metric indexing. In Proceedings of the Swarm Intelligence for Multi-objective Problems in Data Mining. 199–232.
[67]
Magnus Lie Hetland. 2015. Ptolemaic indexing. Journal of Computational Geometry 6, 1 (2015), 165–184.
[68]
Magnus Lie Hetland, Tomáš Skopal, Jakub Lokoč, and Christian Beecks. 2013. Ptolemaic access methods: Challenging the reign of the metric space model. Information Systems 38, 7 (2013), 989–1006.
[69]
Gisli R. Hjaltason and Hanan Samet. 2003. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems 28, 4 (2003), 517–580.
[70]
Michael E. Houle. 2013. Dimensionality, discriminability, density and distance distributions. In Proceedings of the IEEE 13th International Conference on Data Mining Workshops. 468–473.
[71]
Michael E. Houle. 2017. Local intrinsic dimensionality I: An extreme-value-theoretic foundation for similarity applications. In Proceedings of the International Conference on Similarity Search and Applications. 64–79.
[72]
Michael E. Houle. 2017. Local intrinsic dimensionality II: Multivariate analysis and distributional support. In Proceedings of the International Conference on Similarity Search and Applications. 80–95.
[73]
Michael E. Houle. 2020. Local intrinsic dimensionality III: Density and Similarity. In Proceedings of the International Conference on Similarity Search and Applications. 248–260.
[74]
Michael E. Houle and Michael Nett. 2013. Rank cover trees for nearest neighbor search. In Proceedings of the International Conference on Similarity Search and Applications. 16–29.
[75]
Masahiro Ishikawa, Hanxiong Chen, Kazutaka Furuse, Jeffrey Xu Yu, and Nobuo Ohbo. 2000. MB+ tree: A dynamically updatable metric index for similarity search. In Proceedings of the International Conference on Web-Age Information Management. 356–374.
[76]
Hosagrahar V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, and Rui Zhang. 2005. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems 30, 2 (2005), 364–397.
[77]
Shichao Jin, Okhee Kim, and Wenya Feng. 2013. M \(^X\) -tree: A double hierarchical metric index with overlap reduction. In Proceedings of the International Conference on Computational Science and Its Applications. 574–589.
[78]
Iraj Kalantari and Gerard McDonald. 1983. A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering 9, 5 (1983), 631–634.
[79]
Jongik Kim and Hongrae Lee. 2012. Efficient exact similarity searches using multiple token orderings. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering. 822–833.
[80]
Zineddine Kouahla. 2011. Exploring intersection trees for indexing metric spaces. In Proceedings of the CIIA.
[81]
Elizaveta Levina and Peter Bickel. 2001. The earth mover’s distance is the mallows distance: Some insights from statistics. In Proceedings of the 8th IEEE International Conference on Computer Vision. 251–256.
[82]
King-Ip Lin, Hosagrahar V. Jagadish, and Christos Faloutsos. 1994. The TV-tree: An index structure for high-dimensional data. The VLDB Journal 3, 4 (1994), 517–542.
[83]
Bing Liu, Wei Wang, Heping Yan, Baile Shi, et al. 2006. A bottom-up distance-based index tree for metric space. In Proceedings of the 2006 2nd International Conference on Information & Communication Technologies, Vol. 2. 2929–2934.
[84]
Jakub Lokoč, Juraj Moško, Přemysl Čech, and Tomáš Skopal. 2014. On indexing metric spaces using cut-regions. Information Systems 43, July (2014), 1–19.
[85]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45, September (2014), 61–68.
[86]
Rui Mao, Willard L. Miranker, and Daniel P. Miranker. 2012. Pivot selection: Dimension reduction for distance-based indexing. Journal of Discrete Algorithms 13, May (2012), 32–46.
[87]
Mauricio Marin, Roberto Uribe, and Ricardo Barrientos. 2007. Searching and updating metric space databases using the parallel EGNAT. In Proceedings of the International Conference on Computational Science. 229–236.
[88]
José Martinez and Zineddine Kouahla. 2012. Indexing metric spaces with nested forests. In Proceedings of the International Conference on Database and Expert Systems Applications. 458–465.
[89]
Vladimir Mic, David Novak, and Pavel Zezula. 2017. Sketches with unbalanced bits for similarity search. In Proceedings of the International Conference on Similarity Search and Applications. 53–63.
[90]
Luisa Micó, José Oncina, and Rafael C. Carrasco. 1996. A fast branch & bound nearest neighbour classifier in metric spaces. Pattern Recognition Letters 17, 7 (1996), 731–739.
[91]
María Luisa Micó, José Oncina, and Enrique Vidal. 1994. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters 15, 1 (1994), 9–17.
[92]
Hisham Mohamed and Stéphane Marchand-Maillet. 2015. Quantized ranking for permutation-based indexing. Information Systems 52, August - September (2015), 163–175.
[93]
Anirban Mondal, Ayaan Kakkar, Nilesh Padhariya, and Mukesh Mohania. 2021. Efficient indexing of top- \(k\) entities in systems of engagement with extensions for geo-tagged entities. Data Science and Engineering 6, 4 (2021), 411–433.
[94]
Juraj Moško, Jakub Lokoč, and Tomáš Skopal. 2011. Clustered pivot tables for I/O-optimized similarity search. In Proceedings of the 4th International Conference on SImilarity Search and APplications. 17–24.
[95]
Bilegsaikhan Naidan, Leonid Boytsov, and Eric Nyberg. 2015. Permutation search methods are efficient, yet faster search is possible. PVLDB 8, 12 (2015), 1618–1629.
[96]
Alexandros Nanopoulos, Yannis Theodoridis, and Yannis Manolopoulos. 2001. C2P: Clustering based on closest pairs. In Proceedings of the VLDB. 331–340.
[97]
Gonzalo Navarro. 1999. Searching in metric spaces by spatial approximation. In Proceedings of the SPIRE. 141–148.
[98]
Gonzalo Navarro. 2002. Searching in metric spaces by spatial approximation. VLDB Journal 11, 1 (2002), 28–46.
[99]
Gonzalo Navarro, Rodrigo Paredes, Nora Reyes, and Cristian Bustos. 2017. An empirical evaluation of intrinsic dimension estimators. Information Systems 64, March (2017), 206–218.
[100]
Gonzalo Navarro and Nora Reyes. 2001. Dynamic spatial approximation trees. In Proceedings of the 21st International Conference of the Chilean Computer Science Society. 213–222.
[101]
Gonzalo Navarro and Nora Reyes. 2002. Fully dynamic spatial approximation trees. In Proceedings of the International Symposium on String Processing and Information Retrieval. 254–270.
[102]
Gonzalo Navarro and Nora Reyes. 2003. Improved deletions in dynamic spatial approximation trees. In Proceedings of the 23rd International Conference of the Chilean Computer Science Society. 13–22.
[103]
Gonzalo Navarro and Nora Reyes. 2009. Dynamic spatial approximation trees for massive data. In Proceedings of the 2009 Second International Workshop on Similarity Search and Applications. 81–88.
[104]
Gonzalo Navarro and Nora Reyes. 2016. New dynamic metric indices for secondary memory. Information Systems 59, July (2016), 48–78.
[105]
Gonzalo Navarro and Roberto Uribe-Paredes. 2011. Fully dynamic metric access methods based on hyperplane partitioning. Information Systems 36, 4 (2011), 734–747.
[106]
Hartmut Noltemeier, Knut Verbarg, and Christian Zirkelbach. 1992. Monotonous bisector \(^*\) trees—a tool for efficient partitioning of complex scenes of geometric objects. In Proceedings of Data Structures and Efficient Algorithms, Final Report on DFG Special Joint Initiative. 186–203.
[107]
David Novak, Michal Batko, and Pavel Zezula. 2011. Metric index: An efficient and scalable solution for precise and approximate similarity search. Information Systems 36, 4 (2011), 721–733.
[108]
David Novak and Pavel Zezula. 2016. PPP-codes for large-scale similarity searching. In Proceedings of the Transactions on Large-Scale Data-and Knowledge-Centered Systems XXIV. 61–87.
[109]
Alexander Ocsa, Carlos Bedregal, and Ernesto Cuadros-Vargas. 2007. A new approach for similarity queries using neighborhood graphs. In Proceedings of the Brazilian Symposium on Databases. 131–142.
[110]
Rodrigo Paredes and Edgar Chávez. 2005. Using the k-nearest neighbor graph for proximity searching in metric spaces. In Proceedings of the International Symposium on String Processing and Information Retrieval. 127–138.
[111]
Oscar Pedreira and Nieves R. Brisaboa. 2007. Spatial selection of sparse pivots for similarity search in metric spaces. In Proceedings of the International Conference on Current Trends in Theory and Practice of Computer Science. 434–445.
[112]
Vladimir Pestov. 2012. Indexability, concentration, and VC theory. Journal of Discrete Algorithms 13 (2012), 2–18.
[113]
Ives Rene Venturini Pola, Caetano Traina, and Agma Juci Machado Traina. 2007. The MM-tree: A memory-based metric tree without overlap between nodes. In Proceedings of the East European Conference on Advances in Databases and Information Systems. 157–171.
[114]
Jianbin Qin, Wei Wang, Yifei Lu, Chuan Xiao, and Xuemin Lin. 2011. Efficient exact edit similarity query processing with the asymmetric signature scheme. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 1033–1044.
[115]
D. A. Rachkovskij. 2017. Distance-based index structures for fast similarity search. Cybernetics and Systems Analysis 53, 4 (2017), 636–658.
[116]
Humberto Razente and Maria Camila Nardini Barioni. 2019. Storing data once in M-tree and PM-tree. In Proceedings of the International Conference on Similarity Search and Applications. 18–31.
[117]
Humberto Razente, Régis Michel Santos Sousa, and Maria Camila Nardini Barioni. 2018. Metric indexing assisted by short-term memories. In Proceedings of the International Conference on Similarity Search and Applications. 107–121.
[118]
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40, 2 (2000), 99–121.
[119]
Enrique Vidal Ruiz. 1986. An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognition Letters 4, 3 (1986), 145–157.
[120]
Guillermo Ruiz, Francisco Santoyo, Edgar Chávez, Karina Figueroa, and Eric Sadit Tellez. 2013. Extreme pivots for faster metric indexes. In Proceedings of the International Conference on Similarity Search and Applications. 115–126.
[121]
Khalil Al Ruqeishi and Michal Konečnỳ. 2015. Regrouping metric-space search index for search engine size adaptation. In Proceedings of the International Conference on Similarity Search and Applications. 271–282.
[122]
Uri Shaft and Raghu Ramakrishnan. 2006. Theory of nearest neighbors indexability. ACM Transactions on Database Systems 31, 3 (2006), 814–838.
[123]
Larissa Capobianco Shimomura and Daniel S. Kaster. 2019. HGraph: A connected-partition approach to proximity graphs for similarity search. In Proceedings of the International Conference on Database and Expert Systems Applications. 106–121.
[124]
Larissa Capobianco Shimomura, Marcos R. Vieira, and Daniel S. Kaster. 2018. Performance analysis of graph-based methods for exact and approximate similarity search in metric spaces. In Proceedings of the International Conference on Similarity Search and Applications. 18–32.
[125]
Eliezer Silva, Thiago Teixeira, George Teodoro, and Eduardo Valle. 2014. Large-scale distributed locality-sensitive hashing for general metric data. In Proceedings of the International Conference on Similarity Search and Applications. 82–93.
[126]
Tomáš Skopal and David Hoksza. 2007. Improving the performance of M-tree family by nearest neighbor graphs. In Proceedings of the East European Conference on Advances in Databases and Information Systems. 172–188.
[127]
Tomáš Skopal and Jakub Lokoč. 2008. NM-tree: Flexible approximate similarity search in metric and non-metric spaces. In Proceedings of the International Conference on Database and Expert Systems Applications. 312–325.
[128]
Tomáš Skopal and Jakub Lokoč. 2009. New dynamic construction techniques for M-tree. Journal of Discrete Algorithms 7, 1 (2009), 62–77.
[129]
Tomš Skopal, Jaroslav Pokornỳ, Michal Krátkỳ, and Václav Snášel. 2003. Revisiting M-tree building principles. In Proceedings of the East European Conference on Advances in Databases and Information Systems. 148–162.
[130]
Tomás Skopal, Jaroslav Pokornỳ, and Vaclav Snasel. 2004. PM-tree: Pivoting metric tree for similarity search in multimedia databases. In Proceedings of the East European Conference on Advances in Databases and Information Systems. 803–815.
[131]
Michael Stonebraker and Uĝur Çetintemel. 2018. “One size fits all”: An idea whose time has come and gone. In Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker. 441–462.
[132]
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2018. The end of an architectural era: It’s time for a complete rewrite. In Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker. 463–489.
[133]
Richard C. Tillquist and Manuel E. Lladser. 2019. Low-dimensional representation of genomic sequences. Journal of Mathematical Biology 79, 1 (2019), 1–29.
[134]
Ken Tokoro, Kazuaki Yamaguchi, and Sumio Masuda. 2006. Improvements of TLAESA nearest neighbour search algorithm and extension to approximation search. In Proceedings of the 29th Australasian Computer Science Conference. 77–83.
[135]
Caetano Traina, Agma Traina, Bernhard Seeger, and Christos Faloutsos. 2000. Slim-trees: High performance metric trees minimizing overlap between nodes. In Proceedings of the International Conference on Extending Database Technology. 51–65.
[136]
Caetano Traina, Agma J. M. Traina, Marcos R. Vieira, and Christos Faloutsos. 2007. The omni-family of all-purpose access methods: A simple and effective way to make similarity search more efficient. The VLDB Journal 16, 4 (2007), 483–505.
[137]
Caetano Traina Jr, Agma Traina, Roberto Santos Filho, and Christos Faloutsos. 2002. How to improve the pruning ability of dynamic metric access methods. In Proceedings of the 11th International Conference on Information and Knowledge Management. 219–226.
[138]
Jeffrey K. Uhlmann. 1991. Metric trees. Applied Mathematics Letters 4, 5 (1991), 61–62.
[139]
Jeffrey K. Uhlmann. 1991. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40, 4 (1991), 175–179.
[140]
Lucia Vadicamo, Richard Connor, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2019. Splx-perm: A novel permutation-based representation for approximate metric search. In Proceedings of the International Conference on Similarity Search and Applications. 40–48.
[141]
Reinier H. Van Leuken and Remco C. Veltkamp. 2011. Selecting vantage objects for similarity indexing. ACM Transactions on Multimedia Computing, Communications, and Applications 7, 3 (2011), 1–18.
[142]
Marcos R. Vieira, Caetano Traina Jr, Fabio J. T. Chino, and Agma J. M. Traina. 2004. DBM-tree: A dynamic metric access method sensitive to local density data. In Proceedings of the SBBD. 163–177.
[143]
Marcos R. Vieira, Caetano Traina Jr, Fabio J. T. Chino, and Agma J. M. Traina. 2010. DBM-tree: A dynamic metric access method sensitive to local density data. Journal of Information and Data Management 1, 1 (2010), 111–111.
[144]
Juan Miguel Vilar. 1995. Reducing the overhead of the AESA metric-space nearest neighbour searching algorithm. Information Processing Letters 56, 5 (1995), 265–271.
[145]
Ilya Volnyansky and Vladimir Pestov. 2009. Curse of dimensionality in pivot based indexes. In Proceedings of the International Workshop on Similarity Search and Applications. 39–46.
[146]
Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the VLDB. 194–205.
[147]
Xiaojing Xie, Jihong Guan, and Shuigeng Zhou. 2015. Similarity evaluation of DNA sequences based on frequent patterns and entropy. In Proceedings of the BMC Genomics, Vol. 16. 1–10.
[148]
Yuki Yamagishi, Kazuo Aoyama, Kazumi Saito, and Tetsuo Ikeda. 2018. Pivot generation algorithm with a complete binary tree for efficient exact similarity search. IEICE Transactions on Information and Systems 101, 1 (2018), 142–151.
[149]
Peter N Yianilos. 1993. Data structures and algorithms for nearest neighbor. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms, Vol. 66. 311.
[150]
Peter N. Yianilos. 1999. Excluded middle vantage point forests for nearest neighbor search. In Proceedings of the ALENEX.
[151]
Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. 2006. Similarity Search: The Metric Space Approach. Springer Science & Business Media.
[152]
Pavel Zezula, Pasquale Savino, Giuseppe Amato, and Fausto Rabitti. 1998. Approximate similarity retrieval with M-trees. The VLDB Journal 7, 4 (1998), 275–293.
[153]
Ming Zhang and Reda Alhajj. 2010. Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowledge and Information Systems 22, 1 (2010), 1–26.
[154]
Zhenjie Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, and Divesh Srivastava. 2010. Bed-tree: An all-purpose index structure for string similarity search based on edit distance. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 915–926.
[155]
Xiangmin Zhou, Guoren Wang, Jeffrey Xu Yu, and Ge Yu. 2003. M \(^+\) -tree: A new dynamical multidimensional index for metric spaces. In Proceedings of the 14th Australasian Database Conference. 161–168.
[156]
Xiangmin Zhou, Guoren Wang, Xiaofang Zhou, and Ge Yu. 2005. BM \(^+\) -tree: A hyperplane-based index method for high-dimensional metric spaces. In Proceedings of the International Conference on Database Systems for Advanced Applications. 398–409.

Cited By

View all
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
  • (2024)DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338111136:10(5092-5105)Online publication date: Oct-2024
  • (2024)Spatio-Temporal Trajectory Similarity Measures: A Comprehensive Survey and Quantitative StudyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332353536:5(2191-2212)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 55, Issue 6
June 2023
781 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3567471
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2022
Online AM: 23 May 2022
Accepted: 02 May 2022
Revised: 11 January 2022
Received: 21 November 2020
Published in CSUR Volume 55, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Metric spaces
  2. indexing and querying
  3. metric similarity search

Qualifiers

  • Survey
  • Refereed

Funding Sources

  • NSFC
  • Zhejiang Provincial Natural Science Foundation
  • DIREC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)441
  • Downloads (Last 6 weeks)52
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
  • (2024)DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338111136:10(5092-5105)Online publication date: Oct-2024
  • (2024)Spatio-Temporal Trajectory Similarity Measures: A Comprehensive Survey and Quantitative StudyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332353536:5(2191-2212)Online publication date: May-2024
  • (2024)Towards Ptolemaic metric properties of the z-normalized Euclidean distance for multivariate time series indexing2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00026(153-157)Online publication date: 13-May-2024
  • (2024)HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00326(4275-4287)Online publication date: 13-May-2024
  • (2024)FLEX: A fast and light-weight learned index for kNN search in high-dimensional spaceInformation Sciences10.1016/j.ins.2024.120546669(120546)Online publication date: May-2024
  • (2024)Survey of vector database management systemsThe VLDB Journal10.1007/s00778-024-00864-x33:5(1591-1615)Online publication date: 15-Jul-2024
  • (2023)Closest Pairs Search Over Data StreamProceedings of the ACM on Management of Data10.1145/36173261:3(1-26)Online publication date: 13-Nov-2023
  • (2023)Dynamic Distal Spatial Approximation TreesComputer Science – CACIC 202210.1007/978-3-031-34147-2_12(175-189)Online publication date: 27-May-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media