Abstract
Similarity search operations require executing expensive algorithms, and although broadly useful in many new applications, they rely on specific structures not yet supported by commercial DBMS. In this paper we discuss the new Omni-technique, which allows to build a variety of dynamic Metric Access Methods based on a number of selected objects from the dataset, used as global reference objects. We call them as the Omni-family of metric access methods. This technique enables building similarity search operations on top of existing structures, significantly improving their performance, regarding the number of disk access and distance calculations. Additionally, our methods scale up well, exhibiting sub-linear behavior with growing database size.
Similar content being viewed by others
References
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th International Conference on Database Theory (ICDT). Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer (2001).
Annamalai, M., Chopra, R., De Fazio, S.: Indexing images inoracle8i. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 539–547. ACM Press (2000)
Arantes, A.S., Vieira, M.R., Traina, A.J.M., Traina, C. Jr.: The fractal dimension making similarity queries more efficient. In: Proceedings of the II ACM SIGKDD Workshop on Fractals, Power Laws and Other Next Generation Data Mining Tools, pp. 12–17. Washington, USA (2003)
Baeza-Yates, R.A., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed-queries trees. In: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 807, pp. 198–212. Springer (1994)
Beckmann, N.: Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-Tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 322–331. ACM Press (1990)
Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In: Proceedings of 21th International Conference on Very Large Data Bases (VLDB), pp. 299–310. Morgan Kaufmann (1995)
Berman, A., Shapiro, L.G.: Selecting good keys fortriangle-inequality-based pruning algorithms. In: Proceedings of the International Workshop on Content-Based Access of Image and Video Databases (CAIVD), pp. 12–19. IEEE Computer Society (1998)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory (ICDT). Lecture Notes in Computer Science. vol. 1540, pp. 217–235. Springer (1999)
Bozkaya, T., Ózsoyoglu, Z. Meral.: Distance-based indexing for high-dimensional metric spaces. In: Proceedings of the 1997ACM SIGMOD International Conference on Management of Data, pp. 357–368. ACM Press (1997)
Bozkaya, T., Ózsoyoglu, Z. Meral.: Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. (TODS) 24(3), 361–404 (1999)
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of 21th International Conference on Very Large DataBases (VLDB), pp. 574–584. Morgan Kaufmann (1995)
Burkhard, W.A., Keller, R.M.: Some approaches to best-match filesearching. Commun. ACM (CACM) 16(4),230–236 (1973)
Camastra, F., Vinciarelli, A.: Intrinsic dimension estimation of data: an approach based on Grassberger-Procaccia's algorithm. Neural. Process. Lett. 14(1), 27–34 (2001)
Chávez, E., Marroquín, J.L., Baeza-Yates, R.A.: Spaghettis: An array based algorithm for similarity queries inmetric spaces. In: Proceeding of the String Processing and Information Retrieval Symposium & International Workshop on Groupware (SPIRE/CRIWG), pp. 38–46. IEEE Computer Society (1999)
Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surveys 33(3), 273–321 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-Tree: An efficient access method for similarity search in metric spaces. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, pp. 426–435. Morgan Kaufmann Publishers (1997)
de Sousa, E.P.M., Traina, C. Jr., Traina, A.J.M., Faloutsos, C.: How to use fractal dimension to find correlations between attributes. In: Proceeding of the First Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches (in conjunction with 8th ACMSIGKDD International Conference on Knowledge Discovery & DataMining), Edmonton, Alberta, Canada, pp. 26–30. ACM Press (2002)
Faloutsos, C., Seeger, B., Traina, A.J.M., Traina, C. Jr.: Spatialjoin selectivity using power laws. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 177–188, Dallas, USA. ACM Press (2000)
Faragó, A., Linder, T., Lugosi, G.: Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell.(TPAMI) 15(9), 957–962(1993)
Fu, Ada Wai-Chee, Chan, Polly Mei Shuen, Cheung, Yin-Ling, Moon, Yiu Sang.: Dynamic vp-Tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J. 9(2), 154–173 (2000)
Gaede, V., Günther, O.: Multi dimensional access methods. ACM Comput. Surveys 30(2), 170–231 (1998)
Gennaro, C., Savino, P., Zezula, P.: A hashed schema forsimilarity search in metric spaces. In: Proceeding of the 1st DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, pp. 83–88. Zurich, Switzerland (2000)
Guttman, A.: R-Tree : A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, USA, pp. 47–57. ACM Press (1984)
Hjaltason, G.R., Samet, H.: Index-driven similarity search inmetric spaces. ACM Trans. Database Syst. (TODS) 28(4), 517–580 (2003)
Ishikawa, M., Chen, H., Furuse, K., Yu, Jeffrey Xu, Ohbo, N.: Mb+tree: A dynamically updatable metric index for similarity searches. In: Proceedings of the First International Conference Web-Age Information Management (WAIM). Lecture Notes in Computer Science, vol. 1846, pp. 356–373. Springer (2000)
Jin, Hui, Ooi, Beng Chin, Shen, Heng Tao, Yu, Cui, Zhou, Aoying.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 87–98. IEEE Computer Society (2003)
Katayama, N., Satoh, S.: The SR-Tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 369–380. ACM Press (1997)
Korn, F., Pagel, Bernd-Uwe, Faloutsos, C.: On the ‘dimensionality curse’ and the ‘self-similarity blessing’. IEEE Trans. Knowledge Data Eng. (TKDE) 13(1), 96–111 (2001)
Koudas, N., Ooi, Beng Chin, Shen, Heng Tao, Tung, A.K.H.: Ldc: enabling search by partial distance in a hyper-dimensional space. In: Proceedings of the 20th International Conference on Data Engineering (ICDE), pp. 6–17. IEEE Computer Society (2004)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernet. Control Theory 10(8), 707–710 (1966)
Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The tv-tree: an index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)
Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recog. Lett. 15(1), 9–17 (1994)
Moreno-Seco, F., Micó, L., Oncina, J.: Extending laesa fastnearest neighbour algorithm to find the k nearest neighbours. In: Proceedings of the International Workshop of Structural, Syntactic, and Statistical Pattern Recognition (SSPR), Lecture Notes in Computer Science, vol. 2396, pp. 718–724. Springer(2002)
Pagel, B.-U., Korn, F., Faloutsos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: Proceedings of the 16th International Conference on Data Engineering (ICDE), pp. 589–598. IEEE Computer Society (2000)
Santos Filho, R.F., Traina, A.J.M., Traina, C. Jr., Faloutsos, C.: Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), Heidelberg, Germany, pp. 623–630. IEEE Computer Society (2001)
Schroeder, M.: Fractals, Chaos, Power Laws. W.H. Freeman &Company, New York, USA (1991)
Sellis, T.K.: Nick Roussopoulos, and Christos Faloutsos. The R+-Tree: A dynamic index for multi-dimensional objects. In: Proceedings of 13th International Conference on Very Large Databases (VLDB), Brighton, England, pp. 507–518. Morgan Kaufmann Publishers (1987)
Senior, A.: A combination fingerprint classifier. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 23(10), 1165–1175 (2001)
Traina, C., Agma, J.M. Jr., Faloutsos, C.: Distance exponent: a new concept for selectivity estimation in metric trees. In: Proceedings of the 16th International Conference on Data Engineering (ICDE), San Diego - CA, pp. 195. IEEE Computer Society (2000)
Traina, A.J.M., Traina, C. Jr., Bueno, Josiane M., de Azevedo Marques, P.M.: The metric histogram: a new and effiretrieval. In: Proceedings of the Sixth IFIP Working Conference on Visual Database Systems (VDB), Brisbane, Australia, pp. 297–311. Kluwer Academic Publishers (2002)
Traina, C. Jr., Traina, A.J.M., Faloutsos, C., Seeger, B.: Fast indexing and visualization of metric datasets using slim-Trees. IEEE Trans. Knowledge Data Eng. (TKDE) 14(2), 244–260 (2002)
Traina, C. Jr., Traina, A.J.M., Faloutsos, C.: Distance exponent:a new concept for selectivity estimation in metric trees. Research Paper CMU-CS-99-110, Carnegie Mellon University - School of Computer Science, Pittsburgh-PA USA, March 1999
Traina, C. Jr., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-Trees: High performance metric trees minimizing overlap between nodes. In: Proceedings of the International Conference on Extending Database Technology (EDBT). Lecture Notes in Computer Science, vol. 1777, pp. 51–65, Konstanz, Germany. Springer (2000)
Traina, C. Jr., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast feature selection using fractal dimension. In: XV Brazilian Database Symposium (SBBD), João Pessoa, Brazil, pp. 158–171 (2000)
Uhlmann, J.K.: Satisfying general proximity/similarity querieswith metric trees. Inform. Process. Lett. 40(4), 175–179 (1991)
Wactlar, H.D., Christel, M.G., Gong, Y., Hauptmann, A.G.: Lessons learned from building a terabyte digital video library. IEEE Comput. 32(2), 66–73 (1999)
Weber R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of 24rd International Conference on Very Large Data Bases (VLDB), pp. 194–205 (1998)
White, D.A., Jain, R.: Similarity indexing with the SS-Tree. In: Proceedings of the 12th International Conference on Data Engineering (ICDE), New Orleans, USA, pp. 516–523. IEEE Computer Society (1996)
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Yianilos, P.N.: Data structures and algorithms for nearestneighbor search in general metric spaces. In: Proceedings of the 4th Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms (SODA), Austin, USA, pp. 311–321 (1993)
Yianilos, P.N.: Excluded middle vantage point forests for nearest neighbor search. Research paper, NEC Research Institute, Princeton, NJ, USA, Princeton, USA (1998)
Yu, Cui, Ooi, Beng Chin, Tan, Kian-Lee, Jagadish, H.V.: Indexing the distance: an efficient method to knn processing. In: Proceedings of 27th International Conference on Very Large Data Bases (VLDB), pp. 421–430. Morgan Kaufmann (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Traina, C., Filho, R.F.S., Traina, A.J.M. et al. The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. The VLDB Journal 16, 483–505 (2007). https://doi.org/10.1007/s00778-005-0178-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-005-0178-0