Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Chávez, Edgar; Figueroa, Karina; Navarro, Gonzalo

doi:10.1007/11579427_41

Edgar Chávez²¹,
Karina Figueroa^21,22 &
Gonzalo Navarro²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1570 Accesses

Abstract

Kernel based methods (such as k-nearest neighbors classifiers) for AI tasks translate the classification problem into a proximity search problem, in a space that is usually very high dimensional. Unfortunately, no proximity search algorithm does well in high dimensions. An alternative to overcome this problem is the use of approximate and probabilistic algorithms, which trade time for accuracy.

In this paper we present a new probabilistic proximity search algorithm. Its main idea is to order a set of samples based on their distance to each element. It turns out that the closeness between the order produced by an element and that produced by the query is an excellent predictor of the relevance of the element to answer the query.

The performance of our method is unparalleled. For example, for a full 128-dimensional dataset, it is enough to review 10% of the database to obtain 90% of the answers, and to review less than 1% to get 80% of the correct answers. The result is more impressive if we realize that a full 128-dimensional dataset may span thousands of dimensions of clustered data. Furthermore, the concept of proximity preserving order opens a totally new approach for both exact and approximated proximity searching.

Supported by CYTED VII.19 RIBIDI Project (all authors), CONACyT grant 36911A (first and second author) and Millennium Nucleus Center for Web Research, Grant P04-067-F, Mideplan, Chile (second and third author).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Local generalized quadratic distance metrics: application to the k-nearest neighbors classifier

Article 25 April 2017

Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces

Article 13 July 2020

Heuristics for Computing k-Nearest Neighbors Graphs

References

Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching in fixed dimension. In: Proc. 5th ACM-SIAM Symposium on Discrete Algorithms (SODA 1994), pp. 573–583 (1994)
Google Scholar
Brin, S.: Near neighbor search in large metric spaces. In: Proc. 21st Conference on Very Large Databases (VLDB 1995), pp. 574–584 (1995)
Google Scholar
Bustos, B., Navarro, G.: Probabilistic proximity search algorithms based on compact partitions. Journal of Discrete Algorithms (JDA) 2(1), 115–134 (2003)
Article MathSciNet Google Scholar
Chávez, E., Figueroa, K.: Faster proximity searching in metric data. In: Proc. of the Mexican International Conference in Artificial Intelligence (MICAI). LNCS, pp. 222–231. Springer, Heidelberg (2004)
Google Scholar
Chávez, E., Marroquin, J.L., Navarro, G.: Fixed queries array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications (MTAP) 14(2), 113–135 (2001)
Article MATH Google Scholar
Chávez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Inf. Process. Lett. 85(1), 39–46 (2003)
Article MATH Google Scholar
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)
Article Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Proximity searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Article Google Scholar
Ciaccia, P., Patella, M.: Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proc. 16th Intl. Conf. on Data Engineering (ICDE 2000), pp. 244–255. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proc. of the 23rd Conference on Very Large Databases (VLDB 1997), pp. 426–435 (1997)
Google Scholar
Clarkson, K.: Nearest neighbor queries in metric spaces. Discrete Computational Geometry 22(1), 63–93 (1999)
Article MATH MathSciNet Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. 17(1), 134–160 (2003)
Article MATH MathSciNet Google Scholar
Hjaltason, G., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)
Article Google Scholar
Navarro, G.: Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ) 11(1), 28–46 (2002)
Article Google Scholar
Vidal, E.: An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (1986)
Article Google Scholar
White, D., Jain, R.: Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, La Jolla, California (July 1996)
Google Scholar
Yianilos, P.: Excluded middle vantage point forests for nearest neighbor search. In: ALENEX (1999)
Google Scholar
Yianilos, P.N.: Locally lifting the curse of dimensionality for nearest neighbor search. Technical report, NEC Research Institute, Princeton, NJ (June 1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Ciencias Físico-Matemáticas, Universidad Michoacana, México
Edgar Chávez & Karina Figueroa
Center for Web Research, Dept. of Computer Science, University of Chile,
Karina Figueroa & Gonzalo Navarro

Authors

Edgar Chávez
View author publications
You can also search for this author in PubMed Google Scholar
Karina Figueroa
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh
Technológico de Monterrey (ITESM), Campus Ciudad de México (CCM), Calle del Puente 222, Col. Ejudos de Huipulco, 14360 DF, Tlalpan, Mexico
Álvaro de Albornoz
Center for Intelligent Systems, Tecnológico de Monterrey, Campus Monterrey, 64849, Monterrey, N.L., Mexico
Hugo Terashima-Marín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chávez, E., Figueroa, K., Navarro, G. (2005). Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_41

Download citation

DOI: https://doi.org/10.1007/11579427_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics