Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3305890.3305896guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Fast k-nearest neighbour search via prioritized DCI

Published: 06 August 2017 Publication History

Abstract

Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) (Li & Malik, 2016) offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to Locality-Sensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21.

References

[1]
Anagnostopoulos, Evangelos, Emiris, Ioannis Z, and Psarros, Ioannis. Low-quality dimension reduction and high-dimensional approximate nearest neighbor. In 31st International Symposium on Computational Geometry (SoCG 2015), pp. 436-450, 2015.
[2]
Andoni, Alexandr and Indyk, Piotr. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pp. 459-468. IEEE, 2006.
[3]
Andoni, Alexandr and Razenshteyn, Ilya. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pp. 793-801. ACM, 2015.
[4]
Arya, Sunil and Mount, David M. Approximate nearest neighbor queries in fixed dimensions. In SODA, volume 93, pp. 271-280, 1993.
[5]
Arya, Sunil, Mount, David M, Netanyahu, Nathan S, Silverman, Ruth, and Wu, Angela Y. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM), 45(6):891-923, 1998.
[6]
Bayer, Rudolf. Symmetric binary b-trees: Data structure and maintenance algorithms. Acta informatica, 1(4):290-306, 1972.
[7]
Behnam, Ehsan, Waterman, Michael S, and Smith, Andrew D. A geometric interpretation for local alignment-free sequence comparison. Journal of Computational Biology, 20(7):471-485, 2013.
[8]
Bentley, Jon Louis. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509-517, 1975.
[9]
Berchtold, Stefan, Keim, Daniel A., and peter Kriegel, Hans. The X-tree: An index structure for high-dimensional data. In Very Large Data Bases, pp. 28-39, 1996.
[10]
Berchtold, Stefan, Ertl, Bernhard, Keim, Daniel A, Kriegel, H-P, and Seidl, Thomas. Fast nearest neighbor search in high-dimensional space. In Data Engineering, 1998. Proceedings., 14th International Conference on, pp. 209-218. IEEE, 1998.
[11]
Beygelzimer, Alina, Kakade, Sham, and Langford, John. Cover trees for nearest neighbor. In Proceedings of the 23rd International Conference on Machine Learning, pp. 97-104. ACM, 2006.
[12]
Biau, Gérard, Chazal, Frédéric, Cohen-Steiner, David, Devroye, Luc, Rodriguez, Carlos, et al. A weighted k-nearest neighbor density estimate for geometric inference. Electronic Journal of Statistics, 5:204-237, 2011.
[13]
Clarkson, Kenneth L. Nearest neighbor queries in metric spaces. Discrete & Computational Geometry, 22(1):63-93, 1999.
[14]
Dasgupta, Sanjoy and Freund, Yoav. Random projection trees and low dimensional manifolds. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 537-546. ACM, 2008.
[15]
Dasgupta, Sanjoy and Sinha, Kaushik. Randomized partition trees for nearest neighbor search. Algorithmica, 72(1):237-263, 2015.
[16]
Datar, Mayur, Immorlica, Nicole, Indyk, Piotr, and Mirrokni, Vahab S. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pp. 253-262. ACM, 2004.
[17]
Eldawy, Ahmed and Mokbel, Mohamed F. SpatialHadoop: A MapReduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on, pp. 1352-1363. IEEE, 2015.
[18]
Guibas, Leo J and Sedgewick, Robert. A dichromatic framework for balanced trees. In Foundations of Computer Science, 1978., 19th Annual Symposium on, pp. 8-21. IEEE, 1978.
[19]
Guttman, Antonin. R-trees: a dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57, 1984.
[20]
Houle, Michael E and Nett, Michael. Rank-based similarity search: Reducing the dimensional dependence. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(1):136-150, 2015.
[21]
Indyk, Piotr and Motwani, Rajeev. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604-613. ACM, 1998.
[22]
Jégou, Hervé, Douze, Matthijs, and Schmid, Cordelia. Product quantization for nearest neighbor search. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(1):117-128, 2011.
[23]
Karger, David R and Ruhl, Matthias. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 741-750. ACM, 2002.
[24]
Krauthgamer, Robert and Lee, James R. Navigating nets: simple algorithms for proximity search. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 798-807. Society for Industrial and Applied Mathematics, 2004.
[25]
Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
[26]
LeCun, Yann, Bottou, Léon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
[27]
Li, Ke and Malik, Jitendra. Fast k-nearest neighbour search via Dynamic Continuous Indexing. In International Conference on Machine Learning, pp. 671-679, 2016.
[28]
Liu, Ting, Moore, Andrew W, Yang, Ke, and Gray, Alexander G. An investigation of practical approximate nearest neighbor algorithms. In Advances in Neural Information Processing Systems, pp. 825-832, 2004.
[29]
Meiser, Stefan. Point location in arrangements of hyper-planes. Information and Computation, 106(2):286-303, 1993.
[30]
Minsky, Marvin and Papert, Seymour. Perceptrons: an introduction to computational geometry. pp. 222, 1969.
[31]
Orchard, Michael T. A fast nearest-neighbor search algorithm. In Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, pp. 2297-2300. IEEE, 1991.
[32]
Paulevé, Loïc, Jégou, Hervé, and Amsaleg, Laurent. Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 31(11):1348-1358, 2010.
[33]
Pugh, William. Skip lists: a probabilistic alternative to balanced trees. Communications of the ACM, 33(6):668-676, 1990.
[34]
Weiss, Yair, Torralba, Antonio, and Fergus, Rob. Spectral hashing. In Advances in Neural Information Processing Systems, pp. 1753-1760, 2009.
  1. Fast k-nearest neighbour search via prioritized DCI

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70
      August 2017
      4208 pages

      Publisher

      JMLR.org

      Publication History

      Published: 06 August 2017

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 92
        Total Downloads
      • Downloads (Last 12 months)39
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media