Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3227609.3227643acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article
Public Access

NN-Descent on High-Dimensional Data

Published: 25 June 2018 Publication History

Abstract

K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in Õ(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback --- it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs --- points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.

References

[1]
Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 6 (2003), 1373--1396.
[2]
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In ACM Sigmod Record, Vol. 29. ACM, 93--104.
[3]
M. R. Brito, E. L. Chavez, A.J. Quiroz, and J. E. Yukich. 1997. Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Statistics & Probability Letters 35, 1 (1997), 33--42.
[4]
Jie Chen, Haw-ren Fang, and Yousef Saad. 2009. Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. Journal of Machine Learning Research 10 (2009), 1989--2012.
[5]
Howie M. Choset. 2005. Principles of Robot Motion: Theory, Algorithms, and Implementation. MIT press.
[6]
Michael Connor and Piyush Kumar. 2010. Fast construction of k-nearest neighbor graphs for point clouds. IEEE Transactions on Visualization and Computer Graphics 16, 4 (2010), 599--608.
[7]
Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1 (1967), 21--27.
[8]
Belur V. Dasarathy. 2002. Data mining tasks and methods: Classification: Nearest-neighbor approaches. In Handbook of Data Mining and Knowledge Discovery. Oxford University Press, 288--298.
[9]
Thibault Debatty, Pietro Michiardi, Olivier Thonnard, and Wim Mees. 2014. Building k-nn graphs from large text data. In Proc. 2014 IEEE Int. Conf. on Big Data. IEEE, 573--578.
[10]
Wei Dong, Moses Charikar, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proc. 20th Int. Conf. on the World Wide Web (WWW). ACM, 577--586.
[11]
Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. 2011. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In Proc. 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI), Vol. 22. 1312.
[12]
Ville Hautamaki, Ismo Karkkainen, and Pasi Franti. 2004. Outlier detection using k-nearest neighbour graph. In Proc. 17th Int. Conf. on Pattern Recognition (ICPR), Vol. 3. IEEE, 430--433.
[13]
Michael E. Houle, Xiguo Ma, Vincent Oria, and Jichao Sun. 2014. Improving the quality of K-NN graphs for image databases through vector sparsification. In Proc. 4th ACM Int. Conf. on Multimedia Retrieval (ICMR). ACM, 89.
[14]
Rodrigo Paredes, Edgar Chávez, Karina Figueroa, and Gonzalo Navarro. 2006. Practical construction of k-nearest neighbor graphs in metric spaces. In International Workshop on Experimental and Efficient Algorithms. Springer, 85--97.
[15]
Youngki Park, Sungchan Park, Sang-goo Lee, and Woosung Jung. 2013. Scalable k-nearest neighbor graph construction based on greedy filtering. In Proc. 22nd Int. Conf. on the World Wide Web (WWW). ACM, 227--228.
[16]
Miloš Radovanović, Alexandros Nanopoulos, and Mirjana Ivanović. 2010. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11 (2010), 2487--2531.
[17]
Miloš Radovanović, Alexandros Nanopoulos, and Mirjana Ivanović. 2010. On the existence of obstinate results in vector space models. In Proc. 33rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 186--193.
[18]
Miloš Radovanović, Alexandros Nanopoulos, and Mirjana Ivanović. 2010. Time-series classification in many intrinsic dimensions. In Proc. 2010 SIAM Int. Conf. on Data Mining (SDM). SIAM, 677--688.
[19]
Miloš Radovanović, Alexandros Nanopoulos, and Mirjana Ivanović. 2015. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Transactions on Knowledge and Data Engineering 27, 5 (2015), 1369--1382.
[20]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323--2326.
[21]
Jagan Sankaranarayanan, Hanan Samet, and Amitabh Varshney. 2007. A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers & Graphics 31, 2 (2007), 157--174.
[22]
Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4 (2003), 119--155.
[23]
Nenad Tomašev, Miloš Radovanović, Dunja Mladenić, and Mirjana Ivanović. 2011. The role of hubness in clustering high-dimensional data. In Proc. 15th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD). Springer, 183--195.

Cited By

View all
  • (2024)Top-Down Construction of Locally Monotonic Graphs for Similarity SearchSimilarity Search and Applications10.1007/978-3-031-75823-2_25(291-300)Online publication date: 25-Oct-2024
  • (2023)Deep Clustering and Visualization for End-to-End High-Dimensional Data AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.315149834:11(8543-8554)Online publication date: Nov-2023
  • (2023)ESIREOS: Efficient, Scalable, Internal, Relative Evaluation of Outliers Solutions2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00088(555-562)Online publication date: 17-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WIMS '18: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics
June 2018
398 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NN-Descent
  2. hubness
  3. k-nearest neighbor graph

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

WIMS '18

Acceptance Rates

Overall Acceptance Rate 140 of 278 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)387
  • Downloads (Last 6 weeks)55
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Top-Down Construction of Locally Monotonic Graphs for Similarity SearchSimilarity Search and Applications10.1007/978-3-031-75823-2_25(291-300)Online publication date: 25-Oct-2024
  • (2023)Deep Clustering and Visualization for End-to-End High-Dimensional Data AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.315149834:11(8543-8554)Online publication date: Nov-2023
  • (2023)ESIREOS: Efficient, Scalable, Internal, Relative Evaluation of Outliers Solutions2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00088(555-562)Online publication date: 17-Dec-2023
  • (2022)HVSProceedings of the VLDB Endowment10.14778/3489496.348950615:2(246-258)Online publication date: 4-Feb-2022
  • (2021)A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor searchProceedings of the VLDB Endowment10.14778/3476249.347625514:11(1964-1978)Online publication date: 27-Oct-2021
  • (2021)Warp-centric K-Nearest Neighbor Graphs construction on GPU50th International Conference on Parallel Processing Workshop10.1145/3458744.3474053(1-10)Online publication date: 9-Aug-2021
  • (2021)Cluster-and-Conquer: When Randomness Meets Graph Locality2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00195(2027-2032)Online publication date: Apr-2021
  • (2020)Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive HashingACM Transactions on Intelligent Systems and Technology10.1145/340888911:6(1-18)Online publication date: 12-Oct-2020
  • (2020)Benchmark on Indexing Algorithms for Accelerating Molecular Similarity SearchJournal of Chemical Information and Modeling10.1021/acs.jcim.0c0039360:12(6167-6184)Online publication date: 23-Oct-2020
  • (2018)Hubs in Nearest-Neighbor GraphsProceedings of the 8th International Conference on Web Intelligence, Mining and Semantics10.1145/3227609.3227691(1-4)Online publication date: 25-Jun-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media