research-article

Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores

Authors:

Cho-Li WangAuthors Info & Claims

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

Article No.: 10, Pages 1 - 12

https://doi.org/10.1145/3524059.3532368

Published: 28 June 2022 Publication History

Abstract

Approximate nearest neighbor search plays a fundamental role in many areas, and the k-nearest neighbor graph (KNNG) becomes a promising solution, especially in high-dimensional space. The advantages of KNNG come at the expense of high construction time, which is in quadratic time complexity in the number of points. Many GPUs have adopted specialized hardware units for matrix multiplication, providing an even higher arithmetic throughput. This paper presents flyKNNG, a GPU KNNG construction algorithm for billion-scale datasets. It deploys the distance matrix calculation to matrix multiplication units and adopts on-the-fly top-k selection to avoid transferring the exa-scale distance matrix to/from device memory. flyKNNG co-designs the two key algorithms to optimize the overall performance: the distance matrix calculation algorithm considers the data communication costs and pruning strategy of top-k selection; the top-k selection algorithm is also specially designed for on-the-fly usage, which impacts the data reuse and instruction-level parallelism of the distance matrix calculation as little as possible. Moreover, our top-k selection algorithm is optimized for the special data layout adopted by most matrix multiplication units. Experiments show that flyKNNG achieves 4.67X speedup compared with CUML/FAISS, one of the state-of-the-art approaches.

References

[1]

David Adedayo Adeniyi, Zhaoqiang Wei, and Y Yongquan. 2016. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics 12, 1 (2016), 90--108.

[2]

Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2012. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) 17 (2012), 4--1.

[3]

Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2016. YFCC100M-HNfc6: a large-scale deep features benchmark for similarity search. In International Conference on Similarity Search and Applications. Springer, 196--209.

[4]

AMD. 2020. Introducing AMD CDNA architecture. https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf

[5]

Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055--2063.

[6]

Ricardo J Barrientos, José I Gómez, Christian Tenllado, Manuel Prieto Matias, and Mauricio Marin. 2011. kNN query processing in metric spaces using GPUs. In European Conference on Parallel Processing. Springer, 380--392.

[7]

Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, and Jordan Pascual. 2014. KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application 7, 1 (2014), 61--70.

[8]

Deng Cai. 2019. A revisit of hashing algorithms for approximate nearest neighbor search. IEEE Transactions on Knowledge and Data Engineering (2019).

[9]

Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen. 2019. TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. In Proceedings of the ACM International Conference on Supercomputing. 106--116.

Digital Library

[10]

Jack Choquette and Wish Gandhi. 2020. Nvidia A100 GPU: Performance & innovation for GPU computing. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1--43.

[11]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.

Digital Library

[12]

Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. 577--586.

Digital Library

[13]

Carlos Eiras-Franco, David Martínez-Rego, Leslie Kanthan, César Piñeiro, Antonio Bahamonde, Bertha Guijarro-Berdiñas, and Amparo Alonso-Betanzos. 2020. Fast Distributed k NN Graph Construction Using Auto-tuned Locality-sensitive Hashing. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 6 (2020), 1--18.

Digital Library

[14]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143 (2017).

[15]

Fabian Groh, Lukas Ruppert, Patrick Wieschollek, and Hendrik Lensch. 2019. Ggnn: Graph-based gpu nearest neighbor search. arXiv preprint arXiv:1912.01059 (2019).

[16]

Sadegh Bafandeh Imandoust and Mohammad Bolandraftar. 2013. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications 3, 5 (2013), 605--610.

[17]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128.

[18]

Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: re-rank with source coding. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 861--864.

[19]

Zhuoran Ji and Cho-Li Wang. 2021. Accelerating DBSCAN Algorithm with AI Chips for Large Datasets. In 50th International Conference on Parallel Processing. 1--11.

[20]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data (2019).

[21]

Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, et al. 2019. A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019).

[22]

Ivan Komarov, Ali Dashti, and Roshan M D'Souza. 2014. Fast k-NNG construction with GPU-based quick multi-select. PloS one 9, 5 (2014), e92409.

[23]

Quansheng Kuang and Lei Zhao. 2009. A practical GPU based kNN algorithm. In Proceedings. The 2009 International Symposium on Computer Science and Computational Technology (ISCSCI 2009). Citeseer, 151.

[24]

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data---experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2019), 1475--1488.

[25]

Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. Davinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1--44.

[26]

Bruno Meyer, Aurora Pozo, and Wagner M Nunan Zola. 2021. Warp-centric K-Nearest Neighbor Graphs construction on GPU. In 50th International Conference on Parallel Processing Workshop. 1--10.

Digital Library

[27]

NVIDIA. 2021. CUDA Basic Linear Algebra Subroutine library. https://docs.nvidia.com/cuda/cublas/index.html

[28]

NVIDIA. 2021. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/index.html

[29]

NVIDIA. 2021. GPU Machine Learning Algorithms. https://rapids.ai

[30]

NVIDIA. 2021. NVIDIA Ampere GA102 GPU Architecture. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

[31]

NVIDIA. 2021. NVIDIA Nsight Compute Kernel Profiling Guide. https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html

[32]

Md Aamir Raihan, Negar Goli, and Tor M Aamodt. 2019. Modeling deep learning accelerator enabled gpus. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 79--92.

[33]

Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. 2021. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021).

[34]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211--252.

[35]

SCI-Compiler. 2018. Ping Pong Buffer. http://www.scicompiler.doud/userguide/PingPongBuffer.html

[36]

Anil Shanbhag, Holger Pirk, and Samuel Madden. 2018. Efficient top-k query processing on massively parallel hardware. In Proceedings of the 2018 International Conference on Management of Data. 1557--1570.

Digital Library

[37]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[38]

Xiaoxin Tang, Zhiyi Huang, David Eyers, Steven Mills, and Minyi Guo. 2015. Efficient selection algorithm for fast k-nn search on gpus. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 397--406.

Digital Library

[39]

Vasily Volkov. 2016. Understanding latency hiding on GPUs. University of California, Berkeley.

[40]

Hui Wang, Wan-Lei Zhao, and Xiangxiang Zeng. 2021. Large-Scale Approximate k-NN Graph Construction on GPU. arXiv preprint arXiv:2103.15386 (2021).

[41]

Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. arXiv preprint arXiv:2101.12631 (2021).

[42]

Liu Yingfan, Cheng Hong, and Cui Jiangtao. 2021. Revisiting k-Nearest Neighbor Graph Construction on High-Dimensional Data: Experiments and Analyses. arXiv preprint arXiv:2112.02234 (2021).

Cited By

Cui C(2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3695466
Schieffer GDe Medeiros DFaj JMarathe APeng I(2024)On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00022(132-143)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00022
Valero-Lara PJorquera ILui FVetter J(2023)Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor CoresProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624084(179-186)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624084
Show More Cited By

Index Terms

Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Top-k retrieval in databases

Recommendations

Confirmation Sampling for Exact Nearest Neighbor Search
Similarity Search and Applications
Abstract
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest ...
K-Nearest Neighbor Finding Using MaxNearestDist

Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These ...
Approximate Direct and Reverse Nearest Neighbor Queries, and the k-nearest Neighbor Graph
SISAP '09: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

Retrieving the \emph{k-nearest neighbors} of a query object is a basic primitive in similarity searching. A related, far less explored primitive is to obtain the dataset elements which would have the query object within their own \emph{k}-nearest ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

June 2022

514 pages

ISBN:9781450392815

DOI:10.1145/3524059

General Chairs:
Lawrence Rauchwerger
University of Illinois at Urbana-Champaign
,
Kirk Cameron
Virginia Tech
,
Program Chairs:
Dimitrios S. Nikolopoulos
Virginia Tech
,
Dionisios Pnevmatikatos
National Technical University of Athens

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

University Grants Committee

Conference

ICS '22

Sponsor:

SIGARCH

ICS '22: 2022 International Conference on Supercomputing

June 28 - 30, 2022

Virtual Event

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
469
Total Downloads

Downloads (Last 12 months)150
Downloads (Last 6 weeks)7

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cui C(2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3695466
Schieffer GDe Medeiros DFaj JMarathe APeng I(2024)On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00022(132-143)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00022
Valero-Lara PJorquera ILui FVetter J(2023)Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor CoresProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624084(179-186)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624084
Lu YLiu WMohror KArnold DBadia R(2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607051
Tangprasert SSonthana RNilsiam YNuchitprasitchai SBhumpenpein N(2023)Data-Driven Heart Disease Risk Prediction with Machine Learning on Healthcare Datasets2023 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)10.1109/RI2C60382.2023.10355977(220-223)Online publication date: 24-Aug-2023
https://doi.org/10.1109/RI2C60382.2023.10355977
Kindelan RFrías JCerda MHitschfeld N(2023)A topological data analysis based classifierAdvances in Data Analysis and Classification10.1007/s11634-023-00548-418:2(493-538)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1007/s11634-023-00548-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten