Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524059.3532368acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores

Published: 28 June 2022 Publication History

Abstract

Approximate nearest neighbor search plays a fundamental role in many areas, and the k-nearest neighbor graph (KNNG) becomes a promising solution, especially in high-dimensional space. The advantages of KNNG come at the expense of high construction time, which is in quadratic time complexity in the number of points. Many GPUs have adopted specialized hardware units for matrix multiplication, providing an even higher arithmetic throughput. This paper presents flyKNNG, a GPU KNNG construction algorithm for billion-scale datasets. It deploys the distance matrix calculation to matrix multiplication units and adopts on-the-fly top-k selection to avoid transferring the exa-scale distance matrix to/from device memory. flyKNNG co-designs the two key algorithms to optimize the overall performance: the distance matrix calculation algorithm considers the data communication costs and pruning strategy of top-k selection; the top-k selection algorithm is also specially designed for on-the-fly usage, which impacts the data reuse and instruction-level parallelism of the distance matrix calculation as little as possible. Moreover, our top-k selection algorithm is optimized for the special data layout adopted by most matrix multiplication units. Experiments show that flyKNNG achieves 4.67X speedup compared with CUML/FAISS, one of the state-of-the-art approaches.

References

[1]
David Adedayo Adeniyi, Zhaoqiang Wei, and Y Yongquan. 2016. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics 12, 1 (2016), 90--108.
[2]
Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2012. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) 17 (2012), 4--1.
[3]
Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2016. YFCC100M-HNfc6: a large-scale deep features benchmark for similarity search. In International Conference on Similarity Search and Applications. Springer, 196--209.
[4]
AMD. 2020. Introducing AMD CDNA architecture. https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf
[5]
Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055--2063.
[6]
Ricardo J Barrientos, José I Gómez, Christian Tenllado, Manuel Prieto Matias, and Mauricio Marin. 2011. kNN query processing in metric spaces using GPUs. In European Conference on Parallel Processing. Springer, 380--392.
[7]
Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, and Jordan Pascual. 2014. KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application 7, 1 (2014), 61--70.
[8]
Deng Cai. 2019. A revisit of hashing algorithms for approximate nearest neighbor search. IEEE Transactions on Knowledge and Data Engineering (2019).
[9]
Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen. 2019. TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. In Proceedings of the ACM International Conference on Supercomputing. 106--116.
[10]
Jack Choquette and Wish Gandhi. 2020. Nvidia A100 GPU: Performance & innovation for GPU computing. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1--43.
[11]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.
[12]
Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. 577--586.
[13]
Carlos Eiras-Franco, David Martínez-Rego, Leslie Kanthan, César Piñeiro, Antonio Bahamonde, Bertha Guijarro-Berdiñas, and Amparo Alonso-Betanzos. 2020. Fast Distributed k NN Graph Construction Using Auto-tuned Locality-sensitive Hashing. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 6 (2020), 1--18.
[14]
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143 (2017).
[15]
Fabian Groh, Lukas Ruppert, Patrick Wieschollek, and Hendrik Lensch. 2019. Ggnn: Graph-based gpu nearest neighbor search. arXiv preprint arXiv:1912.01059 (2019).
[16]
Sadegh Bafandeh Imandoust and Mohammad Bolandraftar. 2013. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications 3, 5 (2013), 605--610.
[17]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128.
[18]
Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: re-rank with source coding. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 861--864.
[19]
Zhuoran Ji and Cho-Li Wang. 2021. Accelerating DBSCAN Algorithm with AI Chips for Large Datasets. In 50th International Conference on Parallel Processing. 1--11.
[20]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data (2019).
[21]
Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, et al. 2019. A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019).
[22]
Ivan Komarov, Ali Dashti, and Roshan M D'Souza. 2014. Fast k-NNG construction with GPU-based quick multi-select. PloS one 9, 5 (2014), e92409.
[23]
Quansheng Kuang and Lei Zhao. 2009. A practical GPU based kNN algorithm. In Proceedings. The 2009 International Symposium on Computer Science and Computational Technology (ISCSCI 2009). Citeseer, 151.
[24]
Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data---experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2019), 1475--1488.
[25]
Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. Davinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1--44.
[26]
Bruno Meyer, Aurora Pozo, and Wagner M Nunan Zola. 2021. Warp-centric K-Nearest Neighbor Graphs construction on GPU. In 50th International Conference on Parallel Processing Workshop. 1--10.
[27]
NVIDIA. 2021. CUDA Basic Linear Algebra Subroutine library. https://docs.nvidia.com/cuda/cublas/index.html
[28]
NVIDIA. 2021. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/index.html
[29]
NVIDIA. 2021. GPU Machine Learning Algorithms. https://rapids.ai
[30]
NVIDIA. 2021. NVIDIA Ampere GA102 GPU Architecture. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
[31]
NVIDIA. 2021. NVIDIA Nsight Compute Kernel Profiling Guide. https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html
[32]
Md Aamir Raihan, Negar Goli, and Tor M Aamodt. 2019. Modeling deep learning accelerator enabled gpus. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 79--92.
[33]
Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. 2021. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021).
[34]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211--252.
[35]
SCI-Compiler. 2018. Ping Pong Buffer. http://www.scicompiler.doud/userguide/PingPongBuffer.html
[36]
Anil Shanbhag, Holger Pirk, and Samuel Madden. 2018. Efficient top-k query processing on massively parallel hardware. In Proceedings of the 2018 International Conference on Management of Data. 1557--1570.
[37]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[38]
Xiaoxin Tang, Zhiyi Huang, David Eyers, Steven Mills, and Minyi Guo. 2015. Efficient selection algorithm for fast k-nn search on gpus. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 397--406.
[39]
Vasily Volkov. 2016. Understanding latency hiding on GPUs. University of California, Berkeley.
[40]
Hui Wang, Wan-Lei Zhao, and Xiangxiang Zeng. 2021. Large-Scale Approximate k-NN Graph Construction on GPU. arXiv preprint arXiv:2103.15386 (2021).
[41]
Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. arXiv preprint arXiv:2101.12631 (2021).
[42]
Liu Yingfan, Cheng Hong, and Cui Jiangtao. 2021. Revisiting k-Nearest Neighbor Graph Construction on High-Dimensional Data: Experiments and Analyses. arXiv preprint arXiv:2112.02234 (2021).

Cited By

View all
  • (2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
  • (2024)On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00022(132-143)Online publication date: 5-May-2024
  • (2023)Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor CoresProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624084(179-186)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
June 2022
514 pages
ISBN:9781450392815
DOI:10.1145/3524059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KNNG construction
  2. parallel computing
  3. top-k selection

Qualifiers

  • Research-article

Funding Sources

Conference

ICS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Acceleration of Tensor-Product Operations with Tensor CoresACM Transactions on Parallel Computing10.1145/369546611:4(1-24)Online publication date: 9-Sep-2024
  • (2024)On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00022(132-143)Online publication date: 5-May-2024
  • (2023)Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor CoresProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624084(179-186)Online publication date: 12-Nov-2023
  • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
  • (2023)Data-Driven Heart Disease Risk Prediction with Machine Learning on Healthcare Datasets2023 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)10.1109/RI2C60382.2023.10355977(220-223)Online publication date: 24-Aug-2023
  • (2023)A topological data analysis based classifierAdvances in Data Analysis and Classification10.1007/s11634-023-00548-418:2(493-538)Online publication date: 1-Jul-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media