Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search
Abstract
1 Introduction
2 Background
2.1 Approximate Nearest Neighbor Search
2.2 Towards Billion-scale ANNS
2.3 Compute Express Link for Memory Pool
3 A High-level Viewpoint of CXL-ANNS
3.1 Challenge Analysis of Billion-scale ANNS
3.2 Design Consideration and Motivation
3.3 Collaborative Approach Overview
4 Software Stack Design and Implementation
4.1 Local Caching for Graph
4.2 Data Placement on the CXL Memory Pool
5 Collaborative Query Service Acceleration
5.1 Accelerating Distance Calculation
5.2 Prefetching for CXL Memory Pool
5.3 Fine-Granular Query Scheduling
6 Evaluation
6.1 Evaluation Setup
CPU | 40 O3 cores, ARM v8, 3.6GHz L1/L2 $: 64KiB/2MiB per core |
Local memory | 128GiB, DDR4-3200 |
CXL memory pool | 1 CXL switch 256GiB/device, DDR4-3200 |
Storage | 4× Intel Optane 900P 480 GB |
CXL-ANNS | 1 GHz, 5 ANNS PE/device, 2 distance calc. unit/PE |
Dataset | Dist. | Num. vecs. | Emb. dim. | Avg. num. neighbors | Candidate arr. size | Num. devices. | ||
---|---|---|---|---|---|---|---|---|
k = 1 | k = 5 | k = 10 | ||||||
BigANN | L2 | 1B | 128 | 31.6 | 30 | 75 | 150 | 4 |
Yandex-T | Ang. | 1B | 200 | 29.0 | 440 | 900 | 2500 | 4 |
Yandex-D | L2 | 1B | 96 | 66.9 | 300 | 700 | 1700 | 4 |
Meta-S | L2 | 1B | 256 | 190 | 1200 | 2800 | 5600 | 8 |
MS-T | L2 | 1B | 100 | 43.1 | 60 | 130 | 250 | 4 |
MS-S | L2 | 1B | 100 | 87.4 | 580 | 1000 | 2000 | 4 |
6.2 Overall Performance
Dataset | BigANN | Yandex-T | Yandex-D | Meta-S | MS-T | MS-S |
---|---|---|---|---|---|---|
Base | 3.0 | 66.0 | 55.7 | 1,121.2 | 6.0 | 107.2 |
CXL-ANNS | 0.3 | 7.4 | 5.3 | 34.2 | 0.6 | 8.6 |
6.3 Collaborative Query Service Analysis
6.4 Scalability Test
6.5 Sensitivity Analysis
7 Discussion
8 Conclusion
Footnotes
References
Index Terms
- Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search
Recommendations
Complementary hashing for approximate nearest neighbor search
ICCV '11: Proceedings of the 2011 International Conference on Computer VisionRecently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...
Confirmation Sampling for Exact Nearest Neighbor Search
Similarity Search and ApplicationsAbstractLocality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest ...
Comments
Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 2,459Total Downloads
- Downloads (Last 12 months)2,459
- Downloads (Last 6 weeks)323
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in