Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

FedKNN: Secure Federated k-Nearest Neighbor Search

Published: 26 March 2024 Publication History

Abstract

Nearest neighbor search is a fundamental task in various domains, such as federated learning, data mining, information retrieval, and biomedicine. With the increasing need to utilize data from different organizations while respecting privacy regulations, private data federation has emerged as a promising solution. However, it is costly to directly apply existing approaches to federated k-nearest neighbor (kNN) search with difficult-to-compute distance functions, like graph or sequence similarity. To address this challenge, we propose FedKNN, a system that supports secure federated kNN search queries with a wide range of similarity measurements. Our system is equipped with a new Distribution-Aware kNN (DANN) algorithm to minimize unnecessary local computations while protecting data privacy. We further develop DANN*, a secure version of DANN that satisfies differential obliviousness. Extensive evaluations show that FedKNN outperforms state-of-the-art solutions, achieving up to 4.8× improvement on federated graph kNN search and up to 2.7× improvement on federated sequence kNN search. Additionally, our approach offers a trade-off between privacy and efficiency, providing strong privacy guarantees with minimal overhead.

References

[1]
13th National People's Congress of the People's Republic of China. 2021. Personal Information Protection Law of the People's Republic of China. http://en.npc.gov.cn.cdurl.cn/2021--12/29/c_694559.htm.
[2]
Arvind Arasu and Raghav Kaushik. 2013. Oblivious query processing. Preprint arXiv (2013).
[3]
Johes Bater, Gregory Elliott, Craig Eggen, Satyender Goel, Abel N Kho, and Jennie Rogers. 2017. SMCQL: Secure Query Processing for Private Data Networks. PVLDB (2017), 673--684.
[4]
Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. PVLDB (2020), 2691--2705.
[5]
Paolo Bellavista, Luca Foschini, and Alessio Mora. 2021. Decentralised learning in federated deployment environments: A system-level survey. Comput. Surveys, Vol. 54, 1 (2021), 1--38.
[6]
Dmytro Bogatov, Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O'Neill. 2021. $varepsilon$psolute: Efficiently Querying Databases While Providing Differential Privacy. In ACM CCS. 2262--2276.
[7]
California State Legislature. 2018. California Consumer Privacy Act. https://oag.ca.gov/privacy/ccpa.
[8]
T-H. Hubert Chan, Kai-Min Chung, Bruce M. Maggs, and Elaine Shi. 2019. Foundations of Differentially Oblivious Algorithms. In ACM SODA. 2448--2467.
[9]
Lijun Chang, Xing Feng, Kai Yao, Lu Qin, and Wenjie Zhang. 2022a. Accelerating Graph Similarity Search via Efficient GED Computation. IEEE TKDE (2022), 4485--4498.
[10]
Zhao Chang, Dong Xie, Sheng Wang, and Feifei Li. 2022b. Towards Practical Oblivious Join. In ACM SIGMOD. 803--817.
[11]
King Lum Cheung and Ada Wai-Chee Fu. 1998. Enhanced nearest neighbour search on the R-tree. ACM SIGMOD Record (1998), 16--21.
[12]
Victor Costan and Srinivas Devadas. 2016. Intel SGX explained. Cryptology ePrint Archive (2016).
[13]
Ankur Dave, Chester Leung, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. 2020. Oblivious coopetitive analytics using hardware enclaves. In ACM EuroSys. 1--17.
[14]
Colin de la Higuera and Francisco Casacuberta. 2000. Topology of strings: Median string is NP-complete. Theoretical computer science, Vol. 230, 1--2 (2000), 39--48.
[15]
Yichao Du, Zhirui Zhang, Bingzhe Wu, Lemao Liu, Tong Xu, and Enhong Chen. 2022. Federated Nearest Neighbor Machine Translation. ICLR (2022).
[16]
Ahmed Eldawy and Mohamed F Mokbel. 2015. Spatialhadoop: A mapreduce framework for spatial data. In IEEE ICDE. 1352--1363.
[17]
European Parliament and Council of the European Union. 2016. General Data Protection Regulation. https://eur-lex.europa.eu/eli/reg/2016/679/oj.
[18]
Xinrui Ge, Jia Yu, Hanlin Zhang, Chengyu Hu, Zengpeng Li, Zhan Qin, and Rong Hao. 2019. Towards achieving keyword search over dynamic encrypted cloud data with symmetric-key based verification. IEEE TDSC, Vol. 18, 1 (2019), 490--504.
[19]
Craig Gentry. 2009. Fully homomorphic encryption using ideal lattices. In ACM STOC. 169--178.
[20]
Oded Goldreich. 1987. Towards a theory of software protection and simulation by oblivious RAMs. In ACM STOC. 182--194.
[21]
Karam Gouda and Mosab Hassaan. 2016. CSI_GED: An efficient approach for graph edit similarity computation. In IEEE ICDE. 265--276.
[22]
Adam Groce, Peter Rindal, and Mike Rosulek. 2019. Cheaper private set intersection via differentially private leakage. Proceedings on Privacy Enhancing Technologies, Vol. 2019, 3 (2019).
[23]
Yu Guo, Chen Zhang, Cong Wang, and Xiaohua Jia. 2022. Towards Public Verifiable and Forward-Privacy Encrypted Search by Using Blockchain. IEEE TDSC (2022).
[24]
Feng Han, Lan Zhang, Hanwen Feng, Weiran Liu, and Xiangyang Li. 2022. Scape: Scalable Collaborative Analytics System on Private Database with Malicious Security. In IEEE ICDE. 1740--1753.
[25]
Xi He, Ashwin Machanavajjhala, Cheryl Flynn, and Divesh Srivastava. 2017. Composing differential privacy and secure computation: A case study on scaling private record linkage. In ACM CCS. 1389--1406.
[26]
Megha Jain, Sanjay Kumar, and VK Patle. 2015. Bitonic sorting algorithm: A review. International Journal of Computer Applications, Vol. 113, 13 (2015).
[27]
David Kaplan, Jeremy Powell, and Tom Woller. 2016. AMD memory encryption. White paper (2016).
[28]
Jongik Kim. 2021. Boosting graph similarity search through pre-computation. In ACM SIGMOD. 951--963.
[29]
Simeon Krastnikov, Florian Kerschbaum, and Douglas Stebila. 2020. Efficient oblivious database joins. PVLDB (2020), 2132--2145.
[30]
Conglong Li, Minjia Zhang, David G Andersen, and Yuxiong He. 2020. Improving approximate nearest neighbor search through learned adaptive early termination. In ACM SIGMOD. 2539--2554.
[31]
Xiang Li, Fabing Li, and Mingyu Gao. 2023. Flare: A Fast, Secure, and Memory-Efficient Distributed Analytics Framework. PVLDB (2023), 1439--1452.
[32]
John Liagouris, Vasiliki Kalavri, Muhammad Faisal, and Mayank Varia. 2023. SECRECY: Secure collaborative analytics in untrusted clouds. In USENIX NSDI. 1031--1056.
[33]
Zhaorong Liu, Leye Wang, and Kai Chen. 2020. Secure efficient federated knn for recommendation systems. In The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. Springer, 1808--1819.
[34]
Yun Peng, Byron Choi, Tsz Nam Chan, and Jianliang Xu. 2022. LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases. In IEEE ICDE. 2508--2521.
[35]
Rishabh Poddar, Sukrit Kalra, Avishay Yanai, Ryan Deng, Raluca Ada Popa, and Joseph M Hellerstein. 2021. Senate: A Maliciously-Secure MPC Platform for Collaborative Analytics. In USENIX Security. 2129--2146.
[36]
Lianke Qin, Rajesh Jayaram, Elaine Shi, Zhao Song, Danyang Zhuo, and Shumo Chu. 2023. Adore: Differentially Oblivious Relational Database Operators. PVLDB (2023), 842--855.
[37]
Parikshit Ram and Kaushik Sinha. 2022. Federated nearest neighbor classification with a colony of fruit-flies. In AAAI, Vol. 36. 8036--8044.
[38]
Elaine Shi. 2020. Path oblivious heap: Optimal and practical oblivious priority queue. In IEEE S&P. 842--858.
[39]
Saba skandarian and Matei Zaharia. 2019. ObliDB: Oblivious Query Processing for Secure Databases. PVLDB (2019), 169--183.
[40]
Dawn Xiaoding Song, David Wagner, and Adrian Perrig. 2000. Practical techniques for searches on encrypted data. In IEEE S&P. 44--55.
[41]
Yongxin Tong, Xuchen Pan, Yuxiang Zeng, Yexuan Shi, Chunbo Xue, Zimu Zhou, Xiaofei Zhang, Lei Chen, Yi Xu, Ke Xu, et al. 2022. Hu-Fu: efficient and secure spatial queries over data federation. PVLDB (2022), 1159--1172.
[42]
Nikolaj Volgushev, Malte Schwarzkopf, Ben Getchell, Mayank Varia, Andrei Lapets, and Azer Bestavros. 2019. Conclave: secure multi-party computation on big data. In ACM EuroSys. 1--18.
[43]
Songlei Wang, Yifeng Zheng, Xiaohua Jia, Hejiao Huang, and Cong Wang. 2022. OblivGM: Oblivious Attributed Subgraph Matching as a Cloud Service. IEEE TIFS, Vol. 17 (2022), 3582--3596.
[44]
Yilei Wang and Ke Yi. 2021. Secure Yannakakis: Join-Aggregate Queries over Private Data. In ACM SIGMOD. 1969--1981.
[45]
Haotian Wu, Rui Song, Kai Lei, and Bin Xiao. 2022. Slicer: Verifiable, Secure and Fair Search over Encrypted Numerical Data Using Blockchain. In IEEE ICDCS. 1201--1211.
[46]
Pengfei Wu, Qi Li, Jianting Ning, Xinyi Huang, and Wei Wu. 2021. Differentially Oblivious Data Analysis with Intel SGX: Design, Optimization, and Evaluation. IEEE TDSC (2021).
[47]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In ICLR.
[48]
Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In IEEE S&P. 640--656.
[49]
Zhong Yang, Bolong Zheng, Xianzhi Wang, Guohui Li, and Xiaofang Zhou. 2022. minIL: A Simple and Small Index for String Similarity Search with Edit Distance. In IEEE ICDE. 565--577.
[50]
Minghe Yu, Jin Wang, Guoliang Li, Yong Zhang, Dong Deng, and Jianhua Feng. 2017. A unified framework for string similarity search with edit-distance constraint. VLDBJ, Vol. 26, 2 (2017), 249--274.
[51]
Samee Zahur and David Evans. 2015. Obliv-C: A language for extensible data-oblivious computation. Cryptology ePrint Archive (2015).
[52]
Zhiping Zeng, Anthony KH Tung, Jianyong Wang, Jianhua Feng, and Lizhu Zhou. 2009. Comparing stars: On approximating graph edit distance. PVLDB (2009), 25--36.
[53]
Haoyu Zhang and Qin Zhang. 2020. Minsearch: An efficient algorithm for similarity search under edit distance. In ACM SIGKDD. 566--576.
[54]
Xinyi Zhang, Qichen Wang, Cheng Xu, Yun Peng, and Jianliang Xu. 2023. FedKNN: Secure Federated k-Nearest Neighbor Search (Technical Report). https://www.comp.hkbu.edu.hk/ db/fedknn_cr_tech_report.pdf.
[55]
Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. 2017. Opaque: An oblivious and encrypted distributed analytics platform. In USENIX NSDI. 283--298.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 1
SIGMOD
February 2024
1874 pages
EISSN:2836-6573
DOI:10.1145/3654807
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Published in PACMMOD Volume 2, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. differential obliviousness
  2. federated analytics
  3. kNN search
  4. trusted execution environment (TEE)

Qualifiers

  • Research-article

Funding Sources

  • NSF of Guangdong Province
  • NSF of China
  • Hong Kong Research Grants Council

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)604
  • Downloads (Last 6 weeks)118
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media