Secure and Effective Data Appraisal for Machine Learning

Ouyang, Xu; Yang, Changhong; Lin, Felix Xiaozhu; Ji, Yangfeng

Computer Science > Machine Learning

arXiv:2310.02373 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 24 Jan 2024 (this version, v3)]

Title:Secure and Effective Data Appraisal for Machine Learning

Authors:Xu Ouyang, Changhong Yang, Felix Xiaozhu Lin, Yangfeng Ji

View PDF HTML (experimental)

Abstract:Essential for an unfettered data market is the ability to discreetly select and evaluate training data before finalizing a transaction between the data owner and model owner. To safeguard the privacy of both data and model, this process involves scrutinizing the target model through Multi-Party Computation (MPC). While prior research has posited that the MPC-based evaluation of Transformer models is excessively resource-intensive, this paper introduces an innovative approach that renders data selection practical. The contributions of this study encompass three pivotal elements: (1) a groundbreaking pipeline for confidential data selection using MPC, (2) replicating intricate high-dimensional operations with simplified low-dimensional MLPs trained on a limited subset of pertinent data, and (3) implementing MPC in a concurrent, multi-phase manner. The proposed method is assessed across an array of Transformer models and NLP/CV benchmarks. In comparison to the direct MPC-based evaluation of the target model, our approach substantially reduces the time required, from thousands of hours to mere tens of hours, with only a nominal 0.20% dip in accuracy when training with the selected data.

Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Cite as:	arXiv:2310.02373 [cs.LG]
	(or arXiv:2310.02373v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.02373

Submission history

From: Xu Ouyang [view email]
[v1] Tue, 3 Oct 2023 18:52:57 UTC (23,860 KB)
[v2] Thu, 5 Oct 2023 23:00:16 UTC (23,859 KB)
[v3] Wed, 24 Jan 2024 22:02:53 UTC (23,328 KB)

Computer Science > Machine Learning

Title:Secure and Effective Data Appraisal for Machine Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Secure and Effective Data Appraisal for Machine Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators