research-article

Privacy-preserving SVM on Outsourced Genomic Data via Secure Multi-party Computation

Authors:

Huajie Chen,

Ali Burak Ünal,

Mete Akgün,

Nico PfeiferAuthors Info & Claims

IWSPA '20: Proceedings of the Sixth International Workshop on Security and Privacy Analytics

Pages 61 - 69

https://doi.org/10.1145/3375708.3380316

Published: 16 March 2020 Publication History

Get Access

Abstract

Machine learning methods are employed in many areas, such as medical data research, for their efficient and powerful data mining ability. However, submitting unprotected data to a third party, which attempts to train a machine learning model, may suffer from data leakage and privacy violation when the third party is compromised by an adversary. Hence, designing a protocol to execute encrypted computation is inevitably indispensable. In order to address this problem, we propose protocols based on secure multi-party computation to train a support vector machine model privately. Utilizing the semi-honest adversary model and oblivious transfer, the proposed protocols enable the training of a non-linear support vector machine on the combined data from various sources without sacrificing the privacy of individuals. The protocols are applied to train a support vector machine model with the radial basis function kernel on HIV sequence data to predict the efficacy of a certain antiviral drug, which only works if the viruses can only use the human CCR5 coreceptor for cell entry. Benchmarked on synthesized data with 10 data sources that consist of randomly generated integers, containing 100 labeled samples each, the protocol has consumed online time 2991.386/166.912 ms on average in arithmetic/boolean circuits, respectively. The cross-validation has reached 0.5819 F1-score on average on training data with the optimized parameters, which have reached 0.7058 F1-score afterwards on testing data set, which consists of protein sequence of CCR5 and its subtypes. The complete training and testing process on the real data, which contains in total 766 samples having 924 features after encoding, has consumed 43.75/15.84 seconds on average using arithmetic/boolean circuits, respectively, which shows the effectiveness and efficiency of our protocols compared to some of the existing studies in the literature.

References

[1]

Amin Allahyar and Jeroen De Ridder. 2015. FERAL: Network-based classifier with application to breast cancer outcome prediction . Bioinformatics, Vol. 31, 12 (2015), i311--i319. https://doi.org/10.1093/bioinformatics/btv255

Abstract

References

Cited By

Index Terms

Recommendations

Secure Multi-Party Computation without Agreement

An efficient fair UC-secure protocol for two-party computation

Optimally Efficient Multi-party Fair Exchange and Fair Secure Multi-party Computation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations