Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3292500.3330920acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

PrivPy: General and Scalable Privacy-Preserving Data Mining

Published: 25 July 2019 Publication History

Abstract

Privacy is a big hurdle for collaborative data mining across multiple parties. We present multi-party computation (MPC) framework designed for large-scale data mining tasks. PrivPy combines an easy-to-use and highly flexible Python programming interface with state-of-the-art secret-sharing-based MPC backend. With essential data types and operations (such as NumPy arrays and broadcasting), as well as automatic code-rewriting, programmers can write modern data mining algorithms conveniently in familiar Python. We demonstrate that we can support many real-world machine learning algorithms (e.g. logistic regression and convolutional neural networks) and large datasets (e.g. 5000-by-1-million matrix) with minimal algorithm porting effort.

Supplementary Material

MP4 File (rt1736o.mp4)
Supplemental video

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: a system for large-scale machine learning. (2016).
[2]
Toshinori Araki, Assi Barak, Jun Furukawa, Marcel Keller, Yehuda Lindell, Kazuma Ohara, and Hikaru Tsuchida. 2018. Generalizing the SPDZ Compiler For Other Protocols. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 880--895.
[3]
Toshinori Araki, Jun Furukawa, Yehuda Lindell, Ariel Nof, and Kazuma Ohara. 2016. High-Throughput Semi-Honest Secure Three-Party Computation with an Honest Majority. In ACM Sigsac Conference on Computer and Communications Security . 805--817.
[4]
Assaf Ben-David, Noam Nisan, and Benny Pinkas. 2008. FairplayMP: a system for secure multi-party computation. In CCS '08 . ACM.
[5]
Arnaud Berlioz, Arik Friedman, Mohamed Ali Kaafar, Roksana Boreli, and Shlomo Berkovsky. 2015. Applying Differential Privacy to Matrix Factorization. In The ACM Conference . 107--114.
[6]
Dan Bogdanov, Peeter Laud, and Jaak Randmets. 2014. Domain-polymorphic programming of privacy-preserving applications. In Proceedings of the Ninth Workshop on Programming Languages and Analysis for Security. ACM, 53.
[7]
Dan Bogdanov, Sven Laur, and Jan Willemson. 2008. Sharemind: A framework for fast privacy-preserving computations. In European Symposium on Research in Computer Security. Springer, 192--206.
[8]
Boost. 2018. Boost C
[9]
Libraries . http://www.boost.org/.
[10]
Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. 2015. Machine Learning Classification over Encrypted Data. In NDSS .
[11]
Octavian Catrina and Sebastiaan De Hoogh. 2010. Improved primitives for secure multiparty integer computation. In International Conference on Security and Cryptography for Networks. Springer, 182--199.
[12]
Octavian Catrina and Amitabh Saxena. 2010. Secure computation with fixed-point numbers. In International Conference on Financial Cryptography and Data Security. Springer, 35--50.
[13]
Zong Chen. 2000. Handwritten Digits Recognition. In International Conference on Image Processing, Computer Vision, & Pattern Recognition, Ipcv 2009, July 13--16, 2009, Las Vegas, Nevada, Usa, 2 Volumes. 690--694.
[14]
Ivan Damgård, Marcel Keller, Enrique Larraia, Valerio Pastro, Peter Scholl, and Nigel P Smart. 2013. Practical covertly secure MPC for dishonest majority--or: breaking the SPDZ limits. In European Symposium on Research in Computer Security. Springer, 1--18.
[15]
Ivan Damgård, Valerio Pastro, Nigel Smart, and Sarah Zakarias. 2012. Multiparty computation from somewhat homomorphic encryption. In Advances in Cryptology--CRYPTO 2012 . Springer, 643--662.
[16]
Daniel Demmler, Ghada Dessouky, Farinaz Koushanfar, Ahmad-Reza Sadeghi, Thomas Schneider, and Shaza Zeitouni. 2015a. Automated synthesis of optimized circuits for secure computation. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 1504--1517.
[17]
Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015b. ABY-A Framework for Efficient Mixed-Protocol Secure Two-Party Computation. In NDSS .
[18]
Duan, Yitao and Canny, John and Zhan, Justin. 2010. P4P: Practical Large-scale Privacy-preserving Distributed Computation Robust Against Malicious Users. In Proceedings of the 19th USENIX Conference on Security (USENIX Security'10). USENIX Association.
[19]
Shai Halevi and Victor Shoup. 2014. Algorithms in helib. In International Cryptology Conference. Springer, 554--571.
[20]
F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS) (2016).
[21]
Wilko Henecka, Ahmad-Reza Sadeghi, Thomas Schneider, Immo Wehrenberg, et almbox. 2010. TASTY: tool for automating secure two-party computations. In Proceedings of the 17th ACM conference on Computer and communications security. ACM, 451--462.
[22]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning . 448--456.
[23]
Eric Jones, Travis Oliphant, Pearu Peterson, et almbox. 2011-- a. numpy.ndarray. https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html .
[24]
Eric Jones, Travis Oliphant, Pearu Peterson, et almbox. 2011-- b. numpy.ndarray. https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html .
[25]
Liina Kamm and Jan Willemson. 2015. Secure floating point arithmetic and private satellite collision analysis. International Journal of Information Security, Vol. 14, 6 (2015), 531--548.
[26]
Benjamin Kreuter, Abhi Shelat, and Chih-Hao Shen. 2012. Billion-Gate Secure Computation with Malicious Adversaries. In USENIX Security Symposium, Vol. 12. 285--300.
[27]
Toomas Krips and Jan Willemson. 2014. Hybrid model of fixed and floating point numbers in secure multiparty computations. In International Conference on Information Security. Springer, 179--197.
[28]
Yann LeCun et almbox. 2015. LeNet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet (2015).
[29]
Yann Lecun and Corinna Cortes. 2010. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist .
[30]
Yi Li, Yitao Duan, and Wei Xu. 2017. PEM: Practical Differentially Private System for Large-Scale Cross-Institutional Data Mining. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer.
[31]
Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi. 2015. Oblivm: A programming framework for secure computation. In Security and Privacy (SP), 2015 IEEE Symposium on. IEEE, 359--376.
[32]
Jian Liu, Mika Juuti, Yao Lu, and N Asokan. 2017. Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 619--631.
[33]
Wenjie Lu, Shohei Kawasaki, and Jun Sakuma. 2016. Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data. IACR Cryptology ePrint Archive, Vol. 2016 (2016), 1163.
[34]
Payman Mohassel and Peter Rindal. 2018. ABY 3: a mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 35--52.
[35]
Payman Mohassel, Mike Rosulek, and Ye Zhang. 2015. Fast and Secure Three-party Computation:The Garbled Circuit Approach. In The ACM Sigsac Conference . 591--602.
[36]
P. Mohassel and Y. Zhang. 2017. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In 2017 IEEE Symposium on Security and Privacy (SP). 19--38.
[37]
Iulian Neamtiu, Jeffrey S. Foster, and Michael Hicks. 2005. Understanding source code evolution using abstract syntax tree matching. In International Workshop on Mining Software Repositories, MSR 2005, Saint Louis, Missouri, Usa, May . 1--5.
[38]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[39]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[40]
Martin Pettai and Peeter Laud. 2015. Combining differential privacy and secure multiparty computation. In Proceedings of the 31st Annual Computer Security Applications Conference. ACM, 421--430.
[41]
Michael O. Rabin. 1981. How to exchange secrets by oblivious transfer . Technical Report TR-81. Aiken Computation Laboratory, Harvard University.
[42]
Axel Schropfer, Florian Kerschbaum, and Gunter Muller. 2011. L1-an intermediate language for mixed-protocol secure computation. In Computer Software and Applications Conference, IEEE 35th Annual. 298--307.
[43]
Adi Shamir. 1979. How to share a secret . Commun. ACM, Vol. 22, 11 (1979).
[44]
Josef Stoer and Roland Bulirsch. 1980. Introduction to numerical analysis. Math. Comp., Vol. 24, 111 (1980), 749.
[45]
Stéfan Van Der Walt, S. Chris Colbert, and Gaël Varoquaux. 2011. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering, Vol. 13, 2 (2011), 22--30.
[46]
Xiao Wang, Alex J. Malozemoff, and Jonathan Katz. 2016. EMP-toolkit: Efficient MultiParty computation toolkit . https://github.com/emp-toolkit .
[47]
M Weinhardt and W Luk. 2001. Pipeline vectorization. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Vol. 20, 2 (2001), 234--248.
[48]
Xiaodan Wu, Chao Hsien Chu, Yunfeng Wang, Fengli Liu, and Dianmin Yue. 2007. Privacy Preserving Data Mining Research: Current Status and Key Issues. Lecture Notes in Computer Science, Vol. 4489 (2007), 762--772.
[49]
Andrew C. Yao. 1982. Protocols for secure computations. Foundations of Computer Science Annual Symposium on (1982), 160--164.
[50]
Tjalling J. Ypma. 1995. Historical Development of the Newton-Raphson Method. Siam Review, Vol. 37, 4 (1995), 531--551.
[51]
Samee Zahur and David Evans. 2015. Obliv-C: A Language for Extensible Data-Oblivious Computation. IACR Cryptology ePrint Archive, Vol. 2015 (2015), 1153.
[52]
Yihua Zhang, Aaron Steele, and Marina Blanton. 2013. PICCO: a general-purpose compiler for private distributed computation. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 813--826.

Cited By

View all
  • (2024)Privacy-Preserving Queries Using Multisource Private Data Counting on Real Numbers in IoTIEEE Internet of Things Journal10.1109/JIOT.2023.332966011:7(11353-11367)Online publication date: 1-Apr-2024
  • (2024)Privacy-preserving eigenvector computation with applications in spectral clusteringInternational Journal of Information Technology10.1007/s41870-024-01815-zOnline publication date: 5-Apr-2024
  • (2023)FlexBNN: Fast Private Binary Neural Network Inference With Flexible Bit-WidthIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.326534218(2382-2397)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. numpy
  3. privacy-preserving
  4. python

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Privacy-Preserving Queries Using Multisource Private Data Counting on Real Numbers in IoTIEEE Internet of Things Journal10.1109/JIOT.2023.332966011:7(11353-11367)Online publication date: 1-Apr-2024
  • (2024)Privacy-preserving eigenvector computation with applications in spectral clusteringInternational Journal of Information Technology10.1007/s41870-024-01815-zOnline publication date: 5-Apr-2024
  • (2023)FlexBNN: Fast Private Binary Neural Network Inference With Flexible Bit-WidthIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.326534218(2382-2397)Online publication date: 2023
  • (2023)Peer-to-peer privacy-preserving vertical federated learning without trusted third-party coordinatorPeer-to-Peer Networking and Applications10.1007/s12083-023-01512-x16:5(2242-2255)Online publication date: 12-Jul-2023
  • (2023)Privacy-preserving multi-party PCA computation on horizontally and vertically partitioned data based on outsourced QR decompositionThe Journal of Supercomputing10.1007/s11227-023-05206-279:13(14358-14387)Online publication date: 6-Apr-2023
  • (2023)Force: Highly Efficient Four-Party Privacy-Preserving Machine Learning on GPUSecure IT Systems10.1007/978-3-031-47748-5_18(330-349)Online publication date: 8-Nov-2023
  • (2023)Replicated Additive Secret Sharing with the Optimized Number of SharesSecurity and Privacy in Communication Networks10.1007/978-3-031-25538-0_20(371-389)Online publication date: 4-Feb-2023
  • (2022)Big Data Mining and Analytics With MapReduceEncyclopedia of Data Science and Machine Learning10.4018/978-1-7998-9220-5.ch010(156-172)Online publication date: 14-Oct-2022
  • (2022)Big Data Analytics and Mining for Knowledge DiscoveryResearch Anthology on Big Data Analytics, Architectures, and Applications10.4018/978-1-6684-3662-2.ch033(708-721)Online publication date: 2022
  • (2022)NFGenProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560565(995-1008)Online publication date: 7-Nov-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media