Article

Privacy Preserving Data Mining

Authors:

Yehuda Lindell and

Benny PinkasAuthors Info & Claims

CRYPTO '00: Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology

August 2000

Pages 36 - 54

Published: 20 August 2000 Publication History

Abstract

In this paper we introduce the concept of privacy preserving data mining. In our model, two parties owning confidential databases wish to run a data mining algorithm on the union of their databases, without revealing any unnecessary information. This problem has many practical and important applications, such as in medical research with confidential patient records.

Data mining algorithms are usually complex, especially as the size of the input is measured in megabytes, if not gigabytes. A generic secure multi-party computation solution, based on evaluation of a circuit computing the algorithm on the entire input, is therefore of no practical use. We focus on the problem of decision tree learning and use ID3, a popular and widely used algorithm for this problem. We present a solution that is considerably more efficient than generic solutions. It demands very few rounds of communication and reasonable bandwidth. In our solution, each party performs by itself a computation of the same order as computing the ID3 algorithm for its own database. The results are then combined using efficient cryptographic protocols, whose overhead is only logarithmic in the number of transactions in the databases. We feel that our result is a substantial contribution, demonstrating that secure multi-party computation can be made practical, even for complex problems and large inputs.

References

[1]

M. Bellare and S. Micali, Non-interactive oblivious transfer and applications, Advances in Cryptology - Crypto '89, pp. 547-557, 1990.

[2]

M. Ben-Or, S. Goldwasser and A. Wigderson, Completeness theorems for non cryptographic fault tolerant distributed computation, 20th STOC, (1988), 1-9.

[3]

D. Boneh and M. Franklin, Efficient generation of shared RSA keys, Proc. of Crypto' 97, LNCS, Vol. 1233, Springer-Verlag, pp. 425-439, 1997.

[4]

R. Canetti, Security and Composition of Multi-party Cryptographic Protocols. To appear in the Journal of Cryptology. Available from the Theory of Cryptography Library at http://philby.ucsd.edu/cryptlib, 1998.

[5]

D. Chaum, C. Crepeau and I. Damgard, Multiparty unconditionally secure protocols, 20th Proc. ACM Symp. on Theory of Computing, (1988), 11-19.

[6]

B. Chor, O. Goldreich, E. Kushilevitz and M. Sudan, Private Information Retrieval, 36th FOCS, pp. 41-50, 1995.

[7]

R. Cramer, N. Gilboa. M. Naor, B. Pinkas and G. Poupard, Oblivious Polynomial Evaluation, 2000.

[8]

S. Even, O. Goldreich and A. Lempel, A Randomized Protocol for Signing Contracts, Communications of the ACM 28, pp. 637-647, 1985.

[9]

R. Fagin, M. Naor and P. Winkler, Comparing Information Without Leaking It, Communications of the ACM, vol 39, May 1996, pp. 77-85.

[10]

J. Feigenbaum, J. Fong, M. Strauss and R. N. Wright, Secure Multiparty Computation of Approximations, manuscript, 2000.

[11]

N. Gilboa, Two Party RSA Key Generation, Proc of Crypto '99, Lecture Notes in Computer Science, Vol. 1666, Springer-Verlag, pp. 116-129, 1999.

[12]

O. Goldreich, Secure Multi-Party Computation, 1998. (Available at http://philby.ucsd.edu).

[13]

O. Goldreich, S. Micali and A. Wigderson, How to Play any Mental Game - A Completeness Theorem for Protocols with Honest Majority. In 19th ACM Symposium on the Theory of Computing, pp. 218-229, 1987.

[14]

J. Kilian, Uses of randomness in algorithms and protocols, MIT Press, 1990.

[15]

T. Mitchell, Machine Learning. McGraw Hill, 1997.

[16]

M. Naor and B. Pinkas, Oblivious Transfer and Polynomial Evaluation, Proc. of the 31st STOC, Atlanta, GA, pp. 245-254, May 1-4, 1999.

[17]

M. Naor and B. Pinkas, Efficient Oblivious Transfer Protocols, manuscript, 2000.

[18]

P. Paillier, Public-Key Cryptosystems Based on Composite Degree Residuocity Classes. Proc. of Eurocrypt '99, LNCS Vol. 1592, pp. 223-238, 1999.

[19]

J. Ross Quinlan, Induction of Decision Trees. Machine Learning 1(1): 81-106(1986).

[20]

M. O. Rabin, How to exchange secrets by oblivious transfer, Tech. Memo TR-81, Aiken Computation Laboratory, 1981.

[21]

A.C. Yao, How to generate and exchange secrets, Proc. of the 27th IEEE Symp. on Foundations of Computer Science, 1986, pp. 162-167.

Cited By

Jiang YMei FDai TLi YQuek TGao DZhou JCardenas A(2024)SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret SharingProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3657024(274-288)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3657024
Akavia ALeibovich MResheff YRon RShahar MVald M(2022)Privacy-Preserving Decision Trees Training and PredictionACM Transactions on Privacy and Security10.1145/351719725:3(1-30)Online publication date: 19-May-2022
https://dl.acm.org/doi/10.1145/3517197
Pessach DShmueli E(2022)A Review on Fairness in Machine LearningACM Computing Surveys10.1145/349467255:3(1-44)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3494672
Show More Cited By

Index Terms

Privacy Preserving Data Mining
1. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

Privacy-preserving collaborative data mining
Read More
Privacy-preserving data mining in the malicious model

Most of the cryptographic work in privacy-preserving distributed data mining deals with semi-honest adversaries, which are assumed to follow the prescribed protocol but try to infer private information using the messages they receive during the ...
Read More
Secure Multi-party Protocols for Privacy Preserving Data Mining
WASA '08: Proceedings of the Third International Conference on Wireless Algorithms, Systems, and Applications

People are more and more concerned with privacy protection while performing data mining. ID3 is a very popular decision tree building method in data mining. Entropy and Gini index are two different criteria used in ID3. While there is quite some work in ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

CRYPTO '00: Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology

August 2000

544 pages

ISBN:3540679073

Editor:
Mihir Bellare

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 August 2000

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

223
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Jiang YMei FDai TLi YQuek TGao DZhou JCardenas A(2024)SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret SharingProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3657024(274-288)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3657024
Akavia ALeibovich MResheff YRon RShahar MVald M(2022)Privacy-Preserving Decision Trees Training and PredictionACM Transactions on Privacy and Security10.1145/351719725:3(1-30)Online publication date: 19-May-2022
https://dl.acm.org/doi/10.1145/3517197
Pessach DShmueli E(2022)A Review on Fairness in Machine LearningACM Computing Surveys10.1145/349467255:3(1-44)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3494672
Wang QCui SZhou LWu OZhu YRussello GSuga YSakurai KDing XSako K(2022)EnclaveTreeProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security10.1145/3488932.3517391(741-755)Online publication date: 30-May-2022
https://dl.acm.org/doi/10.1145/3488932.3517391
Zhang LHuo YGe QMa YLiu QOuyang W(2021)A Privacy Protection Scheme for IoT Big Data Based on Time and Frequency LimitationWireless Communications & Mobile Computing10.1155/2021/55456482021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/5545648
Yan QLou JVuran MIrmak S(2021)Scalable Privacy-preserving Geo-distance Evaluation for Precision Agriculture IoT SystemsACM Transactions on Sensor Networks10.1145/346357517:4(1-30)Online publication date: 22-Jul-2021
https://dl.acm.org/doi/10.1145/3463575
Fang WZhao DTan JChen CYu CWang LWang LZhou JZhang BDemartini GZuccon GCulpepper JHuang ZTong H(2021)Large-scale Secure XGB for Vertical Federated LearningProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482361(443-452)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482361
Jarin IEshete BMohan CSung A(2021)PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party SettingProceedings of the 2021 ACM Workshop on Security and Privacy Analytics10.1145/3445970.3451156(25-35)Online publication date: 28-Apr-2021
https://dl.acm.org/doi/10.1145/3445970.3451156
Kumar ASingh SLakshmanan KSaxena SShrivastava S(2021)A Novel Cloud-Assisted Secure Deep Feature Classification Framework for Cancer Histopathology ImagesACM Transactions on Internet Technology10.1145/342422121:2(1-22)Online publication date: 2-Jun-2021
https://dl.acm.org/doi/10.1145/3424221
Wu YCai SXiao XChen GOoi B(2020)Privacy preserving vertical federated learning for tree-based modelsProceedings of the VLDB Endowment10.14778/3407790.340781113:12(2090-2103)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3407790.3407811
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents