research-article

Share a pie?: Privacy-Preserving Knowledge Base Export through Count-min Sketches

Authors:

Leonardo Aniello,

Roberto BaldoniAuthors Info & Claims

CODASPY '17: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy

Pages 95 - 106

https://doi.org/10.1145/3029806.3029817

Published: 22 March 2017 Publication History

Abstract

Knowledge base (KB) sharing among parties has been proven to be beneficial in several scenarios. However such sharing can arise considerable privacy concerns depending on the sensitivity of the information stored in each party's KB. In this paper, we focus on the problem of exporting a (part of a) KB of a party towards a receiving one. We introduce a novel solution that enables parties to export data in a privacy-preserving fashion, based on a probabilistic data structure, namely the \emph{count-min sketch}. With this data structure, KBs can be exported in the form of key-value stores and inserted into a set of count-min sketches, where keys can be sensitive and values are counters. Count-min sketches can be tuned to achieve a given key collision probability, which enables a party to deny having certain keys in its own KB, and thus to preserve its privacy. We also introduce a metric, the γ-deniability (novel for count-min sketches), to measure the privacy level obtainable with a count-min sketch. Furthermore, since the value associated to a key can expose to linkage attacks, noise can be added to a count-min sketch to ensure controlled error on retrieved values. Key collisions and noise alter the values contained in the exported KB, and can affect negatively the accuracy of a computation performed on the exported KB. We explore the tradeoff between privacy preservation and computation accuracy by experimental evaluations in two scenarios related to malware detection.

References

[1]

R. Balu and T. Furon. Differentially private matrix factorization using sketching techniques. IMMSEC, June 2016.

Digital Library

[2]

G. Bianchi, L. Bracciale, and P. Loreti. Better than nothing privacy with bloom filters: To what extent? In J. Domingo-Ferrer and I. Tinnirello, editors, Privacy in Statistical Databases, volume 7556 of Lecture Notes in Computer Science, pages 348--363. Springer Berlin Heidelberg, 2012.

Digital Library

[3]

G. Cormode and S. Muthukrishnan. Approximating data with the count-min sketch. Software, IEEE, 29(1):64--69, 2012.

Digital Library

[4]

E. De Cristofaro, Y. Lu, and G. Tsudik. Efficient Techniques for Privacy-Preserving Sharing of Sensitive Information, pages 239--253. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.

Digital Library

[5]

C. Dwork. Differential privacy. In in ICALP, pages 1--12. Springer, 2006.

Digital Library

[6]

C. Dwork and A. Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2):2, 2010.

[7]

D. Gallego and G. Huecas. An empirical case of a context-aware mobile recommender system in a banking environment. In MUSIC, pages 13--20. IEEE, 2012.

Digital Library

[8]

A. Gkoulalas-Divanis, G. Loukides, and J. Sun. Publishing data from electronic health records while preserving privacy: A survey of algorithms. Journal of Biomedical Informatics, 50:4--19, 2014.

[9]

A. Hussien, N. Hamza, and H. Hefny. Attacks on anonymization-based privacy-preserving: A survey for data mining and data publishing. Journal of Information Security, 4:101--112, 2013.

[10]

L. Invernizzi, S. Miskovic, R. Torres, C. Kruegel, S. Saha, G. Vigna, S.-J. Lee, and M. Mellia. Nazca: Detecting malware distribution in large-scale networks. In NDSS, volume 14, pages 23--26, 2014.

[11]

A. J. P. Jeckmans, M. R. T. Beye, Z. Erkin, P. H. Hartel, R. L. Lagendijk, and Q. Tang. Privacy in recommender systems. In Social Media Retrieval, Computer Communications and Networks, pages 263--281. Springer Verlag, London, January 2013.

[12]

M. Kao, editor. Encyclopedia of Algorithms. Springer, 2015.

Digital Library

[13]

O. Kaser and D. Lemire. Strongly universal string hashing is fast. CoRR, abs/1202.4961, 2012.

[14]

M. Kruczkowski and E. N. Szynkiewicz. Support vector machine for malware analysis and classification. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 02, pages 415--420. IEEE Computer Society, 2014.

Digital Library

[15]

B. Li, J. Springer, G. Bebis, and M. H. Gunes. A survey of network flow applications. Journal of Network and Computer Applications, 36(2):567 -- 581, 2013.

Digital Library

[16]

N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering, pages 106--115. IEEE, 2007.

[17]

G. Lodi, L. Aniello, G. A. Di Luna, and R. Baldoni. An event-based platform for collaborative threats detection and monitoring. Information Systems, 39:175--195, 2014.

Digital Library

[18]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):3, 2007.

Digital Library

[19]

L. Melis, G. Danezis, and E. D. Cristofaro. Efficient private statistics with succinct sketches. CoRR, abs/1508.06110, 2015.

[20]

A. Narayanan and V. Shmatikov. How to break anonymity of the netflix prize dataset. CoRR, abs/cs/0610105, 2006.

[21]

M. Roughan and Y. Zhang. Secure distributed data-mining and its application to large-scale network measurements. SIGCOMM Comput. Commun. Rev., 36(1):7--14, Jan. 2006.

Digital Library

[22]

R. Sarathy and K. Muralidhar. Some additional insights on applying differential privacy for numeric data. In J. Domingo-Ferrer and E. Magkos, editors, Privacy in Statistical Databases, volume 6344 of Lecture Notes in Computer Science, pages 210--219. Springer, 2010.

Digital Library

[23]

R. Shokri and V. Shmatikov. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1310--1321. ACM, 2015.

Digital Library

[24]

L. Sweeney. K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557--570, Oct. 2002.

Digital Library

[25]

E. Toch, Y. Wang, and L. F. Cranor. Personalization and privacy: A survey of privacy risks and remedies in personalization-based systems. User Modeling and User-Adapted Interaction, 22(1--2):203--220, Apr. 2012.

Digital Library

[26]

P. Vadrevu, B. Rahbarinia, R. Perdisci, K. Li, and M. Antonakakis. Measuring and detecting malware downloads in live network traffic. In Computer Security - ESORICS 2013, pages 556--573. Springer Berlin Heidelberg, 2013.

[27]

M. N. Wegman and J. Carter. New hash functions and their use in authentication and set equality. Journal of Computer and System Sciences, 22(3):265 -- 279, 1981.

[28]

N. Zhang and W. Zhao. Distributed privacy preserving information sharing. In Proceedings of the 31st international conference on Very large data bases, pages 889--900. VLDB Endowment, 2005.

Digital Library

Index Terms

Share a pie?: Privacy-Preserving Knowledge Base Export through Count-min Sketches
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Theory of database privacy and security

Recommendations

New Approach to Quantification of Privacy on Social Network Sites
AINA '10: Proceedings of the 2010 24th IEEE International Conference on Advanced Information Networking and Applications

Users may unintentionally reveal private information to the world on their blogs on social network sites (SNSs). Information hunters can exploit such disclosed sensitive information for the purpose of advertising, marketing, spamming, etc. We present a ...
Statistical analysis of sketch estimators
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Sketching techniques can provide approximate answers to aggregate queries either for data-streaming or distributed computation. Small space summaries that have linearity properties are required for both types of applications. The prevalent method for ...
Sketches for size of join estimation

Sketching techniques provide approximate answers to aggregate queries both for data-streaming and distributed computation. Small space summaries that have linearity properties are required for both types of applications. The prevalent method for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CODASPY '17: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy

March 2017

382 pages

ISBN:9781450345231

DOI:10.1145/3029806

General Chair:
Gail-Joon Ahn
Arizona State University, USA
,
Program Chairs:
Alexander Pretschner
Technische Universität München, Germany
,
Gabriel Ghinita
University of Massachusetts Boston, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Italian Presidency of Ministry Council
Filierasicura
Cybersecurity National Laboratory of CINI (Consorzio Interuniversitario Nazionale Informatica)

Conference

CODASPY '17

Sponsor:

SIGSAC

CODASPY '17: Seventh ACM Conference on Data and Application Security and Privacy

March 22 - 24, 2017

Arizona, Scottsdale, USA

Acceptance Rates

CODASPY '17 Paper Acceptance Rate 21 of 134 submissions, 16%;

Overall Acceptance Rate 149 of 789 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
185
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents