Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Privacy preservation by disassociation

Published: 01 June 2012 Publication History

Abstract

In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniques (a) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.

References

[1]
C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pp. 901--909, 2005.
[2]
M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Anonymity preserving pattern discovery. VLDB Journal, 17(4): 703--727, 2008.
[3]
M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. New York Times, 2006.
[4]
T. Burghardt, K. Böhm, A. Guttmann, and C. Clifton. Anonymous search histories featuring personalized advertisement - balancing privacy with economic interests. TDP, 4(1): 31--50, 2011.
[5]
J. Cao, P. Karras, C. Raissi, and K.-L. Tan. ρ-uncertainty: inference-proof transaction anonymization. PVLDB, 3(1-2): 1033--1044, 2010.
[6]
R. Chen, M. Noman, B. C. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. PVLDB, 4(11): 1087--1098, 2011.
[7]
V. Ciriani, S. D. C. di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Combining fragmentation and encryption to protect privacy in data storage. TISSEC, 13(3): 1--33, 2010.
[8]
G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. PVLDB, 1(1): 833--844, 2008.
[9]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pp. 864--875, 2004.
[10]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, pp. 265--284, 2006.
[11]
G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, pp. 715--724, 2008.
[12]
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In VLDB, pp. 420--431, 1995.
[13]
Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1): 934--945, 2009.
[14]
A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pp. 171--180, 2009.
[15]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: efficient full-domain k-anonymity. In SIGMOD, pp. 49--60, 2005.
[16]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, pp. 25, 2006.
[17]
J. Li, R. C.-W Wong, A. W.-C. Fu, and J. Pei. Anonymization by local recoding in data with attribute hierarchical taxonomies. TKDE, 20(9): 1181--1194, 2008.
[18]
T. Li, N. Li, J. Zhang, and I. Molloy. Slicing: a new approach to privacy preserving data publishing. TKDE, 24(3): 561--574, 2012.
[19]
G. Loukides, A. Gkoulalas-Divanis, and B. Malin. Anonymization of electronic medical records for validating genome-wide association studies. PNAS, 17: 7898--7903, 2010.
[20]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: privacy beyond k-anonymity. In ICDE, pp. 24, 2006.
[21]
M. Nergiz and C. Clifton. Thoughts on k-anonymization. DKE, 63(3): 622--645, 2007.
[22]
M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. In ICDE, pp. 1417--1421, 2007.
[23]
Netflix Prize FAQ. http://www.netflixprize.com/faq, 2009.
[24]
H. Pang, X. Ding, and X. Xiao. Embellishing text search queries to protect user privacy. PVLDB, 3(1--2): 598--607, 2010.
[25]
P. Samarati. Protecting respondents' identities in microdata release. TKDE, 13(6): 1010--1027, 2001.
[26]
L. Sweeney. k-anonymity: a model for protecting privacy. IJUFKS, 10(5): 557--570, 2002.
[27]
M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. PVLDB, 1(1): 115--125, 2008.
[28]
M. Terrovitis, N. Mamoulis, and P. Kalnis. Local and global recoding methods for anonymizing set-valued data. VLDB Journal, 20(1): 83--106, 2010.
[29]
K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In CIKM, pp. 483--490, 1999.
[30]
X. Xiao and Y Tao. Anatomy: simple and effective privacy preservation. In VLDB, pp. 139--150, 2006.
[31]
Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, pp. 767--775, 2008.
[32]
R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: how to hide a mob in a crowd? In EDBT, pp. 72--83, 2009.
[33]
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD, pp. 401--406, 2001.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 5, Issue 10
June 2012
180 pages

Publisher

VLDB Endowment

Publication History

Published: 01 June 2012
Published in PVLDB Volume 5, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Maximizing data utility while preserving privacy through database fragmentationExpert Systems with Applications10.1016/j.eswa.2025.126873273(126873)Online publication date: May-2025
  • (2024)Optimizing Privacy in Set-Valued Data: Comparing Certainty Penalty and Information GainElectronics10.3390/electronics1323484213:23(4842)Online publication date: 8-Dec-2024
  • (2024)Efficient Multi-Source Anonymity for Aggregated Internet of Vehicles DatasetsApplied Sciences10.3390/app1408323014:8(3230)Online publication date: 11-Apr-2024
  • (2024)A Survey on Trustworthy Recommender SystemsACM Transactions on Recommender Systems10.1145/36528913:2(1-68)Online publication date: 27-Nov-2024
  • (2024)A divide-and-conquer approach to privacy-preserving high-dimensional big data releaseJournal of Information Security and Applications10.1016/j.jisa.2024.10375683:COnline publication date: 8-Aug-2024
  • (2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
  • (2023)A New Approach for Anonymizing Transaction Data with Set ValuesElectronics10.3390/electronics1214304712:14(3047)Online publication date: 12-Jul-2023
  • (2023)Preserving Individual Privacy from Inference Attack in Transaction Data Publishing2023 Eighth International Conference on Informatics and Computing (ICIC)10.1109/ICIC60109.2023.10381942(1-6)Online publication date: 8-Dec-2023
  • (2023)Semantic Attack on Disassociated Transaction DataSN Computer Science10.1007/s42979-023-01781-64:4Online publication date: 20-Apr-2023
  • (2022)Transactional Data Anonymization for Privacy and Information Preservation via Disassociation and Local SuppressionSymmetry10.3390/sym1403047214:3(472)Online publication date: 25-Feb-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media