Abstract
Current technology has made the publication of people’s private information a common occurrence. The implications for individual privacy and security are still largely poorly understood by the general public but the risks are undeniable as evidenced by the increasing number of identity theft cases being reported recently. Two new definitions of privacy have been developed recently to help understand the exposure and how to protect individuals from privacy violations, namely, anonymized privacy and personalized privacy. This paper develops a methodology to validate whether a privacy violation exists for a published dataset. Determining whether privacy violations exist is a non-trivial task. Multiple privacy definitions and large datasets make exhaustive searches ineffective and computationally costly. We develop a compact tree structure called the Privacy FP-Tree to reduce the costs. This data structure stores the information of the published dataset in a format that allows for simple, efficient traversal. The Privacy FP-Tree can effectively determine the anonymity level of the dataset as well as identify any personalized privacy violations. This algorithm is O (n log n) , which has acceptable characteristics for this application. Finally, experiments demonstrate the approach is scalable and practical.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Friedman, R.W., Schuster, A.: Providing k-anonymity in data mining. In: The VLDB Journal 2008, pp. 789–804 (2008)
Machanavajjhala, J.G., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: Proc. 22nd Intnl. Conf. Data Engg. (ICDE), p. 24 (2006)
Narayanan, Shmatikov, V.: Robust De-anonymization of Large Datasets, February 5 (2008)
Dwork: An Ad Omnia Approach to Defining and Achieving Private Data Analysis. In: Proceedings of the First SIGKDD International Workshop on Privacy, Security, and Trust in KDD
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Chen, W., et al. (eds.) Proc. Int’l Conf. Management of Data, pp. 1–12 (2000)
Sweeney, L.: K-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Sweeney, L.: Weaving technology and policy together to maintain confidentiality. J. of Law, Medicine and Ethics 25(2-3), 98–110 (1997)
Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice. Springer, Heidelberg (1996)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Anonymity preserving pattern discovery. The VLDB Journal 2008, 703–727 (2008)
Wong, R., Li, J., Fu, A., Wang, K.: (α, k)Anonymity: An Enhanced k-Anonymity Model for Privacy Preserving Data Publishing. In: KDD (2006)
Hansell, S.: AOL removes search data on vast group of web users. New York Times (August 8, 2006)
Xiao, X., Tao, Y.: Personalized Privacy Preservation. In: SIGMOD (2006)
Chin, F.Y., Ozsoyoglu, G.: Auditing and inference control in statistical databases. IEEE Trans. Softw. Eng. SE-8(6), 113–139 (1982)
Liew, K., Choi, U.J., Liew, C.J.: A data distortion by probability distribution. ACM TODS 10(3), 395–411 (1985)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pun, S., Barker, K. (2009). Privacy FP-Tree. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04205-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-04205-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04204-1
Online ISBN: 978-3-642-04205-8
eBook Packages: Computer ScienceComputer Science (R0)