The advance of information technologies has enabled various organizations (e.g., census agencies, hospitals) to collect large volumes of sensitive personal data (e.g., census data, medical records). Due to the great research value of such data, it is often released for public benefit purposes, which, however, poses a risk to individual privacy. A typical solution to this problem is to anonymize the data before releasing it to the public. In particular, the anonymization should be conducted in a careful manner, such that the published data not only prevents an adversary from inferring sensitive information, but also remains useful for data analysis. This thesis prevents an extensive study on the anonymization techniques for privacy preserving data publishing. We explore various aspects of the problem (e.g., definitions of privacy, modeling of the adversary, methodologies of anonymization), and devise novel solutions that address several important issues overlooked by previous work. Experiments with real-world data confirm the effectiveness and efficiency of our techniques.
Recommendations
Multi-level privacy preserving data publishing
Policedata is an important source of social media data and can be regarded as a technical assistance to increase government accountability and transparency. Notably, it contains large amounts of personal private information that should be preserved ...