0% found this document useful (0 votes)

4 views

Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling

Uploaded by

Niloofar Fallahi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling

Uploaded by

Niloofar Fallahi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Data Science & Ethics

Lecture 5

Data Preprocessing and Modeling: Privacy

Prof. David Martens

david.martens@uantwerp.be
www.applieddatamining.com
@ApplDataMining
AI Ethics in the News
Differential Privacy
▪ Cynthia Dwork et al. (2006)
▪ Data Analysis/Modeling (centralized)
Data Preprocessing (local)
▪ How to analyse data, while preserving privacy?
Census data, surveys, etc.
▪ Goal: Allows social scientist to share useful statistics about
sensitive datasets.
• How many people in Belgium have HIV?
• How many students have financial aid?

2
Differential Privacy
▪ What’s the issue again?
▪ Reporting on data from the students
• Study 1: “In March 2024, there were 86 students taking the DSE
course, 10 of which were on financial aid”
➔ No personal information revealed
• Study 2: “In April 2024, there were 85 students taking the
DSEcourse, 9 of which were on financial aid”
➔ No personal information revealed

• Combination with background knowledge, troublesome!

➢ Student Sam knows that student Tim dropped out of the class in
March, Sam now knows that student Tim was on financial aid.
▪ Similarly with census data, reporting covid cases per zip code,
etc.
Example inspired from Wood et al. (2018) 3
Differential Privacy
▪ Differential privacy: a property of an algorithm:

▪ Note that for small ε, eε ≈ 1 + ε

▪ Whether you are in the dataset on covid numbers/financial

aid, or not, has little impact on the reported numbers
[and hence your privacy]
4
Differential Privacy
▪ Differential privacy: a property of an algorithm:

• ε: depending on algorithm and requirements

➢ Stronger privacy for smaller ε

Strong mathematical defintion of privacy

of an algorithm Hsu et al (2014) 5
Differential Privacy
▪ Two assumptions (for now)
1. Single count query (“how many”)
2. Trusted curator (don’t trust the outside observer)

Curator Outside
Observer
x
D
Result
Algorithm

6
Differential Privacy
▪ Some examples of analysis with sensitive data
▪ Participating in a survey on our class, where you would answer negatively
Prob(your exam will be more difficult after survey without your data) = 1%
eε ≈ 1 + ε for small ε, if ε = 0,01
Prob(your exam will be more difficult after survey with your data) = 1,01%

▪ Participating in medical study looking at cancer and smoking, where you

might fear the insurance premium goes up as you are a smoker
Prob(higher insurance premium after study without your data) = 2%
eε ≈ 1 + ε for small ε, if ε = 0,01
Prob(higher insurance premium after study with your data) = 2,02%

▪ Does not mean the premium will go up, the exam will be more difficult: just that
the increase is limited.

7
Privacy loss parameter ε
▪ Privacy loss parameter ε
▪ Smaller means more privacy (but less accuracy)
▪ If ε = 0
• P(M(D)) = P(M(D’))
• Total privacy, only noise

▪ Rule of thumb
• ε between 0.001 and 1
• Proper ε depends on dataset size and Prob(M(D)). Intuitively: the larger the dataset,
the less the impact of a single data instance.
• “almost no utility is expected from datasets containing 1/ε or fewer records.”
(Nissim et al., 2018)
➢ ε = 0.001 ➔ datasets required of size at least 1000
➢ ε = 0.01 ➔ datasets required of size at least 100

Nissim et al. (2018) Wood et al. (2018) 8

Differential Privacy
▪ Definition, algorithm is differential private if:
• Outcome will remain “largely” the same whether you participate in the dataset
or not
• Give roughly the same privacy to X when X is in the data or not

▪ How to make counting query ε-differential private?

• Add Laplace noise Laplace noise is a type of random noise that follows a Laplace distribution, which is often used in differential privacy and statistical modeling.

In the context of differential privacy, Laplace noise is added to the results of a function
to obscure the contributions of individual data points, helping to protect the privacy of
sensitive information while still allowing for meaningful analysis of the dataset. The
amount of noise added is typically calibrated based on the sensitivity of the function
(i.e., how much a single individual's data can change the output) and a privacy
How many students are
parameter, ensuring that the results remain statistically valid while safeguarding
individual privacy.
on financial aid in the
course DSE?
D
Result
Algorithm

“There were 85 students taking DSE course,

approximately 10 of which passed were on financial aid.”

9
Differential Privacy
▪ Counting
• How many students disliked the course?
• Add Lap(1/ε) noise ➔ ε-differential privacy
• Run analysis twice: two different answers, but there exist accuracy bounds (given ε)

Revisit link with size of dataset:

• Small dataset of 10 records, and the simple
sum function. Adding the noise from the
distribution with ε = 0.01 will lead to very
inaccurate estimates of the sum, where the
noise accounts for most of the answer.
• For a very large dataset with millions of
records, the effect on the accuracy of the
answer is smaller.
• Differential privacy will hence require
increasing the minimal dataset size needed
to provide accurate results.

10
Differential Privacy
▪ Reporting on data from the students
• Study 1: “In March 2024, there were 86 students taking the DSE course, 10 of
which were on financial aid”
➔ No personal information revealed
• Study 2: “In April 2024, there were 85 students taking the DSE course, 9 of
which were on financial aid”
➔ No personal information revealed
• Combination with background knowledge, troublesome!
➢ Student Sam knows that student Tim dropped out of the class in March, Sam now
knows that student Tim was on financial aid.

▪ Differential privacy
• Study 1: “In March 2024, there were 86 students taking the DSE course,
approximately 11 of which were on financial aid”
Study 2: “In April 2024, there were 85 students taking the DSE course,
approximately 8 of which were on financial aid”

Example inspired from Nissem et al. (2018)

Nissim et al. (2018) Differential privacy: A primer for a non-technical audience 11
Differential Privacy
▪ Assumption 1: Single Count Query. Needed?
▪ What if we answer the same question over and over again?
▪ Compositional
• Every analysis has some leakage, accumulates elegantly over more analyses
• Diff. privacy still holds, where if two studies, ε1 ε2 ➔ ε = ε1 + ε2 for combination
• Can create complex algorithms. Differentially private algorithms exist for linear
regression, clustering, classification, etc.

▪ ε as privacy budget (Nissim et al., 2018)

• How much privacy an analysis may use
• How much the risk to an individual’s privacy may increase
• More analyses implies less “budget” for each

Nissim et al. (2018) Differential privacy: A primer for a non-technical audience 12

Differential Privacy
▪ Promises:
• No matter what attack, computing power or additional data: outcome with or
without data similar
• Noone can learn “much” about you because of the data, while still allowing
analysis
• Guess whether your data is in the dataset or not, not much better than random
guess

▪ Does not promise:

• No secrets will be revealed (can be even without participating)
➢ Study finds that professor in Antwerp makes X €
➢ Even if not participated: secret revealed
• Absolute privacy: ε

13
Differential Privacy
▪ Properties
• Quantification of privacy loss
• Compositional: allows for complex differential private techniques
• Immune to post-processing: guarantee no matter the data, technology or
computation power
• Transparant: can reveal what procedure and parameter you used (otherwise
uncertainty on for example accuracy of the results)

• Golden standard of anonymization

14
Differential Privacy
▪ Assumption 2: trusted data curator
▪ Needed?
• Local (decentralized) differential privacy
➢ Don’t trust the curator ➔ Noise added before recording answer
Don’t trust the outside observers
• Centralized differential privacy
➢ Trust the data curator ➔ Noise added after recording answer
Don’t trust the outside observers

Curator Outside
Observer

x x
D
Result
Algorithm

15
Differential Privacy
▪ Two types of differential privacy
▪ Randomized Response: local
• For example, I want to know how much students liked the course,
I ask: “Did you like the class?”
➢ Flip coin:
▪ if head: then write real answer
▪ if tails: flip again, if heads: then write real answer, if tails: write inverse
▪ 75% correct, but total deniability
• If we know 1/3 did not like the class
➢ How many in randomized response result? 5/12
Why? if p is percentage of positive answers, we’d expect to observe positive
answer
▪ 1 out of 4: when it has changed from a negative answer
▪ 3 out of 4: when it is real positive answer
➔ ¼ x (1-p) + ¾ x p
➢ Average over large numbers: will be quite accurate, while having plausible
deniability
Differential Privacy
▪ Two types of differential privacy
▪ Centralized vs decentralized?
• Choice based on risk of hacking, leaks and subpoenas.

17
Differential Privacy
▪ Differential Privacy (Cynthia Dwork et al. 2006)
• Advantages
➢ If linked with additional data added, still privacy
➢ The learnt patterns do generalize
• Challenges
➢ Still can infer properties: with or without data instance (eg Facebook traits or salary)
➢ Learn secrets from large group in dataset (eg Strava)

▪ Large use cases

• Google: usage statistics about Chrome malware (2014)
decentralized
• Apple: usage statistics about iPhone (2016)
• US Census (2020) centralized

18
Data Preprocessing for privacy
▪ How to measure level of anonymity?
• K-anonymity of dataset: issues
• Differential privacy of algorithm: gold standard
▪ Several preprocessing methods to get privacy
• Grouping instances: Aggregation
• Grouping variable values: Discretisation and Generalisation
• Suppressing: replace certain values by *
• Adding noise
▪ Continuum!
Diff. privacy Diff. privacy Removing personal
decentralized centralized t-closeness l-diversity
k-anonymity identifiers
privacy utility
low ε high k low k
high ε
high l low l

low t high t

19
Conclusion
▪ Differential privacy
▪ Dream of analysing data while enabling privacy
▪ Privacy as a matter of accumulated risk, not binary
▪ With clear parameter ε that quantifies privacy loss: the
additional risk to an individual resulting from the use of its data.
▪ Always bounded, mathematically provable.

20
Differential Privacy
▪ Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam Smith (2006)

2016 Time of Test Award 2017 Gödel Award

21
Presentation and Paper Ideas
▪ Differential privacy in action
(use by Apple, Google, Linkedin, etc.)
▪ The trade-offs in value
▪ Practical implementation and examples of
k-anonymity/l-diversity/t-closeness

Pils-300 Manual
No ratings yet
Pils-300 Manual
138 pages
Chemistry 11 Notes PDF
100% (3)
Chemistry 11 Notes PDF
82 pages
J.D.Salinger-this Sandwich Has No Mayonnaise
No ratings yet
J.D.Salinger-this Sandwich Has No Mayonnaise
12 pages
CEOs Guide To Differential Privacy April 2018
No ratings yet
CEOs Guide To Differential Privacy April 2018
17 pages
Signature 1940
100% (2)
Signature 1940
68 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Differential-Privacy - Copy
No ratings yet
Differential-Privacy - Copy
40 pages
Diff Privacy
No ratings yet
Diff Privacy
25 pages
The Algorithmic Foundations of Differential Privacy
No ratings yet
The Algorithmic Foundations of Differential Privacy
281 pages
Privacy Book
No ratings yet
Privacy Book
281 pages
5. Privacy Models Differential Privacy I
No ratings yet
5. Privacy Models Differential Privacy I
27 pages
Research Paper 3
No ratings yet
Research Paper 3
20 pages
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
No ratings yet
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
50 pages
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
100% (1)
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
9 pages
Differential Privacy
No ratings yet
Differential Privacy
56 pages
08 - COE426-Differential Privacy I
No ratings yet
08 - COE426-Differential Privacy I
23 pages
13-4 GoogleDIfferntialPrivacy
No ratings yet
13-4 GoogleDIfferntialPrivacy
20 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
CERIAS Presentation PDF
No ratings yet
CERIAS Presentation PDF
17 pages
A Statistical Framework For Differential Privacy
No ratings yet
A Statistical Framework For Differential Privacy
16 pages
09 - COE426-Differential Privacy II
No ratings yet
09 - COE426-Differential Privacy II
30 pages
Introduction To Differential Privacy
No ratings yet
Introduction To Differential Privacy
11 pages
w9 Differential Privacy
No ratings yet
w9 Differential Privacy
30 pages
Privacy Chapter
No ratings yet
Privacy Chapter
6 pages
Assess Impact of Differential Privacy on Model Performance
No ratings yet
Assess Impact of Differential Privacy on Model Performance
6 pages
Differential Privacy For Non Technical Audience
No ratings yet
Differential Privacy For Non Technical Audience
68 pages
Differentially Private Depth Functions and Their Associated Medians
No ratings yet
Differentially Private Depth Functions and Their Associated Medians
22 pages
Event Data Privacy
No ratings yet
Event Data Privacy
33 pages
Diffrential Privacy
No ratings yet
Diffrential Privacy
35 pages
Wasserstein Differential Privacy: Chengyi Yang, Jiayin Qi, Aimin Zhou
No ratings yet
Wasserstein Differential Privacy: Chengyi Yang, Jiayin Qi, Aimin Zhou
20 pages
2017 Book DifferentialPrivacyAndApplicat PDF
No ratings yet
2017 Book DifferentialPrivacyAndApplicat PDF
243 pages
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
No ratings yet
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
26 pages
Adjacent Initial States Based Differential Privacy F - 2024 - Expert Systems Wit
No ratings yet
Adjacent Initial States Based Differential Privacy F - 2024 - Expert Systems Wit
12 pages
Week 6 - Solution
No ratings yet
Week 6 - Solution
4 pages
Bayesian Differential Privacy for Linear Dynamic System
No ratings yet
Bayesian Differential Privacy for Linear Dynamic System
6 pages
Differential Privacy
No ratings yet
Differential Privacy
12 pages
Comparative Analysis of Differential Privacy Implementations on Synthetic Data
No ratings yet
Comparative Analysis of Differential Privacy Implementations on Synthetic Data
10 pages
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
Differentially Private Instance-Based Noise Mechanisms in Practice
No ratings yet
Differentially Private Instance-Based Noise Mechanisms in Practice
33 pages
WDS Unit 5 Notes
No ratings yet
WDS Unit 5 Notes
20 pages
Privacy Axioms
No ratings yet
Privacy Axioms
36 pages
Kamath 等 - 2019 - Differentially Private Algorithms for Learning Mix
No ratings yet
Kamath 等 - 2019 - Differentially Private Algorithms for Learning Mix
62 pages
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
No ratings yet
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
46 pages
Week 7 - Solution
No ratings yet
Week 7 - Solution
3 pages
Paper 16
No ratings yet
Paper 16
4 pages
2.1 Differential Privacy
No ratings yet
2.1 Differential Privacy
12 pages
Differentially Private Significance Tests For Regression Coefficients
No ratings yet
Differentially Private Significance Tests For Regression Coefficients
15 pages
Correlated-Output Differential Privacy and Applications To Dark Pools
No ratings yet
Correlated-Output Differential Privacy and Applications To Dark Pools
26 pages
Privacy and Utility Tradeoff in Approximate Differential Privacy
No ratings yet
Privacy and Utility Tradeoff in Approximate Differential Privacy
15 pages
Differentially Private Decision Trees
No ratings yet
Differentially Private Decision Trees
5 pages
DP Game Theory 2021
No ratings yet
DP Game Theory 2021
37 pages
Tipos de Ruido en Privacidad Diferencial
No ratings yet
Tipos de Ruido en Privacidad Diferencial
10 pages
Database Anonymization
No ratings yet
Database Anonymization
138 pages
DP-TBART
No ratings yet
DP-TBART
15 pages
Bayesian Differential Privacy for Machine Learning - 1901.09697v5
No ratings yet
Bayesian Differential Privacy for Machine Learning - 1901.09697v5
15 pages
Krishnan Privateclean Final v1
No ratings yet
Krishnan Privateclean Final v1
15 pages
Q2 and 4
No ratings yet
Q2 and 4
4 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
Lecture 09 DifferentialPrivacy
No ratings yet
Lecture 09 DifferentialPrivacy
18 pages
cs6359 hw1 With Hints
No ratings yet
cs6359 hw1 With Hints
2 pages
Aplication of Differential Privacy On A Medical Dataset of The Health System in Colombia
No ratings yet
Aplication of Differential Privacy On A Medical Dataset of The Health System in Colombia
35 pages
Differential_Privacy_for_Deep_and_Federated_Learning_A_Survey
No ratings yet
Differential_Privacy_for_Deep_and_Federated_Learning_A_Survey
22 pages
A Bird's Eye view of Data Visualisation
From Everand
A Bird's Eye view of Data Visualisation
Nisarg Patel
No ratings yet
Data Analysis & Probability - Task Sheets Gr. 3-5
From Everand
Data Analysis & Probability - Task Sheets Gr. 3-5
Tanya Cook
No ratings yet
Data Science Ethics - Lecture 10 - Ethical Deployment
No ratings yet
Data Science Ethics - Lecture 10 - Ethical Deployment
60 pages
Python Tutorial Text 2024-1
No ratings yet
Python Tutorial Text 2024-1
82 pages
Principles of Mgmt Accounting_class7
No ratings yet
Principles of Mgmt Accounting_class7
32 pages
Data Science Ethics - Lecture 2
No ratings yet
Data Science Ethics - Lecture 2
36 pages
Data Science Ethics - Lecture 3
No ratings yet
Data Science Ethics - Lecture 3
79 pages
Data Science Ethics - Lecture 4 - Discrimination and Privacy in Data Preprocessing (re-identification) v2
No ratings yet
Data Science Ethics - Lecture 4 - Discrimination and Privacy in Data Preprocessing (re-identification) v2
47 pages
Data Science Ethics - Lecture 9 - Ethical Reporting
No ratings yet
Data Science Ethics - Lecture 9 - Ethical Reporting
35 pages
Data Science Ethics - Lecture 1
No ratings yet
Data Science Ethics - Lecture 1
68 pages
Principles of Mgmt Accounting_class2
No ratings yet
Principles of Mgmt Accounting_class2
43 pages
Principles of Mgmt Accounting_class4&5
No ratings yet
Principles of Mgmt Accounting_class4&5
139 pages
Principles of Mgmt Accounting_class1
No ratings yet
Principles of Mgmt Accounting_class1
46 pages
Data Science Ethics - Lecture 4 - Discrimination and Privacy in Data Preprocessing (Re-Identification) v2
No ratings yet
Data Science Ethics - Lecture 4 - Discrimination and Privacy in Data Preprocessing (Re-Identification) v2
46 pages
Data Science Ethics - Lecture 1
No ratings yet
Data Science Ethics - Lecture 1
68 pages
S. No Basis of Comparison Baseband Transmission Broadband Transmission
No ratings yet
S. No Basis of Comparison Baseband Transmission Broadband Transmission
2 pages
Videkfejlesztes
No ratings yet
Videkfejlesztes
141 pages
Instant Access to Conflict Power and Organizational Change 1st Edition Deborah A Colwill ebook Full Chapters
No ratings yet
Instant Access to Conflict Power and Organizational Change 1st Edition Deborah A Colwill ebook Full Chapters
40 pages
Project Execution Plan
100% (1)
Project Execution Plan
36 pages
S5 Ch.5 Permutation and Combination
No ratings yet
S5 Ch.5 Permutation and Combination
15 pages
AF2602 Course Outline Sem 1 2021-2022
No ratings yet
AF2602 Course Outline Sem 1 2021-2022
12 pages
IPE Syllabus 2024 (Final_Update) (1)
No ratings yet
IPE Syllabus 2024 (Final_Update) (1)
40 pages
June 2012 (v2) MS - Paper 1 CIE Biology IGCSE
No ratings yet
June 2012 (v2) MS - Paper 1 CIE Biology IGCSE
2 pages
Continuing Professional Development (CPD) CPD Council of Chemical Engineering List of Accredited Providers AS OF JUNE 30, 2015
No ratings yet
Continuing Professional Development (CPD) CPD Council of Chemical Engineering List of Accredited Providers AS OF JUNE 30, 2015
1 page
Template - Quarterly Accomplishment Report
No ratings yet
Template - Quarterly Accomplishment Report
6 pages
coffee supplies direct Invoice 2nd
No ratings yet
coffee supplies direct Invoice 2nd
1 page
Agitador Burrell 51100-XX PDF
No ratings yet
Agitador Burrell 51100-XX PDF
7 pages
Mobile Email Database of Doctors Sample
No ratings yet
Mobile Email Database of Doctors Sample
11 pages
Story of Hayagreeva
No ratings yet
Story of Hayagreeva
2 pages
Sl. No. Institution Code Institution Name: Public Sector Banks
No ratings yet
Sl. No. Institution Code Institution Name: Public Sector Banks
21 pages
Soalan Matematik Tambahan Kertas 1 SABAH
No ratings yet
Soalan Matematik Tambahan Kertas 1 SABAH
18 pages
CCPP Final Report
No ratings yet
CCPP Final Report
325 pages
CED Assignment 2&3
No ratings yet
CED Assignment 2&3
4 pages
Club Body Boss Training Plans
No ratings yet
Club Body Boss Training Plans
13 pages
El402: Embedded System Design CREDITS 5 (L 3, T 0, P 2)
No ratings yet
El402: Embedded System Design CREDITS 5 (L 3, T 0, P 2)
2 pages
New Disney
No ratings yet
New Disney
3 pages
Abdullah Infrared Radiation
No ratings yet
Abdullah Infrared Radiation
24 pages
The Father of Meth
67% (6)
The Father of Meth
4 pages
Topic 1 Accounting Changes and Correction of Errors
No ratings yet
Topic 1 Accounting Changes and Correction of Errors
2 pages
STL Varun Ehs M A02 Hira R0
No ratings yet
STL Varun Ehs M A02 Hira R0
6 pages
Powering Large Scale Network
No ratings yet
Powering Large Scale Network
8 pages
Z Yousuf Resume 3
No ratings yet
Z Yousuf Resume 3
2 pages