Data Science Ethics - Lecture 1
Data Science Ethics - Lecture 1
Lecture 1
▪ FAT Flow
1
Data Science & Ethics
▪ Understand the ethical aspects of data science
2
Why Care?
1. Expected from society
▪ Generation Z
• Born 1995 – 2010
• 90 million in the US
• Cares about social justics and ethics
https://www.statista.com/statistics/797321/us-population-by-generation/ 3
https://www.mckinsey.com/industries/consumer-packaged-goods/our-insights/true-gen-generation-z-and-its-implications-for-companies
Why Care?
2. Huge potential risks
▪ Be aware of the risks and countermeasures
▪ Risks for humans
• Physical and mental well-being (self-driving)
• Privacy
• Discrimination
4
Why Care?
3. Potential benefits
▪ Understanding ethical concerns and applying techniques to
deal with this, can improve the data, model and be a
marketing instrument
• Remove bias in data: improve the accuracy and fairness of the
model
• Explain predictions: improve the trust the model
• Ensure proper data gathering: better data quality
• Part of a company’s brand (cf. 1. Expected from society)
5
Why Care?
Summary:
▪ Life goal in itself (philosophical goal)
▪ Societal and business reasons:
1. Expected from society
2. Huge potential risks
3. Data science ethics can bring value
6
Why Care?
▪ Future
• Increased digitalization
• Increased automation
• Increased use of AI
➔ EU AI Act
7
Goal of the course
8
AI Ethics in the News (past days)
Course and Evaluation
▪ Weekly classes
▪ Discussions in class
▪ Presentation (4 points, 20% of your final grade)
• Each week one presentation
• On topic of the class before
➢ Present additional techniques, cases or regulations/framework
➢ 15 min. (10 min. ppt + 5 min. Q&A)
➢ Place in context of content seen before
• Groups of 4: make group on Google Doc (avoid overlap)
https://docs.google.com/document/d/1dXqCIwDGAh2vZpa5mbvs0IEWD-jV-
MnyBNNqR_gonUk/edit?usp=sharing
• For the second assessment period, one can choose to keep the grade of the
first assessment period, or to write a paper on the topic.
▪ Closed book exam (16 points , 80% of your final grade)
10
Course and Evaluation
▪ Presentation (4 points, 20% of your final grade)
• Evaluation:
➢ Cohesion and structure of the presentation
➢ Relevance of content and link with course
➢ Presentation and ability to answer questions
➢ Proper referencing
• Strict timing!
• Check the book to ensure you don’t take a topic that is covered later on. It
might be related to a future topic, but not exactly the same.
• When in doubt: mail us.
11
Resources
▪ Slides
▪ Book
• Data Science Ethics: Concepts, Techniques and Cautionary
Tales, Oxford University Press, 2022, 272 p.
• Around 35 Euros
• Available at Acco, bol.com, Amazon, Standaard Boekhandel
https://www.amazon.com/Data-Science-Ethics-Techniques-Cautionary/dp/0192847279/ref=sr_1_1
12
Data science ethics
0 1
1 0 0 1
0
0 0 1 1
1
13
Data science ethics
14
Data science ethics
▪ Data Science Ethics: the domain of what is right and wrong when doing
data science
▪ Responsible AI: the development and application of AI that is aligned with
moral values in society
19
Data Science Ethics Equilibrium
▪ Data Science Ethics Equilibrium: A state of data science
practices determined by the ethical concerns and utility of
data science.
Eg churn prediction
Eg CV sorting
20
Trolley Problem
▪ Well-known thought experiment in ethics:
utilitarian vs deontological ethics
https://en.wikipedia.org/wiki/Trolley_problem 21
Trolley problem (The Good Place)
Trolley Problem
▪ Variants
• What if single person is your child or partner?
• What if you’re on a bridge, and can only stop the trolley by
pushing a man standing next to you off the bridge?
https://en.wikipedia.org/wiki/Trolley_problem 23
Ethics of self-driving cars
E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J.-F. Bonnefon, I. Rahwan (2018). The Moral Machine experiment. Nature. 25
Ethics of self-driving cars
▪ MIT Moral Machine - Global moral preferences
E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J.-F. Bonnefon, I. Rahwan (2018). The Moral Machine experiment. Nature. 26
Data, Algorithms and Models
▪ Definitions
• Data: facts or information, especially when examined and used
to find out things or to make decisions
• Algorithm: a set of rules that must be followed when solving a
particular problem
• Prediction or AI Model: the decision-making formula, which
has been learnt from data by a prediction/AI algorithm
▪ Sensitive data: personal data revealing racial or ethnic origin, political opinions,
religious or philosophical beliefs, or trade union membership, and the processing of
genetic data, biometric data for the purpose of uniquely identifying a natural person, data
concerning health or data concerning a natural person's sex life or sexual orientation shall
be prohibited. (GDPR, Article 9)
28
Different Roles
Humans enter the data science process in different roles
30
FAT
▪ Fair: “Treating people equally without favouritism or
discrimination.”
32
FAT
▪ Fair: “Treating people equally without favouritism or
discrimination.”
2. Discrimination: not discriminating against sensitive groups
➢ Sensitive groups? Often race, gender, sexual preference.
➢ Cf. GDPR Sensitive data
33
FAT
▪ Transparant:”Easy to perceive or detect”
1. Process
➢ Depending on the role
➢ Crucial for Fairness and Accountability
➢ Does not imply revealing company secrets
2. Explainable AI
➢ Explain prediction (model)
34
FAT
▪ Accountable: “Required or expected to justify actions or
decisions; responsible”
• From theory to practice: Obligation to
1. implement appropriate and effective measures to ensure that
principles are complied with,
2. demonstrate compliance of the measures upon request, and
3. recognize potential negative consequences.
3
35
FAT Flow: a Data Science Ethics Framework
▪ Three dimensions
• Stage in the data science process
• Evaluation criterion
• Role of the human
FAT Flow: a Data Science Ethics Framework
privacy discrimination
process explainable
41
FAT Flow: Cautionary Tales
42
Subjectivity of ethics
▪ Who decides what is ethical?
• Companies
• You
43
Subjectivity of ethics
▪ Application
• Fair to use gender and race data?
• Credit scoring vs medical diagnosis
▪ Time
• Women: allowed to vote in US in 1920, in Belgium in 1948, in
Moldova in 1978
• Black people: slavery, allowed to vote in the US in 1920
• Victims of our time:
➢ Those we consider not wrong to discriminate against, but have the
same rights as all humans: people of high age, low income, etc.
➢ Those we consider to have less rights than humans: animals, robots
▪ Location
• Respect for elder, disrespect for criminals, etc.
• Respect for right of individuals vs. state 44
Subjectivity of ethics
45
Discussion Case 1
▪ How important is it to be ethical in data science?
• Pro: absolutely, see “Why Care?”
• Con: no, importance is exaggerated.
46
Fair Data Gathering
▪ Concepts: Privacy, Sample Bias, Surveillance
▪ Techniques: Encryption, Hashing
▪ Cautionary Tales: Government backdoors
47
Transparant Data Gathering
▪ Concepts: Privacy
▪ Techniques: A/B Testing
▪ Cautionary Tales: OK Cupid
48
Discussion Case 2
▪ The SID-IN is a fair (expo) where universities and colleges provide
information on the programs for students and their parents. Since 2020
each participant gets a badge with QR code. Everyone at the fair then asks
to scan the badge, as to have information on who came by and send them
additional information later on. The data gathered is:
• Information available to person providing information at the fair:
name, preference for programs
• Information available to university: Name, school, email, address,
preference for programs
▪ Ethics:
• What are potential uses for data science, what additional information
would be useful (think out of the box and for big impact)?
• How could this be misused?
• Spectrum, balance is to be discussed
49
Fair Data Preperation
▪ Concepts: K-Anonymity, Proxies
▪ Techniques: Input Selection, Defining target variable
▪ Cautionary Tales: Netflix re-identificaiton
https://www.wired.com/2009/12/netflix-privacy-lawsuit/ 50
Transparant Data Preparation
▪ Concepts: Proxies
▪ Techniques: Input selection
▪ Cautionary Tales: Red lining
http://powerreporting.com/color/ 51
Fair Data Modeling
▪ Concepts: PPDM, biased models
▪ Techniques: Homomorphic encr., ZK Proofs, removing bias
▪ Cautionary Tales: Self-driving cars
E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J.-F. Bonnefon, I. Rahwan (2018). The Moral Machine
experiment. Nature. 52
Transparant Data Modeling
▪ Concepts: Black box models
▪ Techniques: Global and instance based explanations
▪ Cautionary Tales: Credit scoring
53
Fair Model Evaluation
▪ Concepts: Privacy, discrimination
▪ Techniques: K-anonymity, detect bias
▪ Cautionary Tales: Predicting Recidivism
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing 54
Transparant Model Evaluation
▪ Concepts: KPIs, Reporting, Explaining
▪ Techniques: Cherrypicking, Backtesting, Explaining models
▪ Cautionary Tales: Apple Card
https://towardsdatascience.com/is-the-medias-reluctance-to-admit-ai-
s-weaknesses-putting-us-at-risk-c355728e9028 55
Fair Model Deployment
▪ Concepts: Access to system
▪ Techniques: overruling
▪ Cautionary Tales: Target
https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html 56
Transparant Model Deployment
▪ Concepts: Unintended consequences, misleading
▪ Techniques: Deep Fake
▪ Cautionary Tales: Uber’s God View
57
https://www.forbes.com/sites/kashmirhill/2014/10/03/god-view-uber-allegedly-stalked-users-for-party-goers-viewing-pleasure/
Beyond Data Science Ethics
▪ Singularity
▪ Skynet
▪ Robot Rights and Duties
58
Ethical AI Frameworks
▪ IEEE Global Initiative on Ethics of Autonomous and
Intelligent Systems (2018)
1. Human Rights
5. Transparency
A/IS shall be created and operated to respect, The basis of a particular A/IS decision should always be
promote, and protect internationally discoverable.
recognized human rights. 6. Accountability
2. Well-being A/IS shall be created and operated to provide an
unambiguous rationale for all decisions made.
A/IS creators shall adopt increased human
7. Awareness of Misuse
well-being as a primary success criterion A/IS creators shall guard against all potential misuses
for development. and risks of A/IS in operation.
3. Data Agency 8. Competence
A/IS creators shall empower individuals with A/IS creators shall specify and operators shall adhere
to the knowledge and skill required for safe and
the ability to access and securely share their
effective operation.
data, to maintain people’s capacity to have
control over their identity.
4. Effectiveness
A/IS creators and operators shall provide
evidence of the effectiveness and fitness
for purpose of A/IS.
59
Ethical AI Frameworks
▪ Ethics guidelines for trustworthy AI (2019)
▪ “The aim of the Guidelines is to promote Trustworthy AI. Trustworthy AI has
three components, which should be met throughout the system's entire life
cycle:
• (1) it should be lawful, complying with all applicable laws and regulations
• (2) it should be ethical, ensuring adherence to ethical principles and values
and
• (3) it should be robust, both from a technical and social perspective since,
even with good intentions, AI systems can cause unintentional harm.
https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai 60
AI Act
▪ European regulation
61
Ethical AI Frameworks
▪ White House Executive Order on Maintaining American
Leadership in Artificial Intelligence, Feb. 2019
• Mentioning of privacy and civil liberties
• No mention of ethics, explainable or transparant
• https://www.whitehouse.gov/presidential-actions/executive-
order-maintaining-american-leadership-artificial-intelligence/
▪ ISO
• ISO/IEC AWI TR 24368
• Information technology — Artificial intelligence — Overview of
ethical and societal concerns
• Status: “Under development”
• https://www.iso.org/standard/78507.html
62
63
Hagendorff 2019 - The Ethics of AI Ethics - An Evaluation of Guidelines
Discussion Case 3
▪ Should Data Science Ethics be mandatory training for all
data science and business students?
▪ For who is it more important: business or data science
students?
▪ Should Data Science Ethics be regulated?
64
Presentation Ideas
▪ Review the recently proposed AI Act, and summarize the
main critiques and open issues
▪ Compare US, Chinese and EU view on AI Ethics
66
Next week
67