ML Lecture 2 Supervised Learning Setup
ML Lecture 2 Supervised Learning Setup
Learning
Faizad Ullah
1
Traditional Computer Science
Tasks like:
Play an audio/video file
Display a text file on screen
Perform a mathematical operation on two numbers
Sort an array of numbers using Insertion Sort
Search for a string in a text file
…
Data
Output
Program
2
Problems that Traditional CS Can’t Handle
Data
Output
Program?
3
Machine Learning
Regression
Classification
4
Traditional CS
Data
Output
Program
Machine Learning
Data
Program
Output
5
What is Machine Learning?
Formally:
A computer program A is said to learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T , as measured by P, improves with experience
E. (Tom Mitchell, 1997)
Informally:
Algorithms that improve on some task with experience.
6
Machine Learning Pipeline
7
Data – Big, Big,… data!
How do we obtain these massive datasets to train our Machine Learning models?
From real interactions e.g., call centers
Expert annotators e.g., hired tams of annotators
Crowd sourcing
Recaptcha Tagging
8
Task-Label Relationship
Labels are dictated by the task to be performed.
Example: Speech Technologies
What was said? Speech Recognition
11
Challenges of ML – Fairness
AI tends to reflect the biases of the society
Human taggers who mark a recording as misinformation based on accent or gender
Court decisions in country that make a rich person’s acquittal more likely
Automated standardized testing in the US could yield unfavorable results for certain demographic
groups
AI plays a decision role in hiring decisions, with up to 72% of resumes in the US never being viewed by a
human (Automation Bias)
Decision on immigration, bank loans, credit history checks, criminal profiling
12
ML in Low-resource settings
Problems where large datasets and tools are not available
Natural Language Processing and Speech
Pakistan has 71 languages
We barely have speech recognition capabilities for Urdu!
13
Types of Learning
Supervised
Unsupervised
14
Supervised Learning
15
What does a classifier see?
• Features
Day: Night:
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
What does a classifier see?
What does a classifier see?
Day vs. Night Classifier
Unsupervised Learning
20
Supervised Learning Setup
22
Feature Space: Tabular Data
Features/Dimensions Label/Class/Category
But we said a
record
should be 1D!
25
Feature Space: Image Data
The color Image is 3D array (𝑊𝑖𝑑𝑡ℎ × 𝐻𝑒𝑖𝑔ℎ𝑡 × 𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑠)
Color image has 3 channels while grayscale image has 1 channel.
26
Feature Space: Text Data
Suppose you are given labeled textual data in excel sheet
Document# Text Class
Training 1 The Best movie best Pos
2 The Best best ever Pos
3 The Best film Pos
4 The Worst cast ever Neg
Testing 5 The Best best best worst ever ?
Learn the mapping between an email and its label using past labelled
data
Can be retrained on new emails
Not easy to reverse-engineer and circumvent in all cases
Easier to plug the leaks
References
Murphy Chapter 1
Alpaydin Chapter 1
TM Chapter 1
Lectures of Andrew Ng., Dr. Ali Raza, and “Machine Learning for Intelligent Systems
(CS4780/CS5780)”, Kilian Weinberger.
29
Formalizing the Setup
𝑫 = { 𝒙𝟏, 𝒚𝟏 , 𝒙𝟐, 𝒚𝟐 , … , 𝒙𝒏, 𝒚𝒏
⊆𝑿×𝒀
Feature vector
𝑫 = { 𝒙𝟏, 𝒚𝟏 , ⊆𝑿×𝒀
𝒙𝟐, 𝒚𝟐 , … ,
Any categorical attribute can be
𝒙𝒏, 𝒚𝒏 converted to numerical representation.
Where,
𝑥
𝐷𝑖 is
𝑜𝑟the
𝑥 𝑖 dataset
is the input vector of the 𝑖𝑡ℎ sample/record/instance
𝑋 is the label
𝑌 space
d-dimensional feature space (ℝ𝑑)
If we don’t know the distribution,
The data points are drawn from an unknown distribution 𝑃 lets approximate that using
samples we gathered!
𝒙𝒊, 𝒚𝒊 ~𝑷(𝒙, 𝒚)
We want to learn a function ℎ ∈ 𝐻, such that for a new instance (𝒙𝟏, 𝒚)~𝑃
𝒉(𝒙) = 𝒚 with a high probability or at least 𝒉(𝒙) ≈ 𝒚
This also have to be from the In plain words, don’t train on
same distribution as 𝒙𝒊 dogs and ask prediction for cats.
31
Training and Testing: Formally
Testing Data
Training Data Traditional CS
Machine 𝒙~𝑷
Learning
𝒉(𝒙)
𝒙𝟏, 𝒙𝟐,…, 𝒙𝒏
𝒚 𝟏, 𝒚 𝟐, … , 𝒚 𝒏 𝒉
Training Testing
𝒉 𝒙 = 𝒚 (Ideal)
𝒉 𝒙 ≈ 𝒚 (Plausible)
32
Label Space
Binary (Binary classification)
Sentiment: positive / negative
Email: spam / ham
Online Transactions Fraud: Yes
/ No
Tumor: Malignant / Benign
𝑦 ∈ 0,1
𝑦 ∈ {−1, 1}
Real-valued (Regression)
Temperature, height, age, length,
33
weight, duration, price, …
Hypothesis Space
The hypothesis ℎ is sampled from a hypothesis space 𝐻
𝒉∈𝑯 𝑯 ∈ {𝑯𝑫, 𝑯𝑹, 𝑯𝑺𝑽𝑴, 𝑯𝑫𝑳, … }
34
So, how do we choose our ℎ?
Randomly?
Exhaustively?
How do we evaluate 𝒉?
35
How to choose ℎ?
Randomly
May not work well
Like using a random program to solve your sorting problem!
May work if 𝐻 is constrained enough
Exhaustively
Would be very slow!
The space 𝐻 is usually very large (if not infinite)
36
Book Reading
Murphy – Chapter 1
37
References
Murphy Chapter 1
Alpaydin Chapter 1
TM Chapter 1
Lectures of Andrew Ng., Dr. Ali Raza, and “Machine Learning for Intelligent Systems
(CS4780/CS5780)”, Kilian Weinberger.
38