Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
51 views

ML Lecture 2 Supervised Learning Setup

This document provides an overview of machine learning concepts including: - Traditional computer science tasks vs problems machine learning can handle better - The machine learning pipeline involving data, algorithms, and outputs - Key concepts like supervised vs unsupervised learning, classification vs regression, challenges around explainability and fairness It uses examples like medical diagnosis, text analysis, and image recognition to illustrate machine learning applications and how data is represented as feature vectors to train models.

Uploaded by

Faizad Ullah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

ML Lecture 2 Supervised Learning Setup

This document provides an overview of machine learning concepts including: - Traditional computer science tasks vs problems machine learning can handle better - The machine learning pipeline involving data, algorithms, and outputs - Key concepts like supervised vs unsupervised learning, classification vs regression, challenges around explainability and fairness It uses examples like medical diagnosis, text analysis, and image recognition to illustrate machine learning applications and how data is represented as feature vectors to train models.

Uploaded by

Faizad Ullah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

CSCS 460 – Machine

Learning
Faizad Ullah

1
Traditional Computer Science
 Tasks like:
 Play an audio/video file
 Display a text file on screen
 Perform a mathematical operation on two numbers
 Sort an array of numbers using Insertion Sort
 Search for a string in a text file
 …

Data
Output
Program

2
Problems that Traditional CS Can’t Handle

Tumor? Y/N Price? What was said? Summarize text

Data
Output
Program?

3
Machine Learning
Regression
Classification

4
Traditional CS
Data
Output

Program

Machine Learning
Data
Program
Output

5
What is Machine Learning?
 Formally:
 A computer program A is said to learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T , as measured by P, improves with experience
E. (Tom Mitchell, 1997)

 Informally:
 Algorithms that improve on some task with experience.

To train a classifier, we need labelled data (called dataset)

6
Machine Learning Pipeline

7
Data – Big, Big,… data!
 How do we obtain these massive datasets to train our Machine Learning models?
 From real interactions e.g., call centers
 Expert annotators e.g., hired tams of annotators
 Crowd sourcing

Recaptcha Tagging

8
Task-Label Relationship
 Labels are dictated by the task to be performed.
 Example: Speech Technologies
What was said? Speech Recognition

Who said it? Speaker Recognition

Was it John Doe? Speaker Verification

Did it mention “hey Google”? Keyword Detection


What’s the language? Language Identification

Is the language native for the speaker?


What is their height?
What is the age of the speaker?
What is emotional state?
What was the sentiment?
Is the voice fake?
9
Task-Label Relationship
 Example: Text Technologies

Who wrote it?


Summary of what was written?
Was it plagiarized?
What was the intent?
What language is this?
Is the language native for the speaker?
What is author’s literacy level?
What is the topic of this document?
What is emotional state?
What was the sentiment?
Can we fake this writing style?
10
Challenges of ML - Explainability
 A classifier can potentially learn to classify on the basis of features not desirable for humans
 All dogs waring a collar in the training data while no cat is wearing it – ML just learns to separate based
on collar
 All horse images have a copyrights notice – ML just learns to recognize horses based on the copyrights
notice

 Explainable ML: The results should be understandable by humans


 As opposed to a black-box system

11
Challenges of ML – Fairness
 AI tends to reflect the biases of the society
 Human taggers who mark a recording as misinformation based on accent or gender
 Court decisions in country that make a rich person’s acquittal more likely
 Automated standardized testing in the US could yield unfavorable results for certain demographic
groups
 AI plays a decision role in hiring decisions, with up to 72% of resumes in the US never being viewed by a
human (Automation Bias)
 Decision on immigration, bank loans, credit history checks, criminal profiling

12
ML in Low-resource settings
 Problems where large datasets and tools are not available
 Natural Language Processing and Speech
 Pakistan has 71 languages
 We barely have speech recognition capabilities for Urdu!

13
Types of Learning
Supervised

The outcome is provided along with the data.

Unsupervised

The outcome is NOT provided along with the data.

14
Supervised Learning

15
What does a classifier see?
• Features

Day: Night:
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
What does a classifier see?
What does a classifier see?
Day vs. Night Classifier
Unsupervised Learning

20
Supervised Learning Setup

22
Feature Space: Tabular Data
Features/Dimensions Label/Class/Category

Height Weight B.P.Sys B.P.Dia Heart


(inches) (kgs) disease
62 70 120 80 No Record is 4-dimensional Feature Vector
72 90 110 70 No
74 80 130 70 No
65 120 150 90 Yes
Training Data/Training Split
67 100 140 85 Yes
64 110 130 90 No
69 150 170 100 Yes
66 125 145 90 ?
Testing Data/Testing Split
74 67 110 60 ?

As labels are discrete, this is a classification task.


23
Feature Space: Tabular Data
Features/Dimensions Label

Height Weight B.P.Sys B.P.Dia Choleste


(inches) (kgs) rol Level
62 70 120 80 150 A Record is 4-dimensional Feature Vector
72 90 110 70 165
74 80 130 70 135
65 120 150 90 210
Training Data/Training Split
67 100 140 85 195
64 110 130 90 125
69 150 170 100 250
66 125 145 90 ?
Testing Data/Testing Split
74 67 110 60 ?

As labels are continuous, this is a regression task.


24
Feature Space: Image Data
 Images are nothing but a 2D/3D arrays with values of color
intensities, typically ranging 𝟎 − 𝟐𝟓𝟓

But we said a
record
should be 1D!

25
Feature Space: Image Data
 The color Image is 3D array (𝑊𝑖𝑑𝑡ℎ × 𝐻𝑒𝑖𝑔ℎ𝑡 × 𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑠)
 Color image has 3 channels while grayscale image has 1 channel.

26
Feature Space: Text Data
 Suppose you are given labeled textual data in excel sheet
Document# Text Class
Training 1 The Best movie best Pos
2 The Best best ever Pos
3 The Best film Pos
4 The Worst cast ever Neg
Testing 5 The Best best best worst ever ?

the best movie ever film worst cast label


1 1 1 0 0 0 0 1
1 1 0 1 0 0 0 1
1 1 0 0 1 0 0 1
1 0 0 1 0 1 1 0
These are called “Binary Occurrences” features.
1 1 0 1 0 1 0 ?
27
Rules vs. Learning
 Suppose we are working on classification of emails into “spam” and “ham”
(not spam)
 We can write a complicated set of rules
 Works well for a while
 Cannot adapt well to new emails
 Program could be reverse-engineered and circumvented

 Learn the mapping between an email and its label using past labelled
data
 Can be retrained on new emails
 Not easy to reverse-engineer and circumvent in all cases
 Easier to plug the leaks
References
 Murphy Chapter 1
 Alpaydin Chapter 1
 TM Chapter 1

 Lectures of Andrew Ng., Dr. Ali Raza, and “Machine Learning for Intelligent Systems
(CS4780/CS5780)”, Kilian Weinberger.

 This disclaimer should serve as adequate citation.

29
Formalizing the Setup
𝑫 = { 𝒙𝟏, 𝒚𝟏 , 𝒙𝟐, 𝒚𝟐 , … , 𝒙𝒏, 𝒚𝒏
⊆𝑿×𝒀

Feature vector
𝑫 = { 𝒙𝟏, 𝒚𝟏 , ⊆𝑿×𝒀
𝒙𝟐, 𝒚𝟐 , … ,
Any categorical attribute can be
𝒙𝒏, 𝒚𝒏 converted to numerical representation.
 Where,
𝑥
𝐷𝑖 is
𝑜𝑟the
𝑥 𝑖 dataset
is the input vector of the 𝑖𝑡ℎ sample/record/instance
𝑋 is the label
𝑌 space
d-dimensional feature space (ℝ𝑑)
If we don’t know the distribution,
The data points are drawn from an unknown distribution 𝑃 lets approximate that using
samples we gathered!
𝒙𝒊, 𝒚𝒊 ~𝑷(𝒙, 𝒚)

We want to learn a function ℎ ∈ 𝐻, such that for a new instance (𝒙𝟏, 𝒚)~𝑃
𝒉(𝒙) = 𝒚 with a high probability or at least 𝒉(𝒙) ≈ 𝒚
This also have to be from the In plain words, don’t train on
same distribution as 𝒙𝒊 dogs and ask prediction for cats.
31
Training and Testing: Formally

Testing Data
Training Data Traditional CS
Machine 𝒙~𝑷
Learning
𝒉(𝒙)
𝒙𝟏, 𝒙𝟐,…, 𝒙𝒏

𝒚 𝟏, 𝒚 𝟐, … , 𝒚 𝒏 𝒉

Label/Ground Truth Prediction


Model

Training Testing
𝒉 𝒙 = 𝒚 (Ideal)
𝒉 𝒙 ≈ 𝒚 (Plausible)

32
Label Space
 Binary (Binary classification)
 Sentiment: positive / negative
 Email: spam / ham
 Online Transactions Fraud: Yes
/ No
 Tumor: Malignant / Benign
 𝑦 ∈ 0,1
 𝑦 ∈ {−1, 1}

 Multi-class (multi-class classification)


 Sentiment: Positive / Negative / Neutral
 Emotion: Happy / Sad / Surprised /
Angry / …
 Parts of Speech Tag: Noun / Verb /
Adjective / Adver / …
 𝑦 ∈ {0,1,2, … }

 Real-valued (Regression)
 Temperature, height, age, length,
33
weight, duration, price, …
Hypothesis Space
 The hypothesis ℎ is sampled from a hypothesis space 𝐻
𝒉∈𝑯 𝑯 ∈ {𝑯𝑫, 𝑯𝑹, 𝑯𝑺𝑽𝑴, 𝑯𝑫𝑳, … }

 𝐻 can be thought of to contain types of hypotheses, which share


sets of assumptions like:
 Support Vector Machines 𝑯𝑺𝑽𝑴 ∈ {𝑯𝟏, 𝑯𝟐, … }

 Decision Tree 𝑯𝑫 ∈ {𝑯𝟏, 𝑯𝟐, … } 𝒉 ∈ 𝑯𝑫


 Perception 𝑯𝑷 ∈ {𝑯𝟏, 𝑯𝟐, … }

 Neural Networks 𝑯𝑵𝑵 ∈ {𝑯𝟏, 𝑯𝟐, … }


…
Selection done
Selection done automatically.
 For example: ℎ ∈ 𝐻 for 𝐻 decision trees: manually.
 Would be instance of decision trees of different height, arity, thresholds etc.

34
So, how do we choose our ℎ?
 Randomly?
 Exhaustively?

How do we evaluate 𝒉?

35
How to choose ℎ?
 Randomly
 May not work well
 Like using a random program to solve your sorting problem!
 May work if 𝐻 is constrained enough

 Exhaustively
 Would be very slow!
 The space 𝐻 is usually very large (if not infinite)

 𝐻 is usually chosen by ML Engineers (You!) based on their experience


 ℎ ∈ 𝐻 is estimated efficiently using various optimization techniques (math
alert!)

Before moving to finding 𝒉, let’s first evaluate the labels.

36
Book Reading
 Murphy – Chapter 1

37
References
 Murphy Chapter 1
 Alpaydin Chapter 1
 TM Chapter 1

 Lectures of Andrew Ng., Dr. Ali Raza, and “Machine Learning for Intelligent Systems
(CS4780/CS5780)”, Kilian Weinberger.

 This disclaimer should serve as adequate citation.

38

You might also like