0% found this document useful (0 votes)

25 views

2024 Machine Learning Intro

Uploaded by

thapelondlovu74

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

2024 Machine Learning Intro

Uploaded by

thapelondlovu74

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Machine Learning – COMS3007

Introduction to Machine
Learning
Benjamin Rosman
Benjamin.Rosman1@wits.ac.za
“Write a rap about purple
ducks robbing a bank.”

Benjamin Rosman
ChatGPT / Midjourney
Machine Learning – COMS3007

Introduction to Machine
Learning
Benjamin Rosman
Benjamin.Rosman1@wits.ac.za

Midjourney
Course Details
• Lecturer:
• Prof. Benjamin Rosman
• Contact details:
• The course will be run through the Moodle page
• Email: Benjamin.Rosman1@wits.ac.za
• Lecture Venues and Times:
• Lectures will be every Friday from 10h15-12h00 in FNB35.
• Tutorials and Labs:
• Tuts/labs are on Tuesday at 14h15-16h00 in the ground
floor MSL labs. There will be weekly handins, and
occasional quizzes.
Course Outline
1. Introduction to Machine Learning
2. Naïve Bayes and Probability
3. Decision Trees
4. Linear Regression
5. Logistic Regression
6. Neural Networks
7. K-means Clustering
8. Practical Application of ML Methods
9. Principal Components Analysis
10.Reinforcement Learning
Assessments
• Labs: 10%

• Tests/quizzes: 20%

• Assignments: 30%

• Exam: 40%
Textbooks
There is no prescribed textbook. We will loosely be following these
textbooks:
• Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
• Pattern Recognition and Machine Learning, Christopher M. Bishop,
Springer, 2006.
• Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew
G. Barto, The MIT press, 1998.
There are many texts on Machine Learning in the library and on the web
that have information on the above topics.
You can also look up the Coursera course “Machine Learning” by Andrew
Ng (Stanford University) which follows similar content.
What is expected of you?
• You can collaborate unless otherwise stated, but
make sure you acknowledge!
• Labs and the assignment will be done in groups
• Programming in python will be expected but not
taught
• All labs and assignments require programming
• I advise working in groups a lot: you will learn from
each other
• If you use a tool such as ChatGPT, please
acknowledge when you did, and list the prompts
that you used.
Mathematics
This is a maths-heavy course!

You should have familiarity with linear algebra,

calculus, and statistics.

I will explain what is needed as we go along, but you

may need to brush up on some of your old notes, or
look it up online.
Now, onto the fun stuff…
Example: Hand-Written Digits
• Write a program to automatically classify these
Example: Faces
• Write a program to automatically find faces
What is machine learning?
(patterns)
• “the automatic discovery of regularities in data
through the use of computer algorithms and with the
use of these regularities to take actions such as
classifying the data into different categories”
[Bishop, 2007]

• “Machine learning is the study of computer algorithms

that improve automatically through experience”
• “A computer program is said to learn from experience E
with respect to some task T and some performance
measure P, if its performance on T, as measured by P,
improves with experience E.”
[Mitchell, 1997]
What is machine learning?
• Automating the process that has driven science for
centuries
• Collect data, hypothesise relationships (models),
experiment to test hypotheses
• E.g.
• Kepler’s laws of planetary motion
• Newton’s laws of motion
• …
What is machine learning?
• In many of today’s problems it is
• Very hard to write a correct program
• Heuristics, rules, special cases
• E.g. based on character strokes, topology, etc
• But very easy to collect examples
• Text, emails, images, sales, …
• Idea behind machine learning:
• From the examples, generate the program (function)
• This is easier with many examples
• Define many relative to the size of the data, and complexity of
the model
Terms
• Artificial Intelligence (AI)
• Machine Learning
• Data Science
• Big Data
Terms
• Artificial Intelligence (AI) is the field which studies
how to create computers and computer software that
are capable of intelligent behaviour.
• Machine Learning
• Data Science
• Big Data
Terms

• Artificial Intelligence (AI)

• Machine Learning explores the study and
construction of algorithms that can learn from and
make predictions on data.
• Data Science
• Big Data
Terms

• Artificial Intelligence (AI)

• Machine Learning
• Data Science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from large volumes of data.
• Big Data
Terms

• Artificial Intelligence (AI)

• Machine Learning
• Data Science
• Big Data is a broad term for data sets so large or
complex that traditional data processing applications
are inadequate. Challenges include analysis,
capture, data curation, search, sharing, storage,
transfer, visualization, and information privacy.
ML, Stats and Data Science
Many parts of machine learning are similar to ideas from statistics
Emphasis is typically different
• Focus on prediction in machine learning vs interpretation of
the model in statistics
ML often refers to tasks associated with artificial intelligence (AI)
• e.g. recognition, diagnosis, planning, robot control,
prediction, text and image generation, etc.
Goals can be autonomous machine performance, or enabling
humans to learn from data (data mining)
Data science: typically refers to entire pipeline
• From data acquisition and storage to presenting results
• ML is typically the modelling and analysis phase
supervised
Typical learning problem
New example
(unknown label)

Training examples Machine learning

Prediction rule
(known labels) algorithm

Prediction
Hand-Written Digits

What is the ML problem here?

Learn a function y = f(x):
• Mapping from data (x=image) to a class (y=number label)
• Think of this as a program
• E.g. f( ) = “8”
How is this done?
• The function f is some model of the digits
• We provide the general form
• We present the model with a set of N images {x1, x2, …, xN} where the
true solutions {y1, y2, …, yN} are known
• This training tweaks the parameters of the model
• During testing provide a new x’ to determine y’ = f(x’)
Formally
Given some data (input-output pairs):
• x = training input, y = training output
• D= {(x1 ,y1), (x2, y2), …, (xN, yN)}

We need a model:
• Modelling assumptions (or knowledge about the problem) go here!
• y = f(x; θ)
• θ = parameters of model

Now, learning task is to find “best” model parameters θ

• Usually measure some error between y and f(x; θ)
• Treated as an optimisation problem
An example
• Given (features):
• Animal size
• Level of domestication
• Predict (label):
• Cat or dog?
An example
𝑦 = label = {"dog", "cat"}
Training
Parameters of
𝑥2 model define this
line

Features of
the data (2D)

The model here is a

straight line: linear
discriminant

𝑥1
An example
Training
An example
Training
An example
Training
We want the model to generalise to any
An example point in this space that we haven’t seen yet

Querying

?
Categories of ML
• Supervised learning
• Predict output y when given input x
• Learn from labelled data: {(xi, yi)} (Make predictions!)
• Classification: y categorical
• Regression: y real-valued
• Unsupervised learning (Understand data!)
• Learn from unlabelled data: {xi}
• Clustering
• Learning some structure in the data
• Semi-supervised learning (Combine the above!)
• Only some labels provided
• Reinforcement learning
• Learn from rewards (typically delayed)
• Generate own data (experience) through interacting with an environment

(Learn to act/make decisions!)

Common ML Problems

Regression Classification Clustering

• Predict a continuous • Predict a discrete • Given input data x, return
output y, given some class output y, given discrete cluster membership
input x some input x • No training clusters are given,
• Training: examples • Training: examples so the returned clusters may
of (x,y) pairs of (x,y) pairs not mean anything
• Supervised learning • Supervised learning • Unsupervised learning
NB: the labels (S, M, L) added to this t-
shirt example were the result of
someone looking at the clustering
results and interpreting them!

Note: inputs are usually multidimensional

Reinforcement Learning (Learn to act!)

Given an unknown dynamic process (environment),

and an unknown reward process (goal), learn to take
a sequence of actions to maximise reward.
Learn a behaviour by rewards received
Many attempts
• Trial and error -100

Examples: +10
• Learn to fly a helicopter
• Learn to make coffee
• Learn to play chess
Generalising
During training, only a small fraction of possible data will
be provided

Need the resulting f to generalise to (ideally) all cases

• This is what we really care about: we are unlikely to only ever
see the training data!

Beware of over-/under- fitting!

• Performs well on training data
• Poor generalisation!

The more data the better!

?
Generalising
Want a model that is expressive, but not too general
for the amount of data you have

Note: ground truth

(green line) is unknown
to algorithm
Bias-variance trade-off

These models are

These models are said to have a high
said to have a high variance: there is
bias: there is error error from small
from erroneous fluctuations in the
assumptions in the data. This causes
model. This causes overfitting.
underfitting.

The bias-variance trade-off involves

balancing these two factors: both are
different sources of error that affect
generalisation.
Representations
How the data is represented is fundamental!
• Determines if problem can be solved with chosen model

• Or can be solved at all!

• Identify as elephant vs dog
1. Features: mass, height
2. Features: number of legs, number of ears
Representations
Represent data using features
• Low level: pixels, characters, …
• High level: objects, words, regions, …

Trade-off between
• Expressive: accurately capture distinctions in data
• Sparse: not need prohibitive amounts of data

One of the hardest and most important parts of ML!

Curse of Dimensionality
• As dimensionality of model or feature space grows,
may need exponentially more data

How sparse are these 20 data points in each case?

Handling representations
• Feature extraction
• Manual pre-processing

• Feature selection
• Autonomously identify important dimensions

• Feature learning
• Combine simpler features into more complex ones
• E.g. deep learning (when we talk about neural networks)
Data
For any ML algorithm to work, we need data, and
more is always better. In ML, we “let the data do the
talking”.
Much work goes into collecting data sets. For large
models (many parameters), we may need many
millions of examples to learn a good model.
But, how do we know how well
the model will generalise?
Protip: Never trust people
that mess this up!
Splitting the Data
Typically divide the full data set into three:
• Training data: learn the model parameters
• This is the core learning part, and so it needs the most data
• +/- 60% of the data
• Validation data: learn the model hyperparameters
• Hyperparameters are values set before training begins, e.g. the
degree of the polynomial, the complexity of the neural network
• +/- 20% of the data
• Testing data: report quality of model
• This is used to report an unbiased evaluation of the final model
• +/- 20% of the data
Why split the data?
This red model has a perfect fit to the blue
training points: so they will not give a reliable
estimate of how well the model will
x generalise.
x
Instead, we want to test it on new data points
that it has never seen during training. This
gives a better idea of its performance.

Similarly, we may be learning the hyperparameter of the degree of the model (M), by
training a straight line model (M=1), quadratic model (M=2), … up to M=9 and then
seeing which is best. We can train them all on the same training data, but we need to
use different validation data to choose the best one. Again, we can’t just report its
performance on that data, as it is already biased. So, we then need a different testing
set to report final scores.

The test data must not be touched until the very end! It is the “blind/surprise test”.
Example: Polynomial Curve Fitting
Simple regression (supervised learning) problem
Target label (= y)

Training data (noisy, i.e. a small

amount of randomness is added)

True unknown
function: sin(2πx)

Feature (1D)
Goal: given a new x, predict t (target)
A Polynomial Function
Assume the function is polynomial:

Note: this is a linear model

• Linear function of coefficients w (the parameters)
Evaluate:
• Use an error function

Learning:
• Find the weight vector w to minimise error E(w)
• Unique solution w*
• E(w) is quadratic in w Predicted value at x
• E’(w) is linear in w Error between predicted and
true value t
Squared so it is symmetrical
Sum over every data point
½ to make the maths simpler
after differentiating
More on this example in the linear
regression lecture.
Model Selection
Choosing M (polynomial order)
For M = 9, E(w*) = 0! But goal is to generalise!
Training vs Testing Error
Overfitting: high
Define error to compare across N: error on test
data, low error
on training data

Training error is always better than

test error. Why?
Adding More Training Data

More data
• Less severe over-fitting
• More complex model we can fit

There are other strategies to solve this problem (see

regularisation later).
Recap
• What is ML and why do we need it
• Example problems
• Supervised, unsupervised, reinforcement learning
• Generalisation, bias-variance trade-off
• Representations
• Curse of dimensionality
• Training, validation, testing data
• Curve-fitting example

Make sure you are comfortable with this all by next

week!

2022 Naive Bayes and Probability
No ratings yet
2022 Naive Bayes and Probability
30 pages
Install Computer Software Cbet
No ratings yet
Install Computer Software Cbet
42 pages
AnswerKeys M1M2M3M4 PDF
No ratings yet
AnswerKeys M1M2M3M4 PDF
10 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Advanced Hacking Training
No ratings yet
Advanced Hacking Training
33 pages
M800-M80-E80 Series Maintenance Manual Ib1501273engh
No ratings yet
M800-M80-E80 Series Maintenance Manual Ib1501273engh
224 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
SML_Lecture1
No ratings yet
SML_Lecture1
37 pages
Lecture13 - ML Linear & Log-Linear Models
No ratings yet
Lecture13 - ML Linear & Log-Linear Models
34 pages
ML 01 Intro
No ratings yet
ML 01 Intro
24 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
ML PDF
No ratings yet
ML PDF
29 pages
Unit 1 ML
No ratings yet
Unit 1 ML
70 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
ppt4dl
No ratings yet
ppt4dl
81 pages
Module 01 - Introduction (1)
No ratings yet
Module 01 - Introduction (1)
35 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Introduction To ML - MCA - 2023
No ratings yet
Introduction To ML - MCA - 2023
30 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Machine Learning (ML) RIME-832: Dr. Hasan Sajid
No ratings yet
Machine Learning (ML) RIME-832: Dr. Hasan Sajid
57 pages
ML Introduction
No ratings yet
ML Introduction
47 pages
Machine Learning: Foundations: Prof. Nathan Intrator
No ratings yet
Machine Learning: Foundations: Prof. Nathan Intrator
60 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Lec1 -Introduction
No ratings yet
Lec1 -Introduction
55 pages
ML1-Introduction To Machine Learning
No ratings yet
ML1-Introduction To Machine Learning
46 pages
Chapter 5 - Machine Learning Basics
No ratings yet
Chapter 5 - Machine Learning Basics
58 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
03 Machine Learning Overview
No ratings yet
03 Machine Learning Overview
24 pages
Afafdfsregf
No ratings yet
Afafdfsregf
9 pages
L01-intro-clustering
No ratings yet
L01-intro-clustering
96 pages
Lecture1 - Introduction To Machine Learning
No ratings yet
Lecture1 - Introduction To Machine Learning
39 pages
Session 8- Machine Learning Techniques
No ratings yet
Session 8- Machine Learning Techniques
48 pages
Class1 PDF
No ratings yet
Class1 PDF
18 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
ML_Unit_1 (1)
No ratings yet
ML_Unit_1 (1)
124 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
54 pages
Classification
No ratings yet
Classification
53 pages
01 Introduction ML
No ratings yet
01 Introduction ML
60 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
Machine Learning-Lecture 01
No ratings yet
Machine Learning-Lecture 01
28 pages
Learning
No ratings yet
Learning
16 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
37 pages
Learningintro Notes
No ratings yet
Learningintro Notes
12 pages
I2ml Chap1 v1 1
No ratings yet
I2ml Chap1 v1 1
14 pages
Lecture 1
No ratings yet
Lecture 1
51 pages
1.Introduction
No ratings yet
1.Introduction
24 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
Week 01
No ratings yet
Week 01
37 pages
Lec13 ML Intro
No ratings yet
Lec13 ML Intro
27 pages
Lecture1
No ratings yet
Lecture1
56 pages
Notes
No ratings yet
Notes
125 pages
4.1 Machine Learning Basics
No ratings yet
4.1 Machine Learning Basics
26 pages
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
2024 Assignment 1
No ratings yet
2024 Assignment 1
2 pages
tut01
No ratings yet
tut01
1 page
CS125 Introduction To Data Analysis For Social Sciences Midterm Exam Duration: 100 Minutes
No ratings yet
CS125 Introduction To Data Analysis For Social Sciences Midterm Exam Duration: 100 Minutes
2 pages
Putri Maharani - 21346018 - Chapter 8
No ratings yet
Putri Maharani - 21346018 - Chapter 8
6 pages
Dps Complete Parsed Paths
No ratings yet
Dps Complete Parsed Paths
2 pages
Raspberry Pi Arduino Serial Communication - Everything You Need To Know - The Robotics Back-End
No ratings yet
Raspberry Pi Arduino Serial Communication - Everything You Need To Know - The Robotics Back-End
25 pages
Aspect Based Sentiment Analysis Approaches and Algorithms
No ratings yet
Aspect Based Sentiment Analysis Approaches and Algorithms
4 pages
TM03 Website Information Architecture
No ratings yet
TM03 Website Information Architecture
53 pages
Chat GPT
No ratings yet
Chat GPT
2 pages
Oncology: Legacy Data SDTM
100% (1)
Oncology: Legacy Data SDTM
42 pages
VBA Pivot Drill Down
No ratings yet
VBA Pivot Drill Down
8 pages
S - AC0 - 52000888 Report of Vendor Bal PC Wise
No ratings yet
S - AC0 - 52000888 Report of Vendor Bal PC Wise
10 pages
Java Assignment 31 To 60
No ratings yet
Java Assignment 31 To 60
51 pages
Devin Zheng Resume
No ratings yet
Devin Zheng Resume
2 pages
Week 4 Requirement (Seatwork # 3)
No ratings yet
Week 4 Requirement (Seatwork # 3)
3 pages
Wlidsvctrace
No ratings yet
Wlidsvctrace
37 pages
Hacking Presentation Form 4
100% (1)
Hacking Presentation Form 4
10 pages
Grade 8 - Pseudocode W1 L1 & L2
No ratings yet
Grade 8 - Pseudocode W1 L1 & L2
15 pages
(Student Monitoring and Advising System) SRS
100% (1)
(Student Monitoring and Advising System) SRS
10 pages
CV
No ratings yet
CV
6 pages
An Improved Fuzzy Clustering Technique For User's Browsing Behaviors
No ratings yet
An Improved Fuzzy Clustering Technique For User's Browsing Behaviors
4 pages
Performance Analysis of Algorithms
No ratings yet
Performance Analysis of Algorithms
40 pages
There Is No Such Thing As A Microservice!: Chris Richardson
No ratings yet
There Is No Such Thing As A Microservice!: Chris Richardson
71 pages
Invoice 39091
No ratings yet
Invoice 39091
1 page
FortiToken Mobile 4.0.0 User Guide
No ratings yet
FortiToken Mobile 4.0.0 User Guide
14 pages
Epiplex500 Overview Features Benefits
No ratings yet
Epiplex500 Overview Features Benefits
7 pages
Top 5 Ways Amazon Uses CRM 1. Tailored Offers and Promotions
No ratings yet
Top 5 Ways Amazon Uses CRM 1. Tailored Offers and Promotions
3 pages
Currency Conversion From Local Currency To USD in BW - How2BW
No ratings yet
Currency Conversion From Local Currency To USD in BW - How2BW
12 pages