Machine Learning and Pattern Recognition Notation

The document discusses notation conventions used in a machine learning course. It covers indexing, vectors, matrices, probabilities, integrals, derivatives, and frequently used symbols. Notation can differ between sources but understanding various presentations is an important research skill.

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Machine Learning and Pattern Recognition Notation

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Notation

Textbooks, papers, code, and other courses will all use different names and notations for the
things covered in this course. While learning a subject, these differences can be confusing.
However, dealing with different notations is a necessary research skill. There will always be
differences in presentation in different sources, due to different trade-offs and priorities.
We try to make the notation fairly consistent within the course. This note lists some of the
conventions that we’ve chosen to follow that might be unfamiliar.
You can probably skip over this note at the start of the class. Most notation is introduced
in the notes as we use it. However, everything mentioned here is something that has been
queried by previous students of this class. So please refer back to this note if you find any
unfamiliar notation later on.

1 Vectors, matrices, and indexing

Indexing: Where possible, these notes use lower-case letters for indexing, with the corre-
sponding capital letters for the numbers of settings. Thus the D ‘dimensions’ or elements of a
vector are indexed with d = 1 . . . D, and N training data-points are indexed with n = 1 . . . N.
As a result, the notation won’t match some textbooks. In statistics it’s common to index
data-items with i = 1 . . . n, and parameters with j = 1 . . . p.
Vectors in this course are all column-vectors — these are just columns of numbers, you
don’t need to know about more abstract vector-spaces in this course. We use bold-face
lower-case letters to denote these column vectors in the typeset notes. For example, we write
a D-dimensional vector as:
 
x1
 x2 
x =  .  = [ x1 x2 · · · x D ] > . (1)
 
 .. 
xD

If we need to show the contents of a column vector, we often create it from a row-vector
with the transpose > to save vertical space in the notes. We use subscripts to index into a
vector (or matrix).
Depending on your background, you might be more familiar with ~x than x.
In handwriting, we underline vectors: x because it’s difficult to handwrite in bold! While not
everyone underlines vectors, and some authors use arrows, we recommend you write with
the same notation in this class to help communication. But as long is it’s clear from context
what you are doing, it’s not critical. We often forget to underline vectors when writing, so
we understand.
Matrices (for us rectangular arrays of numbers) are upper-case letters, like A and Σ. We’ve
chosen not to bold them, even though there are sometimes numbers represented by upper-
case letters floating around (such as D for number of dimensions). It should usually be clear
from context which quantities are matrices, and what size they are. See the maths cribsheet
for details on indexing matrices, and how sizes match in matrix-vector multiplication.
Addition: sizes of vectors or matrices should match when adding quantities: a + b, or A + B.
As an exception, to add a scalar c to every element, we’ll just write a + c or A + c.
Indexing items: Sometimes we use superscripts to identify items, such as x(n) for the nth
D-dimensional input vector in a training set with N examples. We can (and often do) stack
these vectors into an N × D matrix X, so we could use a notation such as Xn,: to access the
nth row. In this case we chose to introduce the extra superscript notation instead. In the

MLPR:w0g Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

past, many students have found it hard to follow the distinction between datapoints and
feature dimensions when studying “kernel methods” such as Gaussian processes. We hope
a notation where datapoint identity and feature dimensions look different will help avoid
confusion later.

2 Probabilities
The probability mass of a discrete outcome x is written P( x ).
When it doesn’t seem necessary (nearly always) we don’t introduce notation for a corre-
sponding random variable X, and write more explicit expressions like PX ( x ) or P( X = x ).
Notation is a trade-off, and more explicit notation can be more of a burden to work with.
Joint probabilities: P( x, y). Conditional probabilities: P( x | y). Conditional probabilities
are sometimes written in the literature as P( x; y) — especially in frequentist statistics rather
than Bayesian statistics. The ‘|’ symbol, introduced by Jeffreys, is historically associated with
Bayesian reasoning. Hence for arbitrary functions, like f (x; w), where we want to emphasize
that it’s primarily a function of x controlled by parameters w, we’ve chosen not to use a ‘|’.
The probability density of a real-valued outcome x is written with a lower-case p( x ), such
Rb
that P( a < X < b) = a p( x ) dx. We tend not to introduce new symbols for density functions
over different variables, but again overload the notation: we call them all “p” and infer which
density we are talking about from the argument.
Gaussian distributions are reviewed later in the notes. We will write that an outcome x was
sampled from a Gaussian or normal distribution using x ∼ N (µ, Σ). We write the probability
density associated with that outcome as N ( x; µ, Σ). We could also have chosen to write
N ( x | µ, Σ), as Bishop and Murphy do. The ‘;’ was force of habit, because the Gaussian
(outside of this course) is used in many contexts, and not just Bayesian reasoning.

3 Integrals and expectations

All of the integrals in this course are definite integrals corresponding to expectations. For a
real-valued quantity x, we write:
Z
E p( x) [ f ( x )] = E[ f ( x )] = f ( x ) p( x ) dx. (2)

This
R ∞ is a definite
R integral over the whole range of the variable x. We might have written
−∞ . . . or X . . ., but because our integrals are always over the whole range of the variable,
we don’t bother to specify the limits.
The expectation notation is often quicker to work with than writing out the integral. As
above, we sometimes don’t specify the distribution (especially when handwriting), if it can
be inferred from context.
Please do review the background note on expectations and sums of random variables.
Throughout the course we will see generalizations of those results to real-valued variables
(as above) and expressions with matrices and vectors. You need to have worked through the
basics.
You may have seen multiple dimensional integrals written with multiple integral signs, for
example for a 3-dimensional vector:
ZZZ
f (x) dx1 dx2 dx3 . (3)

Our integrals are often over high-dimensional vectors, so rather than writing potentially
hundreds of integral signs, we simply write:
Z
f (x) dx. (4)

MLPR:w0g Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

And if we are integrating over multiple vectors we still might only write one integral sign:
Z
f (x, z) dx dz. (5)

4 Derivatives
∂f ∂ sin(yx )
Partial derivative of a scalar with respect to another scalar: ∂x . Example: ∂x = y cos(yx ).
∂f >
h i
∂f ∂f
Column vector of partial derivatives: ∇w f = ∂x ∂x · · · ∂x .
1 2 D

∂y
These notes avoid writing derivatives involving vectors as Usually this expression would ∂z .

∂y ∂yi ∂f
be a matrix with ∂z = ∂z . Under this common convention, the derivative of a scalar ∂w
ij j

is a row vector, (∇w f )> .

We also don’t write ∂A
∂B for matrices A and B. While we could put all of the partial derivatives
n ∂A o
ij
∂B into a 4-dimensional array, that’s often not a good idea in machine learning.
kl

Later in the course we will review a notation more suitable for computing derivatives, where
derivative quantities are stored in arrays of the same size and shape as the original variables.
All will be explained in the note on Backpropagation of Derivatives.

5 Frequently used symbols

We try to reserve some letters to mean the same thing throughout the course, so you can
recognize what the terms in equations are at a glance.

• D number of dimensions in input vector x, number of features.

• N number of (training) data-points.
• M number of test or validation points, but used as some matrix in parts of the notes. . .
• K number of components in a model, or number of transformed features in φ(x).
• f a scalar function, usually the output of a machine learning function. The vector f is
often a vector containing the N function values for N training inputs. In some contexts
later on, it could be a vector of outputs from a vector-valued function applied to a
single input.
• f (x; w) depends on both the input feature vector x, and the parameters w. The semi-
colon (rather than a comma) is somewhat arbitrary, but separates the input that we
will provide when we use the function later, from the parameters that we fit at training
time.
• x a vector input to a machine learning system, also called the features or a feature vector.
• X an N × D matrix of (training) inputs. Occasionally code (especially when using the
Fortran memory layout) expects your data to be stored in a D × N array. It is worth
double-checking that your matrices are the correct way around when calling library
routines, and leave comments in your own code to document the intended sizes of
arrays. Sometimes I leave a comment at the end of a line that simply gives the size or
shape of the result: for example, “# (N,D)”.
• y is a target output that we predict with f (x). A vector y could be a vector of N targets
for all the training examples, or a vector target output for single input, depending on
context.
• w (or W) a vector (or matrix, or collection) of parameters or ‘weights’. In statistics
textbooks and papers, linear regression weights are often called β.

MLPR:w0g Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

• b a constant offset or intercept in linear regression, in neural networks language a
“bias weight” or simply “bias”. There’s an unfortunate clash in terminology: a “bias
weight” does not set the statistical “bias” of the model (its expected test error, averaged
over different training sets). In other sources this constant could have various other
symbols, including β 0 , w0 , or c.
• θ like w often used for parameters. Sometimes used to mean a collection of all the
parameters in the model. Used for the ‘hyperparameters’ in Gaussian processes.
• φ(x) vector-valued function (usually non-linear ‘basis functions’) used to replace
features with a new representation early in the course. In a neural network (later in
the course) the equivalent is a hidden layer, which could be called h.
• Φ (a capital φ), like X a matrix of inputs, but now containing N × K transformed inputs
made with the φ function.
• σ either a standard deviation, or the logistic sigmoid function σ ( a) = 1/(1 + e− a ).

MLPR:w0g Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 4

2024 SIGNA Hero - Hoan My Hospital
No ratings yet
2024 SIGNA Hero - Hoan My Hospital
136 pages
Principles of Magnetic Resonance Imaging: A Signal Processing Perspective
No ratings yet
Principles of Magnetic Resonance Imaging: A Signal Processing Perspective
44 pages
Ipendant Customization Manual Ver.7.70 (MAROC77CG01101E Rev.
100% (1)
Ipendant Customization Manual Ver.7.70 (MAROC77CG01101E Rev.
222 pages
All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
Elementary Linear Algebra - Stephen Andrilli, David Hecker
No ratings yet
Elementary Linear Algebra - Stephen Andrilli, David Hecker
124 pages
SM (1e) PDF
No ratings yet
SM (1e) PDF
212 pages
PR-005-D EAA-ProAssure Base User Operations
No ratings yet
PR-005-D EAA-ProAssure Base User Operations
207 pages
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
No ratings yet
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
46 pages
Reader 133A
No ratings yet
Reader 133A
164 pages
2016 STATS 302 Course Book
No ratings yet
2016 STATS 302 Course Book
160 pages
SolutionManual Ch1 2
No ratings yet
SolutionManual Ch1 2
14 pages
D2L CH2 Part2
No ratings yet
D2L CH2 Part2
40 pages
Linear Algebra For Business Analytics
No ratings yet
Linear Algebra For Business Analytics
27 pages
Complete UNIT III DEEP LEARNING PPT (1)
No ratings yet
Complete UNIT III DEEP LEARNING PPT (1)
126 pages
Lecture 3 - Introduction To Linear Algebra, Probability and Statistics (DONE!!)
No ratings yet
Lecture 3 - Introduction To Linear Algebra, Probability and Statistics (DONE!!)
41 pages
1 & 2 Linear Algebra and Probability Distribution
No ratings yet
1 & 2 Linear Algebra and Probability Distribution
11 pages
ENGR_108_HW1_2025
No ratings yet
ENGR_108_HW1_2025
3 pages
Lec 3
No ratings yet
Lec 3
43 pages
Computational Techniques Prof. Niket Kaisare Department of Chemical Engineering Indian Institute of Technology Madras
No ratings yet
Computational Techniques Prof. Niket Kaisare Department of Chemical Engineering Indian Institute of Technology Madras
29 pages
Vmls Additional Exercises
No ratings yet
Vmls Additional Exercises
77 pages
R Programming
No ratings yet
R Programming
22 pages
Mml-Book Removed
No ratings yet
Mml-Book Removed
295 pages
Exp 1
No ratings yet
Exp 1
18 pages
Regression 1
No ratings yet
Regression 1
63 pages
Using MATLAB For Linear Algebra
No ratings yet
Using MATLAB For Linear Algebra
10 pages
MA2VC Lecture Notes
100% (1)
MA2VC Lecture Notes
76 pages
Vmls Additional Exercises
No ratings yet
Vmls Additional Exercises
66 pages
Notation Example
No ratings yet
Notation Example
11 pages
Games103 02 Math
No ratings yet
Games103 02 Math
47 pages
Lecture 2_Math (1)
No ratings yet
Lecture 2_Math (1)
39 pages
Matrices and Vectors. - . in A Nutshell: AT Patera, M Yano October 9, 2014
No ratings yet
Matrices and Vectors. - . in A Nutshell: AT Patera, M Yano October 9, 2014
19 pages
Introduction To MATLAB: Prepared By: Mahendra Shukla
No ratings yet
Introduction To MATLAB: Prepared By: Mahendra Shukla
27 pages
DL Unit-1
No ratings yet
DL Unit-1
37 pages
T&S Book
No ratings yet
T&S Book
8 pages
LAFF Week1 Release1
No ratings yet
LAFF Week1 Release1
48 pages
Week 1
No ratings yet
Week 1
42 pages
B C I D J E: Homework 1
No ratings yet
B C I D J E: Homework 1
13 pages
Lecture 3 Introduction to Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction to Linear Algebra (Part 2)
57 pages
Lab2 VectorsAndMatrices
No ratings yet
Lab2 VectorsAndMatrices
1 page
Matrix Calculus
100% (1)
Matrix Calculus
9 pages
Matlab Intro
No ratings yet
Matlab Intro
19 pages
Lecture3
No ratings yet
Lecture3
116 pages
Background Material Crib-Sheet: 1 Probability Theory
No ratings yet
Background Material Crib-Sheet: 1 Probability Theory
4 pages
Machine Learning Notation: 1 Numbers & Arrays 4 Functions
No ratings yet
Machine Learning Notation: 1 Numbers & Arrays 4 Functions
2 pages
Matlab-Signals and Systems
No ratings yet
Matlab-Signals and Systems
26 pages
Functions in MATLAB: X 2.75 y X 2 + 5 X + 7 y 2.8313e+001
No ratings yet
Functions in MATLAB: X 2.75 y X 2 + 5 X + 7 y 2.8313e+001
21 pages
Linear Algebra
No ratings yet
Linear Algebra
19 pages
Multivariate Data Analysis: Universiteit Van Amsterdam
No ratings yet
Multivariate Data Analysis: Universiteit Van Amsterdam
28 pages
Lab Notes: CE 33500, Computational Methods in Civil Engineering
No ratings yet
Lab Notes: CE 33500, Computational Methods in Civil Engineering
10 pages
Mat Lab Eigenvalue
No ratings yet
Mat Lab Eigenvalue
10 pages
Linear Algebra With R
No ratings yet
Linear Algebra With R
26 pages
week1demo_filled
No ratings yet
week1demo_filled
3 pages
An Introduction To Matlab: Part 3
No ratings yet
An Introduction To Matlab: Part 3
4 pages
1 Vectors
No ratings yet
1 Vectors
49 pages
1 Basic Vector/Matrix Structure and Notation
No ratings yet
1 Basic Vector/Matrix Structure and Notation
6 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Math Linear Algebra
No ratings yet
Math Linear Algebra
54 pages
Lab1 - Basics of Matlab
No ratings yet
Lab1 - Basics of Matlab
49 pages
Matlab Tutorial PDF
No ratings yet
Matlab Tutorial PDF
12 pages
Introduction To MATLAB: ES 156 Signals and Systems 2007 Harvard SEAS
No ratings yet
Introduction To MATLAB: ES 156 Signals and Systems 2007 Harvard SEAS
26 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
MDA3S
No ratings yet
MDA3S
22 pages
w2c_central_limit
No ratings yet
w2c_central_limit
1 page
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Part 5
No ratings yet
Part 5
31 pages
Part 4
No ratings yet
Part 4
24 pages
TS Part2
No ratings yet
TS Part2
62 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Part 3
No ratings yet
Part 3
29 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Machine Learning and Pattern Recognition Sampling Based Approximations
No ratings yet
Machine Learning and Pattern Recognition Sampling Based Approximations
3 pages
Gantt Chart
No ratings yet
Gantt Chart
1 page
RAS Configuration
No ratings yet
RAS Configuration
108 pages
Hacking With Kali Linux - A Comprehensive Beginner's Guide to Learn Ethical Hacking. Practical Examples to Learn the Basics of Cybersecurity. Includes Penetration Testing With Kali Linux by ITC ACADEMY
100% (1)
Hacking With Kali Linux - A Comprehensive Beginner's Guide to Learn Ethical Hacking. Practical Examples to Learn the Basics of Cybersecurity. Includes Penetration Testing With Kali Linux by ITC ACADEMY
91 pages
Nitescu Dorin Florian Architecture Portfolio
No ratings yet
Nitescu Dorin Florian Architecture Portfolio
12 pages
Sap S/4 Hana
80% (20)
Sap S/4 Hana
23 pages
Math Q2W2 G-4
No ratings yet
Math Q2W2 G-4
86 pages
Computer MCQ
No ratings yet
Computer MCQ
31 pages
SC 1000 RDM - Eng
No ratings yet
SC 1000 RDM - Eng
3 pages
Specs Fortnite
No ratings yet
Specs Fortnite
61 pages
CSC 318 Solution To Past Q
No ratings yet
CSC 318 Solution To Past Q
4 pages
Products Affected / Serial Numbers Affected:: TP17 212.pdf 08-11-17
No ratings yet
Products Affected / Serial Numbers Affected:: TP17 212.pdf 08-11-17
4 pages
Cse6669 HW 1
No ratings yet
Cse6669 HW 1
8 pages
Art Fundamental Part 4
100% (1)
Art Fundamental Part 4
3 pages
Multithreading Interview Questions: Click Here
No ratings yet
Multithreading Interview Questions: Click Here
37 pages
Individual Assignment
No ratings yet
Individual Assignment
8 pages
ML255 en DS035
No ratings yet
ML255 en DS035
17 pages
Juniper Networks - (SRX) Example - Creating A PCAP Packet Capture On High-End SRX Devices
No ratings yet
Juniper Networks - (SRX) Example - Creating A PCAP Packet Capture On High-End SRX Devices
2 pages
Cradle Design - 00
No ratings yet
Cradle Design - 00
47 pages
[ebook] Supercharge Your Workday with ChatGPT
100% (4)
[ebook] Supercharge Your Workday with ChatGPT
38 pages
Project Synopsis PRIYANKA Final
No ratings yet
Project Synopsis PRIYANKA Final
8 pages
ELE Q4601 v2.0
No ratings yet
ELE Q4601 v2.0
54 pages
Aa - Req - 000131 - Quality Requirements Third Party Design Verification
No ratings yet
Aa - Req - 000131 - Quality Requirements Third Party Design Verification
11 pages
SDRSharp Users Guide
No ratings yet
SDRSharp Users Guide
12 pages
CONSUMER INFORMATION SHEET (Wait List ID: 3171315)
No ratings yet
CONSUMER INFORMATION SHEET (Wait List ID: 3171315)
3 pages
0raspunsuri Oracle Sem 1 PDF
No ratings yet
0raspunsuri Oracle Sem 1 PDF
22 pages
Cricket Project
No ratings yet
Cricket Project
14 pages
Fpe Exam of Practice
No ratings yet
Fpe Exam of Practice
14 pages