0% found this document useful (0 votes)

81 views

CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML

This document provides an overview of key concepts in probability theory that are important for machine learning. It discusses random variables and sample spaces, probability mass and density functions, events and their probabilities, discrete and continuous distributions, multiple random variables, conditional probability and Bayes' theorem, independence of random variables, expectation, variance, covariance, and important discrete random variables like the Bernoulli and binomial distributions. The document reviews these concepts to lay the foundation for applying probability theory and statistics in machine learning.

Uploaded by

Anonymous d0rFT76B

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views

CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML

Uploaded by

Anonymous d0rFT76B

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

CS 725: Foundations of Machine Learning

Lecture 2. Overview of probability theory for ML

June 2019

1
Probability is Quantification of Uncertainty

• We are trying to build systems that understand and (possibly) interact with
the real world
• We often can not prove something is true, but we can still ask how likely
different outcomes are or ask for the most likely explanation

Probability theory is nothing but common sense reduced to calculation. —

Pierre Laplace, 1812

2
Random Variable and Sample Space

• A random variable X represents the outcome or the state of the world and
takes values from a sample space or domain
• Sample space: the space of all possible outcomes
1. Can be continuous (Example: how much it will rain tomorrow)
2. Or Discrete (Example: results of a coin pair toss; S = {HH, HT , TH, TT }.)
• Pr(x) is the probability mass (density) function
• Assigns a number to each point in sample space
• Non-negative, sums (integrates) to 1
• Intuitively: how often does x occur, how much do we believe in x.

3
Events

• Event (E ) : An event is any subset of the sample space. i.e., E ⊆ S.

Often we describe events using “conditions.” E.g., the event that the two
coin tosses are the same, Esame = {HH, TT }.
Events can be combined using logical operations on these conditions: E.g.,
“the two coin tosses are the same or the second one is T ”:
E = {HH, TT } ∪ {HT , TT } = {HH, TT , HT }.
• Probability of an Event: The total weight assigned to all the elements in the
event by a given probability distribution.
X
Pr(E ) = p(x)
x∈E

4
A review of probability theory

• Note:
• Pr(S) = 1 and Pr(∅) = 0
• Pr(E ) = 1 − Pr(E ), where E = S \ E
• Pr(E1 ∪ E2 ) = Pr(E1 ) + Pr(E2 ) − Pr(E1 ∩ E2 )
• If E1 , E2 , . . . , En are pairwise disjoint events, then
n
[ n
X
Pr ( Ei ) = Pr (Ei )
i=1 i=1

5
Distribution Functions for discrete data

Suppose X is a discrete random variable, that takes m possible values.

6
Continuous Distributions

Let X be a continous random variable, e.g. amount of rains to arrive tomorrow

that takes values between [0,100]. Infinite number of values in this space.
Cannot define Pr(X = x), the distribution function as in the discrete case.

A probability density function (pdf) of a continuous

Z random variable is
f : S → R+ such that for all D ⊆ S, Pr (X ∈ D) = f (x)dx.
D

7
Cumulative Distribution Function

Suppose X is a continuous random variable which takes values from the sample
space R, and has a pdf f . Its cdf is defined as F : R → [0, 1]:
Z a
F (a) = Pr (X ≤ a) = f (x)dx
−∞

Note: pdf for continuous distribution can be obtained by differentiating the cdf
of that random variable:
dF (x)
f (a) = |x=a
dx
8
Multiple Random Variables

• Two random variables, X , Y , e.g.

1. X =the age of a person and Y is his/her height.
2. Y = rainfall today, X = rainfall tomorrow
3. X = word in frame 1 of an audio and Y = word in frame 2 of audio.
• Joint distribution gives probability of two outcomes happening together
For discrete: Pr(X = x, Y = y ).
E.g., S1 = S2 = {H, T } and
Pr((X , Y ) = (H, H)) = 1/3, Pr((X , Y ) = (H, T )) = 1/3,
Pr((X , Y ) = (T , T )) = 1/6, Pr((X , Y ) = (T , H)) = 1/6.

9
Multiple Random Variables (cont)

• For continuous:
If f (x, y ) is a joint pdf, then
Z b Z a
F (a, b) = Pr (X ≤ a, Y ≤ b) = f (x, y )dxdy
−∞ −∞

∂ 2 F (x, y )
f (a, b) = |a,b
∂x∂y
P
• Marginal Distribution: For x ∈ S1 , Pr(X = x) = y ∈S2 Pr((X , Y ) = (x, y )).
Similarly, marginal distribution of Y .

10
Conditional Probability

• Bayes’ Rule, named after Thomas Bayes (1701-1761):

Pr(X |Y ) Pr(Y )
Pr(Y |X ) = ,
Pr(X )

provided Pr(Y ), Pr(X ) > 0.

•
p(y |x)p(x) p(y , x)p(x)
p(x|y ) = =
p(y ) Σx p(y |x)p(x)

• This gives away a way of reversing conditional probabilities

11
Using Bayes’ Theorem

A lab test has a probability 0.95 of detecting a disease when applied to a person
suffering from said disease, and a probability 0.10 of giving a false positive when
applied to a non-sufferer. If 0.5% of the population are sufferers, what is the
probability of a person suffering from the disease if the test is positive?

12
Independence of Random Variables

• Notation: Pr((X , Y ) = (x, y )) abbreviated as p(x, y ), Pr(X = x|Y = y ) as

p(x|y ) etc.
• X , Y are said to be independent (X ⊥ ⊥ Y ) iff for all x, y , the events X = x
and Y = y are independent. i.e., p(x, y ) = p(x)p(y ).
• X , Y are said to be conditionally independent given Z iff for all x, y , z
(with p(z) > 0), p(x, y |z) = p(x|z)p(y |z).

13
Expectation

If X is a random variable taking (say) real values (i.e., S ⊆ R), we can define an
“expected value” for X as:

E (X ) = Σx∈S x Pr(X = x)

For continuous:
R∞
E [X ] = −∞
xf (x)dx

Question: Suppose, f : S → R and Y is the random variable jointly distributed

with X defined as Y = g (X ). Then, what is E (Y )?
P
E (Y ) = x∈S g (x) Pr(X = x)
14
Variance

The variance of a random variable X with E [X ] = µ is defined as:

Var [X ] = E [(X − µ)2 ]

• Var [X ] = E [X 2 ] − (E [X ])2
• Var [X + β] = Var [X ] and Var [αX ] = α2 Var [X ]
P P
• If X1 , · · · , Xn are pairwise independent, then Var [ i Xi ] = i Var [Xi ]
(Proof HW)
• If X1 , · · · , Xn are pairwise independent, each with variance σ 2 , then,
Var [ n1 i Xi ] = σ 2 /n
P

15
Covariance

For random variables X and Y, covariance is defined as:

Cov [X , Y ] = E [(X − E (X ))(Y − E (Y ))] = E [XY ] − E [X ]E [Y ]

If X and Y are independent then their covariance is 0, since in that case

E [XY ] = E [X ]E [Y ]

16
Important Discrete Random Variables

Bernoulli Random Variable: It is a discrete random variable taking values 0,1

Say, Pr [Xi = 0] = 1 − q where q ∈ [0, 1]

Then Pr [Xi = 1] = q

• E [X ] = (1 − q) × 0 + q × 1 = q
• Var [X ] = q − q 2 = q(1 − q)

Note: It represents the probability of success in a random event.

For example: Coin toss experiment has Pr [Head] = Pr [Xi = 1] = q
17
Binomial Random Variable It is a discrete variable where the distribution is of
number of 1’s in a series of n experiments, each with {0,1} value, with the
probability that the outcome of a particular experiment is 1 being q.

A binomial distribution is the distribution of n-times repeated bernoulli trials. Eg:

Distribution of number of heads when a coin is tossed n times.

1. Pr [X = k] = kn q k (1 − q)n−k

P
2. E [X ] = i E [Yi ] where Yi is a bernoulli random variable ⇒ E [X ] = nq
P
3. Var [X ] = i Var [Yi ] (since Yi ’s are independent) ⇒ Var [X ] = nq(1 − q)

18
Normal (Gaussian) Distribution

• It is a popular continuous distribution

1
exp − 2σ1 2 (x − µ)2

N (x|µ, σ 2 ) = √2πσ
• µ is the mean and σ 2 is the variance

• Exercise: Verify the mean and variance. For e.g.

R∞ 1 ?
E [X ] = −∞ x √2πσ exp (− 2σ1 2 (x − µ)2 )dx = µ

19
1-D Gaussian distribution
23/01/2017 https://upload.wikimedia.org/wikipedia/commons/7/74/Normal_Distribution_PDF.svg

1.0

μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0,
0.8
μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5,
0.6
φμ,σ (x)
2

0.4

0.2

0.0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x

20
Normal (Gaussian) Distribution

• It is a popular continuous distribution

1
exp − 2σ1 2 (x − µ)2

N (x|µ, σ 2 ) = √2πσ
• µ is the mean and σ 2 is the variance

• Exercise: Verify the mean and variance. For e.g.

R∞ 1 ?
E [X ] = −∞ x √2πσ exp (− 2σ1 2 (x − µ)2 )dx = µ

• Multivariate Gaussian (d-dim)

f (x|µ, Σ) = (2π)−d/2 |Σ|−1/2 exp {− 12 (x − µ)T Σ−1 (x − µ)}
• x is now a vector, µ is the mean vector and Σ is the co-variance matrix

21
2-D Gaussian distribution

1
0.8
0.6
0.4
0.2
0 3
2
1 3
0 2
1
-1 0
-2 -1
-2
-3 -3
22
Properties of Normal Distribution

• All marginals of a Gaussian are again Gaussian

• Any conditional of a Gaussian is Gaussian
• The product of two Gaussians is again Gaussian
• Even the sum of two independent Gaussian r.v.’s is a Gaussian

Note: Many of the standard distributions belong to the family of exponential

distributions

• Bernoulli, binomial/multinomial, Poisson, Normal (Gaussian), Beta/Dirichlet

...
• Share many important properties - e.g. They have a conjugate prior.
23

St2334-Cheatsheet Organized
No ratings yet
St2334-Cheatsheet Organized
2 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Probability Distributions
No ratings yet
Probability Distributions
56 pages
Lecture 5: Statistical Independence, Discrete Random Variables
No ratings yet
Lecture 5: Statistical Independence, Discrete Random Variables
4 pages
Random Variables: - Definition of Random Variable
No ratings yet
Random Variables: - Definition of Random Variable
29 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
CHP 5
No ratings yet
CHP 5
63 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
3171617_Probability_360
No ratings yet
3171617_Probability_360
74 pages
Probability_FoundationalMathofAI_S24
No ratings yet
Probability_FoundationalMathofAI_S24
7 pages
PTSP
No ratings yet
PTSP
101 pages
Random Variables: Petter Mostad 2005.09.19
No ratings yet
Random Variables: Petter Mostad 2005.09.19
24 pages
SI_Chapter-1
No ratings yet
SI_Chapter-1
30 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Stat 116
No ratings yet
Stat 116
7 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
PTSP PPT
No ratings yet
PTSP PPT
74 pages
Probability
No ratings yet
Probability
28 pages
الشيت الثامن الاحصاء
No ratings yet
الشيت الثامن الاحصاء
19 pages
8366probability Summary Sheet
No ratings yet
8366probability Summary Sheet
4 pages
Lecture03 Probability Review
No ratings yet
Lecture03 Probability Review
48 pages
Sample Space.: How Many Different Equally Likely Possibilities Are There?
No ratings yet
Sample Space.: How Many Different Equally Likely Possibilities Are There?
12 pages
Probability Random Variables Results PV
No ratings yet
Probability Random Variables Results PV
49 pages
MM3&4 - Probability and Distributions Summary Notes
No ratings yet
MM3&4 - Probability and Distributions Summary Notes
31 pages
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
No ratings yet
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
113 pages
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
No ratings yet
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
26 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
Refresher Probabilities Statistics PDF
No ratings yet
Refresher Probabilities Statistics PDF
3 pages
Probability Review
No ratings yet
Probability Review
12 pages
3_prob-review
No ratings yet
3_prob-review
77 pages
Stat 350 Study Guide
No ratings yet
Stat 350 Study Guide
37 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
Chapter 7 Eng
No ratings yet
Chapter 7 Eng
59 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Lec3 Random Variables and Distributions
No ratings yet
Lec3 Random Variables and Distributions
18 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (1)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Probability
No ratings yet
Probability
69 pages
ML DL AI Cheatsheet
No ratings yet
ML DL AI Cheatsheet
52 pages
POC Unit-1 Final
No ratings yet
POC Unit-1 Final
25 pages
Econometric Chap1
No ratings yet
Econometric Chap1
91 pages
Chapter2 Probability
No ratings yet
Chapter2 Probability
45 pages
lec23 random variable - Copy
No ratings yet
lec23 random variable - Copy
16 pages
WK 1 Appendix Review
No ratings yet
WK 1 Appendix Review
26 pages
2 Random Variables
No ratings yet
2 Random Variables
36 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Statistics Concepts: An Overview of Upper-Division Statistics With R
No ratings yet
Statistics Concepts: An Overview of Upper-Division Statistics With R
69 pages
Review MidtermII Summer09
No ratings yet
Review MidtermII Summer09
51 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Unit-Ii: Probability I: Introductory Ideas
No ratings yet
Unit-Ii: Probability I: Introductory Ideas
28 pages
Probability notes
No ratings yet
Probability notes
19 pages
Stochastic Processes SM
No ratings yet
Stochastic Processes SM
82 pages
Traffic Lect04
No ratings yet
Traffic Lect04
50 pages
Introduction and Some Basics
No ratings yet
Introduction and Some Basics
26 pages
5 Probability and Probability Distribution
No ratings yet
5 Probability and Probability Distribution
33 pages
Binomial and Hypergeometric PDF
No ratings yet
Binomial and Hypergeometric PDF
12 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Magic of Definite Integral
No ratings yet
Magic of Definite Integral
88 pages
FALLSEM2018-19 - MAT5009 - TH - TT531 - VL2018191004951 - Reference Material I - 01 - MAT 5009 - ADVANCED COMPUTER ARITHMETIC PDF
No ratings yet
FALLSEM2018-19 - MAT5009 - TH - TT531 - VL2018191004951 - Reference Material I - 01 - MAT 5009 - ADVANCED COMPUTER ARITHMETIC PDF
13 pages
Phy 1
100% (1)
Phy 1
50 pages
Trigonometry 10
No ratings yet
Trigonometry 10
8 pages
Practical Manual
No ratings yet
Practical Manual
14 pages
Further Area and Solids of Revolution (Ex1)
No ratings yet
Further Area and Solids of Revolution (Ex1)
18 pages
Kleen Theorem PDF
No ratings yet
Kleen Theorem PDF
59 pages
SOM IMP Question For Winter 2021
No ratings yet
SOM IMP Question For Winter 2021
17 pages
Digital SAT Practice Test 1 Math
No ratings yet
Digital SAT Practice Test 1 Math
6 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
Iit Bombay PHD Thesis Download
100% (3)
Iit Bombay PHD Thesis Download
6 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
Upsc Cse Free Test Series 2024
No ratings yet
Upsc Cse Free Test Series 2024
6 pages
Ktu Syllabus
No ratings yet
Ktu Syllabus
87 pages
Relativity - The Fools Gold of Physics
100% (1)
Relativity - The Fools Gold of Physics
60 pages
Research Paper (Banana Peel)
No ratings yet
Research Paper (Banana Peel)
8 pages
Course reg 300 level first
No ratings yet
Course reg 300 level first
1 page
6 Itf DPP Genetry
No ratings yet
6 Itf DPP Genetry
12 pages
Applied Acoustics: Masoud Golzari, Ali Asghar Jafari
No ratings yet
Applied Acoustics: Masoud Golzari, Ali Asghar Jafari
22 pages
Acr Math Month Culmination
No ratings yet
Acr Math Month Culmination
4 pages
TOS in Math
No ratings yet
TOS in Math
5 pages
Chapter 4
No ratings yet
Chapter 4
16 pages
Paper - Contribution of LLR To Fundamental Astronomy - Chapront - 2001
No ratings yet
Paper - Contribution of LLR To Fundamental Astronomy - Chapront - 2001
6 pages
History of Evaluation in Nigeria
No ratings yet
History of Evaluation in Nigeria
6 pages
Examples of Markov Chains: - Random Walk On A Line
No ratings yet
Examples of Markov Chains: - Random Walk On A Line
23 pages
Ig WB (3) Trigonometry.1
No ratings yet
Ig WB (3) Trigonometry.1
122 pages
Grade 8 Singapore Math
100% (1)
Grade 8 Singapore Math
8 pages
Linear Optimization Practice 4
No ratings yet
Linear Optimization Practice 4
8 pages
PDC Module2
No ratings yet
PDC Module2
88 pages
P-9-T1 - 02 Kinematics PDF
No ratings yet
P-9-T1 - 02 Kinematics PDF
51 pages