SML_Lecture2
SML_Lecture2
1
Additional reading
2
Probably approximately correct
learning
Probably Approximate Correct Learning framework
Pr (R(hS ) ≤ ) ≥ 1 − δ
for any distribution D, for arbitrary , δ > 0 and sample size m = |S|
that grows polynomially in 1/,1/δ
for any concept C ∈ C
In addition, if A runs in time polynomial in m,1/,and 1/δ the class
is called efficiently PAC learnable
Learned hypothesis is Probably (1 − δ) Approximately Correct ().
4
Interpretation
Pr (R(hS ) ≤ ) ≥ 1 − δ
6
Example: learning axis-aligned
rectangles
Learning setup
7
Example: learning axis-aligned rectangles
8
Example: learning axis-aligned rectangles
Questions:
Is the class of axis-aligned
rectangles PAC-learnable?
How much training data will
be needed to learn?
Need to bound the risk of
outputting a bad hypothesis
R(R0 ) > with high
probability 1 − δ
We can assume PrD (R) >
(otherwise R(R0 ) ≤
trivially)
9
Example: learning axis-aligned rectangles
R(R0 ) ≤
Thus, if R(R0 ) > then R0 must miss at least one of the four regions
10
Example: learning axis-aligned rectangles
Events A =
{R0 intersects all four rectangles r1 , . . . , r4 },
B = {R(R0 ) ≤ }, satisfy A ⊆ B
Complement events AC =
{R0 misses at least one rectangle },
BC = {R(R0 ) > } satisfy BC ⊆ AC
BC is the bad event (high
generalization error), we want it to
have low probability
In probability space, we have
Pr (BC ) ≤ Pr (AC )
Our task is to upper bound Pr (AC )
11
Example: learning axis-aligned rectangles
4 exp(−m/4) < δ
⇔ m ≥ 4/ log 4/δ
13
Plotting the behaviour of bound
14
Plotting the behaviour of the bound
15
Generalization error bound vs. expected test error
16
Half-time poll: Personalized email spam filtering system
Company is developing a personalized email spam filtering system. The
system is tuned personally for each customer using the customers data on
top of the existing training data. The company has a choice of three
machine learning algorithms, with different performance characteristics.
So far, the company has tested three different algorithms on a small set
of test users.
Which algorithm should the company choose?
1. Algorithm 1, which guarantees error rate of less than 10% for 99%
of the future customer base
2. Algorithm 2, which guarantees error rate of less than 5% for 90% of
the future customer base
3. Algorithm 3, which empirically has error rate of 1% on the current
user base
17
Example: Boolean conjunctions
18
Finite hypothesis class - consistent case
1 1
m≥ (log(|H|) + log( ))
δ
An equivalent generalization error bound:
1 1
R(h) ≤ (log(|H|) + log( ))
m δ
Holds for any finite hypothesis class assuming there is a consistent
hypothesis, one with zero empirical risk
Extra term compared to the rectangle learning example is the term
1
(log(|H|))
The more hypotheses there are in H, the more training examples are
needed
19
Example: Boolean conjunctions
20
Plotting the bound for Aldo’s problem using boolean conjunc-
tions
21
Arbitrary boolean formulae
23
Proof outline of the PAC bound
for finite hypothesis classes
Proof outline (Mohri et al., 2018)
24
Proof outline
25
Proof outline
≤ |H|(1 − )m
We have established
Set the right-hand side equal to δ and solve for m to obtain the
bound:
27
Finite hypothesis class - inconsistent case
28
Summary
29