CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu
CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu
*Based on the tutorial given by Erin Grant, Ziyu Zhang, and Ali Punjani in previous years.
Outline for Today
• Cross-Validation
• Decision Trees
• Questions
Cross-Validation
Cross-Validation: Why Validate?
So far:
Learning as Optimization
Goal: Optimize model complexity (for the task)
while minimizing under/overfitting
Validation
Problems:
• Waste of dataset
• Estimation of error rate might be misleading
Types of Validation
• Cross-Validation: Random subsampling
Figure from
Bishop, C.M.
(2006).
Pattern
Recognition
and Machine
Learning.
Springer
Problem:
• More computationally expensive than hold-
out validation.
Variants of Cross-Validation
Leave-p-out: Use p examples as the validation set, and
the rest as training; repeat for all configurations of
examples.
Problem:
𝑁
• Exhaustive. We have to train and test 𝑝
times,
where N is the # of training examples.
Variants of Cross-Validation
K-fold: Partition training data into K equally
sized subsamples. For each fold, use the other K-
1 subsamples as training data with the last
subsample as validation.
K-fold Cross-Validation
• Think of it like leave-p-out but without
combinatoric amounts of training/testing.
Advantages:
• All observations are used for both training and
validation. Each observation is used for
validation exactly once.
• Non-exhaustive: More tractable than leave-p-
out
K-fold Cross-Validation
Problems:
• Expensive for large N, K (since we train/test K
models on N examples).
– But there are some efficient hacks to save time…
• Can still overfit if we validate too many models!
– Solution: Hold out an additional test set before doing
any model selection, and check that the best model
performs well on this additional set (nested cross-
validation). => Cross-Validception
Practical Tips for Using K-fold Cross-Val
Q: How many folds do we need?
A: With larger K, …
• Error estimation tends to be more accurate
• But, computation time will be greater
In practice:
• Usually use K ≈ 10
• BUT, larger dataset => choose smaller K
Questions about Validation
Decision Trees
Decision Trees: Definition
Goal: Approximate a discrete-valued target function
Representation: A tree, of which
• Each internal (non-leaf) node tests an attribute
• Each branch corresponds to an attribute value
• Each leaf node assigns a class
H(PlayTennis)
H(PlayTennis | Y)
Information Gain Computation #1
High
Normal
3 2
• IG( PlayTennis | Humidity ) = 0.970 − 0.0 − (0.0)
5 5
= 0.970
Information Gain Computation #2
3 values b/c
Temp takes
on 3 values!
2 2 1
• IG( PlayTennis | Temp ) = 0.970 − 0.0 − 1.0 − (0.0)
5 5 5
= 0.570
Information Gain Computation #3
2 3
• IG( PlayTennis | Wind ) = 0.970 − 1.0 − 0.918
5 5
= 0.019
The Decision Tree for PlayTennis
Questions about Decision Trees
Feedback (Please!)
boris.ivanovic@mail.utoronto.ca
• So… This was my first ever tutorial!
• I would really appreciate some feedback
about my teaching style, pacing, material
descriptions, etc…
• Let me know any way you can, tell me in
person, tell Prof. Fidler, email me, etc…
• Good luck with A1!