Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
44 views

CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu

Cross-validation and decision trees were discussed. K-fold cross-validation was explained as a method to validate models that partitions data into K subsets, uses K-1 for training and the remaining subset for validation, and repeats this process K times. Decision trees were defined as models that recursively split data into branches based on attribute tests at internal nodes to classify examples in leaf nodes. The ID3 algorithm for inducing decision trees by greedily choosing the most informative attribute at each split was also covered.

Uploaded by

v2k2apj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu

Cross-validation and decision trees were discussed. K-fold cross-validation was explained as a method to validate models that partitions data into K subsets, uses K-1 for training and the remaining subset for validation, and repeats this process K times. Decision trees were defined as models that recursively split data into branches based on attribute tests at internal nodes to classify examples in leaf nodes. The ID3 algorithm for inducing decision trees by greedily choosing the most informative attribute at each split was also covered.

Uploaded by

v2k2apj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CSC411 Tutorial #3

Cross-Validation and Decision Trees


February 3, 2016
Boris Ivanovic*
csc411ta@cs.toronto.edu

*Based on the tutorial given by Erin Grant, Ziyu Zhang, and Ali Punjani in previous years.
Outline for Today
• Cross-Validation
• Decision Trees
• Questions
Cross-Validation
Cross-Validation: Why Validate?
So far:
Learning as Optimization
Goal: Optimize model complexity (for the task)
while minimizing under/overfitting

We want our model to generalize well without


overfitting.
We can ensure this by validating the model.
Types of Validation
Hold-Out Validation: Split data into training and
validation sets.
• Usually 30% as hold-out set.
Original Training Set

Validation

Problems:
• Waste of dataset
• Estimation of error rate might be misleading
Types of Validation
• Cross-Validation: Random subsampling

Figure from
Bishop, C.M.
(2006).
Pattern
Recognition
and Machine
Learning.
Springer

Problem:
• More computationally expensive than hold-
out validation.
Variants of Cross-Validation
Leave-p-out: Use p examples as the validation set, and
the rest as training; repeat for all configurations of
examples.

Problem:
𝑁
• Exhaustive. We have to train and test 𝑝
times,
where N is the # of training examples.
Variants of Cross-Validation
K-fold: Partition training data into K equally
sized subsamples. For each fold, use the other K-
1 subsamples as training data with the last
subsample as validation.
K-fold Cross-Validation
• Think of it like leave-p-out but without
combinatoric amounts of training/testing.

Advantages:
• All observations are used for both training and
validation. Each observation is used for
validation exactly once.
• Non-exhaustive: More tractable than leave-p-
out
K-fold Cross-Validation
Problems:
• Expensive for large N, K (since we train/test K
models on N examples).
– But there are some efficient hacks to save time…
• Can still overfit if we validate too many models!
– Solution: Hold out an additional test set before doing
any model selection, and check that the best model
performs well on this additional set (nested cross-
validation). => Cross-Validception
Practical Tips for Using K-fold Cross-Val
Q: How many folds do we need?
A: With larger K, …
• Error estimation tends to be more accurate
• But, computation time will be greater

In practice:
• Usually use K ≈ 10
• BUT, larger dataset => choose smaller K
Questions about Validation
Decision Trees
Decision Trees: Definition
Goal: Approximate a discrete-valued target function
Representation: A tree, of which
• Each internal (non-leaf) node tests an attribute
• Each branch corresponds to an attribute value
• Each leaf node assigns a class

Example from Mitchell, T


(1997). Machine
Learning. McGraw Hill.
Decision Trees: Induction
The ID3 Algorithm:
while ( training examples are not perfectly classified ) {
choose the “most informative” attribute 𝜃 (that
has not already been used) as the decision
attribute for the next node N (greedy selection).

foreach ( value (discrete 𝜃) / range (continuous 𝜃) )


create a new descendent of N.

sort the training examples to the descendants of N


}
Decision Trees: Example PlayTennis
After first splitting the training
examples on Outlook…

• What should we choose as the next attribute


under the branch Outlook = Sunny?
Choosing the “Most Informative” Attribute
Formulation: Maximize information gain over attributes Y.

H(PlayTennis)

H(PlayTennis | Y)
Information Gain Computation #1

High

Normal

3 2
• IG( PlayTennis | Humidity ) = 0.970 − 0.0 − (0.0)
5 5
= 0.970
Information Gain Computation #2

3 values b/c
Temp takes
on 3 values!

2 2 1
• IG( PlayTennis | Temp ) = 0.970 − 0.0 − 1.0 − (0.0)
5 5 5
= 0.570
Information Gain Computation #3

2 3
• IG( PlayTennis | Wind ) = 0.970 − 1.0 − 0.918
5 5
= 0.019
The Decision Tree for PlayTennis
Questions about Decision Trees
Feedback (Please!)
boris.ivanovic@mail.utoronto.ca
• So… This was my first ever tutorial!
• I would really appreciate some feedback
about my teaching style, pacing, material
descriptions, etc…
• Let me know any way you can, tell me in
person, tell Prof. Fidler, email me, etc…
• Good luck with A1!

You might also like