CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu

Cross-validation and decision trees were discussed. K-fold cross-validation was explained as a method to validate models that partitions data into K subsets, uses K-1 for training and the remaining subset for validation, and repeats this process K times. Decision trees were defined as models that recursively split data into branches based on attribute tests at internal nodes to classify examples in leaf nodes. The ID3 algorithm for inducing decision trees by greedily choosing the most informative attribute at each split was also covered.

Uploaded by

v2k2apj

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu

Uploaded by

v2k2apj

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CSC411 Tutorial #3

Cross-Validation and Decision Trees

February 3, 2016
Boris Ivanovic*
csc411ta@cs.toronto.edu

*Based on the tutorial given by Erin Grant, Ziyu Zhang, and Ali Punjani in previous years.
Outline for Today
• Cross-Validation
• Decision Trees
• Questions
Cross-Validation
Cross-Validation: Why Validate?
So far:
Learning as Optimization
Goal: Optimize model complexity (for the task)
while minimizing under/overfitting

We want our model to generalize well without

overfitting.
We can ensure this by validating the model.
Types of Validation
Hold-Out Validation: Split data into training and
validation sets.
• Usually 30% as hold-out set.
Original Training Set

Validation

Problems:
• Waste of dataset
• Estimation of error rate might be misleading
Types of Validation
• Cross-Validation: Random subsampling

Figure from
Bishop, C.M.
(2006).
Pattern
Recognition
and Machine
Learning.
Springer

Problem:
• More computationally expensive than hold-
out validation.
Variants of Cross-Validation
Leave-p-out: Use p examples as the validation set, and
the rest as training; repeat for all configurations of
examples.

Problem:
𝑁
• Exhaustive. We have to train and test 𝑝
times,
where N is the # of training examples.
Variants of Cross-Validation
K-fold: Partition training data into K equally
sized subsamples. For each fold, use the other K-
1 subsamples as training data with the last
subsample as validation.
K-fold Cross-Validation
• Think of it like leave-p-out but without
combinatoric amounts of training/testing.

Advantages:
• All observations are used for both training and
validation. Each observation is used for
validation exactly once.
• Non-exhaustive: More tractable than leave-p-
out
K-fold Cross-Validation
Problems:
• Expensive for large N, K (since we train/test K
models on N examples).
– But there are some efficient hacks to save time…
• Can still overfit if we validate too many models!
– Solution: Hold out an additional test set before doing
any model selection, and check that the best model
performs well on this additional set (nested cross-
validation). => Cross-Validception
Practical Tips for Using K-fold Cross-Val
Q: How many folds do we need?
A: With larger K, …
• Error estimation tends to be more accurate
• But, computation time will be greater

In practice:
• Usually use K ≈ 10
• BUT, larger dataset => choose smaller K
Questions about Validation
Decision Trees
Decision Trees: Definition
Goal: Approximate a discrete-valued target function
Representation: A tree, of which
• Each internal (non-leaf) node tests an attribute
• Each branch corresponds to an attribute value
• Each leaf node assigns a class

Example from Mitchell, T

(1997). Machine
Learning. McGraw Hill.
Decision Trees: Induction
The ID3 Algorithm:
while ( training examples are not perfectly classified ) {
choose the “most informative” attribute 𝜃 (that
has not already been used) as the decision
attribute for the next node N (greedy selection).

foreach ( value (discrete 𝜃) / range (continuous 𝜃) )

create a new descendent of N.

sort the training examples to the descendants of N

}
Decision Trees: Example PlayTennis
After first splitting the training
examples on Outlook…

• What should we choose as the next attribute

under the branch Outlook = Sunny?
Choosing the “Most Informative” Attribute
Formulation: Maximize information gain over attributes Y.

H(PlayTennis)

H(PlayTennis | Y)
Information Gain Computation #1

High

Normal

3 2
• IG( PlayTennis | Humidity ) = 0.970 − 0.0 − (0.0)
5 5
= 0.970
Information Gain Computation #2

3 values b/c
Temp takes
on 3 values!

2 2 1
• IG( PlayTennis | Temp ) = 0.970 − 0.0 − 1.0 − (0.0)
5 5 5
= 0.570
Information Gain Computation #3

2 3
• IG( PlayTennis | Wind ) = 0.970 − 1.0 − 0.918
5 5
= 0.019
The Decision Tree for PlayTennis
Questions about Decision Trees
Feedback (Please!)
boris.ivanovic@mail.utoronto.ca
• So… This was my first ever tutorial!
• I would really appreciate some feedback
about my teaching style, pacing, material
descriptions, etc…
• Let me know any way you can, tell me in
person, tell Prof. Fidler, email me, etc…
• Good luck with A1!

BOM Empties V2
No ratings yet
BOM Empties V2
2 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
1 Introduction to ML
No ratings yet
1 Introduction to ML
43 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Lecture3 Eval Metric
No ratings yet
Lecture3 Eval Metric
26 pages
03 Classification
No ratings yet
03 Classification
66 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification
100% (1)
Classification
37 pages
03 Supervised Classification
No ratings yet
03 Supervised Classification
68 pages
Learningintro Notes
No ratings yet
Learningintro Notes
12 pages
Classification
No ratings yet
Classification
33 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
UNIT 2 - Decision Tree - Issues
No ratings yet
UNIT 2 - Decision Tree - Issues
23 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Data Mining and Classification
No ratings yet
Data Mining and Classification
50 pages
Feature Engineering
No ratings yet
Feature Engineering
43 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
decision-trees-Parth-Gupta
No ratings yet
decision-trees-Parth-Gupta
22 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Lec9 - Evaluation - converted
No ratings yet
Lec9 - Evaluation - converted
11 pages
Class1 PDF
No ratings yet
Class1 PDF
18 pages
Machine Learning Fundamentals (Updated)
No ratings yet
Machine Learning Fundamentals (Updated)
42 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
20 pages
Chapter 7
No ratings yet
Chapter 7
74 pages
Validation Model 2024-2
No ratings yet
Validation Model 2024-2
37 pages
Data Mining - Credibility: Evaluating What's Been Learned
No ratings yet
Data Mining - Credibility: Evaluating What's Been Learned
36 pages
Lecture1 - Introduction To Machine Learning
No ratings yet
Lecture1 - Introduction To Machine Learning
39 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
30 pages
Data Splitting and Bias Variance Tradeoff
No ratings yet
Data Splitting and Bias Variance Tradeoff
14 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
02 Linear Regression
No ratings yet
02 Linear Regression
40 pages
Learning AI
No ratings yet
Learning AI
34 pages
Set1 Introduction
No ratings yet
Set1 Introduction
15 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
BMI 704 - Machine Learning Lab
No ratings yet
BMI 704 - Machine Learning Lab
17 pages
Chap2-Some Unique Features of Data Science Projects
No ratings yet
Chap2-Some Unique Features of Data Science Projects
44 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Learning From Observations: Section 1 - 3
No ratings yet
Learning From Observations: Section 1 - 3
26 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
Lesson 2 - Introduction to ML
No ratings yet
Lesson 2 - Introduction to ML
36 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Lecture1 6 PDF
No ratings yet
Lecture1 6 PDF
30 pages
Ensemble Methods
No ratings yet
Ensemble Methods
15 pages
Lecture 12
No ratings yet
Lecture 12
31 pages
Naive Bayes Ons
No ratings yet
Naive Bayes Ons
29 pages
Data Science and ML - End Term
No ratings yet
Data Science and ML - End Term
4 pages
Chapter
100% (1)
Chapter
101 pages
dm4
No ratings yet
dm4
68 pages
Sampling and Sampling Methods Dewi Rosmala
No ratings yet
Sampling and Sampling Methods Dewi Rosmala
21 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Unit 2
No ratings yet
Unit 2
55 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Practice Tests for CASAS Math GOAL 2 Level D, Forms 927M and 928M
From Everand
Practice Tests for CASAS Math GOAL 2 Level D, Forms 927M and 928M
Coaching For Better Learning
No ratings yet
Practice Tests for CASAS Math GOAL 2 Level E, Forms 929M and 930M
From Everand
Practice Tests for CASAS Math GOAL 2 Level E, Forms 929M and 930M
Coaching For Better Learning
No ratings yet
Notes - Best Practices
No ratings yet
Notes - Best Practices
1 page
Python Debugger Cheatsheet
No ratings yet
Python Debugger Cheatsheet
1 page
ID Evdescription Parenth1 Year Period Level Prior Current Timeid
No ratings yet
ID Evdescription Parenth1 Year Period Level Prior Current Timeid
44 pages
ID Evdescription Parenth1 Year Period Level Prior Current Timeid
No ratings yet
ID Evdescription Parenth1 Year Period Level Prior Current Timeid
44 pages
Index 9th Edition Pheur
50% (2)
Index 9th Edition Pheur
34 pages
QB On Application 2
No ratings yet
QB On Application 2
9 pages
Udp BaseComm V1 en
No ratings yet
Udp BaseComm V1 en
67 pages
Class 12 Science Assignment
No ratings yet
Class 12 Science Assignment
39 pages
2017TJS53
No ratings yet
2017TJS53
8 pages
UEAnal. Ch-4
No ratings yet
UEAnal. Ch-4
12 pages
Som - 3
No ratings yet
Som - 3
27 pages
Magnetic Field PDF
No ratings yet
Magnetic Field PDF
29 pages
Computer Organization: Digital Computer: It Is A Fast Electronic Calculating Machine That Accepts Digitized Input
No ratings yet
Computer Organization: Digital Computer: It Is A Fast Electronic Calculating Machine That Accepts Digitized Input
52 pages
Under The Hood The Loadrunner Compiler
No ratings yet
Under The Hood The Loadrunner Compiler
7 pages
Samacheer Kalvi Tenth Book Back One Word Questions English Medium
No ratings yet
Samacheer Kalvi Tenth Book Back One Word Questions English Medium
22 pages
Air Condition Size Calculator (1.1.19)
No ratings yet
Air Condition Size Calculator (1.1.19)
5 pages
Widyowijatnoko, Andry
No ratings yet
Widyowijatnoko, Andry
17 pages
77 TOP Electric Traction - Electrical Engineering Multiple Choice Questions and Answers - MCQs Preparation For Engineering Competitive Exams
50% (2)
77 TOP Electric Traction - Electrical Engineering Multiple Choice Questions and Answers - MCQs Preparation For Engineering Competitive Exams
8 pages
IMO Maths Important Questions Class 1
No ratings yet
IMO Maths Important Questions Class 1
10 pages
Symfony Cookbook 2.3 FR
100% (1)
Symfony Cookbook 2.3 FR
336 pages
Module-2 CC
No ratings yet
Module-2 CC
5 pages
Development of Mill Drives For The Cement Industry
No ratings yet
Development of Mill Drives For The Cement Industry
16 pages
Actual Topics For PHD Study Programme
No ratings yet
Actual Topics For PHD Study Programme
7 pages
ReadingCourse-I-Durga Prasad Khatua
No ratings yet
ReadingCourse-I-Durga Prasad Khatua
33 pages
GL200 SMS Protocol V102 Decrypted.100130920 PDF
No ratings yet
GL200 SMS Protocol V102 Decrypted.100130920 PDF
28 pages
Cahigam Es Grade 5 Q1 Data-Bank-In-Math-5
No ratings yet
Cahigam Es Grade 5 Q1 Data-Bank-In-Math-5
4 pages
I28 UserManual
No ratings yet
I28 UserManual
271 pages
Mab 206 PDF
No ratings yet
Mab 206 PDF
2 pages
Soklan Year 2 Science
No ratings yet
Soklan Year 2 Science
19 pages
HSS Twist Drill Recommended Speeds and Point Angles
No ratings yet
HSS Twist Drill Recommended Speeds and Point Angles
7 pages
116 - Transport Mechanisms in Cells
100% (1)
116 - Transport Mechanisms in Cells
4 pages
Efecto Flash-Lag
No ratings yet
Efecto Flash-Lag
6 pages
Demo Probability of Simple Events Tunay
No ratings yet
Demo Probability of Simple Events Tunay
16 pages