Validation Slides

The document discusses techniques for estimating the performance of machine learning models, including validation, resampling methods like cross validation and bootstrapping, and estimating bias and variance. Cross validation techniques like k-fold and leave-one-out are described, as well as how to use resampling to estimate a model's bias and variance. The text also recommends splitting data into separate training, validation, and test sets.

Uploaded by

Richa Halder

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Validation Slides

Uploaded by

Richa Halder

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Machine Learning

and Network Analysis

MA4207
Performance Estimation
Pattern recognition techniques usually have one or more free parameters
◼ The number of neighbors in a kNN classifier
◼ The feature dimension in feature extraction or selection process
Model Selection: How do we select the “optimal” parameter(s) for a given
classification problem?
Validation: After model selection, how to estimate its true error rate?
◼ The true error rate is the classifier’s error rate when tested on the ENTIRE POPULATION
◼ Choose the model that provides the lowest error rate on the entire population
Only a finite set of examples is available
◼ Smaller training size than requirement (e.g. Biomedical data)
◼ Data collection and storage
Entire training data to select the best classifier gives rise to two problems
◼ Overfitting
◼ Computed error rate typically optimistic (~100% in some cases)
Performance Estimation

◼ Training
◼ Model selection
◼ Resampling Techniques
◼ Cross Validation
◼ Bootstrap
◼ Jackknife
The holdout method
Split dataset into two groups
◼ Training set: used to train the classifier
◼ Test set: used to estimate the error rate of the trained classifier
The holdout method has two basic drawbacks
◼ In sparse dataset , setting aside a portion of the dataset for testing not feasible
◼ A single train-and-test experiment, the holdout estimate of error rate dependant
on split of test and train
These limitations of the holdout can be overcome with a family of resampling
methods at the expense of higher computational cost
◼ Cross validation
◼ Random subsampling
◼ K-fold cross-validation
◼ Leave-one-out cross-validation
◼ Bootstrap
◼ Jackknife
Random subsampling

Random subsampling performs K data splits of the entire dataset

◼ Each data split randomly selects a (fixed) number of examples without
replacement
◼ For each data split we retrain the classifier from scratch with the training
examples and then estimate 𝐸𝑖 with the test examples
◼ The true error estimate is obtained as the average of the separate estimates 𝐸𝑖
◼ This estimate is significantly better than the holdout estimate
K-fold Cross Validation

Create a K-fold partition of the dataset

◼ For each of 𝐾 experiments, use 𝐾−1 folds for training and a different fold for
testing
◼ K-Fold cross validation is similar to random subsampling
◼ –The advantage of KFCV is that all the examples in the dataset are eventually used
for both training and testing
◼ –As before, the true error is estimated as the average error rate on test examples
Leave-one-out cross validation
◼ LOO is the degenerate case of Kfold CV, where K is chosen as the total number of
examples
◼ For a dataset with 𝑁 examples, perform 𝑁 experiments
◼ For each experiment use 𝑁−1 examples for training and the remaining example
for testing
◼ As usual, the true error is estimated as the average error rate on test examples
Selecting number of fold (k)
Large k
◼ The bias of the true error rate estimator will be small (the estimator will be very
accurate)
◼ The variance of the true error rate estimator will be large
◼ The computational time will be very large as well (many experiments)

Small k
◼ The number of experiments and, therefore, computation time are reduced
◼ The variance of the estimator will be small
◼ The bias of the estimator will be large (conservative or larger than the true error rate)

Choice for K depends on the size of the dataset

◼ For large datasets, even 3-fold cross validation will be quite accurate
◼ For very sparse datasets, we may have to use leave-one-out in order to train on as many
examples as possible

Commonly used K=10

Bootstrap
Resampling technique with replacement
◼ Randomly select (with replacement) 𝑁 examples from Data set of size N and use this set
for training
◼ Remaining examples not selected for training are used for testing
◼ Differs from fold to fold
◼ Repeat this process for a specified number of folds (𝐾)
◼ The true error is estimated as the average error rate on test data
CV vs Bootstrap
Compared to basic CV, the bootstrap increases the variance
that can occur in each fold [Efron and Tibshirani, 1993]
◼ desirable property since it is a more realistic simulation of the real-life
experiment from which our dataset was obtained
Consider a classification problem with 𝐶 classes, a total of 𝑁
examples and 𝑁𝑖 examples for each class 𝝎𝑖
◼ The a priori probability of choosing an example from class 𝜔i is 𝑁𝑖/𝑁
◼ Once we choose an example from class 𝜔i, if we do not replace it for the next
selection, then the a priori probabilities will have changed since the
probability of choosing an example from class 𝜔i will now be 𝑁i−1/𝑁
◼ Thus, sampling with replacement preserves the a priori probabilities of the
classes throughout the random selection process *
◼ An additional benefit is that the bootstrap can provide accurate measures of
BOTH the bias and variance of the true error estimate

*Stratified CV used to overcome problem of maintaining similar prior

Jackknife
◼ One of the earliest resampling methods
◼ Leave one sample out and compute model
parameters on the remaining data
◼ Computationally simple
◼ Mean Estimate

◼ Variance Estimate

Linear approximation of bootstrap, only works well for linear statistics (e.g.,
mean). It fails to give accurate estimation for non-smooth (e.g., median) and
nonlinear (e.g., correlation coefficient) cases
Bias and Variance of a Statistical Estimate
Problem: estimate parameter 𝛼 of unknown distribution 𝐺
◼ To emphasize the fact that 𝛼 concerns 𝐺, we write 𝛼(𝐺)
Solution
◼ We collect 𝑁 examples 𝑋={𝑥1 ,𝑥2 …𝑥𝑁} from 𝐺
◼ 𝑋 defines a discrete distribution 𝐺’ with mass 1/𝑁 at each example
◼ We compute the statistic 𝛼’=𝛼(𝐺’) as an estimator of 𝛼(𝐺)
◼ e.g., 𝛼(𝐺’) may is the estimate of the true error rate for a classifier
How good is this estimator?
◼ Bias: How much does it deviate from the true value

◼ Variance: how much variability does it show for different samples

Bias and variance of the sample mean

Bias: The sample mean is known to be an unbiased estimator

variance?
the standard deviation of the sample mean is equal to

This term is also known in statistics as the STANDARD ERROR

Unfortunately, there is no such a neat algebraic formula for almost any
estimate other than the sample mean
Bootstrap Estimates
The bootstrap allows us to estimate bias and variance for practically any statistical
estimate, be it a scalar or vector (matrix)
◼ Here we will only describe the estimation procedure
◼ For more details refer to “Advanced algorithms for neural networks” [Masters,
1995], which provides an excellent introduction
Approach
◼ Consider a dataset of 𝑁 examples 𝑋 ={𝑥1 ,𝑥2 …𝑥𝑁} from distribution 𝐺
◼ This dataset defines a discrete distribution 𝐺’
◼ Compute 𝛼’=𝛼(𝐺’) as our initial estimate of 𝛼(𝐺)
◼ Let {𝑥1∗,𝑥2∗,…,𝑥𝑁∗} be a bootstrap dataset drawn from 𝑋 ={𝑥1 ,𝑥2 …𝑥𝑁}
◼ Estimate parameter 𝛼 using this bootstrap dataset 𝛼∗(𝐺∗)
◼ Generate 𝐾 bootstrap datasets and obtain 𝐾 estimates
{𝛼∗1(𝐺∗),𝛼∗2(𝐺∗)…,𝛼∗𝐾(𝐺∗)}
The bias and variance estimates of 𝛼′are

◼ The effect of generating a bootstrap dataset from the distribution 𝐺’ is similar to

the effect of obtaining the dataset 𝑋={𝑥1,𝑥2 …,𝑥𝑁} from the original distribution 𝐺
◼ In other words, the distribution {𝛼∗1(𝐺∗),𝛼∗2(𝐺∗)…,𝛼∗𝐾(𝐺∗)} is related to the initial
estimate 𝛼′ in the same fashion that multiple estimates 𝛼′ are related to the true
value 𝛼
Example
◼ Assume a small dataset 𝑥={3,5,2,1,7}, and we want to compute the bias and
variance of the sample mean 𝛼′=3.6
We generate a number of bootstrap samples (three in this case)
◼ Assume that the first bootstrap yields the dataset {7,3,2,3,1}
◼ the sample mean 𝛼∗1=3.2
◼ The second bootstrap sample yields the dataset {5,1,1,3,7}
◼ the sample mean 𝛼∗2=3.4
◼ The third bootstrap sample yields the dataset {2,2,7,1,3}
◼ the sample mean 𝛼∗3=3.0
◼ average these estimates and obtain an average of 𝛼∗∘=3.2
What are the bias and variance of the sample mean 𝜶’?
𝑩𝒊𝒂𝒔(𝜶’) = 𝟑.𝟐 − 𝟑.𝟔 = −𝟎.𝟒
◼ Resampling introduced a downward bias on the mean, so we would be inclined to use 3.6 + 0.4 =
4.0 as an unbiased estimate of 𝛼
𝑽𝒂𝒓(𝜶’) =𝟏/𝟐∗[𝟑.𝟐−𝟑.𝟐𝟐+𝟑.𝟒−𝟑.𝟐𝟐+𝟑.𝟎−𝟑.𝟐𝟐] = 𝟎.𝟎𝟒
◼ NOTES
Example given for the sample mean, but 𝛼 could be any other statistical operator!
How many bootstrap samples should we use?
As a rule of thumb, several hundred resamples will suffice for most problems
Three Way Data Split
Simultaneously, the data should be divided into three disjoint sets [Ripley, 1996]
◼ Training set: used for learning, e.g., to fit the parameters of the classifier
◼ Validation set: used to select among several trained classifiers
◼ Test set: used only to assess the performance of a fully-trained classifier
Why separate test and validation sets?
◼ The error rate of the final model on validation data will be biased (smaller than the true error rate)
since the validation set is used to select the final model
◼ After assessing the final model on the test set, YOU MUST NOT tune the model any further!
Procedure outline

This outline assumes a holdout method

If CV or bootstrap are used, steps 3 and 4 have to be repeated for each of the K folds

Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
Data Science Q&A - Latest Ed (2020) - 5 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 5 - 1
2 pages
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
No ratings yet
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
16 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Cofusion Matrix Cross- Validation
No ratings yet
Cofusion Matrix Cross- Validation
34 pages
Lecture Note #6_PEC-CS701E
No ratings yet
Lecture Note #6_PEC-CS701E
11 pages
ML m5_2
No ratings yet
ML m5_2
24 pages
Chapitre_2
No ratings yet
Chapitre_2
26 pages
Bootstrap Student Presentation
100% (1)
Bootstrap Student Presentation
36 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Cross Validation
No ratings yet
Cross Validation
6 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Chapitre_2-converti
No ratings yet
Chapitre_2-converti
26 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Ensemble
No ratings yet
Ensemble
2 pages
Sampling Methods in Machine Learning
No ratings yet
Sampling Methods in Machine Learning
13 pages
4-ResamplingMethods 1
No ratings yet
4-ResamplingMethods 1
23 pages
Lecture Testmodels
No ratings yet
Lecture Testmodels
31 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Stat-340 - Assignment 4 - 2014 Spring Term: Part 1 - Breakfast Cereals - Easy
No ratings yet
Stat-340 - Assignment 4 - 2014 Spring Term: Part 1 - Breakfast Cereals - Easy
16 pages
Ensemble Learning
No ratings yet
Ensemble Learning
16 pages
5 - Model For Predictions - ML
No ratings yet
5 - Model For Predictions - ML
52 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
Lecture 5 Evaluation_Classifer
No ratings yet
Lecture 5 Evaluation_Classifer
61 pages
Resampled Inference Resampled Inference
No ratings yet
Resampled Inference Resampled Inference
21 pages
11 - Model Eval and Tuning
No ratings yet
11 - Model Eval and Tuning
17 pages
ML 5
No ratings yet
ML 5
14 pages
Chap 11
No ratings yet
Chap 11
11 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
Cross-Validation and Model Selection
No ratings yet
Cross-Validation and Model Selection
46 pages
MI_Unit 5
No ratings yet
MI_Unit 5
72 pages
ENSEMBLE LEARNING-1
No ratings yet
ENSEMBLE LEARNING-1
61 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Model Validation-Tutorial
No ratings yet
Model Validation-Tutorial
35 pages
Bootstrapping Methods
No ratings yet
Bootstrapping Methods
4 pages
MIS410 Lecture9-10
No ratings yet
MIS410 Lecture9-10
40 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Bootstrap Explained
No ratings yet
Bootstrap Explained
1 page
ensemble
No ratings yet
ensemble
33 pages
AIDS2-QB-UT2
No ratings yet
AIDS2-QB-UT2
24 pages
Predictive Accuracy Evaluation
No ratings yet
Predictive Accuracy Evaluation
13 pages
One Sample T Test - SPSS
No ratings yet
One Sample T Test - SPSS
23 pages
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
No ratings yet
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
21 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Generalization Error
No ratings yet
Generalization Error
9 pages
Model Cross Validation
No ratings yet
Model Cross Validation
11 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
6.mass Balancing
100% (1)
6.mass Balancing
36 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
10 Minute Guide to Orthogonal Array Test Strategy
From Everand
10 Minute Guide to Orthogonal Array Test Strategy
Rajeev Nair Raman
No ratings yet
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Sludge Analysis in IKA Bomb Calorimeter
No ratings yet
Sludge Analysis in IKA Bomb Calorimeter
2 pages
Couple Therapy Disorganized Attachment Dynamics
No ratings yet
Couple Therapy Disorganized Attachment Dynamics
11 pages
Arist de Sensu Par1
No ratings yet
Arist de Sensu Par1
3 pages
Manual Nebula 6
No ratings yet
Manual Nebula 6
26 pages
11 We'Ll Bite Your Tail, Geronimo by Stilton, Geronimo
No ratings yet
11 We'Ll Bite Your Tail, Geronimo by Stilton, Geronimo
133 pages
AITEL JOINT MOCKS 2025 update-1
No ratings yet
AITEL JOINT MOCKS 2025 update-1
30 pages
Maryam's Essay
No ratings yet
Maryam's Essay
4 pages
Thermal Stresses and Strains
No ratings yet
Thermal Stresses and Strains
9 pages
PH1 ProbSet 0 PDF
No ratings yet
PH1 ProbSet 0 PDF
3 pages
Week 2-Tools of Research
No ratings yet
Week 2-Tools of Research
26 pages
Maize mid-density genotyping services _ Excellenceinbreeding
No ratings yet
Maize mid-density genotyping services _ Excellenceinbreeding
4 pages
COMSATS Institute of Information Technology (CIIT) : Islamabad Campus
No ratings yet
COMSATS Institute of Information Technology (CIIT) : Islamabad Campus
3 pages
Latihan Set 1 dbm20023
No ratings yet
Latihan Set 1 dbm20023
4 pages
ANECO 2023-2032-PSPP Grid
No ratings yet
ANECO 2023-2032-PSPP Grid
24 pages
Uhvpe Q&A Unit 1
No ratings yet
Uhvpe Q&A Unit 1
18 pages
340090-Causes and Effects of Sludge Formation in Motor Oils
No ratings yet
340090-Causes and Effects of Sludge Formation in Motor Oils
13 pages
Dara Mining
No ratings yet
Dara Mining
3 pages
A Study On Green Banking in India - An: Commerce Original Research Paper
No ratings yet
A Study On Green Banking in India - An: Commerce Original Research Paper
3 pages
Field Density by Sand Cone Method PDF
83% (6)
Field Density by Sand Cone Method PDF
9 pages
I Have A Dream
No ratings yet
I Have A Dream
8 pages
Cambridge Latin Course Unit 3: Quarter 2: Stage 23: Date Homework Due Lesson Homework All Roads Lead To Rome
No ratings yet
Cambridge Latin Course Unit 3: Quarter 2: Stage 23: Date Homework Due Lesson Homework All Roads Lead To Rome
2 pages
Marking Ac1 SMD
No ratings yet
Marking Ac1 SMD
3 pages
9.3 MCQ
No ratings yet
9.3 MCQ
3 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
TRR Group Polytech Staff
No ratings yet
TRR Group Polytech Staff
3 pages
UMRR Bracket Data Sheet
No ratings yet
UMRR Bracket Data Sheet
12 pages
Benedict 2
No ratings yet
Benedict 2
7 pages
Modeling Volatility Smile
No ratings yet
Modeling Volatility Smile
23 pages
AVIP Brochure_compressed
No ratings yet
AVIP Brochure_compressed
6 pages
Mock Clat - 21 Answer and Explanation: Section-I: English
No ratings yet
Mock Clat - 21 Answer and Explanation: Section-I: English
24 pages