Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
29 views

Lecture Testmodels

The document discusses various methods for evaluating machine learning models including test sets, validation sets, learning curves, cross validation, and accuracy metrics like confusion matrices, ROC curves and precision/recall curves.

Uploaded by

sowmyasanthavel
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lecture Testmodels

The document discusses various methods for evaluating machine learning models including test sets, validation sets, learning curves, cross validation, and accuracy metrics like confusion matrices, ROC curves and precision/recall curves.

Uploaded by

sowmyasanthavel
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Evaluating Machine

Learning Methods

This file is adapted with materials from: http://pages.cs.wisc.edu/~dpage/cs760/ by David Page


Test Sets

• How can we get an unbiased estimate of the accuracy of a learned model?


• When learning a model, you should pretend that you don’t have the
test data yet. (In some applications, it is reasonable to assume that
you have access to the feature vector x but not the y part of each test
instance).
• If the test set labels influence the learned model in any way, accuracy
estimates will be biased.
Learning Curves

• How does the accuracy of a learning method change as a function of


the training-set size?
Validation (Tuning) sets
• Suppose we want unbiased estimates of accuracy the learning process.
Partition training data into separate training/validation sets.
Limitations of using a single training/test partition
Random Resampling
• We can address the second issue by repeatedly randomly partitioning the
available data into training and test sets.
Stratified Sampling
• When randomly selecting training or validation sets, we may want to
ensure that class proportions are maintained in each selected set.
Cross Validation
Cross Validation Example
• Suppose we have 100 instances, and we want to estimate accuracy with
cross validation.
Cross Validation
Internal Cross Validation
• Instead of a single validation set, we can use cross-validation within a
training set to select a model.
Example: Using Internal Cross Validation to Select k in k-NN
Confusion Matrices
• How can we understand what types of mistakes a learned model makes?

Figure from vision.jhu.edu


Confusion Matrix for 2-class Problems
Is Accuracy an adequate measure of predictive performance?
Other Accuracy Metrics
ROC Curves
• A Receiver Operating Characteristic (ROC) curve plots the TP-rate vs. the
FP-rate as a threshold on the confidence of an instance being positive is
varied.
ROC Curve Example
ROC Curves and Misclassification Costs
Algorithm for Creating an ROC Curve
Plotting an ROC Curve
• Does a low false-positive rate indicate that most positive predictions (i.e.
predictions with confidence > some threshold) are correct?
Other Accuracy Metrics
Precision/Recall Curves
• A precision/recall curve plots the precision vs. recall (TP-rate) as a
threshold on the confidence of an instance being positive is varied.
How Do We Get one ROC/PR Curve When We Do Cross Validation?
Comments on ROC and PR Curves
To Avoid Cross-Validation Pitfalls, Ask:
• Is my held-aside test data really representative of going out to collect new data?
• Did I repeat my entire data processing procedure on every fold of
cross-validation, using only the training data for that fold?
• On each fold of cross-validation, did I ever access in any way the label of a test
case?
• Any preprocessing done over entire data set (feature selection, parameter
tuning, threshold selection) must not use labels

• Have I modified my algorithm so many times, or tried so many


approaches, on this same date set that I (the human) am overfitting
it?
• Have I continually modified my preprocessing or learning algorithm until I got
some improvement on this data set?
• If so, I really need to get some additional data now to at least test on.

You might also like