The document discusses various methods for evaluating machine learning models including test sets, validation sets, learning curves, cross validation, and accuracy metrics like confusion matrices, ROC curves and precision/recall curves.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
29 views
Lecture Testmodels
The document discusses various methods for evaluating machine learning models including test sets, validation sets, learning curves, cross validation, and accuracy metrics like confusion matrices, ROC curves and precision/recall curves.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31
Evaluating Machine
Learning Methods
This file is adapted with materials from: http://pages.cs.wisc.edu/~dpage/cs760/ by David Page
Test Sets
• How can we get an unbiased estimate of the accuracy of a learned model?
• When learning a model, you should pretend that you don’t have the test data yet. (In some applications, it is reasonable to assume that you have access to the feature vector x but not the y part of each test instance). • If the test set labels influence the learned model in any way, accuracy estimates will be biased. Learning Curves
• How does the accuracy of a learning method change as a function of
the training-set size? Validation (Tuning) sets • Suppose we want unbiased estimates of accuracy the learning process. Partition training data into separate training/validation sets. Limitations of using a single training/test partition Random Resampling • We can address the second issue by repeatedly randomly partitioning the available data into training and test sets. Stratified Sampling • When randomly selecting training or validation sets, we may want to ensure that class proportions are maintained in each selected set. Cross Validation Cross Validation Example • Suppose we have 100 instances, and we want to estimate accuracy with cross validation. Cross Validation Internal Cross Validation • Instead of a single validation set, we can use cross-validation within a training set to select a model. Example: Using Internal Cross Validation to Select k in k-NN Confusion Matrices • How can we understand what types of mistakes a learned model makes?
Figure from vision.jhu.edu
Confusion Matrix for 2-class Problems Is Accuracy an adequate measure of predictive performance? Other Accuracy Metrics ROC Curves • A Receiver Operating Characteristic (ROC) curve plots the TP-rate vs. the FP-rate as a threshold on the confidence of an instance being positive is varied. ROC Curve Example ROC Curves and Misclassification Costs Algorithm for Creating an ROC Curve Plotting an ROC Curve • Does a low false-positive rate indicate that most positive predictions (i.e. predictions with confidence > some threshold) are correct? Other Accuracy Metrics Precision/Recall Curves • A precision/recall curve plots the precision vs. recall (TP-rate) as a threshold on the confidence of an instance being positive is varied. How Do We Get one ROC/PR Curve When We Do Cross Validation? Comments on ROC and PR Curves To Avoid Cross-Validation Pitfalls, Ask: • Is my held-aside test data really representative of going out to collect new data? • Did I repeat my entire data processing procedure on every fold of cross-validation, using only the training data for that fold? • On each fold of cross-validation, did I ever access in any way the label of a test case? • Any preprocessing done over entire data set (feature selection, parameter tuning, threshold selection) must not use labels
• Have I modified my algorithm so many times, or tried so many
approaches, on this same date set that I (the human) am overfitting it? • Have I continually modified my preprocessing or learning algorithm until I got some improvement on this data set? • If so, I really need to get some additional data now to at least test on.