Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
2nd edition
#MLSEV 2
Evaluations
All models are wrong, but some are useful
Charles Parker
VP Algorithms, BigML, Inc
#MLSEV 3
My Model Is Wonderful
• I trained a model on my data and it
seems really marvelous!
• How do you know for sure?
• To quantify your model’s
performance, you must evaluate it
• This is not optional. If you don’t
do this and do it right, you’ll have
problems
#MLSEV 4
Proper Evaluation
• Choosing the right metric
• Testing on the right data (which might be harder than you think)
• Replicating your tests
#MLSEV 5
Metric Choice
#MLSEV 6
Proper Evaluation
• The most basic workflow for model evaluation is:
• Split your data into two sets, training and testing
• Train a model on the training data
• Measure the “performance” of the model on the testing data
• If your training data is representative of what you will see in the future, that’s
the performance you should get out of your model
• What do we mean by “performance”? This is where you come in.
#MLSEV 7
Medical Testing Example
• Let’s say we develop an ML model that can
diagnose a disease
• About 1 in 1000 people who are tested by
the model turn out to have the disease
• Call the people who have the disease
“sick” and people who don’t have it “well”.
• How well do we do on a test set?
#MLSEV 8
Some Terminology
We’ll define the sick people as “positive” and the well people as “negative"
• “True Positive”: You’re sick and the model diagnosed you as sick
• “False Positive”: You’re well, but the model diagnosed you as sick
• “True Negative”: You’re well, and the model diagnosed you as well
• “False Negative”: You’re sick, but the model diagnosed you as well
The model is correct in the “true” cases, and incorrect in the “false” cases
#MLSEV 9
Accuracy
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Remember, only 1 in 1000 have the disease
• A silly model which always predicts “well” is 99.9% accurate
#MLSEV 10
Precision
Predicted “Well”
Predicted “Sick”
• How well did we do when we predicted
someone was sick?
• A test with high precision has few false
positives
• Precision of 1.0 indicates that everyone who
we predict is sick is actually sick
• What about people who we predict are well?
TP
TP + FP
= 0.6
Sick Person
Well Person
#MLSEV 11
Recall
Predicted “Well”
Predicted “Sick”
• How well did we do when someone was
actually sick?
• A test with high recall indicates few false
negatives
• Recall of 1.0 indicates that everyone who was
actually sick was correctly diagnosed
• But this doesn’t say anything about false
positives!
TP
TP + FN
= 0.75
Sick Person
Well Person
#MLSEV 12
Trade Offs
• We can “trivially maximize” both measures
• If you pick the sickest person and only label them sick and no one
else, you can probably get perfect precision
• If you label everyone sick, you are guaranteed perfect recall
• The unfortunate catch is that if you make one perfect, the
other is terrible, so you want a model that has both high
precision and recall
• This is what quantities like the F1 score and Phi
Coefficient try to do
#MLSEV 13
Cost Matrix
• In many cases, the consequences of a true
positive and a false positive are very different
• You can define “costs” for each type of mistake
• Total Cost = TP * TP_Cost + FP * FP_Cost
• Here, we are willing to accept lots of false
positives in exchange for high recall
• What if a positive diagnosis resulted in
expensive or painful treatment?
Classified
Sick
Classified
Well
Actually
Sick
0 100
Actually
Well
1 0
Cost matrix for medical
diagnosis problem
#MLSEV 14
Operating Thresholds
• Most classifiers don’t output a prediction. Instead they give a “score” for each
class
• The prediction you assign to an instance is usually a function of a threshold on
this score (e.g., if the score is over 0.5, predict true)
• You can experiment with an ROC curve to see how your metrics will change if
you change the threshold
• Lowering the threshold means you are more likely to predict the positive class, which improves
recall but introduces false positives
• Increasing the threshold means you predict the positive class less often (you are more “picky”),
which will probably increase precision but lower recall.
#MLSEV 15
ROC Curve Example
#MLSEV 16
Holding Out Data
#MLSEV 17
Why Hold Out Data?
• Why do we split the dataset into training and testing sets? Why do we always
(always, always) test on data that the model training process did not see?
• Because machine learning algorithms are good at memorizing data
• We don’t care how well the model does on data it has already seen because it
probably won’t see that data again
• Holding out some of the test data is simulating the data the model will see in
the future
#MLSEV 18
Memorization
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
85 26,6 0,351 31 FALSE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Training Evaluating
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 ?
85 26,6 0,351 31 ?
• You don’t even need meaningful features;
the person’s name would be enough
• “Oh right, Bob. I know him. Yes, he
certainly has diabetes”
• As long as there are no duplicate names
in the dataset, it's a 100% accurate
model
#MLSEV 19
Well, That Was Easy
• Okay, so I’m not testing on the training
data, so I’m good, right? NO NO NO
• You also have to worry about information
leakage between training and test data.
• What is this? Let’s try to predict the daily
closing price of the stock market
• What happens if you hold out 10 random
days from your dataset?
• What if you hold out the last 10 days?
#MLSEV 20
Traps Everywhere!
• This is common when you have time-distributed
data, but can also happen in other instances:
• Let’s say we have a dataset of 10,000 pictures
from 20 people, each labeled with the year it which
it was taken
• We want to predict the year from the image
• What happens if we hold out random data?
• Solution: Hold out users instead
#MLSEV 21
How Do We Avoid This?
• It’s a terrible problem, because if you make the mistake you will get results
that are too good, and be inclined to believe them
• So be careful? Do you have:
• Data where points can be grouped in time (by week or by month)?
• Data where points can be grouped by user (each point is an action a user took)
• Data where points can be grouped by location (each point is a day of sales at a particular store)
• Even if you’re suspicious that points from the group might leak information to
one another, try a test where you hold out a few groups (months, users,
locations) and train on the rest
#MLSEV 22
Do It Again!
#MLSEV 23
One Test is Not Enough
• Even if you have a correct holdout, you still need to test more than once.
• Every result you get from any test is a result of randomness
• Randomness from the Data:
• The dataset you have is a finite number of points drawn from an infinite distribution
• The split you make between training and test data is done at random
• Randomness of the algorithm
• The ordering of the data might give different results
• The best performing algorithms (random forests, deepnets) have randomness built-in
• With just one result, you might get lucky
#MLSEV 24
One Test is Not Enough
Performance
Really nice result!
#MLSEV 25
One Test is Not Enough
Performance
Really nice result!
Likelihood
But really just a lucky one
#MLSEV 26
Comparing Models is Even Worse
#MLSEV 27
Comparing Models is Even Worse
#MLSEV 28
Comparing Models is Even Worse
First digit of 

random seed
#MLSEV 29
Please, Sir, Can I Have Some More?
• Always do more than one test!
• For each test, try to vary all sources of
randomness that you can (change the seeds of all
random processes) to try to “experience” as much
variance as you can
• Cross-validation (stratifying is great, monte-carlo
can be a useful simplification)
• Don’t just average the results! The variance is
important!
#MLSEV 30
Summing Up
• Choose the metric that makes sense for
your problem
• Use held out data for testing and watch out
for information leakage
• Always do more than one test, varying all
sources of randomness that you have
control over!
MLSEV Virtual. Evaluations

More Related Content

What's hot

Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Laguna State Polytechnic University
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Koundinya Desiraju
 
Actionable Machine Learning
Actionable Machine LearningActionable Machine Learning
Actionable Machine Learning
Meir Maor
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
Amy Hodler
 
Data Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution ImplementationData Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution Implementation
Rupak Roy
 
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesIntroduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
QuestionPro
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
Peter Varhol
 
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
Troy Magennis
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
Maxwell Rebo
 
Understanding randomness
Understanding randomnessUnderstanding randomness
Understanding randomness
suncil0071
 
Carma internet research module sample size considerations
Carma internet research module   sample size considerationsCarma internet research module   sample size considerations
Carma internet research module sample size considerations
Syracuse University
 
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
Solutions Manual for Discrete Event System Simulation 5th Edition by BanksSolutions Manual for Discrete Event System Simulation 5th Edition by Banks
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
LanaMcdaniel
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
BigML, Inc
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
M. Raihan
 
A Pocket Guide in Machine Learning for Beginners
A Pocket Guide in Machine Learning for BeginnersA Pocket Guide in Machine Learning for Beginners
A Pocket Guide in Machine Learning for Beginners
Rajat Gupta
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
Peter Varhol
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
Sara Hooker
 
IPT Tools 2
IPT Tools 2IPT Tools 2
IPT Tools 2
MR Z
 
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
Lauren Cormack
 

What's hot (20)

Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Actionable Machine Learning
Actionable Machine LearningActionable Machine Learning
Actionable Machine Learning
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
 
Data Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution ImplementationData Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution Implementation
 
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesIntroduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Understanding randomness
Understanding randomnessUnderstanding randomness
Understanding randomness
 
Carma internet research module sample size considerations
Carma internet research module   sample size considerationsCarma internet research module   sample size considerations
Carma internet research module sample size considerations
 
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
Solutions Manual for Discrete Event System Simulation 5th Edition by BanksSolutions Manual for Discrete Event System Simulation 5th Edition by Banks
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
A Pocket Guide in Machine Learning for Beginners
A Pocket Guide in Machine Learning for BeginnersA Pocket Guide in Machine Learning for Beginners
A Pocket Guide in Machine Learning for Beginners
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
IPT Tools 2
IPT Tools 2IPT Tools 2
IPT Tools 2
 
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
 

Similar to MLSEV Virtual. Evaluations

Model validation
Model validationModel validation
Model validation
Utkarsh Sharma
 
Performance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptxPerformance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptx
toneve4907
 
Statistics for UX Professionals
Statistics for UX ProfessionalsStatistics for UX Professionals
Statistics for UX Professionals
Jessica Cameron
 
Statistics for UX Professionals - Jessica Cameron
Statistics for UX Professionals - Jessica CameronStatistics for UX Professionals - Jessica Cameron
Statistics for UX Professionals - Jessica Cameron
User Vision
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
Jen Stirrup
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
SugumarSarDurai
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
RichardGroom
 
5. testing differences
5. testing differences5. testing differences
5. testing differences
Steve Saffhill
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
Sri Ambati
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
nagarajan740445
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
martyynyyte
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
Wayne Lee
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers
ProductFolks
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product Managers
Mahmoud Jalajel
 
Performance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper ParameterPerformance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper Parameter
IndraFransiskusAlam1
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
Minho Lee
 
Bad metric, bad! - Joseph Ours
Bad metric, bad! - Joseph OursBad metric, bad! - Joseph Ours
Bad metric, bad! - Joseph Ours
QA or the Highway
 
Bad metric, bad!
Bad metric, bad!Bad metric, bad!
Bad metric, bad!
Centric Consulting
 
Lecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrixLecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrix
Mostafa El-Hosseini
 

Similar to MLSEV Virtual. Evaluations (20)

Model validation
Model validationModel validation
Model validation
 
Performance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptxPerformance Measurement for Machine Leaning.pptx
Performance Measurement for Machine Leaning.pptx
 
Statistics for UX Professionals
Statistics for UX ProfessionalsStatistics for UX Professionals
Statistics for UX Professionals
 
Statistics for UX Professionals - Jessica Cameron
Statistics for UX Professionals - Jessica CameronStatistics for UX Professionals - Jessica Cameron
Statistics for UX Professionals - Jessica Cameron
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
 
5. testing differences
5. testing differences5. testing differences
5. testing differences
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product Managers
 
Performance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper ParameterPerformance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper Parameter
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
 
Bad metric, bad! - Joseph Ours
Bad metric, bad! - Joseph OursBad metric, bad! - Joseph Ours
Bad metric, bad! - Joseph Ours
 
Bad metric, bad!
Bad metric, bad!Bad metric, bad!
Bad metric, bad!
 
Lecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrixLecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrix
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
BigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
BigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
BigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
BigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
BigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
BigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
BigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
BigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
BigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
BigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
BigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
BigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
BigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
BigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Recently uploaded

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理
一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理
一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理
ekehyz
 
SAP ANalytics Cloud -SAP SAC planning 22
SAP ANalytics Cloud -SAP SAC planning 22SAP ANalytics Cloud -SAP SAC planning 22
SAP ANalytics Cloud -SAP SAC planning 22
ramana4bw
 
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdfAWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
Miguel Ángel Rodríguez Anticona
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
NEW THYROID DISEASES CLASSIFICATION USING ML.docx
NEW THYROID DISEASES CLASSIFICATION USING ML.docxNEW THYROID DISEASES CLASSIFICATION USING ML.docx
NEW THYROID DISEASES CLASSIFICATION USING ML.docx
dharugayu13475
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
taqyea
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
punebabes1
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
Delhi Call Girls
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
Amazon Web Services Korea
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
bcme welcome and ground rule required for bcme course (1).pptx
bcme welcome and ground rule required for bcme course (1).pptxbcme welcome and ground rule required for bcme course (1).pptx
bcme welcome and ground rule required for bcme course (1).pptx
BINITADASH3
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
qemnpg
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
SARITA PANDEY
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
ritu36392
 
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
nehadubay1
 

Recently uploaded (20)

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理
一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理
一比一原版(爱大文凭证书)美国爱荷华大学毕业证如何办理
 
SAP ANalytics Cloud -SAP SAC planning 22
SAP ANalytics Cloud -SAP SAC planning 22SAP ANalytics Cloud -SAP SAC planning 22
SAP ANalytics Cloud -SAP SAC planning 22
 
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdfAWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
AWS Cloud Technology and Services by Miguel Ángel Rodríguez Anticona.pdf
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
NEW THYROID DISEASES CLASSIFICATION USING ML.docx
NEW THYROID DISEASES CLASSIFICATION USING ML.docxNEW THYROID DISEASES CLASSIFICATION USING ML.docx
NEW THYROID DISEASES CLASSIFICATION USING ML.docx
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
bcme welcome and ground rule required for bcme course (1).pptx
bcme welcome and ground rule required for bcme course (1).pptxbcme welcome and ground rule required for bcme course (1).pptx
bcme welcome and ground rule required for bcme course (1).pptx
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
 
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 

MLSEV Virtual. Evaluations

  • 2. #MLSEV 2 Evaluations All models are wrong, but some are useful Charles Parker VP Algorithms, BigML, Inc
  • 3. #MLSEV 3 My Model Is Wonderful • I trained a model on my data and it seems really marvelous! • How do you know for sure? • To quantify your model’s performance, you must evaluate it • This is not optional. If you don’t do this and do it right, you’ll have problems
  • 4. #MLSEV 4 Proper Evaluation • Choosing the right metric • Testing on the right data (which might be harder than you think) • Replicating your tests
  • 6. #MLSEV 6 Proper Evaluation • The most basic workflow for model evaluation is: • Split your data into two sets, training and testing • Train a model on the training data • Measure the “performance” of the model on the testing data • If your training data is representative of what you will see in the future, that’s the performance you should get out of your model • What do we mean by “performance”? This is where you come in.
  • 7. #MLSEV 7 Medical Testing Example • Let’s say we develop an ML model that can diagnose a disease • About 1 in 1000 people who are tested by the model turn out to have the disease • Call the people who have the disease “sick” and people who don’t have it “well”. • How well do we do on a test set?
  • 8. #MLSEV 8 Some Terminology We’ll define the sick people as “positive” and the well people as “negative" • “True Positive”: You’re sick and the model diagnosed you as sick • “False Positive”: You’re well, but the model diagnosed you as sick • “True Negative”: You’re well, and the model diagnosed you as well • “False Negative”: You’re sick, but the model diagnosed you as well The model is correct in the “true” cases, and incorrect in the “false” cases
  • 9. #MLSEV 9 Accuracy TP + TN Total • “Percentage correct” - like an exam • If Accuracy = 1 then no mistakes • If Accuracy = 0 then all mistakes • Intuitive but not always useful • Watch out for unbalanced classes! • Remember, only 1 in 1000 have the disease • A silly model which always predicts “well” is 99.9% accurate
  • 10. #MLSEV 10 Precision Predicted “Well” Predicted “Sick” • How well did we do when we predicted someone was sick? • A test with high precision has few false positives • Precision of 1.0 indicates that everyone who we predict is sick is actually sick • What about people who we predict are well? TP TP + FP = 0.6 Sick Person Well Person
  • 11. #MLSEV 11 Recall Predicted “Well” Predicted “Sick” • How well did we do when someone was actually sick? • A test with high recall indicates few false negatives • Recall of 1.0 indicates that everyone who was actually sick was correctly diagnosed • But this doesn’t say anything about false positives! TP TP + FN = 0.75 Sick Person Well Person
  • 12. #MLSEV 12 Trade Offs • We can “trivially maximize” both measures • If you pick the sickest person and only label them sick and no one else, you can probably get perfect precision • If you label everyone sick, you are guaranteed perfect recall • The unfortunate catch is that if you make one perfect, the other is terrible, so you want a model that has both high precision and recall • This is what quantities like the F1 score and Phi Coefficient try to do
  • 13. #MLSEV 13 Cost Matrix • In many cases, the consequences of a true positive and a false positive are very different • You can define “costs” for each type of mistake • Total Cost = TP * TP_Cost + FP * FP_Cost • Here, we are willing to accept lots of false positives in exchange for high recall • What if a positive diagnosis resulted in expensive or painful treatment? Classified Sick Classified Well Actually Sick 0 100 Actually Well 1 0 Cost matrix for medical diagnosis problem
  • 14. #MLSEV 14 Operating Thresholds • Most classifiers don’t output a prediction. Instead they give a “score” for each class • The prediction you assign to an instance is usually a function of a threshold on this score (e.g., if the score is over 0.5, predict true) • You can experiment with an ROC curve to see how your metrics will change if you change the threshold • Lowering the threshold means you are more likely to predict the positive class, which improves recall but introduces false positives • Increasing the threshold means you predict the positive class less often (you are more “picky”), which will probably increase precision but lower recall.
  • 17. #MLSEV 17 Why Hold Out Data? • Why do we split the dataset into training and testing sets? Why do we always (always, always) test on data that the model training process did not see? • Because machine learning algorithms are good at memorizing data • We don’t care how well the model does on data it has already seen because it probably won’t see that data again • Holding out some of the test data is simulating the data the model will see in the future
  • 18. #MLSEV 18 Memorization plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 85 26,6 0,351 31 FALSE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Training Evaluating plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 ? 85 26,6 0,351 31 ? • You don’t even need meaningful features; the person’s name would be enough • “Oh right, Bob. I know him. Yes, he certainly has diabetes” • As long as there are no duplicate names in the dataset, it's a 100% accurate model
  • 19. #MLSEV 19 Well, That Was Easy • Okay, so I’m not testing on the training data, so I’m good, right? NO NO NO • You also have to worry about information leakage between training and test data. • What is this? Let’s try to predict the daily closing price of the stock market • What happens if you hold out 10 random days from your dataset? • What if you hold out the last 10 days?
  • 20. #MLSEV 20 Traps Everywhere! • This is common when you have time-distributed data, but can also happen in other instances: • Let’s say we have a dataset of 10,000 pictures from 20 people, each labeled with the year it which it was taken • We want to predict the year from the image • What happens if we hold out random data? • Solution: Hold out users instead
  • 21. #MLSEV 21 How Do We Avoid This? • It’s a terrible problem, because if you make the mistake you will get results that are too good, and be inclined to believe them • So be careful? Do you have: • Data where points can be grouped in time (by week or by month)? • Data where points can be grouped by user (each point is an action a user took) • Data where points can be grouped by location (each point is a day of sales at a particular store) • Even if you’re suspicious that points from the group might leak information to one another, try a test where you hold out a few groups (months, users, locations) and train on the rest
  • 22. #MLSEV 22 Do It Again!
  • 23. #MLSEV 23 One Test is Not Enough • Even if you have a correct holdout, you still need to test more than once. • Every result you get from any test is a result of randomness • Randomness from the Data: • The dataset you have is a finite number of points drawn from an infinite distribution • The split you make between training and test data is done at random • Randomness of the algorithm • The ordering of the data might give different results • The best performing algorithms (random forests, deepnets) have randomness built-in • With just one result, you might get lucky
  • 24. #MLSEV 24 One Test is Not Enough Performance Really nice result!
  • 25. #MLSEV 25 One Test is Not Enough Performance Really nice result! Likelihood But really just a lucky one
  • 26. #MLSEV 26 Comparing Models is Even Worse
  • 27. #MLSEV 27 Comparing Models is Even Worse
  • 28. #MLSEV 28 Comparing Models is Even Worse First digit of random seed
  • 29. #MLSEV 29 Please, Sir, Can I Have Some More? • Always do more than one test! • For each test, try to vary all sources of randomness that you can (change the seeds of all random processes) to try to “experience” as much variance as you can • Cross-validation (stratifying is great, monte-carlo can be a useful simplification) • Don’t just average the results! The variance is important!
  • 30. #MLSEV 30 Summing Up • Choose the metric that makes sense for your problem • Use held out data for testing and watch out for information leakage • Always do more than one test, varying all sources of randomness that you have control over!