MLSEV Virtual. Automating Model SelectionBigML, Inc
1) Bayesian parameter optimization uses machine learning to predict the performance of untrained models based on parameters from previous models to efficiently search the parameter space.
2) However, there are still important issues like choosing the right evaluation metric, ensuring no information leakage between training and test data, and selecting the appropriate model for the problem and available data.
3) Automated model selection requires sufficient data to make accurate predictions; with insufficient data, the process can fail.
MLSEV Virtual. Supervised vs UnsupervisedBigML, Inc
Supervised vs Unsupervised Learning Techniques, by Charles Parker, Vice President of Machine Learning algorithms at BigML.
*MLSEV 2020: Virtual Conference.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This document discusses max-diff (maximum difference) analysis, which is a method for collecting preference data. It covers when to use max-diff, experimental design considerations, problems with simple "counting" analysis, using latent class analysis instead, and computing preference shares from max-diff data. Latent class analysis addresses issues with counting analysis by accounting for experimental design, inconsistencies in preferences, and differences between individuals.
MLSEV Virtual. Automating Model SelectionBigML, Inc
1) Bayesian parameter optimization uses machine learning to predict the performance of untrained models based on parameters from previous models to efficiently search the parameter space.
2) However, there are still important issues like choosing the right evaluation metric, ensuring no information leakage between training and test data, and selecting the appropriate model for the problem and available data.
3) Automated model selection requires sufficient data to make accurate predictions; with insufficient data, the process can fail.
MLSEV Virtual. Supervised vs UnsupervisedBigML, Inc
Supervised vs Unsupervised Learning Techniques, by Charles Parker, Vice President of Machine Learning algorithms at BigML.
*MLSEV 2020: Virtual Conference.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This document discusses max-diff (maximum difference) analysis, which is a method for collecting preference data. It covers when to use max-diff, experimental design considerations, problems with simple "counting" analysis, using latent class analysis instead, and computing preference shares from max-diff data. Latent class analysis addresses issues with counting analysis by accounting for experimental design, inconsistencies in preferences, and differences between individuals.
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
This document discusses issues in statistics that data scientists can and cannot ignore when working with large datasets. It begins by outlining the talk and defining key terms in data science. It then explains that model assessment, such as estimating model performance on new data, becomes easier with more data as statistical adjustments are not needed. However, more data and variables are not always better, as noise, collinearity, and overfitting can still occur. Several examples are given where common machine learning algorithms can be fooled into achieving high accuracy on training data even when the target variable is random. The conclusion emphasizes that data science, statistics, and domain expertise each provide unique perspectives, and effective teams need to understand all views.
This document discusses machine learning, including differentiating it from artificial intelligence and deep learning. It covers the need for machine learning due to increasing data volumes and how machine learning processes work through experiences to build rules and logic from data. The types of machine learning are described as supervised learning, unsupervised learning, and reinforcement learning. Examples of machine learning applications like recommendation engines and spam filters are also provided.
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
This document discusses using machine learning to optimize future outcomes rather than just predict them. It explains that randomized studies are needed to accurately predict the effects of actions, but sometimes this is not possible with observational data alone. The document proposes techniques like transfer learning, common support analysis, and generative adversarial networks to help evaluate strategies without randomized trials by expanding the available data.
ML Drift - How to find issues before they become problemsAmy Hodler
Over time, our AI predictions degrade. Full Stop.
Whether it's concept drift where the relationships of our data to what we're trying to predict as changed or data drift where our production data no longer resembles the historical training data, identifying meaningful ML drift versus spurious or acceptable drift is tedious. Not to mention the difficulty of uncovering which ML features are the source of poorer accuracy.
This session looked at the key types of machine learning drift and how to catch them before they become a problem.
Data Science Methodology for Analytics and Solution ImplementationRupak Roy
Answer what is analytics why it is so important and how can we conduct a successful analysis with solution implementation and much more. let me know if anything is required. Happy to help. Ping me at google #bobrupakroy Talk soon! Enjoy Data Science.
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesQuestionPro
This document provides an overview of MaxDiff, a technique for evaluating preferences that asks respondents to choose best and worst options from sets. It notes limitations of traditional rating scales like scale bias and lack of constraints. MaxDiff forces trade-offs and provides richer data than ratings. Questions present lists and ask for most/least important. Results can be analyzed simply by counting choices or more advanced techniques can provide respondent-level utilities. The document provides examples and tips for effective MaxDiff surveys.
Testing a movingtarget_quest_dynatracePeter Varhol
This document discusses challenges in testing machine learning and adaptive systems. It begins by explaining that these systems are non-deterministic and do not always produce the same outputs for a given input. Traditional testing approaches cannot be used because outputs are not predefined. The document then explores challenges like defining requirements, determining what constitutes a bug, and validating results without a single correct answer. It argues that testing objectives, scenarios, and acceptable outcomes need to be clearly defined. Accuracy alone may not be a useful metric, and non-deterministic results are expected. Overall, the document advocates understanding how these systems work and setting measurable criteria to assess quality.
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisTroy Magennis
Software risk impact is more predictable than you might think. This session discusses similarities of uncertainty in various industries and relates this back to how we can measure and analyze impediments and risk for agile software teams.
This document provides an overview of machine learning concepts and example algorithms. It discusses how machine learning systems can learn from experience without explicit programming. It then covers classification and regression problems and provides examples of random forests and Gaussian processes algorithms. The document also discusses feature learning with examples of autoencoders and PCA. Finally, it discusses practical considerations for applying machine learning, including the importance of data quality, data pipelines, managing error risk, and institutionalizing machine learning applications.
While computers are often used to generate random numbers, they are not truly random and instead produce pseudorandom values following a program. There are better ways to generate random data that is equally likely and truly random. Simulations are used to model and investigate random processes, events, and questions when collecting real data is difficult. A successful simulation involves identifying components, modeling outcomes, defining response variables, running multiple trials, analyzing results, and drawing conclusions. Care must be taken to avoid overstating findings or confusing simulation results with reality.
Carma internet research module sample size considerationsSyracuse University
This document discusses key considerations for determining sample size in research studies, including response rate, attrition, statistical power, and margin of error. It recommends hoping to achieve a 50% response rate but planning for 30%, and using power analysis tools to estimate sample size needed based on the expected effect size. Margin of error calculators can also help determine the needed sample size for projecting results to the larger population. An overall sampling plan should account for all these factors.
Solutions Manual for Discrete Event System Simulation 5th Edition by BanksLanaMcdaniel
Full download : https://downloadlink.org/p/solutions-manual-for-discrete-event-system-simulation-5th-edition-by-banks/
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
This document summarizes a presentation on model evaluation given at the 4th annual Valencian Summer School in Machine Learning. It discusses the importance of evaluating models to understand how well they will perform on new data and identify mistakes. Various evaluation metrics are introduced like accuracy, precision, recall, F1 score, and Phi coefficient. The dangers of evaluating on training data are explained, and techniques like train-test splits and cross-validation are recommended to get less optimistic evaluations. Regression metrics like MAE, MSE, and R-squared error are also covered. Different evaluation techniques for specific problem types like imbalanced classification, time series forecasting, and model selection are discussed.
The document discusses several key concepts in machine learning including reinforcement learning, evolutionary learning, features, training/test/validation sets, overfitting, underfitting, and clustering. Reinforcement learning involves training an agent through rewards/punishments without being directly told what to do. Evolutionary learning follows biological evolution principles of inheritance, variation, and selection. Features represent attributes of an object encoded in a vector. Training/test/validation sets are used to develop and evaluate models. Overfitting and underfitting refer to models fitting the training data too closely or not closely enough, respectively. Clustering groups similar objects together.
A Pocket Guide in Machine Learning for BeginnersRajat Gupta
Visual aids to get started with machine learning. This guide presents steps from collecting data to deploying your model with basic machine learning algorithms and key points to remember.
Testing for cognitive bias in ai systemsPeter Varhol
The document discusses how machine learning systems can produce biased results based on issues with the training data used, and provides examples of how biases have emerged in commercial AI systems. It then outlines approaches for testing machine learning systems to identify potential biases, including understanding the training data, defining objective success criteria, and testing with diverse edge cases. The challenges of addressing biases that emerge from limitations in the data or human decisions are also examined.
Module 9: Natural Language Processing Part 2Sara Hooker
This document provides an overview of natural language processing techniques for gathering and analyzing text data, including web scraping, topic modeling, and clustering. It discusses gathering text data through APIs or web scraping using tools like Beautiful Soup. It also covers representing text numerically using bag-of-words and TF-IDF, visualizing documents in multi-dimensional spaces based on word frequencies, and using k-means clustering to group similar documents together based on cosine or Euclidean distances between their vectors. The document uses examples of Netflix movie descriptions to illustrate these NLP techniques.
The document discusses software tools used for analyzing data. It describes how hardware supports software analysis by providing storage, processing power, and specialized components depending on the task. Key software analysis methods covered include searching, sorting, modeling and simulation, creating "what if" scenarios, and generating charts and graphs. The document also notes that specialized software exists for analyzing non-numeric data like images, video and audio. It concludes by mentioning some social and ethical issues that can arise from extensive data analysis, such as loss of privacy and profiling.
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...Lauren Cormack
The document provides an overview of analytical techniques for answering business questions. It discusses the four pillars of analytics: data munging, reporting and visualization, analysis and insights, and applied analytics. Specific topics covered include A/B testing best practices, reporting and visualization tools like Tableau, using multiple data sources for analysis, and best practices for data analysis and communication. The document is intended as a practical guide for those working in analytics to help tackle business issues.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
This document discusses issues in statistics that data scientists can and cannot ignore when working with large datasets. It begins by outlining the talk and defining key terms in data science. It then explains that model assessment, such as estimating model performance on new data, becomes easier with more data as statistical adjustments are not needed. However, more data and variables are not always better, as noise, collinearity, and overfitting can still occur. Several examples are given where common machine learning algorithms can be fooled into achieving high accuracy on training data even when the target variable is random. The conclusion emphasizes that data science, statistics, and domain expertise each provide unique perspectives, and effective teams need to understand all views.
This document discusses machine learning, including differentiating it from artificial intelligence and deep learning. It covers the need for machine learning due to increasing data volumes and how machine learning processes work through experiences to build rules and logic from data. The types of machine learning are described as supervised learning, unsupervised learning, and reinforcement learning. Examples of machine learning applications like recommendation engines and spam filters are also provided.
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
This document discusses using machine learning to optimize future outcomes rather than just predict them. It explains that randomized studies are needed to accurately predict the effects of actions, but sometimes this is not possible with observational data alone. The document proposes techniques like transfer learning, common support analysis, and generative adversarial networks to help evaluate strategies without randomized trials by expanding the available data.
ML Drift - How to find issues before they become problemsAmy Hodler
Over time, our AI predictions degrade. Full Stop.
Whether it's concept drift where the relationships of our data to what we're trying to predict as changed or data drift where our production data no longer resembles the historical training data, identifying meaningful ML drift versus spurious or acceptable drift is tedious. Not to mention the difficulty of uncovering which ML features are the source of poorer accuracy.
This session looked at the key types of machine learning drift and how to catch them before they become a problem.
Data Science Methodology for Analytics and Solution ImplementationRupak Roy
Answer what is analytics why it is so important and how can we conduct a successful analysis with solution implementation and much more. let me know if anything is required. Happy to help. Ping me at google #bobrupakroy Talk soon! Enjoy Data Science.
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesQuestionPro
This document provides an overview of MaxDiff, a technique for evaluating preferences that asks respondents to choose best and worst options from sets. It notes limitations of traditional rating scales like scale bias and lack of constraints. MaxDiff forces trade-offs and provides richer data than ratings. Questions present lists and ask for most/least important. Results can be analyzed simply by counting choices or more advanced techniques can provide respondent-level utilities. The document provides examples and tips for effective MaxDiff surveys.
Testing a movingtarget_quest_dynatracePeter Varhol
This document discusses challenges in testing machine learning and adaptive systems. It begins by explaining that these systems are non-deterministic and do not always produce the same outputs for a given input. Traditional testing approaches cannot be used because outputs are not predefined. The document then explores challenges like defining requirements, determining what constitutes a bug, and validating results without a single correct answer. It argues that testing objectives, scenarios, and acceptable outcomes need to be clearly defined. Accuracy alone may not be a useful metric, and non-deterministic results are expected. Overall, the document advocates understanding how these systems work and setting measurable criteria to assess quality.
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisTroy Magennis
Software risk impact is more predictable than you might think. This session discusses similarities of uncertainty in various industries and relates this back to how we can measure and analyze impediments and risk for agile software teams.
This document provides an overview of machine learning concepts and example algorithms. It discusses how machine learning systems can learn from experience without explicit programming. It then covers classification and regression problems and provides examples of random forests and Gaussian processes algorithms. The document also discusses feature learning with examples of autoencoders and PCA. Finally, it discusses practical considerations for applying machine learning, including the importance of data quality, data pipelines, managing error risk, and institutionalizing machine learning applications.
While computers are often used to generate random numbers, they are not truly random and instead produce pseudorandom values following a program. There are better ways to generate random data that is equally likely and truly random. Simulations are used to model and investigate random processes, events, and questions when collecting real data is difficult. A successful simulation involves identifying components, modeling outcomes, defining response variables, running multiple trials, analyzing results, and drawing conclusions. Care must be taken to avoid overstating findings or confusing simulation results with reality.
Carma internet research module sample size considerationsSyracuse University
This document discusses key considerations for determining sample size in research studies, including response rate, attrition, statistical power, and margin of error. It recommends hoping to achieve a 50% response rate but planning for 30%, and using power analysis tools to estimate sample size needed based on the expected effect size. Margin of error calculators can also help determine the needed sample size for projecting results to the larger population. An overall sampling plan should account for all these factors.
Solutions Manual for Discrete Event System Simulation 5th Edition by BanksLanaMcdaniel
Full download : https://downloadlink.org/p/solutions-manual-for-discrete-event-system-simulation-5th-edition-by-banks/
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
This document summarizes a presentation on model evaluation given at the 4th annual Valencian Summer School in Machine Learning. It discusses the importance of evaluating models to understand how well they will perform on new data and identify mistakes. Various evaluation metrics are introduced like accuracy, precision, recall, F1 score, and Phi coefficient. The dangers of evaluating on training data are explained, and techniques like train-test splits and cross-validation are recommended to get less optimistic evaluations. Regression metrics like MAE, MSE, and R-squared error are also covered. Different evaluation techniques for specific problem types like imbalanced classification, time series forecasting, and model selection are discussed.
The document discusses several key concepts in machine learning including reinforcement learning, evolutionary learning, features, training/test/validation sets, overfitting, underfitting, and clustering. Reinforcement learning involves training an agent through rewards/punishments without being directly told what to do. Evolutionary learning follows biological evolution principles of inheritance, variation, and selection. Features represent attributes of an object encoded in a vector. Training/test/validation sets are used to develop and evaluate models. Overfitting and underfitting refer to models fitting the training data too closely or not closely enough, respectively. Clustering groups similar objects together.
A Pocket Guide in Machine Learning for BeginnersRajat Gupta
Visual aids to get started with machine learning. This guide presents steps from collecting data to deploying your model with basic machine learning algorithms and key points to remember.
Testing for cognitive bias in ai systemsPeter Varhol
The document discusses how machine learning systems can produce biased results based on issues with the training data used, and provides examples of how biases have emerged in commercial AI systems. It then outlines approaches for testing machine learning systems to identify potential biases, including understanding the training data, defining objective success criteria, and testing with diverse edge cases. The challenges of addressing biases that emerge from limitations in the data or human decisions are also examined.
Module 9: Natural Language Processing Part 2Sara Hooker
This document provides an overview of natural language processing techniques for gathering and analyzing text data, including web scraping, topic modeling, and clustering. It discusses gathering text data through APIs or web scraping using tools like Beautiful Soup. It also covers representing text numerically using bag-of-words and TF-IDF, visualizing documents in multi-dimensional spaces based on word frequencies, and using k-means clustering to group similar documents together based on cosine or Euclidean distances between their vectors. The document uses examples of Netflix movie descriptions to illustrate these NLP techniques.
The document discusses software tools used for analyzing data. It describes how hardware supports software analysis by providing storage, processing power, and specialized components depending on the task. Key software analysis methods covered include searching, sorting, modeling and simulation, creating "what if" scenarios, and generating charts and graphs. The document also notes that specialized software exists for analyzing non-numeric data like images, video and audio. It concludes by mentioning some social and ethical issues that can arise from extensive data analysis, such as loss of privacy and profiling.
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...Lauren Cormack
The document provides an overview of analytical techniques for answering business questions. It discusses the four pillars of analytics: data munging, reporting and visualization, analysis and insights, and applied analytics. Specific topics covered include A/B testing best practices, reporting and visualization tools like Tableau, using multiple data sources for analysis, and best practices for data analysis and communication. The document is intended as a practical guide for those working in analytics to help tackle business issues.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
Are you looking to expand your research toolkit to include some quantitative methods, such as survey research or A/B testing? Have you been asked to collect some usability metrics, but aren’t sure how best to go about that? Or do you just want to be more aware of all of the UX research possibilities? If your answer to any of those questions is yes, then this session is for you.
You may know that without statistics, you won’t know if A is really better than B, if users are truly more satisfied with your new site than with your old one, or which changes to your site have actually impacted conversion rates. However, statistics can also help you figure out how to report satisfaction and other metrics you collect during usability tests. And they’re essential for making sense of the results of quantitative usability tests.
This session will focus on the statistical concepts that are most useful for UX researchers. It won’t make you a quant, but it will give you a good grounding in quantitative methods and reporting. (For example, you will learn what a margin of error is, how to report quantitative data collected during a usability test - and how not to - and how many people you really need to fill out a survey.)
Statistics for UX Professionals - Jessica CameronUser Vision
Are you looking to expand your research toolkit to include some quantitative methods, such as survey research or A/B testing? Have you been asked to collect some usability metrics, but aren’t sure how best to go about that? Or do you just want to be more aware of all of the UX research possibilities? If your answer to any of those questions is yes, then this session is for you.
You may know that without statistics, you won’t know if A is really better than B, if users are truly more satisfied with your new site than with your old one, or which changes to your site have actually impacted conversion rates. However, statistics can also help you figure out how to report satisfaction and other metrics you collect during usability tests. And they’re essential for making sense of the results of quantitative usability tests.
This session will focus on the statistical concepts that are most useful for UX researchers. It won’t make you a quant, but it will give you a good grounding in quantitative methods and reporting. (For example, you will learn what a margin of error is, how to report quantitative data collected during a usability test - and how not to - and how many people you really need to fill out a survey.)
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
The document discusses marketing research methods, including sampling techniques, statistical significance, and customer databases. It covers probability and non-probability sampling methods, how to determine sample size, and factors that affect statistical significance. It also addresses advantages and limitations of customer databases, data analysis techniques, and principles of profiling and segmenting customers.
This document introduces difference testing and parametric and non-parametric tests. It discusses the assumptions of parametric tests including random sampling, normally distributed interval/ratio data, and equal variances. Non-parametric tests like Wilcoxon and Mann-Whitney U are introduced as alternatives. Key principles of difference testing like independent vs dependent variables are explained. Steps for t-tests, paired t-tests, and non-parametric equivalents are outlined along with interpreting SPSS outputs and dealing with issues of significance. Factors like meaningful vs statistical significance and one-tailed vs two-tailed tests are also briefly covered.
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, we review top 10 common pitfalls and steps to avoid them. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Statistical Learning and Model Selection module 2.pptxnagarajan740445
Statistical learning theory was introduced in the 1960s as a problem of function estimation from data. In the 1990s, new learning algorithms like support vector machines were proposed based on the developed theory, making statistical learning theory a tool for both theoretical analysis and creating practical algorithms. Cross-validation techniques like k-fold and leave-one-out cross-validation help estimate a model's predictive performance and avoid overfitting by splitting data into training and test sets. The goal is to find the right balance between bias and variance to minimize prediction error on new data.
This document describes how to perform a chi-square test to determine if two genes are independently assorting or linked. It explains that for a two-point testcross of a heterozygote individual, you expect a 25% ratio for each of the four possible offspring genotypes if the genes are independent. The chi-square test compares observed vs. expected offspring ratios. It notes that the standard test assumes equal segregation of alleles, which may not always be true.
- A/B testing involves randomized controlled experiments comparing a treatment group to a control group. However, there are various sources of variability beyond just the treatment that must be accounted for.
- Good experiment design aims to minimize bias and convert it to random noise through randomization. The role of statistics is to quantify the magnitude of the treatment effect compared to the noise.
- Classical hypothesis testing approaches the problem as "assuming no difference and seeing if the data contradicts that". However, concerns with this approach include overreliance on p-values and not addressing multiple testing.
- Bayesian approaches consider the probability of there being a difference given the data, but require specifying a prior probability which is challenging. Alternatives like multi-
Top 10 Data Science Practitioner PitfallsSri Ambati
Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, Mark Landry, one of the world’s leading Kagglers, will review the top 10 common pitfalls and steps to avoid them.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This talk addresses product managers and discusses basics of statistics and analytics and ways to use them effectively in their products.
Video: https://youtu.be/Rsrp040DYKg (orientation is fixed after a few minutes)
April 22, 2017 - Product Folks! Meetup Amman, Jordan
This document discusses various performance metrics used to evaluate machine learning models, with a focus on classification metrics. It defines key metrics like accuracy, precision, recall, and specificity using a cancer detection example. Accuracy is only useful when classes are balanced, while precision captures true positives and recall focuses on minimizing false negatives. The document emphasizes that the appropriate metric depends on the problem and whether minimizing false positives or false negatives is more important. Confusion matrices are also introduced as a way to visualize model performance.
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
Metrics have always been used in corporate sectors, primarily as a way to gain insight into what is an otherwise invisible world. Not only that, “standards bodies”, such as CMMi, require metrics to achieve a certain maturity level. These two factors tend to drive organizations to blindly adopt a set of metrics as a way of satisfying some process transparency requirement. Rarely do any organizations apply any statistical or scientific thought behind the measures and metrics they establish and interpret. In this talk, we’ll look at some common metrics and why they fail to represent what most believe they do. We’ll discuss the real purpose of metrics, issues with metric programs, how to leverage metrics effectively, and finally specific measure and metric pitfalls organizations encounter.
About Joseph Ours' Presentation – “Bad Metric – Bad!”
Metrics have always been used in corporate sectors, primarily as a way to gain insight into what is an otherwise invisible world. Organizations blindly adopt a set of metrics as a way of satisfying some process transparency requirement, rarely applying any statistical or scientific thought behind the measures and metrics they establish and interpret. Many metrics do not represent what people believe they do and as a result can lead to erroneous decisions. Joseph looks at some of the common and some of the humorous testing metrics and determines why they are failures. He further discusses the real purpose of metrics, metrics programs and finishes with pitfalls into which you fall.
The document discusses classifying handwritten digits from the MNIST dataset using various machine learning classifiers and evaluation metrics. It begins with binary classification of the digit 5 using SGDClassifier, evaluating accuracy which is misleading due to class imbalance. The document then introduces confusion matrices and precision/recall metrics to better evaluate performance. It demonstrates how precision and recall can be traded off by varying the decision threshold, and introduces ROC curves to visualize this tradeoff. Finally, it compares SGDClassifier and RandomForestClassifier on this binary classification task.
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
Machine Learning for Anti Money Laundering Compliance, by Kevin Nagel, Consultant and Data Scientist at INFORM.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
Multi Perspective Anomalies, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
The document discusses building an anomaly detector model to identify unusual transactions in a dataset. It describes loading transaction data with 31 features into the BigML platform and creating an anomaly detector model. The model scores new data and identifies the most anomalous fields to help detect fraud. Creating the anomaly detector involves interpreting the data, exploring the dataset distribution, and setting a threshold score to define what is considered anomalous.
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
History and Present Developments in Machine Learning, by Tom Dietterich, Emeritus Professor of computer science at Oregon State University and Chief Scientist at BigML.
*Machine Learning School in The Netherlands 2022.
Introduction to End-to-End Machine Learning: Classification and Regression - Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
A Data-Driven Company: 21 Lessons for Large Organizations to Create Value from AI, by Richard Benjamins, Chief AI and Data Strategist at Telefónica.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
How Machine Learning Transforms and Automates Legal Services, by Arnoud Engelfriet, Co-Founder at Lynn Legal.
*Machine Learning School in The Netherlands 2022.
This document describes a proposed solution using machine learning and artificial intelligence to help create a safer stadium experience. The solution involves two parts: 1) linking access to stadiums to a verified identity through a fan app for preregistration, and 2) using AI/ML to help detect unwanted behaviors or events early. The rest of the document provides more details on the proposed smart video review framework, including using computer vision and audio analysis techniques to help identify issues like flares, flags, banners, chants including monkey chants. The goal is to help reviewers more efficiently identify potential problems but with privacy, ethics and human oversight.
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
Process Optimization in Manufacturing Plants, by Keyanoush Razavidinani, Digital Business Consultant at A1 Digital.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
The document discusses the need for citizen developers and humans in the AI/ML process. It notes that while technology and talent are important, company culture must also support broad data analytics and AI/ML adoption. It then provides examples of how involving domain experts can help attribute meaning to correlations and build better causal models to improve AI systems. The document advocates for a systems thinking approach and having humans in the loop to help AI/ML systems consider the wider context and avoid issues like bias.
This new feature is a continuation of and improvement on our previous Image Processing release. Now, Object Detection lets you go a step further with your image data and allows you to locate objects and annotate regions in your images. Once your image regions are defined, you can train and evaluate Object Detection models, make predictions with them, and automate end-to-end Machine Learning workflows on a single platform. To make that possible, BigML enables Object Detection by introducing the regions optype.
As with any other BigML feature, Object Detection is available from the BigML Dashboard, API, and WhizzML for automation. Object Detection is extremely helpful to tackle a wide range of computer vision use cases such as medical image analysis, quality control in manufacturing, license plate recognition in transportation, people detection in security surveillance, among many others.
This new release brings Image Processing to the BigML platform, a feature that enhances our offering to solve image data-driven business problems with remarkable ease of use. Because BigML treats images as any other data type, this unique implementation allows you to easily use image data alongside text, categorical, numeric, date-time, and items data types as input to create any Machine Learning model available in our platform, both supervised and unsupervised.
Now, it is easier than ever to solve a wide variety of computer vision and image classification use cases in a single platform: label your image data, train and evaluate your models, make predictions, and automate your end-to-end Machine Learning workflows. As with any other BigML feature, Image Processing is available from the BigML Dashboard, API, and WhizzML, and it can be applied to solve use cases such as medical image analysis, visual product search, security surveillance, and vehicle damage detection, among others.
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
This session presents a quite common situation for those working in food and beverage retail (FnB) and highlights interesting insights to fight waste reduction.
Speaker: Stephen Kinns, CEO and Co-Founder at catsAi.
*ML in Retail 2021: Webinar.
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
This is an introductory session about the role that Machine Learning is playing in the retail sector and how it is being deployed across the different areas of this industry.
Speaker: Atakan Cetinsoy, VP of Predictive Applications at BigML.
*ML in Retail 2021: Webinar.
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
This presentation analyzes the role that Machine Learning plays in legal automation with a real-world Machine Learning application.
Speaker: Arnoud Engelfriet, Co-Founder at Lynn Legal.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
This is a real-life Machine Learning use case about integrated risk.
Speakers: Thomas Rengersen, Product Owner of the Governance Risk and Compliance Tool for Rabobank, and Thomas Alderse Baas, Co-Founder and Director of The Bowmen Group.
*ML in GRC 2021: Virtual Conference.
Airline Satisfaction Project using Azure
This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.
Amazon Aurora 클러스터를 초당 수백만 건의 쓰기 트랜잭션으로 확장하고 페타바이트 규모의 데이터를 관리할 수 있으며, 사용자 지정 애플리케이션 로직을 생성하거나 여러 데이터베이스를 관리할 필요 없이 Aurora에서 관계형 데이터베이스 워크로드를 단일 Aurora 라이터 인스턴스의 한도 이상으로 확장할 수 있는 Amazon Aurora Limitless Database를 소개합니다.
Amazon DocumentDB(MongoDB와 호환됨)는 빠르고 안정적이며 완전 관리형 데이터베이스 서비스입니다. Amazon DocumentDB를 사용하면 클라우드에서 MongoDB 호환 데이터베이스를 쉽게 설치, 운영 및 규모를 조정할 수 있습니다. Amazon DocumentDB를 사용하면 MongoDB에서 사용하는 것과 동일한 애플리케이션 코드를 실행하고 동일한 드라이버와 도구를 사용하는 것을 실습합니다.
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...javier ramirez
Los sistemas distribuidos son difíciles. Los sistemas distribuidos de alto rendimiento, más. Latencias de red, mensajes sin confirmación de recibo, reinicios de servidores, fallos de hardware, bugs en el software, releases problemáticas, timeouts... hay un montón de motivos por los que es muy difícil saber si un mensaje que has enviado se ha recibido y procesado correctamente en destino. Así que para asegurar mandas el mensaje otra vez.. y otra... y cruzas los dedos para que el sistema del otro lado tenga tolerancia a los duplicados.
QuestDB es una base de datos open source diseñada para alto rendimiento. Nos queríamos asegurar de poder ofrecer garantías de "exactly once", deduplicando mensajes en tiempo de ingestión. En esta charla, te cuento cómo diseñamos e implementamos la palabra clave DEDUP en QuestDB, permitiendo deduplicar y además permitiendo Upserts en datos en tiempo real, añadiendo solo un 8% de tiempo de proceso, incluso en flujos con millones de inserciones por segundo.
Además, explicaré nuestra arquitectura de log de escrituras (WAL) paralelo y multithread. Por supuesto, todo esto te lo cuento con demos, para que veas cómo funciona en la práctica.
3. #MLSEV 3
My Model Is Wonderful
• I trained a model on my data and it
seems really marvelous!
• How do you know for sure?
• To quantify your model’s
performance, you must evaluate it
• This is not optional. If you don’t
do this and do it right, you’ll have
problems
4. #MLSEV 4
Proper Evaluation
• Choosing the right metric
• Testing on the right data (which might be harder than you think)
• Replicating your tests
6. #MLSEV 6
Proper Evaluation
• The most basic workflow for model evaluation is:
• Split your data into two sets, training and testing
• Train a model on the training data
• Measure the “performance” of the model on the testing data
• If your training data is representative of what you will see in the future, that’s
the performance you should get out of your model
• What do we mean by “performance”? This is where you come in.
7. #MLSEV 7
Medical Testing Example
• Let’s say we develop an ML model that can
diagnose a disease
• About 1 in 1000 people who are tested by
the model turn out to have the disease
• Call the people who have the disease
“sick” and people who don’t have it “well”.
• How well do we do on a test set?
8. #MLSEV 8
Some Terminology
We’ll define the sick people as “positive” and the well people as “negative"
• “True Positive”: You’re sick and the model diagnosed you as sick
• “False Positive”: You’re well, but the model diagnosed you as sick
• “True Negative”: You’re well, and the model diagnosed you as well
• “False Negative”: You’re sick, but the model diagnosed you as well
The model is correct in the “true” cases, and incorrect in the “false” cases
9. #MLSEV 9
Accuracy
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Remember, only 1 in 1000 have the disease
• A silly model which always predicts “well” is 99.9% accurate
10. #MLSEV 10
Precision
Predicted “Well”
Predicted “Sick”
• How well did we do when we predicted
someone was sick?
• A test with high precision has few false
positives
• Precision of 1.0 indicates that everyone who
we predict is sick is actually sick
• What about people who we predict are well?
TP
TP + FP
= 0.6
Sick Person
Well Person
11. #MLSEV 11
Recall
Predicted “Well”
Predicted “Sick”
• How well did we do when someone was
actually sick?
• A test with high recall indicates few false
negatives
• Recall of 1.0 indicates that everyone who was
actually sick was correctly diagnosed
• But this doesn’t say anything about false
positives!
TP
TP + FN
= 0.75
Sick Person
Well Person
12. #MLSEV 12
Trade Offs
• We can “trivially maximize” both measures
• If you pick the sickest person and only label them sick and no one
else, you can probably get perfect precision
• If you label everyone sick, you are guaranteed perfect recall
• The unfortunate catch is that if you make one perfect, the
other is terrible, so you want a model that has both high
precision and recall
• This is what quantities like the F1 score and Phi
Coefficient try to do
13. #MLSEV 13
Cost Matrix
• In many cases, the consequences of a true
positive and a false positive are very different
• You can define “costs” for each type of mistake
• Total Cost = TP * TP_Cost + FP * FP_Cost
• Here, we are willing to accept lots of false
positives in exchange for high recall
• What if a positive diagnosis resulted in
expensive or painful treatment?
Classified
Sick
Classified
Well
Actually
Sick
0 100
Actually
Well
1 0
Cost matrix for medical
diagnosis problem
14. #MLSEV 14
Operating Thresholds
• Most classifiers don’t output a prediction. Instead they give a “score” for each
class
• The prediction you assign to an instance is usually a function of a threshold on
this score (e.g., if the score is over 0.5, predict true)
• You can experiment with an ROC curve to see how your metrics will change if
you change the threshold
• Lowering the threshold means you are more likely to predict the positive class, which improves
recall but introduces false positives
• Increasing the threshold means you predict the positive class less often (you are more “picky”),
which will probably increase precision but lower recall.
17. #MLSEV 17
Why Hold Out Data?
• Why do we split the dataset into training and testing sets? Why do we always
(always, always) test on data that the model training process did not see?
• Because machine learning algorithms are good at memorizing data
• We don’t care how well the model does on data it has already seen because it
probably won’t see that data again
• Holding out some of the test data is simulating the data the model will see in
the future
18. #MLSEV 18
Memorization
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
85 26,6 0,351 31 FALSE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Training Evaluating
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 ?
85 26,6 0,351 31 ?
• You don’t even need meaningful features;
the person’s name would be enough
• “Oh right, Bob. I know him. Yes, he
certainly has diabetes”
• As long as there are no duplicate names
in the dataset, it's a 100% accurate
model
19. #MLSEV 19
Well, That Was Easy
• Okay, so I’m not testing on the training
data, so I’m good, right? NO NO NO
• You also have to worry about information
leakage between training and test data.
• What is this? Let’s try to predict the daily
closing price of the stock market
• What happens if you hold out 10 random
days from your dataset?
• What if you hold out the last 10 days?
20. #MLSEV 20
Traps Everywhere!
• This is common when you have time-distributed
data, but can also happen in other instances:
• Let’s say we have a dataset of 10,000 pictures
from 20 people, each labeled with the year it which
it was taken
• We want to predict the year from the image
• What happens if we hold out random data?
• Solution: Hold out users instead
21. #MLSEV 21
How Do We Avoid This?
• It’s a terrible problem, because if you make the mistake you will get results
that are too good, and be inclined to believe them
• So be careful? Do you have:
• Data where points can be grouped in time (by week or by month)?
• Data where points can be grouped by user (each point is an action a user took)
• Data where points can be grouped by location (each point is a day of sales at a particular store)
• Even if you’re suspicious that points from the group might leak information to
one another, try a test where you hold out a few groups (months, users,
locations) and train on the rest
23. #MLSEV 23
One Test is Not Enough
• Even if you have a correct holdout, you still need to test more than once.
• Every result you get from any test is a result of randomness
• Randomness from the Data:
• The dataset you have is a finite number of points drawn from an infinite distribution
• The split you make between training and test data is done at random
• Randomness of the algorithm
• The ordering of the data might give different results
• The best performing algorithms (random forests, deepnets) have randomness built-in
• With just one result, you might get lucky
29. #MLSEV 29
Please, Sir, Can I Have Some More?
• Always do more than one test!
• For each test, try to vary all sources of
randomness that you can (change the seeds of all
random processes) to try to “experience” as much
variance as you can
• Cross-validation (stratifying is great, monte-carlo
can be a useful simplification)
• Don’t just average the results! The variance is
important!
30. #MLSEV 30
Summing Up
• Choose the metric that makes sense for
your problem
• Use held out data for testing and watch out
for information leakage
• Always do more than one test, varying all
sources of randomness that you have
control over!