Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lec - 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Lecture - 4

Model Evaluation and Improvement


Outline
Cross –validation
✓Benefits of cross validation
✓Stratified K-fold cross- validation
Grid search
✓Simple grid search
✓Grid search with cross validation
Evaluation metrics & scoring
✓Metrics for classification(Binary& multi- class)
✓Regression metrics
✓Using evaluation metrics in model selection
Introduction to Model Evaluation & Improvement

ML model is an algorithm trained on a data set to perform a specific


predictive task.

Model evaluation aims to define how well the model performs its task.

Model evaluation metrics are:

accuracy, precision, recall, F1 score, Area under Curve, Confusion


Matrix, Mean Square Error.
ML model evaluation ensures that production models’ performance is
Optimal & Reliable.
Introduction to Model Evaluation & Improvement

Model evaluation in ML is the process of determining a model’s performance


via a metrics-driven analysis.

It can be performed in two ways:

Offline: The model is evaluated after training during experimentation or


continuous retraining.

Online: The model is evaluated in production as part of model monitoring.

Model evaluation is performed both during experimentation and in production.


Cross –validation
▪ It is a statistical method used to estimate the skill of machine learning models.

▪ It is a technique used to evaluate the performance of a model on unseen data

▪ The main purpose of cross-validation is to prevent overfitting.

▪ It provides a more realistic estimate of the model’s generalization


performance, i.e., its ability to perform well on new, unseen data.

▪ It involves dividing the available data into multiple folds or subsets, using one
of these folds as a validation set, and training the model on the remaining folds.
Types of Cross –validation
Types of cross-validation are:
✓ k-fold cross-validation
✓ leave-one-out cross-validation
✓ Holdout validation
✓ Stratified Cross-Validation.

The choice of technique depends on the size and nature of the data, as
well as the specific requirements of the modeling problem.
Types of Cross –validation
4. K-Fold Cross Validation

That k-fold cross-validation is a procedure used to estimate the


skill of the model on new data.

The whole dataset is partitioned in k parts of equal size and each


partition is called a fold. where k can be any integer: 3,5, 10, etc.

There are commonly used variations on cross-validation such as


stratified and repeated that are available in scikit-learn.
Types of Cross –validation
2. LOOCV (Leave One Out Cross Validation)

Training on the whole dataset but leaves only one data point of the
available dataset and then iterates for each data point.

the model is trained on n-1 samples and tested on the one omitted
sample, repeating this process for each data point in the dataset.

Its advantage is low bias.

The major drawback:


✓ it leads to higher variation in the testing model
✓it takes a lot of execution time
Types of Cross –validation
1. Holdout Validation
✓Training and testing dataset are equal.
✓It’s a simple and quick way to evaluate a model.
✓It’s major drawback is dividing dataset into 50% for training and
50% for testing i.e. higher bias.
Types of Cross –validation (CV)
3. Stratified Cross-Validation

It ensure CV process maintains the same class distribution as the entire dataset.

This is particularly important when dealing with imbalanced datasets, where


certain classes may be underrepresented. In this method:
✓The dataset is divided into k folds

✓During each iteration, one fold is used for testing, and other for training.

✓The process is repeated k times, with each fold serving as the test set exactly once.

It is essential when dealing with classification problems.


Benefits of cross validation
Overcoming Overfitting: helps to prevent overfitting by providing a
more robust estimate of the model’s performance on unseen data.

Model Selection: used to compare different models and select the one
that performs the best on average.

Hyperparameter tuning: used to optimize the hyperparameters of a


model, such as the regularization parameter, by selecting the values that
result in the best performance on the validation set.

Data Efficient: allows the use of all the available data for both training
and validation.
Disadvantages of cross validation
Computationally Expensive: especially when the number of folds is
large or when the model is complex and requires a long time to train.

Time-Consuming: Cross-validation can be time-consuming, especially


when there are many hyperparameters to tune or when multiple models
need to be compared.

Bias-Variance Tradeoff: The choice of the number of folds in cross-


validation can impact the bias-variance tradeoff, i.e., too few folds may
result in high variance, while too many folds may result in high bias.
Grid search
▪ It is a model hyperparameter optimization technique.

▪ Machine learning models have hyperparameters.

▪ There are often general rules of thumb for configuring hyperparameters.

▪ A better approach is to objectively search different values for model


hyperparameters and choose the best performance achieved model.

▪ This is called hyperparameter optimization or hyperparameter tuning.

▪ common examples: Train-test split ratio, optimizer, gradient descent, Choice of


activation function, epochs, and Batch size, …
Grid search: Model Hyperparameter Optimization

▪ Hyperparameters are points of choice or configuration.

▪ Hyperparameter: Model configuration argument specified by the


developer to guide the learning process for a specific dataset.

▪ ML models also have parameters, which are the internal coefficients set
by training or optimizing the model on a training dataset.

▪ Parameters are learned automatically; hyperparameters are set


manually to help guide the learning process.
What is a Model Parameter?

A model parameter is a configuration variable that is internal to the


model and whose value can be estimated from data.
▪ They are required by the model when making predictions.
▪ The values define the skill of the model on your problem.
▪ They are learned from data.
▪ They are often not set manually by the practitioner.
▪ They are often saved as part of the learned model.
Grid search: Model Hyperparameter Optimization
A range of different optimization algorithms are used.

Random Search: Define a search space as a bounded domain of hyperparameter values


and randomly sample points in that domain.

Grid Search: Define a search space as a grid of hyperparameter values and evaluate
every position in the grid.

Grid search is great for spot-checking combinations that are known to perform well
generally.

Random search is great for discovering and getting hyperparameter combinations that
you would not have guessed intuitively, although it often requires more time to execute.
Evaluation metrics & scoring

▪ Metrics for classification(Binary& multi- class)

▪ Regression metrics

▪ Using evaluation metrics in model selection


Metrics for classification(Binary& multi- class)

There are lot of different types of metrics in the real world.


Most common metrics in classification problems are:

1. Accuracy: It defines how accurate your model is.

2. Precision (P): it is the ratio of correctly predicted positive observations to


the total predicted positive observations.
✓True Positives (TP) - the value of actual class is yes and the value of predicted class is yes.

✓True Negatives (TN) - the value of actual class is no and value of predicted class is also no.
Metrics for classification(Binary& multi- class)
✓False positives and false negatives, these values occur when your actual class
contradicts with the predicted class.
✓False Positives (FP) – When actual class is no and predicted class is yes.
✓False Negatives (FN) – When actual class is yes but predicted class in no.
Precision = TP / (TP + FP)
Accuracy Score = (TP + TN) / (TP + TN + FP + FN)
Approach To compute precision of multi class classification problem:
precision depends on true positives and false positives.
Macro averaged precision: calculate precision for all classes individually and then average them
Micro averaged precision: calculate class wise true positive and false positive and then use that to
calculate overall precision
Metrics for classification(Binary& multi- class)
3. Recall (Sensitivity): Recall is the ratio of correctly predicted positive
observations to all observations in actual class - yes.
Recall = TP / (TP + FN)
Approach To compute recall of multi-class classification problem
✓recall depends on true positives and false negatives.

Macro averaged recall: calculate recall for all classes individually and then
average them
Micro averaged recall: calculate class-wise true positive and false negative
and then use that to calculate overall recall
Metrics for classification(Binary& multi- class)
4. F1 score (F1): F1 Score is the weighted average of Precision and Recall.

Area under the ROC (Receiver Operating Characteristic) curve or simply


(AUC): it is a performance measurement for the classification problems at
various threshold settings.

ROC is a probability curve and AUC represents the degree or measure of


separability.

It tells how much the model is capable of distinguishing between classes.

The higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s.
Metrics for classification(Binary& multi- class)
Approach to compute AUC score of multi class classification problem:
✓ One vs All and confusion matrix
Metrics for classification(Binary& multi- class)

Reading Assignment

1. Binary class classification

2. Approach To compute binary class classification problem


Regression metrics
Regression refers to predictive modeling problems that involve predicting
a numeric value.

Mean Squared Error (MSE): it is a popular error metric for regression


problems.

MSE = 1 / N * sum for i to N (y_i – yhat_i)^2

Where y_i is the i’th expected value in the dataset and yhat_i is the i’th
predicted value.

Root Mean Squared Error (RMSE): it is an extension of the mean


squared error.
Regression metrics
RMSE = sqrt(1 / N * sum for i to N (y_i – yhat_i)^2)

Where y_i is the i’th expected value in the dataset, yhat_i is the i’th predicted
value, and sqrt() is the square root function.

Mean Absolute Error (MAE): it is a popular metric because, like RMSE, the
units of the error score match the units of the target value that is being
predicted.

MAE = 1 / N * sum for i to N abs(y_i – yhat_i)

Where y_i is the i’th expected value in the dataset, yhat_i is the i’th predicted
value and abs() is the absolute function.
Using evaluation metrics in model selection
▪ Model selection is the process of selecting the best model for a problem in
machine learning.

▪ It involves evaluating the model's performance against various parameters


and is crucial for improving model performance.

▪ Techniques like data pre-processing, feature engineering, and hyperparameter


tuning are used.

▪ Model complexity selection is a crucial step in the development lifecycle,


ensuring the model is accurate and generalizable to out-of-sample data.
Why is Model Selection important in Machine Learning?

▪ Model selection have a significant


Different Types of Machine
impact on the performance and Learning Models
accuracy of the model.
✓accurate and generalizable to new data

✓ prevent overfitting

✓ improving the final model's


interpretability

✓save computing resources and time


The revolution in robotics

• Cheap robots!!!
• Cheap sensors
• Moore’s law
Robotics and ML

 Areas that robots are used:


✓ Industrial robots
✓ Military, government and space robots
✓ Service robots for home, healthcare, laboratory
 Why are robots used?
✓ Dangerous tasks or in hazardous environments
✓ Repetitive tasks
✓ High precision tasks or those requiring high quality
✓ Labor savings
 Control technologies:
✓ Autonomous (self-controlled), tele-operated (remote control)
Industrial Robots
▪ Uses for robots in manufacturing:
• Welding
• Painting
• Cutting
• Dispensing
• Assembly
• Polishing/Finishing
• Material Handling
✓Packaging, Palletizing
✓Machine loading
Industrial Robots

❑Uses for robots in Industry/Manufacturing


• Automotive:
✓Video - Welding and handling of fuel tanks from TV
show “How It’s Made” on Discovery Channel.
• Packaging:
✓Video - Robots in food manufacturing.
Industrial Robots - Automotive
Industrial Robots - Computer
Military/Government Robots

• iRobot PackBot
 Remotec Andros
Military/Government Robots

Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.


Military Robots

• Aerial drones (UAV)  Military suit


Space Robots

• Mars Rovers – Spirit and Opportunity


• Autonomous navigation features with human remote
control and oversight
Service Robots

• Many uses…
• Cleaning & Housekeeping
• Humanitarian Demining
• Rehabilitation
• Inspection
• Agriculture & Harvesting
• Lawn Mowers
• Surveillance
• Mining Applications
• Construction
• Automatic Refilling
• Fire Fighters
• Search & Rescue

iRobot Roomba vacuum cleaner robot


Medical/Healthcare Applications

DaVinci surgical robot by Intuitive Surgical. Japanese health care assistant suit
St. Elizabeth Hospital is one of the local hospitals using this robot. You can
see this robot in person during an open house (website). (HAL - Hybrid Assistive Limb)

Also… Mind-
controlled wheelchair
using NI LabVIEW
Laboratory Applications

Drug discovery Test tube sorting


Beyond Code: Open Data, Papers

https://PapersWithCode.com

https://arXiv.org

https://www.anaconda.com/download/success

(choose based on your OS)

https://www.codeconvert.ai/free-converter

(it is used to convert one type of code to another)


Questions

1. How does ML affect information science?


2. Natural vs artificial learning – which is better?
3. Is ML needed in all problems?
4. What are the future directions of ML?
5. How to Speed-Up Hyperparameter Optimization?
6. How to configure random and grid search hyperparameter optimization
for classification tasks?
7. How to configure random and grid search hyperparameter optimization
for regression tasks?
8. What Is the Difference Between a Parameter and a Hyperparameter?
Questions

1. How to Use Best-Performing Hyperparameters?


2. How to Calculate Precision, Recall, and F-Measure for Imbalanced
Classification?
3. What distinguishes a binary from a multi-class classification model?

You might also like