Lec - 4

Lecture - 4
Model Evaluation and Improvement

Outline
Cross –validation
✓Benefits of cross validation
✓Stratified K-fold cross- validation
Grid search
✓Simple grid search
✓Grid search with cross validation
Evaluation metrics & scoring
✓Metrics for classification(Binary& multi- class)
✓Regression metrics
✓Using evaluation metrics in model selection
Introduction to Model Evaluation & Improvement
ML model is an algorithm trained on a data set to perform a specific

predictive task.
Model evaluation aims to define how well the model performs its task.
Model evaluation metrics are:
accuracy, precision, recall, F1 score, Area under Curve, Confusion

Matrix, Mean Square Error.
ML model evaluation ensures that production models’ performance is
Optimal & Reliable.
Introduction to Model Evaluation & Improvement
Model evaluation in ML is the process of determining a model’s performance

via a metrics-driven analysis.
It can be performed in two ways:
Offline: The model is evaluated after training during experimentation or

continuous retraining.
Online: The model is evaluated in production as part of model monitoring.
Model evaluation is performed both during experimentation and in production.

Cross –validation
▪ It is a statistical method used to estimate the skill of machine learning models.
▪ It is a technique used to evaluate the performance of a model on unseen data
▪ The main purpose of cross-validation is to prevent overfitting.
▪ It provides a more realistic estimate of the model’s generalization

performance, i.e., its ability to perform well on new, unseen data.
▪ It involves dividing the available data into multiple folds or subsets, using one
of these folds as a validation set, and training the model on the remaining folds.
Types of Cross –validation
Types of cross-validation are:
✓ k-fold cross-validation
✓ leave-one-out cross-validation
✓ Holdout validation
✓ Stratified Cross-Validation.
The choice of technique depends on the size and nature of the data, as
well as the specific requirements of the modeling problem.
4. K-Fold Cross Validation
That k-fold cross-validation is a procedure used to estimate the

skill of the model on new data.
The whole dataset is partitioned in k parts of equal size and each

partition is called a fold. where k can be any integer: 3,5, 10, etc.
There are commonly used variations on cross-validation such as

stratified and repeated that are available in scikit-learn.
2. LOOCV (Leave One Out Cross Validation)
Training on the whole dataset but leaves only one data point of the
available dataset and then iterates for each data point.
the model is trained on n-1 samples and tested on the one omitted
sample, repeating this process for each data point in the dataset.
Its advantage is low bias.
The major drawback:

✓ it leads to higher variation in the testing model
✓it takes a lot of execution time
1. Holdout Validation
✓Training and testing dataset are equal.
✓It’s a simple and quick way to evaluate a model.
✓It’s major drawback is dividing dataset into 50% for training and
50% for testing i.e. higher bias.
Types of Cross –validation (CV)
3. Stratified Cross-Validation
It ensure CV process maintains the same class distribution as the entire dataset.
This is particularly important when dealing with imbalanced datasets, where

certain classes may be underrepresented. In this method:
✓The dataset is divided into k folds
✓During each iteration, one fold is used for testing, and other for training.
✓The process is repeated k times, with each fold serving as the test set exactly once.
It is essential when dealing with classification problems.

Benefits of cross validation
Overcoming Overfitting: helps to prevent overfitting by providing a
more robust estimate of the model’s performance on unseen data.
Model Selection: used to compare different models and select the one
that performs the best on average.
Hyperparameter tuning: used to optimize the hyperparameters of a

model, such as the regularization parameter, by selecting the values that
result in the best performance on the validation set.
Data Efficient: allows the use of all the available data for both training
and validation.
Disadvantages of cross validation
Computationally Expensive: especially when the number of folds is
large or when the model is complex and requires a long time to train.
Time-Consuming: Cross-validation can be time-consuming, especially

when there are many hyperparameters to tune or when multiple models
need to be compared.
Bias-Variance Tradeoff: The choice of the number of folds in cross-

validation can impact the bias-variance tradeoff, i.e., too few folds may
result in high variance, while too many folds may result in high bias.
Grid search
▪ It is a model hyperparameter optimization technique.
▪ Machine learning models have hyperparameters.
▪ There are often general rules of thumb for configuring hyperparameters.
▪ A better approach is to objectively search different values for model

hyperparameters and choose the best performance achieved model.
▪ This is called hyperparameter optimization or hyperparameter tuning.
▪ common examples: Train-test split ratio, optimizer, gradient descent, Choice of

activation function, epochs, and Batch size, …
Grid search: Model Hyperparameter Optimization
▪ Hyperparameters are points of choice or configuration.
▪ Hyperparameter: Model configuration argument specified by the

developer to guide the learning process for a specific dataset.
▪ ML models also have parameters, which are the internal coefficients set
by training or optimizing the model on a training dataset.
▪ Parameters are learned automatically; hyperparameters are set

manually to help guide the learning process.
What is a Model Parameter?
A model parameter is a configuration variable that is internal to the

model and whose value can be estimated from data.
▪ They are required by the model when making predictions.
▪ The values define the skill of the model on your problem.
▪ They are learned from data.
▪ They are often not set manually by the practitioner.
▪ They are often saved as part of the learned model.
Grid search: Model Hyperparameter Optimization
A range of different optimization algorithms are used.
Random Search: Define a search space as a bounded domain of hyperparameter values

and randomly sample points in that domain.
Grid Search: Define a search space as a grid of hyperparameter values and evaluate
every position in the grid.
Grid search is great for spot-checking combinations that are known to perform well
generally.
Random search is great for discovering and getting hyperparameter combinations that
you would not have guessed intuitively, although it often requires more time to execute.
Evaluation metrics & scoring
▪ Metrics for classification(Binary& multi- class)
▪ Regression metrics
▪ Using evaluation metrics in model selection

Metrics for classification(Binary& multi- class)
There are lot of different types of metrics in the real world.

Most common metrics in classification problems are:
1. Accuracy: It defines how accurate your model is.
2. Precision (P): it is the ratio of correctly predicted positive observations to

the total predicted positive observations.
✓True Positives (TP) - the value of actual class is yes and the value of predicted class is yes.
✓True Negatives (TN) - the value of actual class is no and value of predicted class is also no.
✓False positives and false negatives, these values occur when your actual class
contradicts with the predicted class.
✓False Positives (FP) – When actual class is no and predicted class is yes.
✓False Negatives (FN) – When actual class is yes but predicted class in no.
Precision = TP / (TP + FP)
Accuracy Score = (TP + TN) / (TP + TN + FP + FN)
Approach To compute precision of multi class classification problem:
precision depends on true positives and false positives.
Macro averaged precision: calculate precision for all classes individually and then average them
Micro averaged precision: calculate class wise true positive and false positive and then use that to
calculate overall precision
3. Recall (Sensitivity): Recall is the ratio of correctly predicted positive
observations to all observations in actual class - yes.
Recall = TP / (TP + FN)
Approach To compute recall of multi-class classification problem
✓recall depends on true positives and false negatives.
Macro averaged recall: calculate recall for all classes individually and then
average them
Micro averaged recall: calculate class-wise true positive and false negative
and then use that to calculate overall recall
4. F1 score (F1): F1 Score is the weighted average of Precision and Recall.
Area under the ROC (Receiver Operating Characteristic) curve or simply

(AUC): it is a performance measurement for the classification problems at
various threshold settings.
ROC is a probability curve and AUC represents the degree or measure of

separability.
It tells how much the model is capable of distinguishing between classes.
The higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s.
Approach to compute AUC score of multi class classification problem:
✓ One vs All and confusion matrix
Reading Assignment
1. Binary class classification
2. Approach To compute binary class classification problem

Regression metrics
Regression refers to predictive modeling problems that involve predicting
a numeric value.
Mean Squared Error (MSE): it is a popular error metric for regression

problems.
MSE = 1 / N * sum for i to N (y_i – yhat_i)^2
Where y_i is the i’th expected value in the dataset and yhat_i is the i’th
predicted value.
Root Mean Squared Error (RMSE): it is an extension of the mean

squared error.
Regression metrics
RMSE = sqrt(1 / N * sum for i to N (y_i – yhat_i)^2)
Where y_i is the i’th expected value in the dataset, yhat_i is the i’th predicted
value, and sqrt() is the square root function.
Mean Absolute Error (MAE): it is a popular metric because, like RMSE, the
units of the error score match the units of the target value that is being
predicted.
MAE = 1 / N * sum for i to N abs(y_i – yhat_i)
Where y_i is the i’th expected value in the dataset, yhat_i is the i’th predicted
value and abs() is the absolute function.
Using evaluation metrics in model selection
▪ Model selection is the process of selecting the best model for a problem in
machine learning.
▪ It involves evaluating the model's performance against various parameters

and is crucial for improving model performance.
▪ Techniques like data pre-processing, feature engineering, and hyperparameter

tuning are used.
▪ Model complexity selection is a crucial step in the development lifecycle,

ensuring the model is accurate and generalizable to out-of-sample data.
Why is Model Selection important in Machine Learning?
▪ Model selection have a significant

Different Types of Machine
impact on the performance and Learning Models
accuracy of the model.
✓accurate and generalizable to new data
✓ prevent overfitting
✓ improving the final model's

interpretability
✓save computing resources and time

The revolution in robotics
• Cheap robots!!!
• Cheap sensors
• Moore’s law
Robotics and ML
 Areas that robots are used:

✓ Industrial robots
✓ Military, government and space robots
✓ Service robots for home, healthcare, laboratory
 Why are robots used?
✓ Dangerous tasks or in hazardous environments
✓ Repetitive tasks
✓ High precision tasks or those requiring high quality
✓ Labor savings
 Control technologies:
✓ Autonomous (self-controlled), tele-operated (remote control)
Industrial Robots
▪ Uses for robots in manufacturing:
• Welding
• Painting
• Cutting
• Dispensing
• Assembly
• Polishing/Finishing
• Material Handling
✓Packaging, Palletizing
✓Machine loading
Industrial Robots
❑Uses for robots in Industry/Manufacturing

• Automotive:
✓Video - Welding and handling of fuel tanks from TV
show “How It’s Made” on Discovery Channel.
• Packaging:
✓Video - Robots in food manufacturing.
Industrial Robots - Automotive
Industrial Robots - Computer
Military/Government Robots
• iRobot PackBot
 Remotec Andros
Military/Government Robots
Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.

Military Robots
• Aerial drones (UAV)  Military suit

Space Robots
• Mars Rovers – Spirit and Opportunity

• Autonomous navigation features with human remote
control and oversight
Service Robots
• Many uses…
• Cleaning & Housekeeping
• Humanitarian Demining
• Rehabilitation
• Inspection
• Agriculture & Harvesting
• Lawn Mowers
• Surveillance
• Mining Applications
• Construction
• Automatic Refilling
• Fire Fighters
• Search & Rescue
iRobot Roomba vacuum cleaner robot

Medical/Healthcare Applications
DaVinci surgical robot by Intuitive Surgical. Japanese health care assistant suit
St. Elizabeth Hospital is one of the local hospitals using this robot. You can
see this robot in person during an open house (website). (HAL - Hybrid Assistive Limb)
Also… Mind-
controlled wheelchair
using NI LabVIEW
Laboratory Applications
Drug discovery Test tube sorting

Beyond Code: Open Data, Papers
https://PapersWithCode.com
https://arXiv.org
https://www.anaconda.com/download/success
(choose based on your OS)
https://www.codeconvert.ai/free-converter
(it is used to convert one type of code to another)

Questions
1. How does ML affect information science?

2. Natural vs artificial learning – which is better?
3. Is ML needed in all problems?
4. What are the future directions of ML?
5. How to Speed-Up Hyperparameter Optimization?
6. How to configure random and grid search hyperparameter optimization
for classification tasks?
7. How to configure random and grid search hyperparameter optimization
for regression tasks?
8. What Is the Difference Between a Parameter and a Hyperparameter?
Questions
1. How to Use Best-Performing Hyperparameters?

2. How to Calculate Precision, Recall, and F-Measure for Imbalanced
Classification?
3. What distinguishes a binary from a multi-class classification model?

Lec - 4

Uploaded by

Copyright:

Available Formats

Lec - 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec - 4

Uploaded by

Copyright:

Available Formats

Lecture - 4

Model Evaluation and Improvement

ML model is an algorithm trained on a data set to perform a specific

Model evaluation metrics are:

accuracy, precision, recall, F1 score, Area under Curve, Confusion

Model evaluation in ML is the process of determining a model’s performance

It can be performed in two ways:

Offline: The model is evaluated after training during experimentation or

Online: The model is evaluated in production as part of model monitoring.

Model evaluation is performed both during experimentation and in production.

▪ It is a technique used to evaluate the performance of a model on unseen data

▪ The main purpose of cross-validation is to prevent overfitting.

▪ It provides a more realistic estimate of the model’s generalization

That k-fold cross-validation is a procedure used to estimate the

The whole dataset is partitioned in k parts of equal size and each

There are commonly used variations on cross-validation such as

Its advantage is low bias.

The major drawback:

This is particularly important when dealing with imbalanced datasets, where

It is essential when dealing with classification problems.

Hyperparameter tuning: used to optimize the hyperparameters of a

Time-Consuming: Cross-validation can be time-consuming, especially

Bias-Variance Tradeoff: The choice of the number of folds in cross-

▪ Machine learning models have hyperparameters.

▪ There are often general rules of thumb for configuring hyperparameters.

▪ A better approach is to objectively search different values for model

▪ This is called hyperparameter optimization or hyperparameter tuning.

▪ common examples: Train-test split ratio, optimizer, gradient descent, Choice of

▪ Hyperparameters are points of choice or configuration.

▪ Hyperparameter: Model configuration argument specified by the

▪ Parameters are learned automatically; hyperparameters are set

A model parameter is a configuration variable that is internal to the

Random Search: Define a search space as a bounded domain of hyperparameter values

▪ Metrics for classification(Binary& multi- class)

▪ Using evaluation metrics in model selection

There are lot of different types of metrics in the real world.

1. Accuracy: It defines how accurate your model is.

2. Precision (P): it is the ratio of correctly predicted positive observations to

Area under the ROC (Receiver Operating Characteristic) curve or simply

ROC is a probability curve and AUC represents the degree or measure of

It tells how much the model is capable of distinguishing between classes.

1. Binary class classification

2. Approach To compute binary class classification problem

Mean Squared Error (MSE): it is a popular error metric for regression

MSE = 1 / N * sum for i to N (y_i – yhat_i)^2

Root Mean Squared Error (RMSE): it is an extension of the mean

MAE = 1 / N * sum for i to N abs(y_i – yhat_i)

▪ It involves evaluating the model's performance against various parameters

▪ Techniques like data pre-processing, feature engineering, and hyperparameter

▪ Model complexity selection is a crucial step in the development lifecycle,

▪ Model selection have a significant

✓ improving the final model's

✓save computing resources and time

 Areas that robots are used:

❑Uses for robots in Industry/Manufacturing

Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.

• Aerial drones (UAV)  Military suit

• Mars Rovers – Spirit and Opportunity