230208 MLOps Getting from Good to Great.pptx

Machine Learning Operations - MLOps
Getting from Good to Great
Michal Maciejewski, PhD
Acknowledgements: Dejan Golubovic, Ricardo Rocha, Christoph Obermair, Marek Grzenkowicz

2
ML Model
X Y = f(X)
Let’s share our model with users aka let’s put it into production!
Alice:
Bob: Y = f(X)

What Has to Go Right?
3
What is needed for an ML model to perform well in production?

What Can Go Wrong?
Concept and data drifts are one of the main challenges of production ML systems!
4

5
MLOps is about maintaining the trained model performance* in production.
The performance may degrade due to factors outside of our control
so we ought to monitor the performance and if needed, roll out a new model to
users.
*model performance = accuracy, latency, jitter, etc.

6
ML Model = Data + Code
+ Algorithm
+ Weights
+ Hyperparameters
+ Scripts
+ Libraries
+ Infrastructure
+ DevOps
MLOps = ML Model + Software

7
D. Sculley et. al. Hidden Technical Debt in Machine Learning Systems, NIPS 2015
Good news: most of these components come as ready-to-use frameworks
Configuration
Feature Extraction
ML Model
Data
Collection
Data
Verification
Machine
Resource
Management
Analysis Tools
Process
Management
Tools
Serving
Infrastructure
Monitoring
MLOps = ML Model + Software
Your
code
ML Framework

MLOps Pipeline
8
MLOps is a multi-stage, iterative process.
Data Engineering Modelling Deployment Monitoring

Data Engineering
9
Reproducibility
Traceability
Data-driven ML

Exploratory Data Analysis
For structured data:
- schema as required
tables, columns and
datatypes
For unstructured data:
- resolution, image
extension
- frequency, duration,
audio codec
11
Initial exploration allows indetifying requirements for input data in produciton.

Data Processing Pipeline
12
Data
Ingestion
Data
Validation
Data
Cleaning
Feature
Engineering
• Filling NaNs
• Filtering
• Normalization
• Standarization
• Schema check
• Audio/video file
check
• Feature selection
• Feature crossover
• Load from file
• Load from db
We need to reproduce some of those steps (e.g. subtracting mean) in production!

https://sites.google.com/princeton.edu/rep-workshop/
Reproducibility
13
Dataset
Notebooks
Various
scripts
Excel
spreadsheets
Curated dataset

Keeping Track of Data Processing
• Version Input Data – DVC framework
• Version Processing Script - GitLab
• Version Computing Environment - Docker
14
Data Provenance – where does data come from?
Data Lineage – how data is manipulated?

Notebook Good Practices
• Linear flow of execution
• Little amount of code
• Extract reusable code into a package
• Pre-commit for cleaning notebook before
committing to a repository
• Set parameters on top so that notebook
can be treated as a function (papermill and
scrapbook packages)
15
It is OK, to do exploratory quick&dirty model development.
Once we start communicating the model outside, we need to clean it!

From Model-driven to Data-driven ML
16
Model-driven ML Data-driven ML
Fixed component Dataset Model Architecture
Variable component Model Architecture Dataset
Objective High accuracy Fairness, low bias
Explainability Limited Possible
https://datacentricai.org
https://spectrum.ieee.org/andrew-ng-data-centric-ai

Modelling
17
Training challenges
Rare events
Analyzing results

Selecting Data for Training
18
Validation
20%
Training
80%
Dataset
train validate
Hyperparameter
tuning
With this approach, the model eventually sees the entire dataset.

Selecting Data for Training
19
Validation
15%
Training
75%
Dataset
train validate
Hyperparameter
tuning
Splitting dataset in three allows to perform a final check with unseen data.
Test
10%
test
Final
check

Balancing Datasets
Consider a binary classification problem with a dataset composed of 200 entries.
There are 160 negative examples (no failure) and 40 positive ones (failure).
20
Expected:
Random:
Validation
15%
(24+6)
Training
75%
(120 + 30)
Test
10%
(16+4)
Validation
15%
(19+11)
Training
75%
(131 + 19)
Test
10%
(10+10)
For continuous values it is important to preserve statistical distribution.
Although for big datasets it is not an issue, it is still a low-hanging-fruit.

Rare Events
21
C. Obermair, Extension of Signal Monitoring Applications with Machine Learning, Master Thesis, TU Graz
M. Brice, LHC tunnel Pictures during LS2, https://cds.cern.ch/images/CERN-PHOTO-201904-108-15
There were 3130 healthy signals (Y=False) and 112 faulty ones (Y=True)

22
This naive model is guaranteed to achieve 97% average dataset accuracy?!
Rare Events

Rare Events
23
It is a valuable conversation to decide if precision or recall (or both) is more important.
Ground truth
Y = True Y = False
Model
Y = True 0
true positive
0
false positive
Y = False 112
false negative
3130
true negative
Precision =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
=
0
0
Avg accuracy =
TN
TN + FN
= 97%
Recall =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
=
0
0 + 112
= 0
F1score =
2
1/Precision + 1/Recall

Data Augmentation
24
JH. Kim et al. Hybrid Integration of Solid-State Quantum Emitters on a Silicon Photonic Chip, Nano Letters 2017
New examples obtained by
shifting the region left and right
New examples obtained by
rotating/shifting/hiding

What else can we do?
When one of the values of Y is rare in the population, considerable
resources in data collection can be saved by randomly selecting within
categories of Y. […]
The strategy is to select on Y by collecting observations (randomly or all
those available) for which Y = 1 (the "cases") and a random selection of
observations for which Y = 0 (the "controls").
25
G. King and L. Zeng, “Logistic Regression in Rare Events Data,” Political Analysis, p. 28, 2001.
https://en.wikipedia.org/wiki/Cross-validation_(statistics)
We can also collect more data of particular class (if even possible).

Training Tracking
1. Pen & Paper
2. Spreadsheet
3. Dedicated framework
- Weights and Biases
- Neptune.ai
- Tensorflow
- …
26

Error Analysis
Such analysis may reveal issues with labelling or rare classes in data.
For unstructured data, a cockpit could help in analysis.
Useful in monitoring of certain classes of inputs. 27
# Signal Noise Gap in signal Bias Wrong sampling
1 Magnet 1 x x
2 Magnet 2 x x
3 Magnet 3 x x

Deployment
29
Degrees of automation
Modes of deployment
Reproducible environments

Degrees of Automation
30
Human inspection Shadow mode
Human in
the loop
Full Automation
Starting from Shadow mode we can collect more training data in production!

Modes of Deployment
31
https://hbr.org/2017/09/the-surprising-power-of-online-experiments
https://en.wikipedia.org/wiki/Blue-winged_parrot
Router
Old version
New version
X%
- In Canary deployment there is a gradual switch between versions
- In Blue/green deployment there is an on/off switch between versions
100-X%

Computing environment
(OS, Python, packages)
Reproducible Environments
32
Serverless compute
We will play with those during the exercise sessions!
ML Model
REST API
Data Pipeline
HTTP Server
Request Response
Docker Containers
Computing
infrastructure
KServe
Request Response
Pool of
models
Config
file

Monitoring
33
Useful metrics
Relevant frameworks

Relevant Metrics
• Model metrics
• Distribution of input features – data/concept drift
• Missing/malformed values in the input
• Average output accuracy/classification distribution – concept drift
• Infrastructure metrics
• Logging errors
• Memory, CPU resources utilization
• Latency and jitter
35
For each of the relevant metrics one should define warning/error thresholds.

Monitoring Matters
36

37

MLOps Pipeline with Tensorflow
38
https://www.tensorflow.org/tfx/guide
Pipeline represented as DAG
directed acyclic graph
Data Engineering
Modelling
Deployment

39
MLOps Pipeline with Kubeflow
https://ml.cern.ch
https://www.kubeflow.org/docs/started/
Data Engineering
Modelling
Deployment

I do hope the presented MLOps concepts will allow your models to transition
From Good to Great 40
Development ML Production ML
Objective High-accuracy model Efficiency of the overall system
Dataset Fixed Evolving
Code quality Secondary importance Critical
Model training Optimal tuning Fast turn-arounds
Reproducibility Secondary importance Critical
Traceability Secondary importance Critical
Conclusion

Resources
41
Machine Learning Engineering for Production (MLOps) Specialization

230208 MLOps Getting from Good to Great.pptx

More Related Content

230208 MLOps Getting from Good to Great.pptx

Editor's Notes