1) MLOps is the process of maintaining machine learning models in production environments. It involves monitoring model performance over time and retraining models if needed due to data or concept drift.
2) The MLOps pipeline includes stages for data engineering, modelling, deployment, and monitoring. Key aspects are ensuring reproducibility, managing data processing pipelines, and defining deployment and monitoring strategies.
3) Successful MLOps requires automating model deployment, monitoring model and data metrics over time, and retraining models when performance degrades to keep models performing well as data evolves in production.
Report
Share
Report
Share
1 of 41
More Related Content
230208 MLOps Getting from Good to Great.pptx
1. Machine Learning Operations - MLOps
Getting from Good to Great
Michal Maciejewski, PhD
Acknowledgements: Dejan Golubovic, Ricardo Rocha, Christoph Obermair, Marek Grzenkowicz
2. 2
ML Model
X Y = f(X)
Let’s share our model with users aka let’s put it into production!
Alice:
Bob: Y = f(X)
3. What Has to Go Right?
3
What is needed for an ML model to perform well in production?
4. What Can Go Wrong?
Concept and data drifts are one of the main challenges of production ML systems!
4
5. 5
MLOps is about maintaining the trained model performance* in production.
The performance may degrade due to factors outside of our control
so we ought to monitor the performance and if needed, roll out a new model to
users.
*model performance = accuracy, latency, jitter, etc.
6. 6
ML Model = Data + Code
+ Algorithm
+ Weights
+ Hyperparameters
+ Scripts
+ Libraries
+ Infrastructure
+ DevOps
MLOps = ML Model + Software
7. 7
D. Sculley et. al. Hidden Technical Debt in Machine Learning Systems, NIPS 2015
Good news: most of these components come as ready-to-use frameworks
Configuration
Feature Extraction
ML Model
Data
Collection
Data
Verification
Machine
Resource
Management
Analysis Tools
Process
Management
Tools
Serving
Infrastructure
Monitoring
MLOps = ML Model + Software
Your
code
ML Framework
8. MLOps Pipeline
8
MLOps is a multi-stage, iterative process.
Data Engineering Modelling Deployment Monitoring
11. Exploratory Data Analysis
For structured data:
- schema as required
tables, columns and
datatypes
For unstructured data:
- resolution, image
extension
- frequency, duration,
audio codec
11
Initial exploration allows indetifying requirements for input data in produciton.
14. Keeping Track of Data Processing
• Version Input Data – DVC framework
• Version Processing Script - GitLab
• Version Computing Environment - Docker
14
Data Provenance – where does data come from?
Data Lineage – how data is manipulated?
15. Notebook Good Practices
• Linear flow of execution
• Little amount of code
• Extract reusable code into a package
• Pre-commit for cleaning notebook before
committing to a repository
• Set parameters on top so that notebook
can be treated as a function (papermill and
scrapbook packages)
15
It is OK, to do exploratory quick&dirty model development.
Once we start communicating the model outside, we need to clean it!
16. From Model-driven to Data-driven ML
16
Model-driven ML Data-driven ML
Fixed component Dataset Model Architecture
Variable component Model Architecture Dataset
Objective High accuracy Fairness, low bias
Explainability Limited Possible
https://datacentricai.org
https://spectrum.ieee.org/andrew-ng-data-centric-ai
18. Selecting Data for Training
18
Validation
20%
Training
80%
Dataset
train validate
Hyperparameter
tuning
With this approach, the model eventually sees the entire dataset.
19. Selecting Data for Training
19
Validation
15%
Training
75%
Dataset
train validate
Hyperparameter
tuning
Splitting dataset in three allows to perform a final check with unseen data.
Test
10%
test
Final
check
20. Balancing Datasets
Consider a binary classification problem with a dataset composed of 200 entries.
There are 160 negative examples (no failure) and 40 positive ones (failure).
20
Expected:
Random:
Validation
15%
(24+6)
Training
75%
(120 + 30)
Test
10%
(16+4)
Validation
15%
(19+11)
Training
75%
(131 + 19)
Test
10%
(10+10)
For continuous values it is important to preserve statistical distribution.
Although for big datasets it is not an issue, it is still a low-hanging-fruit.
21. Rare Events
21
C. Obermair, Extension of Signal Monitoring Applications with Machine Learning, Master Thesis, TU Graz
M. Brice, LHC tunnel Pictures during LS2, https://cds.cern.ch/images/CERN-PHOTO-201904-108-15
There were 3130 healthy signals (Y=False) and 112 faulty ones (Y=True)
22. 22
This naive model is guaranteed to achieve 97% average dataset accuracy?!
Rare Events
23. Rare Events
23
It is a valuable conversation to decide if precision or recall (or both) is more important.
Ground truth
Y = True Y = False
Model
Y = True 0
true positive
0
false positive
Y = False 112
false negative
3130
true negative
Precision =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
=
0
0
Avg accuracy =
TN
TN + FN
= 97%
Recall =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
=
0
0 + 112
= 0
F1score =
2
1/Precision + 1/Recall
24. Data Augmentation
24
JH. Kim et al. Hybrid Integration of Solid-State Quantum Emitters on a Silicon Photonic Chip, Nano Letters 2017
New examples obtained by
shifting the region left and right
New examples obtained by
rotating/shifting/hiding
25. What else can we do?
When one of the values of Y is rare in the population, considerable
resources in data collection can be saved by randomly selecting within
categories of Y. […]
The strategy is to select on Y by collecting observations (randomly or all
those available) for which Y = 1 (the "cases") and a random selection of
observations for which Y = 0 (the "controls").
25
G. King and L. Zeng, “Logistic Regression in Rare Events Data,” Political Analysis, p. 28, 2001.
https://en.wikipedia.org/wiki/Cross-validation_(statistics)
We can also collect more data of particular class (if even possible).
26. Training Tracking
1. Pen & Paper
2. Spreadsheet
3. Dedicated framework
- Weights and Biases
- Neptune.ai
- Tensorflow
- …
26
27. Error Analysis
Such analysis may reveal issues with labelling or rare classes in data.
For unstructured data, a cockpit could help in analysis.
Useful in monitoring of certain classes of inputs. 27
# Signal Noise Gap in signal Bias Wrong sampling
1 Magnet 1 x x
2 Magnet 2 x x
3 Magnet 3 x x
30. Degrees of Automation
30
C. Obermair, Extension of Signal Monitoring Applications with Machine Learning, Master Thesis, TU Graz
Human inspection Shadow mode
Human in
the loop
Full Automation
Starting from Shadow mode we can collect more training data in production!
32. Computing environment
(OS, Python, packages)
Reproducible Environments
32
Serverless compute
We will play with those during the exercise sessions!
ML Model
REST API
Data Pipeline
HTTP Server
Request Response
Docker Containers
Computing
infrastructure
KServe
Request Response
Pool of
models
Config
file
35. Relevant Metrics
• Model metrics
• Distribution of input features – data/concept drift
• Missing/malformed values in the input
• Average output accuracy/classification distribution – concept drift
• Infrastructure metrics
• Logging errors
• Memory, CPU resources utilization
• Latency and jitter
35
For each of the relevant metrics one should define warning/error thresholds.
38. MLOps Pipeline with Tensorflow
38
https://www.tensorflow.org/tfx/guide
Pipeline represented as DAG
directed acyclic graph
Data Engineering
Modelling
Deployment
39. 39
MLOps Pipeline with Kubeflow
https://ml.cern.ch
https://www.kubeflow.org/docs/started/
Data Engineering
Modelling
Deployment
40. I do hope the presented MLOps concepts will allow your models to transition
From Good to Great 40
Development ML Production ML
Objective High-accuracy model Efficiency of the overall system
Dataset Fixed Evolving
Code quality Secondary importance Critical
Model training Optimal tuning Fast turn-arounds
Reproducibility Secondary importance Critical
Traceability Secondary importance Critical
Conclusion
When one of the values of Y is rare in the population, considerable resources in datacollection can be saved by randomly selecting within categories of Y. This is known in
econometrics as choice-based or endogenous stratified sampling and in epidemiology as a
case-control design (Breslow 1996); it is also useful for choosing qualitative case studies
(King et al. 1994, Sect. 4.4.2). The strategy is to select on Y
by collecting observations
(randomly or all those available) for which Y
= 1 (the "cases") and a random selection of
observations for which Y = 0 (the "controls").