Based on the prediction of house prices, this tutorial covers the reasons and the advantages of versioning machine learning models using the open-source platform MLflow. This platform allows to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. Participants will gain insights into the initial setup, the tracking of ML experiments and the final registration of a production ready model.
Github: https://github.com/pdanninger/dsc_dach.git
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using MLflow - Philipp Danninger
1. DSC DACH 23 ╳
Go with the flow
Track your machine learning lifecycle using
2. Agenda
1. Introduction to Parkside Interactive
2. Why track your ML lifecycle?
3. What is MLflow?
4. Tutorial
5. Recap
3. We Work Globally and
Have Offices in GrazHQ
,
Linz, Vienna, Porto &
San Francisco.
PORTO
GRAZHQ
VIENNA
LINZ
SAN FRANCISCO
4. Software Development
Frontend Development / Mobile
Development / Backend & API
Development / Platform
Development
Research & Testing
User Research & Testing /
Usability Testing / Expert Reviews /
UX Strategy & Measurement /
Market & Product Research
Design & Tech
Consulting
Product Thinking / Proof of
Concept / Product & Business
Analysis / Requirements
Engineering / Discovery
Workshops
Agile Methodologies
Scrum / Kanban / SAFe / LESS /
Agile Contracting
Product Development
Product Management &
Ownership / Custom Software
Development / UX Design / Proof
of Concept & Prototyping / MVP
Software- &
System-Architecture
Microservices / Cloud Native
Technologies / Scalability / Database
Schema / Containerization / Client
Architecture
Support, Maintenance
& Optimization
A/B Testing / Performance
Analysis / Conversion
Optimization / Release
Management / SLA
User Experience & Design
Accessibility & Inclusive Design /
UX Writing / Information Architecture /
User Interface Design / Design
Systems / Digital Branding
01 02 03
05 06 07
09 10
Quality Assurance
& Testing
Strategy / Management /
Automation / Execution
DevOps
Cloud Native / Continuous
Integration / Continuous
Deployment / Continuous
Delivery
04
08
Parkside offers customized services across
the entire digital product lifecycle.
12
Data Science & AI
Use-Case Identification
Data Analysis and POC phase
ML model development and
validation
Bringing models into production
13
Cyber Security
White Box Audits / Secure
Software Development Lifecycle /
Threat Modelling / SAST / SCA
4
5. DATA SCIENTIST
Philipp Danninger
Based in Graz, Austria
MSc in Economics
MSc in Data & Information Science
danninger@parkside-interactive.com
www.linkedin.com/in/philipp-danninger
6. Why track the lifecycle of ML applications?
What are ingress
channels?
How often do you receive
new data?
Data ingress
Deployment method &
workflow?
What is the final model’s
version?
Dependencies?
Model deployment &
maintenance
Data imputations?
Sample sizes?
Split size?
Data leakage?
Training set creation
Choice of algorithm?
Hyperparameters?
Cross-validation?
Performance metrics?
Model training &
evaluation
What parts of the data
are usable?
How to store the data?
Data cleaning
Can you answer all the
questions after some
time has passed?
Store all components
for reproducibility in
one place → MLflow
7. What is MLflow?
● Open source platform for machine
learning lifecycle
● 4 components
○ MLflow Tracking
○ MLflow Projects
○ MLflow Models
○ MLflow Registry
8. Some MLflow Tracking basics
● MLflow Tracking is organized around the concept of runs, which are executions of
some piece of data science code
● You can optionally organize runs into experiments, which group together runs for a
specific task
● Code version (git commit), start & end time, source, parameters, metrics, artifacts
● Store runs locally, to SQLAlchemy compatible database or remote tracking server
● Once your runs have been recorded, you can query them using the Tracking UI or the
MLflow API
9. Tutorial - What are we going to cover?
ML problem: Prediction of California house prices
Creation of ML pipeline
MLflow:
● Experiments - different algorithms
● MLflow runs for experiments
● Tracking of artifacts, performance metrics &
hyperparameters
● Registration of models
● Prediction using registered models
Repo: https://github.com/pdanninger/dsc_dach.git
10. Recap
● Machine learning use-cases pose individual challenges
● A transparent documentation of your approach saves a lot of time
● MLflow provides simple but powerful capabilities to document and structure
your approach for reproducibility
● MLflow allows you to structure the deployment of your machine learning
models