Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
DSC DACH 23 ╳
Go with the flow
Track your machine learning lifecycle using
Agenda
1. Introduction to Parkside Interactive
2. Why track your ML lifecycle?
3. What is MLflow?
4. Tutorial
5. Recap
We Work Globally and
Have Offices in GrazHQ
,
Linz, Vienna, Porto &
San Francisco.
PORTO
GRAZHQ
VIENNA
LINZ
SAN FRANCISCO
Software Development
Frontend Development / Mobile
Development / Backend & API
Development / Platform
Development
Research & Testing
User Research & Testing /
Usability Testing / Expert Reviews /
UX Strategy & Measurement /
Market & Product Research
Design & Tech
Consulting
Product Thinking / Proof of
Concept / Product & Business
Analysis / Requirements
Engineering / Discovery
Workshops
Agile Methodologies
Scrum / Kanban / SAFe / LESS /
Agile Contracting
Product Development
Product Management &
Ownership / Custom Software
Development / UX Design / Proof
of Concept & Prototyping / MVP
Software- &
System-Architecture
Microservices / Cloud Native
Technologies / Scalability / Database
Schema / Containerization / Client
Architecture
Support, Maintenance
& Optimization
A/B Testing / Performance
Analysis / Conversion
Optimization / Release
Management / SLA
User Experience & Design
Accessibility & Inclusive Design /
UX Writing / Information Architecture /
User Interface Design / Design
Systems / Digital Branding
01 02 03
05 06 07
09 10
Quality Assurance
& Testing
Strategy / Management /
Automation / Execution
DevOps
Cloud Native / Continuous
Integration / Continuous
Deployment / Continuous
Delivery
04
08
Parkside offers customized services across
the entire digital product lifecycle.
12
Data Science & AI
Use-Case Identification
Data Analysis and POC phase
ML model development and
validation
Bringing models into production
13
Cyber Security
White Box Audits / Secure
Software Development Lifecycle /
Threat Modelling / SAST / SCA
4
DATA SCIENTIST
Philipp Danninger
Based in Graz, Austria
MSc in Economics
MSc in Data & Information Science
danninger@parkside-interactive.com
www.linkedin.com/in/philipp-danninger
Why track the lifecycle of ML applications?
What are ingress
channels?
How often do you receive
new data?
Data ingress
Deployment method &
workflow?
What is the final model’s
version?
Dependencies?
Model deployment &
maintenance
Data imputations?
Sample sizes?
Split size?
Data leakage?
Training set creation
Choice of algorithm?
Hyperparameters?
Cross-validation?
Performance metrics?
Model training &
evaluation
What parts of the data
are usable?
How to store the data?
Data cleaning
Can you answer all the
questions after some
time has passed?
Store all components
for reproducibility in
one place → MLflow
What is MLflow?
● Open source platform for machine
learning lifecycle
● 4 components
○ MLflow Tracking
○ MLflow Projects
○ MLflow Models
○ MLflow Registry
Some MLflow Tracking basics
● MLflow Tracking is organized around the concept of runs, which are executions of
some piece of data science code
● You can optionally organize runs into experiments, which group together runs for a
specific task
● Code version (git commit), start & end time, source, parameters, metrics, artifacts
● Store runs locally, to SQLAlchemy compatible database or remote tracking server
● Once your runs have been recorded, you can query them using the Tracking UI or the
MLflow API
Tutorial - What are we going to cover?
ML problem: Prediction of California house prices
Creation of ML pipeline
MLflow:
● Experiments - different algorithms
● MLflow runs for experiments
● Tracking of artifacts, performance metrics &
hyperparameters
● Registration of models
● Prediction using registered models
Repo: https://github.com/pdanninger/dsc_dach.git
Recap
● Machine learning use-cases pose individual challenges
● A transparent documentation of your approach saves a lot of time
● MLflow provides simple but powerful capabilities to document and structure
your approach for reproducibility
● MLflow allows you to structure the deployment of your machine learning
models
parkside-interactive.com

More Related Content

[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using MLflow - Philipp Danninger

  • 1. DSC DACH 23 ╳ Go with the flow Track your machine learning lifecycle using
  • 2. Agenda 1. Introduction to Parkside Interactive 2. Why track your ML lifecycle? 3. What is MLflow? 4. Tutorial 5. Recap
  • 3. We Work Globally and Have Offices in GrazHQ , Linz, Vienna, Porto & San Francisco. PORTO GRAZHQ VIENNA LINZ SAN FRANCISCO
  • 4. Software Development Frontend Development / Mobile Development / Backend & API Development / Platform Development Research & Testing User Research & Testing / Usability Testing / Expert Reviews / UX Strategy & Measurement / Market & Product Research Design & Tech Consulting Product Thinking / Proof of Concept / Product & Business Analysis / Requirements Engineering / Discovery Workshops Agile Methodologies Scrum / Kanban / SAFe / LESS / Agile Contracting Product Development Product Management & Ownership / Custom Software Development / UX Design / Proof of Concept & Prototyping / MVP Software- & System-Architecture Microservices / Cloud Native Technologies / Scalability / Database Schema / Containerization / Client Architecture Support, Maintenance & Optimization A/B Testing / Performance Analysis / Conversion Optimization / Release Management / SLA User Experience & Design Accessibility & Inclusive Design / UX Writing / Information Architecture / User Interface Design / Design Systems / Digital Branding 01 02 03 05 06 07 09 10 Quality Assurance & Testing Strategy / Management / Automation / Execution DevOps Cloud Native / Continuous Integration / Continuous Deployment / Continuous Delivery 04 08 Parkside offers customized services across the entire digital product lifecycle. 12 Data Science & AI Use-Case Identification Data Analysis and POC phase ML model development and validation Bringing models into production 13 Cyber Security White Box Audits / Secure Software Development Lifecycle / Threat Modelling / SAST / SCA 4
  • 5. DATA SCIENTIST Philipp Danninger Based in Graz, Austria MSc in Economics MSc in Data & Information Science danninger@parkside-interactive.com www.linkedin.com/in/philipp-danninger
  • 6. Why track the lifecycle of ML applications? What are ingress channels? How often do you receive new data? Data ingress Deployment method & workflow? What is the final model’s version? Dependencies? Model deployment & maintenance Data imputations? Sample sizes? Split size? Data leakage? Training set creation Choice of algorithm? Hyperparameters? Cross-validation? Performance metrics? Model training & evaluation What parts of the data are usable? How to store the data? Data cleaning Can you answer all the questions after some time has passed? Store all components for reproducibility in one place → MLflow
  • 7. What is MLflow? ● Open source platform for machine learning lifecycle ● 4 components ○ MLflow Tracking ○ MLflow Projects ○ MLflow Models ○ MLflow Registry
  • 8. Some MLflow Tracking basics ● MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code ● You can optionally organize runs into experiments, which group together runs for a specific task ● Code version (git commit), start & end time, source, parameters, metrics, artifacts ● Store runs locally, to SQLAlchemy compatible database or remote tracking server ● Once your runs have been recorded, you can query them using the Tracking UI or the MLflow API
  • 9. Tutorial - What are we going to cover? ML problem: Prediction of California house prices Creation of ML pipeline MLflow: ● Experiments - different algorithms ● MLflow runs for experiments ● Tracking of artifacts, performance metrics & hyperparameters ● Registration of models ● Prediction using registered models Repo: https://github.com/pdanninger/dsc_dach.git
  • 10. Recap ● Machine learning use-cases pose individual challenges ● A transparent documentation of your approach saves a lot of time ● MLflow provides simple but powerful capabilities to document and structure your approach for reproducibility ● MLflow allows you to structure the deployment of your machine learning models