Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
DevOps & MLOps -
The Same But Different?
@mattreyuk
Agenda
2
● What is machine learning
● What’s the same as regular service development
● What’s different (and can go wrong)
● Building ML teams and their place in the company
● How we do ML at Ibotta
● The future
Background
3
● I'm Matt Reynolds, a principal platform engineer on the
machine learning team at Ibotta
● Ibotta is a rewarded shopping company with mobile, web
and white label platform components
● Based here in Denver but now fully "remote-friendly"
● We're hiring - https://home.ibotta.com/work-with-us/careers/
What Is ML?
4
Machine learning (ML) A program or system that builds
(trains) a predictive model from input data. The system uses
the learned model to make useful predictions from new
(never-before-seen) data drawn from the same distribution
as the one used to train the model.
https://developers.google.com/machine-learning/glossary#machine-learning
Types Of ML
5
● “Analytical” ML
One off, exploratory, findings used in reports to
management
● “Engineering” ML
Models deployed to production, called by services
What’s The Same For ML?
6
● Frameworks/libraries/tools
● Git/PRs for code
● CI/CD - process automation for repeatability
● Provide a service in production
● Service monitoring*
Some Companies Struggle…
7
55% of companies surveyed did not have a model in production
https://info.algorithmia.com/hubfs/2019/Whitepapers/The-State-of-Enterprise-ML-2020/Algo
rithmia_2020_State_of_Enterprise_ML.pdf
87% of data science projects don’t make it to production
https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into
-production/
What’s Different?
8
● DATA
Data
9
https://medium.com/hackernoon/the-ai-hierarchy-of-needs-18f111fcc007
Exploratory Data Analysis (EDA)
10
New Models require:
● Finding data sources that may be suitable
● Checking Data Quality, distribution
● Figuring out label generation
● Building initial Features
● Testing with algorithm(s)
● Validating results and tuning
What’s Different?
11
● DATA
● People
People
12
“Data Scientists” have different skill sets:
● Have their own jargon
● May not be used to writing “production ready” code
● May not be used to being on-call, production support
● Mostly work in Python
What’s Different?
13
● DATA
● People
● Different tools
Different Tools
14
As well as the tooling to run a “regular” service, you also need:
● Data pipeline
● Feature engineering
● Feature store
● Training & hyperparameter tuning infrastructure
● Maybe specialized inference hardware (GPU)
● Inference monitoring (data drift)
Jupyter Notebooks
15
https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb
Jupyter Notebooks
16
https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb
{
"cell_type": "code",
"source": "from matplotlib import pyplot as pltnimport numpy as npnn# Generate 100 random data
points along 3 dimensionsnx, y, scale = np.random.randn(3, 100)nfig, ax = plt.subplots()nn# Map each
onto a scatterplot we'll create with Matplotlibnax.scatter(x=x, y=y, c=scale,
s=np.abs(scale)*500)nax.set(title="Some random data, created with JupyterLab!")nplt.show()",
"metadata": {
"trusted": true
},
"execution_count": 1,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png":
"iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR
0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAA9hAAAPYQGoP6dpAADYYUlEQVR4nOzdd3wcxfn48c/sXlMvlmTJslzl3rE
dgwvYxlRTDHFoScD0JEBCKAlOAgECIaRQvkBov9BJAIPpmG5sgzu4V7nJRbJ6l67tzu+Pk2Sf1U7S3anNOy+9gvf2ZubqPjflGSGllCi
KoiiKoig9htbRDVAURVEURVHCSwWAiqIoiqIoPYwKABVFURR…
Ideal Team Composition
17
● Fighter (Software Engineering)
● Cleric (Data Engineering)
● Wizard (Data Science)
● Rogue (Ops/Infrastructure)
How Can You Help?
18
● Take some time to learn the lay of the land
● Look for pain points - local dev, process automation
● Make suggestions, listen to feedback
● Jump in and learn the ropes
● Work from the more “engineering” side to the more “ML”
● Teach what you do and learn what they do
● Encourage collaboration, standardization
● Explain why
ML In The Larger Organization
19
● Need to work with Data/Analytics & Engineering orgs
● Involve product
● Advocate for big picture concerns like:
● Data catalog, more metadata
● Data quality
● More (timely) data - events from engineering services
Our Process - Data & Training
20
● Airflow for job orchestration
● PySpark for Data transformation
● Sagemaker for managing training, hyperparameter jobs
● Local dev with Docker for Airflow DAGs
● Jupyter notebooks for EDA and troubleshooting
Our Process - Inference
21
● Sagemaker Endpoints using docker images built on top of
AWS supplied bases
● Postgres DB for storing real time features
● All behind API gateway for consistent API
● Lambda for A/B test, model aggregation
● Local dev with Docker for Inference test along with Jupyter
notebooks, test integration in staging environment
The Future
22
● How to scale
● Quality monitoring
● More real-time feature generation
● Serverless inference
23
Thank you!
@mattreyuk

More Related Content

DevOps Days Rockies MLOps

  • 1. DevOps & MLOps - The Same But Different? @mattreyuk
  • 2. Agenda 2 ● What is machine learning ● What’s the same as regular service development ● What’s different (and can go wrong) ● Building ML teams and their place in the company ● How we do ML at Ibotta ● The future
  • 3. Background 3 ● I'm Matt Reynolds, a principal platform engineer on the machine learning team at Ibotta ● Ibotta is a rewarded shopping company with mobile, web and white label platform components ● Based here in Denver but now fully "remote-friendly" ● We're hiring - https://home.ibotta.com/work-with-us/careers/
  • 4. What Is ML? 4 Machine learning (ML) A program or system that builds (trains) a predictive model from input data. The system uses the learned model to make useful predictions from new (never-before-seen) data drawn from the same distribution as the one used to train the model. https://developers.google.com/machine-learning/glossary#machine-learning
  • 5. Types Of ML 5 ● “Analytical” ML One off, exploratory, findings used in reports to management ● “Engineering” ML Models deployed to production, called by services
  • 6. What’s The Same For ML? 6 ● Frameworks/libraries/tools ● Git/PRs for code ● CI/CD - process automation for repeatability ● Provide a service in production ● Service monitoring*
  • 7. Some Companies Struggle… 7 55% of companies surveyed did not have a model in production https://info.algorithmia.com/hubfs/2019/Whitepapers/The-State-of-Enterprise-ML-2020/Algo rithmia_2020_State_of_Enterprise_ML.pdf 87% of data science projects don’t make it to production https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into -production/
  • 10. Exploratory Data Analysis (EDA) 10 New Models require: ● Finding data sources that may be suitable ● Checking Data Quality, distribution ● Figuring out label generation ● Building initial Features ● Testing with algorithm(s) ● Validating results and tuning
  • 12. People 12 “Data Scientists” have different skill sets: ● Have their own jargon ● May not be used to writing “production ready” code ● May not be used to being on-call, production support ● Mostly work in Python
  • 13. What’s Different? 13 ● DATA ● People ● Different tools
  • 14. Different Tools 14 As well as the tooling to run a “regular” service, you also need: ● Data pipeline ● Feature engineering ● Feature store ● Training & hyperparameter tuning infrastructure ● Maybe specialized inference hardware (GPU) ● Inference monitoring (data drift)
  • 16. Jupyter Notebooks 16 https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb { "cell_type": "code", "source": "from matplotlib import pyplot as pltnimport numpy as npnn# Generate 100 random data points along 3 dimensionsnx, y, scale = np.random.randn(3, 100)nfig, ax = plt.subplots()nn# Map each onto a scatterplot we'll create with Matplotlibnax.scatter(x=x, y=y, c=scale, s=np.abs(scale)*500)nax.set(title="Some random data, created with JupyterLab!")nplt.show()", "metadata": { "trusted": true }, "execution_count": 1, "outputs": [ { "output_type": "display_data", "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR 0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAA9hAAAPYQGoP6dpAADYYUlEQVR4nOzdd3wcxfn48c/sXlMvlmTJslzl3rE dgwvYxlRTDHFoScD0JEBCKAlOAgECIaRQvkBov9BJAIPpmG5sgzu4V7nJRbJ6l67tzu+Pk2Sf1U7S3anNOy+9gvf2ZubqPjflGSGllCi KoiiKoig9htbRDVAURVEURVHCSwWAiqIoiqIoPYwKABVFURR…
  • 17. Ideal Team Composition 17 ● Fighter (Software Engineering) ● Cleric (Data Engineering) ● Wizard (Data Science) ● Rogue (Ops/Infrastructure)
  • 18. How Can You Help? 18 ● Take some time to learn the lay of the land ● Look for pain points - local dev, process automation ● Make suggestions, listen to feedback ● Jump in and learn the ropes ● Work from the more “engineering” side to the more “ML” ● Teach what you do and learn what they do ● Encourage collaboration, standardization ● Explain why
  • 19. ML In The Larger Organization 19 ● Need to work with Data/Analytics & Engineering orgs ● Involve product ● Advocate for big picture concerns like: ● Data catalog, more metadata ● Data quality ● More (timely) data - events from engineering services
  • 20. Our Process - Data & Training 20 ● Airflow for job orchestration ● PySpark for Data transformation ● Sagemaker for managing training, hyperparameter jobs ● Local dev with Docker for Airflow DAGs ● Jupyter notebooks for EDA and troubleshooting
  • 21. Our Process - Inference 21 ● Sagemaker Endpoints using docker images built on top of AWS supplied bases ● Postgres DB for storing real time features ● All behind API gateway for consistent API ● Lambda for A/B test, model aggregation ● Local dev with Docker for Inference test along with Jupyter notebooks, test integration in staging environment
  • 22. The Future 22 ● How to scale ● Quality monitoring ● More real-time feature generation ● Serverless inference