DevOps Days Rockies MLOps

DevOps & MLOps -
The Same But Diﬀerent?
@mattreyuk

Agenda
2
● What is machine learning
● What’s the same as regular service development
● What’s different (and can go wrong)
● Building ML teams and their place in the company
● How we do ML at Ibotta
● The future

Background
3
● I'm Matt Reynolds, a principal platform engineer on the
machine learning team at Ibotta
● Ibotta is a rewarded shopping company with mobile, web
and white label platform components
● Based here in Denver but now fully "remote-friendly"
● We're hiring - https://home.ibotta.com/work-with-us/careers/

What Is ML?
4
Machine learning (ML) A program or system that builds
(trains) a predictive model from input data. The system uses
the learned model to make useful predictions from new
(never-before-seen) data drawn from the same distribution
as the one used to train the model.
https://developers.google.com/machine-learning/glossary#machine-learning

Types Of ML
5
● “Analytical” ML
One off, exploratory, ﬁndings used in reports to
management
● “Engineering” ML
Models deployed to production, called by services

What’s The Same For ML?
6
● Frameworks/libraries/tools
● Git/PRs for code
● CI/CD - process automation for repeatability
● Provide a service in production
● Service monitoring*

Some Companies Struggle…
7
55% of companies surveyed did not have a model in production
https://info.algorithmia.com/hubfs/2019/Whitepapers/The-State-of-Enterprise-ML-2020/Algo
rithmia_2020_State_of_Enterprise_ML.pdf
87% of data science projects don’t make it to production
https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into
-production/

What’s Diﬀerent?
8
● DATA

Data
9
https://medium.com/hackernoon/the-ai-hierarchy-of-needs-18f111fcc007

Exploratory Data Analysis (EDA)
10
New Models require:
● Finding data sources that may be suitable
● Checking Data Quality, distribution
● Figuring out label generation
● Building initial Features
● Testing with algorithm(s)
● Validating results and tuning

11
● DATA
● People

People
12
“Data Scientists” have different skill sets:
● Have their own jargon
● May not be used to writing “production ready” code
● May not be used to being on-call, production support
● Mostly work in Python

13
● DATA
● People
● Different tools

Diﬀerent Tools
14
As well as the tooling to run a “regular” service, you also need:
● Data pipeline
● Feature engineering
● Feature store
● Training & hyperparameter tuning infrastructure
● Maybe specialized inference hardware (GPU)
● Inference monitoring (data drift)

Jupyter Notebooks
15
https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb

Jupyter Notebooks
16
https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb
{
"cell_type": "code",
"source": "from matplotlib import pyplot as pltnimport numpy as npnn# Generate 100 random data
points along 3 dimensionsnx, y, scale = np.random.randn(3, 100)nfig, ax = plt.subplots()nn# Map each
onto a scatterplot we'll create with Matplotlibnax.scatter(x=x, y=y, c=scale,
s=np.abs(scale)*500)nax.set(title="Some random data, created with JupyterLab!")nplt.show()",
"metadata": {
"trusted": true
},
"execution_count": 1,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png":
"iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR
0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAA9hAAAPYQGoP6dpAADYYUlEQVR4nOzdd3wcxfn48c/sXlMvlmTJslzl3rE
dgwvYxlRTDHFoScD0JEBCKAlOAgECIaRQvkBov9BJAIPpmG5sgzu4V7nJRbJ6l67tzu+Pk2Sf1U7S3anNOy+9gvf2ZubqPjflGSGllCi
KoiiKoig9htbRDVAURVEURVHCSwWAiqIoiqIoPYwKABVFURR…

Ideal Team Composition
17
● Fighter (Software Engineering)
● Cleric (Data Engineering)
● Wizard (Data Science)
● Rogue (Ops/Infrastructure)

How Can You Help?
18
● Take some time to learn the lay of the land
● Look for pain points - local dev, process automation
● Make suggestions, listen to feedback
● Jump in and learn the ropes
● Work from the more “engineering” side to the more “ML”
● Teach what you do and learn what they do
● Encourage collaboration, standardization
● Explain why

ML In The Larger Organization
19
● Need to work with Data/Analytics & Engineering orgs
● Involve product
● Advocate for big picture concerns like:
● Data catalog, more metadata
● Data quality
● More (timely) data - events from engineering services

Our Process - Data & Training
20
● Airﬂow for job orchestration
● PySpark for Data transformation
● Sagemaker for managing training, hyperparameter jobs
● Local dev with Docker for Airﬂow DAGs
● Jupyter notebooks for EDA and troubleshooting

Our Process - Inference
21
● Sagemaker Endpoints using docker images built on top of
AWS supplied bases
● Postgres DB for storing real time features
● All behind API gateway for consistent API
● Lambda for A/B test, model aggregation
● Local dev with Docker for Inference test along with Jupyter
notebooks, test integration in staging environment

The Future
22
● How to scale
● Quality monitoring
● More real-time feature generation
● Serverless inference

DevOps Days Rockies MLOps

Related slideshows

More Related Content

DevOps Days Rockies MLOps