Deploying ML models in the enterprise

Deploying machine learning models
in the enterprise
Strata Data Conference NYC
Diego Oppenheimer, CEO
diego@algorithmia.com

About Me
Diego Oppenheimer - Founder and CEO - Algorithmia
● Product developer, entrepreneur, extensive background in all things data.
● Microsoft: PowerPivot, PowerBI, Excel and SQL Server.
● Founder of algorithmic trading startup
● BS/MS Carnegie Mellon University

Make state-of-the-art algorithms
discoverable and accessible
to everyone.

4
Algorithmia.com
AI/ML scalable infrastructure on demand + marketplace
● Function-as-a-service for Machine & Deep Learning
● Discoverable, live inventory of AI
● Monetizable
● Composable
● Every developer on earth can make their app intelligent

“There’s an algorithm for that!”
77K DEVELOPERS 6.4K algorithms, models and functions

6
What does production mean for us
● ~6,400 algorithms,models, functions (50k w/ different versions)
● Each model: 1 to 1,000 calls a second, fluctuates, no devops
● ~15ms overhead latency
● Accessible in any of 14 languages (through SDKs)
● Any runtime, any architecture

7
ALGORITHMIA ENTERPRISE
Algorithmia Enterprise is an organization’s internal inventory of intelligence and a algorithm-as-a-
service platform
Deploy
Write your function or
model in any
programming
language, framework,
or infrastructure.
Scale
Expose your model as a
highly-reliable
versioned REST API that
automatically scales
from one to hundreds
of requests per second.
Discover
Name and describe
your model, making it
available in a central
catalog where your
peers can easily
discover and reuse it.
Monitor
House thousands of
models under one roof
with a uniform REST
interface and a single
cluster monitoring
dashboard.

MACHINE LEARNING
!=
PRODUCTION MACHINE LEARNING

9
● Challenges of Deploying Models in the Enterprise
● Characteristics of AI and Technologies
● Varying languages
● Standardize versioning
● Continuous model deployment
● Managing your portfolio of ML models
● Analytics on how and where models are being used
● Maintaining auditability
● Best practices for your organization
What we will cover

10
Challenges of deploying models in the enterprise
● Machine learning
○ CPU/GPU/Specialized hardware
○ Multiple frameworks, languages,
dependencies
○ Called from different
devices/architectures
● “Snowflake” environments
○ Unique cloud hardware and services
● Security and Audit
○ Stringent security and access controls
○ “Who called what when” for audit and
compliance
● Uncharted territory
○ Not a lot of literature
○ Deployment for data science teams is a
new problem
○ Many teams have not bought software
or dealt with their own infrastructure
teams.
○ Chargebacks and billing

• Two distinct phases: training and inference
• Lots of processing power
• Heterogeneous hardware (CPUs, GPUs, TPUs, etc.)
• Limited by compute rather than bandwidth
• “Tensorflow is open source, scaling it is not.” - Kenny Daniel
Characteristics of AI/ML

12
Metal or VM Containers Kubernetes
INFERENCE
Short compute bursts
Elastic
Stateless
Multiple users
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
Single user
OWNER: Data Scientists
Technologies

MICROSERVICES: the design of a system as
independently deployable, loosely coupled
services.
Two system design paradigms that work well
ADVANTAGES
• Maintainability
• Scalability
• Rolling deployments
• Elastic
• Software/Hardware agnostic
SERVERLESS: the encapsulation, starting, and
stopping of singular functions per request, with
a just-in-time-compute model.
ADVANTAGES
• Cost / Efficiency
• Concurrency built-in
• Speed of development
• Improved latency

14
Runtime Abstraction
Support any
programming language
or framework, including
interoperability between
mixed stacks.
Elastic Scale
Prioritize and
automatically optimize
execution of concurrent
short-lived jobs.
Cloud Abstraction
Provide portability to
algorithms, including
public clouds or private
clouds.
Discoverability, Authentication, Instrumentation, etc.
Shell & Services
Kernel
Think of your technology stack as an OS for running AI in your enterprise

15
Multiple frameworks, languages, dependencies
● Models are rarely developed in the language they will be consumed in.
● In large enterprises there is rarely a standard language for software
development
● The goal should be to make models consumable by any part of the
organization on any platform:
○ To ensure the max value extracted from the model
○ To ensure model re-use
○ To ensure the fastest time from lab to production
● APIs and well developed SDKs are your best friend.
○ APIs allow for easy testing if a model is adequate.
○ APIs allow for easy consumption.
○ Well developed APIs have built in versioning. Which takes us to the next
step....

16
Standardize versioning
● Versioning is an extremely important part of deploying models in the enterprise.
● Starting thinking about models as any other piece of modular software:
○ Must be versioned
○ Versioned must be tracked
○ Older versions should be accessible (for rollbacks, acceptance testing,etc).
For the Data Scientist:
● Ability to compare two different versions of the model is key.
○ Not only at training and verification time but also to understand performance
and SLA changes.
● Model drift.
For the Application Developers:
● Ability to match acceptance testing to predetermined developer cycles.
● Ability to stay behind a version.
● Avoid performance issues even if the model accuracy is better.

17
Standardize versioning
● Even better, make your system auto-version.
● Borrow the best practices from the Software
Development world.
● Rolling non-interruptive deployments

18
Standardize Documentation
● Just like any API, they are
only as good as their
documentation.
● Make the documentation
travel and be updated
with the model to ensure
its always up to date.
○ Directly in
markdown inside the
git that contains the
model.
○ As an artifact that
travels with the
model.

19
Continuous deployment
● Just like with standardizing versioning, Continuous
Integration and Continuous Deployment for AI/ML
should borrow from best practices of software
development.
● The fastest path is usually the best (git push -> deploy).
○ Git + Docker + API generation makes this really
easy.
● Don’t forget dependency management!
● Some interesting use cases:
○ Continuous training and deployment
○ Human in the loop training and deployment
○ Bespoke training and deployment to central
platform

Continuous deployment - Human in the Loop

Continuous deployment - Single train platform to Deployment

Continuous deployment - Multiple training frameworks to deployment

23
Managing a model portfolio
“We have 14 versions of imagic magik running as services
for image resizing before feeding into a number of different
models. ”
Dev Manager Analytics Platform - F500 Media Company
● Similar to an API strategy as new models become available you need to
start caring about how to find them, who can use them and how to bill
for them.
● Borrowing from the concept of an API Gateway and API registry the same
paradigms work for model management and distribution.
● A common, centralized registry will offer the ability to find what has
already been created and potentially re-use it.

24
● Centralized
repository/registry of
models that can be accessed
across organization.
● Encourage reuse (many
preprocessing and post
processing functions should
only be built once).
● Finding existing models is
key for experimenting with
different pipelines.

25
● Centralization of models allows for understanding business impact and usage across the
organization.
● C-level understanding of AI/ML investments working or not.
● Finding existing models is key for experimenting with different pipelines and for rapid
application development by disparate teams or external developers.
● Security and Access controls so that only the right people in the organization can access the
models.
○ Who can view how the model works vs Who can call upon it.

26
Model Analytics
During the training phase analytics such as accuracy, drift, errors rates are very important. When
deploying models inside an enterprise a different slice of analytics is required.
What is important during deployment and production:
● Latency
● Resources used (CPU/GPU, I/O)
● System Capacity
● Scale up and Scale down
● Authentication
● API timing metrics and calls
● Errors rates
But also…
● What teams are using the models
● What applications are using them

27
Model Auditability and Compliance
You enterprise deployment system should be able to answer:
“Who called what model, when and with what data.”
Why is this important:
● Compliance:
○ Regulated industries need to provide this information to government regulators:
■ Financial Services
■ Life Sciences
■ Federal Government
● C-level understanding of AI/ML investments working or not.
● Debugging production systems
● Billing
○ Complete understanding of who is using what models allows for chargebacks

28
Best practices and conclusion
"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to
Seaworld with a giraffe since they are already handling large mammals.”
● Technology:
○ Borrow from the best practices of software development and deployment of
applications and code at scale.
■ CI/CD, Versioning, API design, etc.
○ The most advanced AI/ML companies in the world are centralizing their deployment
and serving platforms under one roof. This is because the influence of data science
and ML teams across your organization will only grow.
○ Understand that training and serving have very different profiles and different
technology choices will need to be made.
○ Seriously consider using microservices and serverless as design patterns for re-use,
scale and modularity.

29
● Organization:
○ Production and serving will usually be owned DevOps or Enterprise Architecture.
Success in deployment in the enterprise will dicated by understanding the roles and
responsibilities of these teams.
○ Data science teams tend to be new with limited experience in:
■ Purchasing enterprise software
■ Requirements and considerations for production environments
■ IT requirements around Information Security, Compliance and Support.
It’s crucial for success to educate and guide these teams through the enterprise
requirements.

30
● Future proofing:
○ Think re-use: many models will be interesting and usable to multiple parts of the
enterprise. Discoverability and accessibility become key.
○ Safe to assume that your number of models will only grow in time. How you will
manage them in the future requires having a conversation early in the process.
○ Your AI/ML model portfolio is an extremely important asset - understanding the value
you are getting from them is important to measurement and influence should be
tracked.
○ When deciding to build vs buy think of your teams capacity to adapt and move at the
pace the AI/ML industry is moving at over time.

Diego Oppenheimer
CEO
Thank you!
diego@algorithmia.com
@doppenhe

Deploying ML models in the enterprise

Related slideshows

More Related Content

Deploying ML models in the enterprise