Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Deploying machine learning models
in the enterprise
Strata Data Conference NYC
Diego Oppenheimer, CEO
diego@algorithmia.com
About Me
Diego Oppenheimer - Founder and CEO - Algorithmia
● Product developer, entrepreneur, extensive background in all things data.
● Microsoft: PowerPivot, PowerBI, Excel and SQL Server.
● Founder of algorithmic trading startup
● BS/MS Carnegie Mellon University
Make state-of-the-art algorithms
discoverable and accessible
to everyone.
4
Algorithmia.com
AI/ML scalable infrastructure on demand + marketplace
● Function-as-a-service for Machine & Deep Learning
● Discoverable, live inventory of AI
● Monetizable
● Composable
● Every developer on earth can make their app intelligent
“There’s an algorithm for that!”
77K DEVELOPERS 6.4K algorithms, models and functions
6
What does production mean for us
● ~6,400 algorithms,models, functions (50k w/ different versions)
● Each model: 1 to 1,000 calls a second, fluctuates, no devops
● ~15ms overhead latency
● Accessible in any of 14 languages (through SDKs)
● Any runtime, any architecture
7
ALGORITHMIA ENTERPRISE
Algorithmia Enterprise is an organization’s internal inventory of intelligence and a algorithm-as-a-
service platform
Deploy
Write your function or
model in any
programming
language, framework,
or infrastructure.
Scale
Expose your model as a
highly-reliable
versioned REST API that
automatically scales
from one to hundreds
of requests per second.
Discover
Name and describe
your model, making it
available in a central
catalog where your
peers can easily
discover and reuse it.
Monitor
House thousands of
models under one roof
with a uniform REST
interface and a single
cluster monitoring
dashboard.
MACHINE LEARNING
!=
PRODUCTION MACHINE LEARNING
9
● Challenges of Deploying Models in the Enterprise
● Characteristics of AI and Technologies
● Varying languages
● Standardize versioning
● Continuous model deployment
● Managing your portfolio of ML models
● Analytics on how and where models are being used
● Maintaining auditability
● Best practices for your organization
What we will cover
10
Challenges of deploying models in the enterprise
● Machine learning
○ CPU/GPU/Specialized hardware
○ Multiple frameworks, languages,
dependencies
○ Called from different
devices/architectures
● “Snowflake” environments
○ Unique cloud hardware and services
● Security and Audit
○ Stringent security and access controls
○ “Who called what when” for audit and
compliance
● Uncharted territory
○ Not a lot of literature
○ Deployment for data science teams is a
new problem
○ Many teams have not bought software
or dealt with their own infrastructure
teams.
○ Chargebacks and billing
• Two distinct phases: training and inference
• Lots of processing power
• Heterogeneous hardware (CPUs, GPUs, TPUs, etc.)
• Limited by compute rather than bandwidth
• “Tensorflow is open source, scaling it is not.” - Kenny Daniel
Characteristics of AI/ML
12
Metal or VM Containers Kubernetes
INFERENCE
Short compute bursts
Elastic
Stateless
Multiple users
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
Single user
OWNER: Data Scientists
Technologies
MICROSERVICES: the design of a system as
independently deployable, loosely coupled
services.
Two system design paradigms that work well
ADVANTAGES
• Maintainability
• Scalability
• Rolling deployments
• Elastic
• Software/Hardware agnostic
SERVERLESS: the encapsulation, starting, and
stopping of singular functions per request, with
a just-in-time-compute model.
ADVANTAGES
• Cost / Efficiency
• Concurrency built-in
• Speed of development
• Improved latency
14
Runtime Abstraction
Support any
programming language
or framework, including
interoperability between
mixed stacks.
Elastic Scale
Prioritize and
automatically optimize
execution of concurrent
short-lived jobs.
Cloud Abstraction
Provide portability to
algorithms, including
public clouds or private
clouds.
Discoverability, Authentication, Instrumentation, etc.
Shell & Services
Kernel
Think of your technology stack as an OS for running AI in your enterprise
15
Multiple frameworks, languages, dependencies
● Models are rarely developed in the language they will be consumed in.
● In large enterprises there is rarely a standard language for software
development
● The goal should be to make models consumable by any part of the
organization on any platform:
○ To ensure the max value extracted from the model
○ To ensure model re-use
○ To ensure the fastest time from lab to production
● APIs and well developed SDKs are your best friend.
○ APIs allow for easy testing if a model is adequate.
○ APIs allow for easy consumption.
○ Well developed APIs have built in versioning. Which takes us to the next
step....
16
Standardize versioning
● Versioning is an extremely important part of deploying models in the enterprise.
● Starting thinking about models as any other piece of modular software:
○ Must be versioned
○ Versioned must be tracked
○ Older versions should be accessible (for rollbacks, acceptance testing,etc).
For the Data Scientist:
● Ability to compare two different versions of the model is key.
○ Not only at training and verification time but also to understand performance
and SLA changes.
● Model drift.
For the Application Developers:
● Ability to match acceptance testing to predetermined developer cycles.
● Ability to stay behind a version.
● Avoid performance issues even if the model accuracy is better.
17
Standardize versioning
● Even better, make your system auto-version.
● Borrow the best practices from the Software
Development world.
● Rolling non-interruptive deployments
18
Standardize Documentation
● Just like any API, they are
only as good as their
documentation.
● Make the documentation
travel and be updated
with the model to ensure
its always up to date.
○ Directly in
markdown inside the
git that contains the
model.
○ As an artifact that
travels with the
model.
19
Continuous deployment
● Just like with standardizing versioning, Continuous
Integration and Continuous Deployment for AI/ML
should borrow from best practices of software
development.
● The fastest path is usually the best (git push -> deploy).
○ Git + Docker + API generation makes this really
easy.
● Don’t forget dependency management!
● Some interesting use cases:
○ Continuous training and deployment
○ Human in the loop training and deployment
○ Bespoke training and deployment to central
platform
Continuous deployment - Human in the Loop
Continuous deployment - Single train platform to Deployment
Continuous deployment - Multiple training frameworks to deployment
23
Managing a model portfolio
“We have 14 versions of imagic magik running as services
for image resizing before feeding into a number of different
models. ”
Dev Manager Analytics Platform - F500 Media Company
● Similar to an API strategy as new models become available you need to
start caring about how to find them, who can use them and how to bill
for them.
● Borrowing from the concept of an API Gateway and API registry the same
paradigms work for model management and distribution.
● A common, centralized registry will offer the ability to find what has
already been created and potentially re-use it.
24
Managing a model portfolio
● Centralized
repository/registry of
models that can be accessed
across organization.
● Encourage reuse (many
preprocessing and post
processing functions should
only be built once).
● Finding existing models is
key for experimenting with
different pipelines.
25
Managing a model portfolio
● Centralization of models allows for understanding business impact and usage across the
organization.
● C-level understanding of AI/ML investments working or not.
● Finding existing models is key for experimenting with different pipelines and for rapid
application development by disparate teams or external developers.
● Security and Access controls so that only the right people in the organization can access the
models.
○ Who can view how the model works vs Who can call upon it.
26
Model Analytics
During the training phase analytics such as accuracy, drift, errors rates are very important. When
deploying models inside an enterprise a different slice of analytics is required.
What is important during deployment and production:
● Latency
● Resources used (CPU/GPU, I/O)
● System Capacity
● Scale up and Scale down
● Authentication
● API timing metrics and calls
● Errors rates
But also…
● What teams are using the models
● What applications are using them
27
Model Auditability and Compliance
You enterprise deployment system should be able to answer:
“Who called what model, when and with what data.”
Why is this important:
● Compliance:
○ Regulated industries need to provide this information to government regulators:
■ Financial Services
■ Life Sciences
■ Federal Government
● C-level understanding of AI/ML investments working or not.
● Debugging production systems
● Billing
○ Complete understanding of who is using what models allows for chargebacks
28
Best practices and conclusion
"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to
Seaworld with a giraffe since they are already handling large mammals.”
● Technology:
○ Borrow from the best practices of software development and deployment of
applications and code at scale.
■ CI/CD, Versioning, API design, etc.
○ The most advanced AI/ML companies in the world are centralizing their deployment
and serving platforms under one roof. This is because the influence of data science
and ML teams across your organization will only grow.
○ Understand that training and serving have very different profiles and different
technology choices will need to be made.
○ Seriously consider using microservices and serverless as design patterns for re-use,
scale and modularity.
29
Best practices and conclusion
● Organization:
○ Production and serving will usually be owned DevOps or Enterprise Architecture.
Success in deployment in the enterprise will dicated by understanding the roles and
responsibilities of these teams.
○ Data science teams tend to be new with limited experience in:
■ Purchasing enterprise software
■ Requirements and considerations for production environments
■ IT requirements around Information Security, Compliance and Support.
It’s crucial for success to educate and guide these teams through the enterprise
requirements.
30
Best practices and conclusion
● Future proofing:
○ Think re-use: many models will be interesting and usable to multiple parts of the
enterprise. Discoverability and accessibility become key.
○ Safe to assume that your number of models will only grow in time. How you will
manage them in the future requires having a conversation early in the process.
○ Your AI/ML model portfolio is an extremely important asset - understanding the value
you are getting from them is important to measurement and influence should be
tracked.
○ When deciding to build vs buy think of your teams capacity to adapt and move at the
pace the AI/ML industry is moving at over time.
MACHINE LEARNING
!=
PRODUCTION MACHINE LEARNING
Diego Oppenheimer
CEO
Thank you!
diego@algorithmia.com
@doppenhe

More Related Content

Deploying ML models in the enterprise

  • 1. Deploying machine learning models in the enterprise Strata Data Conference NYC Diego Oppenheimer, CEO diego@algorithmia.com
  • 2. About Me Diego Oppenheimer - Founder and CEO - Algorithmia ● Product developer, entrepreneur, extensive background in all things data. ● Microsoft: PowerPivot, PowerBI, Excel and SQL Server. ● Founder of algorithmic trading startup ● BS/MS Carnegie Mellon University
  • 3. Make state-of-the-art algorithms discoverable and accessible to everyone.
  • 4. 4 Algorithmia.com AI/ML scalable infrastructure on demand + marketplace ● Function-as-a-service for Machine & Deep Learning ● Discoverable, live inventory of AI ● Monetizable ● Composable ● Every developer on earth can make their app intelligent
  • 5. “There’s an algorithm for that!” 77K DEVELOPERS 6.4K algorithms, models and functions
  • 6. 6 What does production mean for us ● ~6,400 algorithms,models, functions (50k w/ different versions) ● Each model: 1 to 1,000 calls a second, fluctuates, no devops ● ~15ms overhead latency ● Accessible in any of 14 languages (through SDKs) ● Any runtime, any architecture
  • 7. 7 ALGORITHMIA ENTERPRISE Algorithmia Enterprise is an organization’s internal inventory of intelligence and a algorithm-as-a- service platform Deploy Write your function or model in any programming language, framework, or infrastructure. Scale Expose your model as a highly-reliable versioned REST API that automatically scales from one to hundreds of requests per second. Discover Name and describe your model, making it available in a central catalog where your peers can easily discover and reuse it. Monitor House thousands of models under one roof with a uniform REST interface and a single cluster monitoring dashboard.
  • 9. 9 ● Challenges of Deploying Models in the Enterprise ● Characteristics of AI and Technologies ● Varying languages ● Standardize versioning ● Continuous model deployment ● Managing your portfolio of ML models ● Analytics on how and where models are being used ● Maintaining auditability ● Best practices for your organization What we will cover
  • 10. 10 Challenges of deploying models in the enterprise ● Machine learning ○ CPU/GPU/Specialized hardware ○ Multiple frameworks, languages, dependencies ○ Called from different devices/architectures ● “Snowflake” environments ○ Unique cloud hardware and services ● Security and Audit ○ Stringent security and access controls ○ “Who called what when” for audit and compliance ● Uncharted territory ○ Not a lot of literature ○ Deployment for data science teams is a new problem ○ Many teams have not bought software or dealt with their own infrastructure teams. ○ Chargebacks and billing
  • 11. • Two distinct phases: training and inference • Lots of processing power • Heterogeneous hardware (CPUs, GPUs, TPUs, etc.) • Limited by compute rather than bandwidth • “Tensorflow is open source, scaling it is not.” - Kenny Daniel Characteristics of AI/ML
  • 12. 12 Metal or VM Containers Kubernetes INFERENCE Short compute bursts Elastic Stateless Multiple users OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful Single user OWNER: Data Scientists Technologies
  • 13. MICROSERVICES: the design of a system as independently deployable, loosely coupled services. Two system design paradigms that work well ADVANTAGES • Maintainability • Scalability • Rolling deployments • Elastic • Software/Hardware agnostic SERVERLESS: the encapsulation, starting, and stopping of singular functions per request, with a just-in-time-compute model. ADVANTAGES • Cost / Efficiency • Concurrency built-in • Speed of development • Improved latency
  • 14. 14 Runtime Abstraction Support any programming language or framework, including interoperability between mixed stacks. Elastic Scale Prioritize and automatically optimize execution of concurrent short-lived jobs. Cloud Abstraction Provide portability to algorithms, including public clouds or private clouds. Discoverability, Authentication, Instrumentation, etc. Shell & Services Kernel Think of your technology stack as an OS for running AI in your enterprise
  • 15. 15 Multiple frameworks, languages, dependencies ● Models are rarely developed in the language they will be consumed in. ● In large enterprises there is rarely a standard language for software development ● The goal should be to make models consumable by any part of the organization on any platform: ○ To ensure the max value extracted from the model ○ To ensure model re-use ○ To ensure the fastest time from lab to production ● APIs and well developed SDKs are your best friend. ○ APIs allow for easy testing if a model is adequate. ○ APIs allow for easy consumption. ○ Well developed APIs have built in versioning. Which takes us to the next step....
  • 16. 16 Standardize versioning ● Versioning is an extremely important part of deploying models in the enterprise. ● Starting thinking about models as any other piece of modular software: ○ Must be versioned ○ Versioned must be tracked ○ Older versions should be accessible (for rollbacks, acceptance testing,etc). For the Data Scientist: ● Ability to compare two different versions of the model is key. ○ Not only at training and verification time but also to understand performance and SLA changes. ● Model drift. For the Application Developers: ● Ability to match acceptance testing to predetermined developer cycles. ● Ability to stay behind a version. ● Avoid performance issues even if the model accuracy is better.
  • 17. 17 Standardize versioning ● Even better, make your system auto-version. ● Borrow the best practices from the Software Development world. ● Rolling non-interruptive deployments
  • 18. 18 Standardize Documentation ● Just like any API, they are only as good as their documentation. ● Make the documentation travel and be updated with the model to ensure its always up to date. ○ Directly in markdown inside the git that contains the model. ○ As an artifact that travels with the model.
  • 19. 19 Continuous deployment ● Just like with standardizing versioning, Continuous Integration and Continuous Deployment for AI/ML should borrow from best practices of software development. ● The fastest path is usually the best (git push -> deploy). ○ Git + Docker + API generation makes this really easy. ● Don’t forget dependency management! ● Some interesting use cases: ○ Continuous training and deployment ○ Human in the loop training and deployment ○ Bespoke training and deployment to central platform
  • 20. Continuous deployment - Human in the Loop
  • 21. Continuous deployment - Single train platform to Deployment
  • 22. Continuous deployment - Multiple training frameworks to deployment
  • 23. 23 Managing a model portfolio “We have 14 versions of imagic magik running as services for image resizing before feeding into a number of different models. ” Dev Manager Analytics Platform - F500 Media Company ● Similar to an API strategy as new models become available you need to start caring about how to find them, who can use them and how to bill for them. ● Borrowing from the concept of an API Gateway and API registry the same paradigms work for model management and distribution. ● A common, centralized registry will offer the ability to find what has already been created and potentially re-use it.
  • 24. 24 Managing a model portfolio ● Centralized repository/registry of models that can be accessed across organization. ● Encourage reuse (many preprocessing and post processing functions should only be built once). ● Finding existing models is key for experimenting with different pipelines.
  • 25. 25 Managing a model portfolio ● Centralization of models allows for understanding business impact and usage across the organization. ● C-level understanding of AI/ML investments working or not. ● Finding existing models is key for experimenting with different pipelines and for rapid application development by disparate teams or external developers. ● Security and Access controls so that only the right people in the organization can access the models. ○ Who can view how the model works vs Who can call upon it.
  • 26. 26 Model Analytics During the training phase analytics such as accuracy, drift, errors rates are very important. When deploying models inside an enterprise a different slice of analytics is required. What is important during deployment and production: ● Latency ● Resources used (CPU/GPU, I/O) ● System Capacity ● Scale up and Scale down ● Authentication ● API timing metrics and calls ● Errors rates But also… ● What teams are using the models ● What applications are using them
  • 27. 27 Model Auditability and Compliance You enterprise deployment system should be able to answer: “Who called what model, when and with what data.” Why is this important: ● Compliance: ○ Regulated industries need to provide this information to government regulators: ■ Financial Services ■ Life Sciences ■ Federal Government ● C-level understanding of AI/ML investments working or not. ● Debugging production systems ● Billing ○ Complete understanding of who is using what models allows for chargebacks
  • 28. 28 Best practices and conclusion "Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.” ● Technology: ○ Borrow from the best practices of software development and deployment of applications and code at scale. ■ CI/CD, Versioning, API design, etc. ○ The most advanced AI/ML companies in the world are centralizing their deployment and serving platforms under one roof. This is because the influence of data science and ML teams across your organization will only grow. ○ Understand that training and serving have very different profiles and different technology choices will need to be made. ○ Seriously consider using microservices and serverless as design patterns for re-use, scale and modularity.
  • 29. 29 Best practices and conclusion ● Organization: ○ Production and serving will usually be owned DevOps or Enterprise Architecture. Success in deployment in the enterprise will dicated by understanding the roles and responsibilities of these teams. ○ Data science teams tend to be new with limited experience in: ■ Purchasing enterprise software ■ Requirements and considerations for production environments ■ IT requirements around Information Security, Compliance and Support. It’s crucial for success to educate and guide these teams through the enterprise requirements.
  • 30. 30 Best practices and conclusion ● Future proofing: ○ Think re-use: many models will be interesting and usable to multiple parts of the enterprise. Discoverability and accessibility become key. ○ Safe to assume that your number of models will only grow in time. How you will manage them in the future requires having a conversation early in the process. ○ Your AI/ML model portfolio is an extremely important asset - understanding the value you are getting from them is important to measurement and influence should be tracked. ○ When deciding to build vs buy think of your teams capacity to adapt and move at the pace the AI/ML industry is moving at over time.