MLOps is a practice for collaboration between Data Science and operations to manage the production machine learning (ML) lifecycles. As an amalgamation of “machine learning” and “operations,” MLOps applies DevOps principles to ML delivery, enabling the delivery of ML-based innovation at scale to result in:
Faster time to market of ML-based solutions
More rapid rate of experimentation, driving innovation
Assurance of quality, trustworthiness, and ethical AI
MLOps is essential for scaling ML. Without it, enterprises risk struggling with costly overhead and stalled progress. Several vendors have emerged with offerings to support MLOps: the major offerings are Microsoft Azure ML and Google Vertex AI. We looked at these offerings from the perspective of enterprise features and time-to-value.
1 of 40
Download to read offline
More Related Content
MLOps – Applying DevOps to Competitive Advantage
1. MLOps: Applying DevOps to
Competitive Advantage
Presented by: William McKnight
President, McKnight Consulting Group
linkedin.com/in/wmcknight
www.mcknightcg.com
(214) 514-1444
2. 8th December, 2022
Put AI Into Action And Boost
Productivity with MLOps
Abhilash Mula
Senior Manager, Product Management
11. William McKnight
President, McKnight Consulting Group
• Frequent keynote speaker and trainer internationally
• Consulted to Pfizer, Scotiabank, Fidelity, TD Ameritrade, Teva
Pharmaceuticals, Verizon, and many other Global 1000
companies
• Hundreds of articles, blogs and white papers in publication
• Focused on delivering business value and solving business
problems utilizing proven, streamlined approaches to
information management
• Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young Entrepreneur of Year
Finalist
• Owner/consultant: Research, Data Strategy and
Implementation consulting firm
2
12. McKnight Consulting Group Offerings
Strategy
Training
Strategy
Trusted Advisor
Action Plans
Roadmaps
Tool Selections
Program Management
Training
Classes
Workshops
Implementation
Data/Data Warehousing/Business
Intelligence/Analytics
Big Data
Master Data Management
Governance/Quality
Implementation
3
15. Use Cases for ML
Flow optimization Modeling and
analytics
Predictive insights Threat and risk
analysis
Public Sector Traffic flow
management
Smart city planning Autonomous
routing
Situational
Awareness
Oil and Gas Pipeline modelling Drilling patterns
and asset
utilization
Intelligent planning Safety assurance
Manufacturing Supply chain
optimization
Production
optimization
Predictive
maintenance
Fault identification
Retail Supply chain
optimization
Customer
experience
Segmentation
analysis and
forecasting
Fraud and theft
identification
Healthcare Patient care
pathway
optimization
Disease research
and drug creation
Early diagnosis of
conditions
Patient safety
Technology Operational
efficiency
Log analysis Capacity planning Cybersecurity and
zero-day detection
6
16. Drivers to MLOps
• Senior management does not always see ML as strategic, and it can be
difficult to measure and manage the value of ML projects.
• ML initiatives can work in isolation from each other, resulting in
difficulties aligning workflows between ML and other teams.
• To be effective, ML training requires large quantities of high-quality data,
which creates significant overheads across data access, preparation, and
ongoing management.
• ML/data science work requires a large amount of trial and error, making
it hard to plan the time required to complete a project.
7
17. What is MLOps?
• MLOps is a practice for collaboration
between data science and operations
to manage the production machine
learning (ML) lifecycles.
• As an amalgamation of “machine
learning” and “operations,” MLOps
applies DevOps principles to ML
delivery, enabling the delivery of ML-
based innovation at scale to result
in:
– Faster time to market of ML-
based solutions
– More rapid rate of
experimentation, driving
innovation
– Assurance of quality,
trustworthiness, and ethical AI
8
18. From ML to MLOps
• Many companies have built strong ML capabilities
• Few businesses have been successful in putting the majority of their
ML models into production, leaving a sizable amount of value
untapped.
• Machine learning operations, also known as MLOps, are a set of
standards, tools, and frameworks that are used to scale ML to reach
its full potential.
• Three main objectives of MLOps, which concentrates on the entire life
cycle of ML model design, implementation, testing, monitoring, and
management, are as follows:
– To create a highly repeatable procedure for the entire life cycle of a model, from
feature exploration to model deployment in production.
– Data scientists and analysts should be shielded from the complexity of the
infrastructure so they can concentrate on their models and plans.
– Develop MLOps so that it scales without a horde of engineers, along with the number
of models and modeling complexity.
9
19. MLOps Operations
• For modern enterprises, use of ML goes to the heart of
digital transformation, enabling organizations to harness
the power of their data and deliver new and
differentiated services to their customers. Achieving this
goal is predicated on three pillars:
• Development of such models requires an iterative
approach so the domain can be better understood,
and the models improved over time, as new
learnings are achieved from data and inference.
• Automated tools and repositories need to store
and keep track of models, code, data lineage, and a
target environment for deployment of ML-enabled
applications at speed without undermining
governance.
• Developers and data scientists need to work
collaboratively to ensure ML initiatives are aligned
with broader software delivery and, more broadly
still, IT-business alignment.
10
20. Why not DevOps?
• Connect data and services. DevOps success depends
on how well platforms of data and existing/new
services can be integrated, adapting to changing
circumstances.
• Automate deployment. Automation needs to be
considered in the context of the above, to ensure
constant, consistent delivery of business value.
• Operate and orchestrate resources. A commoditized,
flexible platform is table stakes: as platform efficiency
increases, so does DevOps effectiveness.
11
21. The goal is to assure the delivery of value to the business,
its customers and other stakeholders.
12
22. Terminology
• Pipeline. Each development iteration of an ML-based application will
follow a planned and automated series of steps. The pipeline itself
can be put under configuration control, such that the steps can be
repeated.
• Datasets store/Datasets. MLOps relies on an easily accessible and
scalable source of data, both during training and inference. While
data may come from several places, it will be prepared, cleaned and
accessed as a single resource.
• Repository. A common, version-controlled storage resource (e.g. Git,
Artifactory, Azure Artifacts) for data, model and configuration
schemas, managing dependencies between models, libraries and
other resources.
• Registry. A logical picture of all elements required to support a given
ML model, across its development and operational pipeline.
13
23. Terminology
• Workspace. Model and application developers conduct their activities
within individual workspaces, accessible graphically or via code (e.g.
written in Python), with access control over data sets, models and
insights
• Target. A deployment environment for ML models and code,
packaged for example as containers/microservices that is often cloud-
based, but can include on-premises and edge-based environments.
• Experiment. Outputs of a given iteration or run need to be stored so
they can be assessed, compared and monitored for audit purposes.
• Model. Packaged output of an experiment which can be used to
predict values or built on top of (via transfer learning).
• Endpoint. Internet-capable computer hardware device on a TCP/IP
network.
14
26. Applying MLOps in Practice
• Configure Target – Set up the compute targets on which models will be trained.
• Prepare data – Set up how data is ingested, prepared and used
• Train Model – Develop ML training scripts and submit them to the compute target
• Containerize the Service – After a satisfactory run is found, register the persisted model in a
model registry.
• Validate Results – Application integration test of the service deployed on dev/test target.
• Deploy Model – If the model is satisfactory, deploy it into the target environment
• Monitor Model – Monitor the deployed model to evaluate its inferencing performance and
accuracy
17
27. For iterative pipelines to continue to deliver
results, we need
• Reproducibility – as with software configuration management and continuous
integration, ML pipelines and steps, together with their data sources and models,
libraries and SDKs, need to be stored and maintained such that they can be repeated
exactly as previously.
• Reusability– to fit with principles of continuous delivery, the pipeline needs to be
able to package and deliver models and code into production, both to training and
target environments.
• Manageability – the ability to apply governance, linking changes to models and code
to development activities (for example through sprints) and enabling managers to
measure and oversee both progress and value delivery.
• Automation – as with DevOps, continuous integration and delivery require
automation to assure rapid and repeatable pipelines, particularly when these are
augmented by governance and testing (which can otherwise create a bottleneck).
18
28. MLOps scenario: Customer Churn
• Prepare Environment: Create and configure data stores, in this
case CRM data
• Normalize, transform and otherwise prepare datasets for
training and inference
• Point algorithms and code to the data
• Enforce transparency (e.g. through audit trails) to build
confidence in results
19
32. Azure Solution Architecture (example)
• With security controls in place, a user can provision a workspace
private link, customer managed keys, and role-based access control
(RBAC) using AML python SDK, CLI, or UX. ARM templates can be
used for automation.
• Compute instance is used as a managed workstation by data
scientists and is used to build models. IT Admin can create a compute
instance behind a VNet if there are restrictions in place to not use a
public IP.
• Compute Cluster is used as a training compute to train ML models. IT
Admin (not shown) can create a compute cluster behind a VNet or
enable a private link if there are restrictions in place to not use a public
IP.
• Once a model is created it can be deployed on AKS cluster. A private
AKS cluster with no public IP can be attached to the AML workspace
and an internal load balancer can be used so that the deployed
scoring endpoint is not visible outside of the virtual network. All the
scoring requests to the deployed model are made over TLS/SSL.
23
33. MLOps Features
• Ease of Setup and Use
– Create ML Managed Endpoints
– Create Compute Resources
– Manage Compute Resources
• MLOps Workflow
– Model Orchestration
– Data Orchestration
24
34. MLOps Features
• Security
– Network
– User
– Data
• Governance
– Monitoring
– Control
• Automation
– Experiments
– Workflow
– Code and App Orchestration
– Event-Driven
25
36. MLOps Features
• Model Explainability
• A/B Model Testing
• Granular Data Preparation
27
37. Midsize Organization MLOps Costs
Category Type Price Per
Time
Time Units
Per Year
Subtotal Units Amount
ML1
Compute E8 v3 $0.504 8,760 $4,415 16 $70,641
Service included $0.000 8,760 $0 16 $0
ML2
Model
Training
Per node per
hour
$19.32 8,760 $203,092 0.2 $33,849
Batch
prediction
Per node per
hour
$1.160 8,760 $10,162 16 $162,586
ML3 Compute ml.r5.2xlarge $0.504 8,760 $4,415 16 $70,641
Service ml.r5.2xlarge $0.101 8,760 $885 16 $14,156
28
38. Maturity Levels
29
1 Just gaining an understanding of using machine learning. No data scientists hired. Early data models built
without much success. There is a belief that whatever DevOps processes are in place will handle ML.
2 The data architecture serves most data that would be necessary for ML. A cloud commitment and direction is
present, providing scale for ML. A first data scientist is hired and prototyping is done. A full lifecycle ML is
accomplished with manual processes. MLOps is still an afterthought.
3 This company is actively looking to deliver the benefits of ML across the company. There is recognition of ML at
the executive level. However, early processes in use resemble DevOps and will not scale. Company begins
forking their DevOps for ML.
4 There is company-wide embracement of ML. Benefits have been produced and realized. There are numerous
and ample data scientists and the data architecture has matured so that more ML benefits can be realized.
Although there still isn’t full consistency in processes, the company has embraced MLOps and is rapidly
adapting it.
5 The business has fundamentally changed due to ML and it could not have done so without MLOps. ML is
applied to initiatives wherever possible. MLOps is nurtured as much as ML and includes model sharing,
reusability and reproducibility, model diagnostics and a strong path to production. Governance has become
central to ML strategy, ensuring outcomes that are explainable and transparent.
As featured in
39. In Conclusion
• ML Uptake is Strong
• A MLOps workspace is a cloud-based
development environment that enables you to
collaboratively develop, test and deploy
machine learning models
• Develop iterative pipelines to continue to
deliver result
• Automation is a key differentiator in MLOps
platforms
• Embrace Transparency and Predictability
30
40. MLOps: Applying DevOps to
Competitive Advantage
Presented by: William McKnight
President, McKnight Consulting Group
linkedin.com/in/wmcknight
www.mcknightcg.com
(214) 514-1444