Professional Documents
Culture Documents
Original
Original
Emmanuel Raj
BIRMINGHAM—MUMBAI
Engineering MLOps
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the publisher,
except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without warranty,
either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors,
will be held liable for any damages caused or alleged to have been caused directly or indirectly by
this book.
Packt Publishing has endeavored to provide trademark information about all of the companies
and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing
cannot guarantee the accuracy of this information.
ISBN 978-1-80056-288-2
www.packt.com
Contributors
About the author
Emmanuel Raj is a Finland-based Senior Machine Learning Engineer with 6+ years of
industry experience. He is also a Machine Learning Engineer at TietoEvry and a Member
of the European AI Alliance at the European Commission. He is passionate about
democratizing AI and bringing research and academia to industry. He holds a Master of
Engineering degree in Big Data Analytics from Arcada University of Applied Sciences.
He has a keen interest in R&D in technologies such as Edge AI, Blockchain, NLP, MLOps,
and Robotics. He believes the best way to learn is to teach, he is passionate about sharing
and learning new technologies with others.
About the reviewers
Magnus Westerlund (DSc) is a principal lecturer in information technology and
director of the master's degree programme in big data analytics at Arcada University of
Applied Sciences in Helsinki, Finland. He has a background in telecoms and information
management and earned his doctoral degree in information systems at Åbo Akademi
University, Finland. Magnus has published research in the fields of analytics, IT security,
cyber regulation, and distributed ledger technology. His current research topics are
smart contract-based distributed security for IoT edge applications and the assessment
of intelligent systems. He participates as a technical expert in the Z-inspection® network,
which works for a Mindful Use of AI (#MUAI).
Emerson Bertolo is a data scientist and software developer who has created mission-
critical software and dealt with big data applications for more than 12 years. In 2016,
Bertolo deep-dived into machine learning and deep learning projects by creating AI
models using TensorFlow, PyTorch, MXNet, Keras, and Python libraries to bring those
models into reality for tech companies from LawTech to security and defense. By merging
Agile concepts into data science, Bertolo has been seeking the best blend between
Agile software engineering and machine learning research to build time-to-market AI
applications. His approach has been build to learn, validate results, research and identify
uncertainties, rebuild, and learn again!
Table of Contents
Preface
2
Characterizing Your Machine Learning Problem
The ML solution development Structuring your MLOps 37
process28 Small data ops 39
Types of ML models 29 Big data ops 40
Learning models 30 Hybrid MLOps 41
Hybrid models 31 Large-scale MLOps 41
Statistical models 34
An implementation roadmap
HITL models 36
for your solution 42
ii Table of Contents
3
Code Meets Data
Business problem analysis and Data preprocessing 66
categorizing the problem 52 Data quality assessment 66
Setting up the resources and Calibrating missing data 68
tools54 Label encoding 68
Installing MLflow 54 New feature – Future_weather_condition70
Azure Machine Learning 55 Data correlations and filtering 70
Azure DevOps 58 Time series analysis 73
JupyterHub60
Data registration and versioning 74
10 principles of source code Toward the ML Pipeline 76
management for ML 60 Feature Store 77
What is good data for ML? 64
Summary78
4
Machine Learning Pipelines
Going through the basics of ML metrics96
pipelines80 Testing the SVM classifier 96
Data ingestion and feature Testing the Random Forest classifier 97
engineering 88
Model packaging 97
Data ingestion (training dataset) 90
Registering models and
Machine learning training and production artifacts 98
hyperparameter optimization 92 Registering production artifacts 100
Support Vector Machine 92
Random Forest classifier 94 Summary102
5
Model Evaluation and Packaging
Model evaluation and Interoperability 126
interpretability metrics 104 Deployment agnosticity 126
Learning models' metrics 105
How to package ML models 126
Hybrid models' metrics 114
Serialized files 127
Statistical models' metrics 119
Packetizing or containerizing 128
HITL model metrics 121
Microservice generation and
Production testing methods 122 deployment129
Batch testing 123
Inference ready models 130
A/B testing 123
Connecting to the workspace and
Stage test or shadow test 124
importing model artifacts 131
Testing in CI/CD 124
Loading model artifacts for inference 131
Why package ML models? 125 Summary132
Portability 125
Inference126
7
Building Robust CI-CD Pipelines
Continuous integration, Installing the extension to connect to
delivery, and deployment in the Azure ML workspace 172
MLOps166 Setting up a continuous integration
Continuous integration 167 and deployment pipeline for the test
environment174
Continuous delivery 167
Connecting artifacts to the pipeline 175
Continuous deployment 168
Setting up a test environment 178
Setting up a CI-CD pipeline and
the test environment (using Pipeline execution and testing 185
Azure DevOps) 168 Pipeline execution triggers 188
Creating a service principal 169 Summary190
8
APIs and Microservice Management
Introduction to APIs and Old is gold – REST API-based
microservices192 microservices198
What is an Application Programming Hands-on implementation of
Interface (API)? 192 serving an ML model as an API 199
Microservices193 API design and development 200
9
Testing and Securing Your ML Solution
Understanding the need for design214
testing and securing your ML Data testing 214
application214 Model testing 215
Testing your ML solution by Pre-training tests 216
Table of Contents v
10
Essentials of Production Release
Setting up the production Configuring pipeline triggers for
infrastructure230 automation245
Azure Machine Learning workspace 231 Setting up a Git trigger 246
Azure Machine Learning SDK 234 Setting up an Artifactory trigger 247
Setting up a Schedule trigger 248
Setting up our production
environment in the CI/CD Pipeline release management 250
pipeline237 Toward continuous monitoring 252
Testing our production-ready Summary252
pipeline243
12
Model Serving and Monitoring
Serving, monitoring, and Implementing the Explainable
maintaining models in Monitoring framework 281
production276 Monitoring your ML system 284
Exploring different modes of Analyzing your ML system 307
serving ML models 277
Governing your ML system 308
Serving the model as a batch service 278
Serving the model to a human user 279 Summary309
Serving the model to a machine 280
13
Governing the ML System for Continual Learning
Understanding the need for Model auditing and reports 333
continual learning 312
Enabling model retraining 335
Continual learning 312
Manual model retraining 336
The need for continual learning 313
Automated model retraining 336
Explainable monitoring –
governance 315 Maintaining the CI/CD pipeline 336
Alerts and actions 315 Summary338
Model QA and control 333 Why subscribe? 339
Chapter 7, Building Robust CI and CD Pipelines, covers different CI/CD pipeline components
such as triggers, releases, jobs, and so on. It will also equip you with knowledge on curating
your own custom CI/CD pipelines for ML solutions. We will build a CI/CD pipeline for an
ML solution for a business use case. The pipelines we build will be traceable end to end as
they will serve as middleware for model deployment and monitoring.
Chapter 8, APIs and Microservice Management, goes into the principles of API and
microservice design for ML inference. A learn by doing approach will be encouraged.
We will go through a hands-on implementation of designing and developing an API and
microservice for an ML model using tools such as FastAPI and Docker. You will learn key
principles, challenges, and tips to designing a robust and scalable microservice and API
for test and production environments.
Chapter 9, Testing and Securing Your ML Solution, introduces the core principles of
performing tests in the test environment to test the robustness and scalability of the
microservice or API we have previously developed. We will perform hands-on load testing
for a deployed ML solution. This chapter provides a checklist of tests to be done before
taking the microservice to production release.
Chapter 10, Essentials of Production Release, explains how to deploy ML services to
production with a robust and scalable approach using the CI/CD pipelines designed
earlier. We will focus on deploying, monitoring, and managing the service in production.
Key learnings will be deployment in serverless and server environments using tools such
as Python, Docker, and Kubernetes.
Chapter 11, Key Principles for Monitoring Your ML System, looks at key principles
and aspects of monitoring ML systems in production for robust, secure, and scalable
performance. As a key takeaway, readers will get a concrete explainable monitoring
framework and checklist to set up and configure a monitoring framework for their ML
solution in production.
Chapter 12, Model Serving and Monitoring, explains serving models to users and defining
metrics for an ML solution, especially in the aspects of algorithm efficiency, accuracy, and
production performance. We will deep dive into hands-on implementation and real-life
examples on monitoring data drift, model drift, and application performance.
Chapter 13, Governing the ML System for Continual Learning, reflects on the need for
continual learning in machine learning solutions. We will look into what is needed to
successfully govern an ML system for business efficacy. Using the Explainable Monitoring
framework, we will devise a strategy to govern and we will delve into the hands-on
implementation for error handling and configuring alerts and actions. This chapter will
equip you with critical skills to automate and govern your MLOps.
x Preface
If you are using the digital version of this book, we advise you to type the code yourself
or access the code via the GitHub repository (link available in the next section). Doing
so will help you avoid any potential errors related to the copying and pasting of code.
Preface xi
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.
Here is an example: "The preprocessed dataset is imported using the .get_by_name()
function."
A block of code is set as follows:
uri = workspace.get_mlflow_tracking_uri( )
mlflow.set_tracking_uri(uri)
When we wish to draw your attention to a particular part of a code block, the relevant
lines or items are set in bold:
python3 test_inference.py
Bold: Indicates a new term, an important word, or words that you see onscreen. For
example, words in menus or dialog boxes appear in the text like this. Here is an example:
"Go to the Compute option and click the Create button to explore compute options
available on the cloud."
xii Preface
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book
title in the subject of your message and email us at customercare@packtpub.com.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you have found a mistake in this book, we would be grateful if you would
report this to us. Please visit www.packtpub.com/support/errata, selecting your
book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet,
we would be grateful if you would provide us with the location address or website name.
Please contact us at copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in
and you are interested in either writing or contributing to a book, please visit authors.
packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on
the site that you purchased it from? Potential readers can then see and use your unbiased
opinion to make purchase decisions, we at Packt can understand what you think about
our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
Section 1:
Framework for
Building Machine
Learning Models
This part will equip readers with the foundation of MLOps and workflows to characterize
their ML problems to provide a clear roadmap for building robust and scalable ML
pipelines. This will be done in a learn-by-doing approach via practical implementation
using proposed methods and tools (Azure Machine Learning services or MLflow).
This section comprises the following chapters:
Cloud computing became popular in the industry from 2006 onward when Sun
Microsystems launched Sun Grid in March 2006. It is a hardware and data resource
sharing service. This service was acquired by Oracle and was later named Sun Cloud.
Parallelly, in the same year (2006), another cloud computing service was launched by
Amazon called Elastic Compute Cloud. This enabled new possibilities for businesses
to provision computation, storage, and scaling capabilities on demand. Since then, the
transformation across industries has been organic toward adopting cloud computing.
In the last decade, many companies on a global and regional scale have catalyzed the cloud
transformation, with companies such as Google, IBM, Microsoft, UpCloud, Alibaba, and
others heavily investing in the research and development of cloud services. As a result, a
shift from localized computing (companies having their own servers and data centers) to
on-demand computing has taken place due to the availability of robust and scalable cloud
services. Now businesses and organizations are able to provision resources on-demand on
the cloud to satisfy their data processing needs.
With these developments, we have witnessed Moore's law in operation, which states
that the number of transistors on a microchip doubles every 2 years – though the cost of
computers has halved, this has been true so far. Subsequently, some trends are developing
as follows.
Figure 1.1 – Demand for deep learning over time supported by computation
These breakthroughs in deep learning are enabled by the exponential growth in
computing, which increases around 35 times every 18 months. Looking ahead in time,
with such demands we may hit roadblocks in terms of scaling up central computing
for CPUs, GPUs, or TPUs. This has forced us to look at alternatives such as distributed
learning where computation for data processing is distributed across multiple
computation nodes. We have seen some breakthroughs in distributed learning, such
as federated learning and edge computing approaches. Distributed learning has shown
promise to serve the growing demands of deep learning.
The evolution of infrastructure and software development 7
AI-centric applications
Applications are becoming AI-centric – we see that across multiple industries. Virtually
every application is starting to use AI, and these applications are running separately on
distributed workloads such as HPC, microservices, and big data, as shown in Figure 1.2:
By combining HPC and AI, we can enable the benefits of computation needed to
train deep learning and ML models. With the overlapping of big data and AI, we can
leverage extracting required data at scale for AI model training, and with the overlap
of microservices and AI we can serve the AI models for inference to enhance business
operations and impact. This way, distributed applications have become the new norm.
Developing AI-centric applications at scale requires a synergy of distributed applications
(HPC, microservices, and big data) and for this, a new way of developing software is
required.
• The entire set of requirements has to be given before starting the development;
modifying them during or after the project development is not possible.
The evolution of infrastructure and software development 9
• Requirements are defined before starting the development, but they can be modified
at any time.
• It is possible to create or implement reusable components.
• The solution or project can be modular by segregating the project into different
modules that are delivered periodically.
• The users or customers can co-create by testing and evaluating developed solution
modules periodically to ensure the business needs are satisfied. Such a user-centric
process ensures quality outcomes focused on meeting customer and business needs.
10 Fundamentals of an MLOps Workflow
The following diagram shows the difference between Waterfall and Agile methodologies:
While code is meticulously crafted in the development environment, data comes from
multiple sources for training, testing, and inference. It is robust and changing over time
in terms of volume, velocity, veracity, and variety. To keep up with evolving data, code
evolves over time. For perspective, their relationship can be observed as if code and data
live in separate planes that share the time dimension but are independent in all other
aspects. The challenge of an ML development process is to create a bridge between these
two planes in a controlled way:
Information
These points have been sourced from policy and investment recommendations
for trustworthy AI – European commission (https://ec.europa.
eu/digital-single-market/en/news/policy-and-
investment-recommendations-trustworthy-artificial-
intelligence) and AI Index 2019 (https://hai.stanford.
edu/research/ai-index-2019).
14 Fundamentals of an MLOps Workflow
All these developments indicate a strong push toward the industrialization of AI, and
this is possible by bridging industry and research. MLOps will play a key role in the
industrialization of AI. If you invest in learning this method, it will give you a headstart
in your company or team and you could be a catalyst for operationalizing ML and
industrializing AI.
So far, we have learned about some challenges and developments in IT, software
development, and AI. Next, we will delve into understanding MLOps conceptually and
learn in detail about a generic MLOps workflow that can be used commonly for any use
case. These fundamentals will help you get a firm grasp of MLOps.
Understanding MLOps
Software development is interdisciplinary and is evolving to facilitate ML. MLOps is an
emerging method to fuse ML with software development by integrating multiple domains
as MLOps combines ML, DevOps, and data engineering, which aims to build, deploy,
and maintain ML systems in production reliably and efficiently. Thus, MLOps can be
expounded by this intersection.
The upper layer is the MLOps pipeline (build, deploy, and monitor), which is enabled
by drivers such as data, code, artifacts, middleware, and infrastructure. The MLOps
pipeline is powered by an array of services, drivers, middleware, and infrastructure, and
it crafts ML-driven solutions. By using this pipeline, a business or individual(s) can do
quick prototyping, testing, and validating and deploy the model(s) to production at scale
frugally and efficiently.
To understand the workings and implementation of the MLOps workflow, we will look at
the implementation of each layer and step using a figurative business use case.
Concepts and workflow of MLOps 17
• Data: The pet park has given you access to their data lake containing 100,000
labeled images of cats and dogs, which we will use for training the model.
• Infrastructure: Public cloud (IaaS).
This use case resembles a real-life use case for operationalizing ML and is used to explain
the workings and implementation of the MLOps workflow. Remember to look for an
explanation for the implementation of this use case at every segment and step of the
MLOps workflow. Now, let's look at the workings of every layer and step in detail.
Build
The build module has the core ML pipeline, and this is purely for training, packaging,
and versioning the ML models. It is powered by the required compute (for example, the
CPU or GPU on the cloud or distributed computing) resources to run the ML training
and pipeline:
The pipeline works from left to right. Let's look at the functionality of each step in detail:
• Data ingestion: This step is a trigger step for the ML pipeline. It deals with the
volume, velocity, veracity, and variety of data by extracting data from various data
sources (for example, databases, data warehouses, or data lakes) and ingesting
the required data for the model training step. Robust data pipelines connected to
multiple data sources enable it to perform extract, transform, and load (ETL)
operations to provide the necessary data for ML training purposes. In this step, we
can split and version data for model training in the required format (for example,
the training or test set). As a result of this step, any experiment (that is, model
training) can be audited and is back-traceable.
For a better understanding of the data ingestion step, here is the previously
described use case implementation:
Use case implementation
As you have access to the pet park's data lake, you can now procure data to get
started. Using data pipelines (part of the data ingestion step), you do the following:
1. Extract, transform, and load 100,000 images of cats and dogs.
2. Split and version this data into a train and test split (with an 80% and 20% split).
Versioning this data will enable end-to-end traceability for trained models.
Congrats – now you are ready to start training and testing the ML model using
this data.
• Model training: After procuring the required data for ML model training in the
previous step, this step will enable model training; it has modular scripts or code
that perform all the traditional steps in ML, such as data preprocessing, feature
engineering, and feature scaling before training or retraining any model. Following
this, the ML model is trained while performing hyperparameter tuning to fit the
model to the dataset (training set). This step can be done manually, but efficient and
automatic solutions such as Grid Search or Random Search exist. As a result, all
important steps of ML model training are executed with a ML model as the output
of this step.
Use case implementation
Concepts and workflow of MLOps 19
In this step, we implement all the important steps to train the image
classification model. The goal is to train a ML model to classify cats and dogs.
For this case, we train a convolutional neural network (CNN – https://
towardsdatascience.com/wtf-is-image-classification-
8e78a8235acb) for the image classification service. The following steps are
implemented: data preprocessing, feature engineering, and feature scaling before
training, followed by training the model with hyperparameter tuning. As a result,
we have a CNN model to classify cats and dogs with 97% accuracy.
• Model testing: In this step, we evaluate the trained model performance on a
separated set of data points named test data (which was split and versioned in the
data ingestion step). The inference of the trained model is evaluated according to
selected metrics as per the use case. The output of this step is a report on the trained
model's performance.
Use case implementation
We test the trained model on test data (we split data earlier in the Data ingestion
step) to evaluate the trained model's performance. In this case, we look for precision
and the recall score to validate the model's performance in classifying cats and dogs
to assess false positives and true positives to get a realistic understanding of the
model's performance. If and when we are satisfied with the results, we can proceed
to the next step, or else reiterate the previous steps to get a decent performing model
for the pet park image classification service.
• Model packaging: After the trained model has been tested in the previous step, the
model can be serialized into a file or containerized (using Docker) to be exported to
the production environment.
Use case implementation
The model we trained and tested in the previous steps is serialized to an ONNX file
and is ready to be deployed in the production environment.
20 Fundamentals of an MLOps Workflow
• Model registering: In this step, the model that was serialized or containerized in
the previous step is registered and stored in the model registry. A registered model
is a logical collection or package of one or more files that assemble, represent, and
execute your ML model. For example, multiple files can be registered as one model.
For instance, a classification model can be comprised of a vectorizer, model weights,
and serialized model files. All these files can be registered as one single model. After
registering, the model (all files or a single file) can be downloaded and deployed as
needed.
Use case implementation
The serialized model in the previous step is registered on the model registry and is
available for quick deployment into the pet park production environment.
By implementing the preceding steps, we successfully execute the ML pipeline
designed for our use case. As a result, we have trained models on the model registry
ready to be deployed in the production setup. Next, we will look into the workings
of the deployment pipeline.
Deploy
The deploy module enables operationalizing the ML models we developed in the previous
module (build). In this module, we test our model performance and behavior in a
production or production-like (test) environment to ensure the robustness and scalability
of the ML model for production use. Figure 1.12 depicts the deploy pipeline, which has
two components – production testing and production release – and the deployment
pipeline is enabled by streamlined CI/CD pipelines connecting the development to
production environments:
It works from left to right. Let's look at the functionality of each step in detail:
• Production release: Previously tested and approved models are deployed in the
production environment for model inference to generate business or operational
value. This production release is deployed to the production environment enabled
by CI/CD pipelines.
Use case implementation
We deploy a previously tested and approved model (by a quality assurance expert)
as an API service on a computer connected to CCTV in the pet park (production
setup). This deployed model performs ML inference on the incoming video data
from the CCTV camera in the pet park to classify cats or dogs in real time.
Monitor
The monitor module works in sync with the deploy module. Using explainable
monitoring (discussed later in detail, in Chapter 11, Key Principles for Monitoring Your
ML System), we can monitor, analyze, and govern the deployed ML application (ML
model and application). Firstly, we can monitor the performance of the ML model (using
pre-defined metrics) and the deployed application (using telemetry data). Secondly, model
performance can be analyzed using a pre-defined explainability framework, and lastly,
the ML application can be governed using alerts and actions based on the model's quality
assurance and control. This ensures a robust monitoring mechanism for the production
system:
In real time, we will monitor three things – data integrity, model drift, and
application performance – for the deployed API service on the park's computer.
Metrics such as accuracy, F1 score, precision, and recall are tracked to data integrity
and model drift. We monitor application performance by tracking the telemetry
data of the production system (the on-premises computer in the park) running the
deployed ML model to ensure the proper functioning of the production system.
Telemetry data is monitored to foresee any anomalies or potential failures and fix
them in advance. Telemetry data is logged and can be used to assess production
system performance over time to check its health and longevity.
• Analyze: It is critical to analyze the model performance of ML models deployed in
production systems to ensure optimal performance and governance in correlation
to business decisions or impact. We use model explainability techniques to measure
the model performance in real time. Using this, we evaluate important aspects such
as model fairness, trust, bias, transparency, and error analysis with the intention of
improving the model in correlation to business.
Over time, the statistical properties of the target variable we are trying to predict
can change in unforeseen ways. This change is called "model drift," for example, in
a case where we have deployed a recommender system model to suggest suitable
items for users. User behavior may change due to unforeseeable trends that could
not be observed in historical data that was used for training the model. It is essential
to consider such unforeseen factors to ensure deployed models provide the best
and most relevant business value. When model drift is observed, then any of these
actions should be performed:
a) The product owner or the quality assurance expert needs to be alerted.
b) The model needs to be switched or updated.
c) Re-training the pipeline should be triggered to re-train and update the model as
per the latest data or needs.
Use case implementation
We monitor the deployed model's performance in the production system (a
computer connected to the CCTV in the pet park). We will analyze the accuracy,
precision, and recall scores for the model periodically (once a day) to ensure the
model's performance does not deteriorate below the threshold. When the model
performance deteriorates below the threshold, we initiate system governing
mechanisms (for example, a trigger to retrain the model).
24 Fundamentals of an MLOps Workflow
Drivers
These are the key drivers for the MLOps pipeline: data, code, artifacts, middleware, and
infrastructure. Let's look into each of the drivers to get an overview of how they enable the
MLOps pipeline:
Concepts and workflow of MLOps 25
• Data: Data can be in multiple forms, such as text, audio, video, and images. In
traditional software applications, data quite often tends to be structured, whereas,
for ML applications, it can be structured or unstructured. To manage data in ML
applications, data is handled in these steps: data acquisition, data annotation,
data cataloging, data preparation, data quality checking, data sampling, and data
augmentation. Each step involves its own life cycle. This makes a whole new set of
processes and tools necessary for ML applications. For efficient functioning of the
ML pipeline, data is segmented and versioned into training data, testing data, and
monitoring data (collected in production, for example, model inputs, outputs, and
telemetry data). These data operations are part of the MLOps pipeline.
• Code: There are three essential modules of code that drive the MLOps pipeline:
training code, testing code, and application code. These scripts or code are executed
using the CI/CD and data pipelines to ensure the robust working of the MLOps
pipeline. The source code management system (for example, using Git or Mercurial)
will enable orchestration and play a vital role in managing and integrating
seamlessly with CI, CD, and data pipelines. All of the code is staged and versioned
in the source code management setup (for example, Git).
• Artifacts: The MLOps pipeline generates artifacts such as data, serialized models,
code snippets, system logs, ML model training, and testing metrics information. All
these artifacts are useful for the successful working of the MLOps pipeline, ensuring
its traceability and sustainability. These artifacts are managed using middleware
services such as the model registry, workspaces, logging services, source code
management services, databases, and so on.
26 Fundamentals of an MLOps Workflow
Summary
In this chapter, we have learned about the evolution of software development and
infrastructure to facilitate ML. We delved into the concepts of MLOps, followed by getting
acquainted with a generic MLOps workflow that can be implemented in a wide range of
ML solutions across multiple industries.
In the next chapter, you will learn how to characterize any ML problem into an MLOps-
driven solution and start developing it using an MLOps workflow.
2
Characterizing Your
Machine Learning
Problem
In this chapter, you will get a fundamental understanding of the various types of Machine
Learning (ML) solutions that can be built for production, and will learn to categorize the
relevant operations in line with the business and technological needs of your organization.
You will learn how to curate an implementation roadmap for operationalizing ML solutions,
followed by procuring the necessary tools and infrastructure for any given problem. By the end
of this chapter, you will have a solid understanding of how to architect robust and scalable ML
solutions and procure the required data and tools for implementing these solutions.
ML Operations (MLOps) aims to bridge academia and industry using state-of-the-art
engineering principles, and we will explore different elements from both industry and
academia to get a holistic understanding and awareness of the possibilities. Before beginning
to craft your MLOps solution, it is important to understand the various possibilities, setups,
problems, solutions, and methodologies on offer for solving business-oriented problems. To
achieve this understanding, we're going to cover the following main topics in this chapter:
Without further ado, let's jump in and explore the possibilities ML can enable by taking an
in-depth look into the ML solution development process and examining different types of
ML models to solve business problems.
Types of ML models
As there is a selection of ML and deep learning models that address the same business
problem, it is essential to understand the landscape of ML models in order to make an
efficient algorithm selection. There are around 15 types of ML techniques, these being
categorized into 4 categories, namely learning models, hybrid models, statistical models,
and Human-In-The-Loop (HITL) models, as shown in the following matrix (where each
grid square reflects one of these categories) in Figure 2.2. It is worth noting that there
are other possible ways of categorizing ML models and none of them are fully complete,
and as such, these categorizations will serve appropriately for some scenarios and not for
others. Here is our recommended categorization with which to look at ML models:
Learning models
First, we'll take a look at two types of standard learning models, supervised learning and
unsupervised learning:
Supervised learning
Supervised learning models or algorithms are trained based on labeled data. In the
training data, the result of the input is marked or known. Hence a model is trained to
predict the outcome when given an input based on the labeled data it learns from, and you
tell the system which output corresponds with a given input in the system.
Supervised learning models are very effective on narrow AI cases and well-defined tasks
but can only be harnessed where there is sufficient and comprehensive labeled data. We
can see in Figure 2.3, in the case of supervised learning, that the model has learned to
predict and classify an input.
Consider the example of an image classification model used to classify images of cats and
dogs. A supervised learning model is trained on labeled data consisting of thousands of
correctly labeled images of cats and dogs. The trained model then learns to classify a given
input image as containing a dog or a cat.
Unsupervised learning
Unsupervised learning has nothing to do with a machine running around and doing things
without human supervision. Unsupervised learning models or algorithms learn from
unlabeled data. Unsupervised learning can be used to mine insights and identify patterns
from unlabeled data. Unsupervised algorithms are widely used for clustering or anomaly
detection without relying on any labels. These algorithms can be pattern-finding algorithms;
when data is fed to such an algorithm, it will identify patterns and turn those into a recipe
for taking a new data input without a label and applying the correct label to it.
Types of ML models 31
Unsupervised learning is used mainly for analytics, though you could also use it for
automation and ML. It is recommended not to use these algorithms in production due
to their dynamic nature that changes outputs on every training cycle. However, they can
be useful to automate certain processes such as segmenting incoming data or identifying
anomalies in real time.
Let's discuss an example of clustering news articles into relevant groups. Let's assume
you have thousands of news articles without any labels and you would like to identify the
types or categories of articles. To perform unsupervised learning on these articles, we can
input a bunch of articles into the algorithm and converge it to put similar things together
(that is, clustering) in four groups. Then, we look at the clusters and discover that similar
articles have been grouped together in categories such as politics, sports, science, and
health. This is a way of mining patterns in the data.
Hybrid models
There have been rapid developments in ML by combining conventional methods to
develop hybrid models to solve diverse business and research problems. Let's look into
some hybrid models and how they work. Figure 2.4 shows various hybrid models:
Semi-supervised learning
Semi-supervised learning is a hybrid of supervised learning, used in cases where only a
few samples are labeled and a large number of samples are not labeled. Semi-supervised
learning enables efficient use of the data available (though not all of it is labeled),
including the unlabeled data. For example, a text document classifier is a typical example
of a semi-supervised learning program. It will be very difficult to locate a large number
of labeled text documents in this case, so semi-supervised learning is ideal. This is due to
the fact that making someone read through entire text documents just to assign a basic
classification is inefficient. As a result, semi-supervised learning enables the algorithm to
learn from a limited number of labeled text documents while classifying the large number
of unlabeled text documents present in the training data.
32 Characterizing Your Machine Learning Problem
Self-supervised learning
Self-supervised learning problems are unsupervised learning problems where data is not
labeled; these problems are translated into supervised learning problems in order to apply
algorithms for supervised learning to solve them sustainably. Usually, self-supervised
algorithms are used to solve an alternate task in which they supervise themselves to solve
the problem or generate an output. One example of self-supervised learning is Generative
Adversarial Networks (GANs); these are commonly used to generate synthetic data
by training on labeled and/or unlabeled data. With proper training, GAN models can
generate a relevant output in a self-supervised manner. For example, a GAN could
generate a human face based on a text description input, such as gender: male, age: 30,
color: brown, and so on.
Multi-instance learning
Multi-instance learning is a supervised learning problem in which data is not labeled by
individual data samples, but cumulatively in categories or classes. Compared to typical
supervised learning, where labeling is done for each data sample, such as news articles labeled
in categories such as politics, science, and sports, with multi-instance learning, labeling is done
categorically. In such scenarios, individual samples are collectively labeled in multiple classes,
and by using supervised learning algorithms, we can make predictions.
Multitask learning
Multitask learning is an incarnation of supervised learning that involves training a model
on one dataset and using that model to solve multiple tasks or problems. For example,
for natural language processing, we use word embeddings or Bidirectional Encoder
Representations from Transformers (BERT) embeddings models, which are trained on
one large corpus of data. (BERT is a pre-trained model, trained on a large text corpus.
The model has a deep understanding of how a given human language works.) And these
models can be used to solve many supervised learning tasks such as text classification,
keyword extraction, sentiment analysis, and more.
Reinforcement learning
Reinforcement learning is a type of learning in which an agent, such as a robot system,
learns to operate in a defined environment to perform sequential decision-making tasks or
achieve a pre-defined goal. Simultaneously, the agent learns based on continuously evaluated
feedback and rewards from the environment. Both feedback and rewards are used to shape
the learning of the agent, as shown in Figure 2.5. An example is Google's AlphaGo, which
recently outperformed the world's leading Go player. After 40 days of self-training using
feedback and rewards, AlphaGo was able to beat the world's best human Go player:
Types of ML models 33
Ensemble learning
Ensemble learning is a hybrid model that involves two or more models trained on the
same data. Predictions are made using each model individually and a collective prediction
is made as a result of combining all outputs and averaging them to determine the final
outcome or prediction. An example of this is the random forest algorithm, which is an
ensemble learning method for classification or regression tasks. It operates by composing
several decision trees while training, and creates a prediction as output by averaging the
predictions of all the decision trees.
Transfer learning
We humans have an innate ability to transfer knowledge to and from one another. This
same principle is translated to ML, where a model is trained to perform a task and it is
transferred to another model as a starting point for training or fine-tuning for performing
another task. This type of learning is popular in deep learning, where pre-trained models
are used to solve computer vision or natural language processing problems by fine-
tuning or training using a pre-trained model. Learning from pre-trained models gives a
huge jumpstart as models don't need to be trained from scratch, saving large amounts of
training data. For example, we can train a sentiment classifier model using training data
containing only a few labeled data samples. This is possible with transfer learning using a
pre-trained BERT model (which is trained on a large corpus of labeled data). This enables
the transfer of learning from one model to another.
34 Characterizing Your Machine Learning Problem
Federated learning
Federated learning is a way of performing ML in a collaborative fashion (synergy
between cloud and edge). The training process is distributed across multiple devices,
storing only a local sample of the data. Data is neither exchanged nor transferred between
devices or the cloud to maintain data privacy and security. Instead of sharing data, locally
trained models are shared to learn from each other to train global models. Let's discuss
an example of federated learning in hospitals (as shown in Figure 2.6) where patient data
is confidential and cannot be shared with third parties. In this case, ML training is done
locally in the hospitals (at the edge) and global models are trained centrally (on the cloud)
without sharing the data. Models trained locally are fine-tuned to produce global models.
Instead of ingesting data in the central ML pipeline, locally trained models are ingested.
Global models learn by tuning their parameters from local models to converge on optimal
performance, concatenating the learning of local models:
Statistical models
In some cases, statistical models are efficient at making decisions. It is vital to know where
statistical models can be used to get the best value or decisions. There are three types
of statistical models: inductive learning, deductive learning, and transductive learning.
Figure 2.7 shows the relationship between these types of statistical models:
Types of ML models 35
HITL models
There are two types of HITL models: human-centered reinforcement learning models
and active learning models. In these models, human-machine collaboration enables
the algorithm to mimic human-like behaviors and outcomes. A key driver for these ML
solutions is the human in the loop (hence HITL). Humans validate, label, and retrain the
models to maintain the accuracy of the model:
Based on the feedback received from the task environment and human expert, the agent
augments its behavior and actions. Human reinforcement learning is highly efficient in
environments where the agent has to learn or mimic human behavior. To learn more, read
the paper Human-Centered Reinforcement Learning: A Survey (https://ieeexplore.
ieee.org/abstract/document/8708686).
Active learning is a method where the trained model can query a HITL (the human
user) during the inference process to resolve incertitude during the learning process. For
example, this could be a question-answering chatbot asking the human user for validation
by asking yes or no questions.
These are the types of ML solutions possible to build for production to solve problems
in the real world. Now that you are aware of the possibilities for crafting ML solutions,
as the next step, it is critical to categorize your MLOps in line with your business and
technological needs. It's important for you to be able to identify the right requirements,
tools, methodology, and infrastructure needed to support your business and MLOps,
hence we will look into structuring MLOps in the next section.
MLOps can be categorized into small data ops, big data ops, large-scale MLOps,
and hybrid MLOps (this categorization is based on the author's experience and is a
recommended way to approach MLOps for teams and organizations):
• Big data: A quantity of data that cannot fit in the memory of a single typical
computer; for instance, > 1 TB
• Medium-scale data: A quantity of data that can fit in the memory of a single server;
for instance, from 10 GB to 1 TB
• Small-scale data: A quantity of data that easily fits in the memory of a laptop or a
PC; for instance, < 10 GB
With these factors in mind, let's look into the MLOps categories to identify the suitable
process and scale for implementing MLOps for your business problems or organization.
Structuring your MLOps 39
• Running into situations where much of the work is repeated by multiple people
including tasks such as crafting data, ML pipelines doing the same job, or training
similar types of ML models.
• Working in silos and having minimal understanding of the parallel work of their
teammates. This leads to less transparency.
• Incurring huge costs, or higher costs than expected, due to the mundane and
repeated work.
• Code and data starting to grow independently.
• Artifacts not being audited and hence are non-repeatable.
Any of these can be costly and unsustainable for the team. If you are working in a team or
have a setup like the following, you can categorize your operations as small data ops:
Hybrid MLOps
Hybrid teams operate with experienced data scientists, data engineers, and DevOps
engineers, and these teams make use of ML capabilities to support their business
operations. They are further ahead in implementing MLOps compared to other teams.
They work with big data and open source software tools such as PyTorch, TensorFlow,
and scikit-learn, and hence have a requirement for efficient collaboration. They often
work on well-defined problems by implementing robust and scalable software engineering
practices. However, this team is still prone to challenges such as the following:
• Incurring huge costs, or more than expected, due to mundane and repeated work to
be done by data scientists, such as repeating data cleaning or feature engineering.
• Inefficient model monitoring and retraining mechanisms.
• The team consists of data scientists, data engineers, and DevOps engineers.
• High requirement for efficient and effective collaboration.
• High requirement for big data processing capacity.
• High support requirements for open source technologies such as PyTorch,
TensorFlow, and scikit-learn for any kind of ML, from classical to deep learning,
and from supervised to unsupervised learning.
Large-scale MLOps
Large-scale operations are common in big companies with large or medium-sized
engineering teams consisting of data scientists, data engineers, and DevOps engineers. They
have data operations on the scale of big data, or with various types of data on various scales,
veracity, and velocity. Usually, their teams have multiple legacy systems to manage to support
their business operations. Such teams or organizations are prone to the following:
• Incurring huge costs, or more than expected, due to mundane and repeated work.
• Code and data starting to grow independently.
42 Characterizing Your Machine Learning Problem
• The team consists of data scientists, data engineers, and DevOps engineers.
• Large-scale inference and operations.
• Big data operations.
• ML model management on multiple resources.
• Big or multiple teams.
• Multiple use cases and models.
Once you have characterized your MLOps as per your business and technological needs,
a solid implementation roadmap ensures smooth development and implementation of a
robust and scalable MLOps solution for your organization. For example, a fintech start-up
processing 0-1,000 transactions a day would need small-scale data ops compared to a
larger financial institution that needs large-scale MLOps. Such categorization enables a
team or organization to be more efficient and robust.
Phase 1 – ML development
This is the genesis of implementing the MLOps framework for your problem; before
beginning to implement the requirements, the problem and solution must be clear
and vivid. In this phase, we take into account the system requirements to design and
implement a robust and scalable MLOps framework. We begin by selecting the right tools
and infrastructure needed (storage, compute, and so on) to implement the MLOps.
When the infrastructure is set up, we should be provisioned with the necessary workspace
and the development and test environments to execute ML experiments (training and
testing). We train the ML models using the development environment and test the
models for performance and functionality using test data in the development or test
environments, depending on the workflow or requirement. When infrastructure is set up
and the first ML model is trained, tested, serialized, and packaged, phase 1 of your MLOps
framework is set up and validated for robustness. Serializing and containerizing is an
important process to standardize and get the models ready for deployments.
Next, we move to implement phase 2.
44 Characterizing Your Machine Learning Problem
Phase 3 – Operations
Phase 3 is the core operations phase for deployed models in phase 2. In this phase, we
monitor the deployed model performance in terms of model drift, bias, or other metrics
(we will delve into these terms and metrics in the coming chapters). Based on the model's
performance, we can enable continual learning via periodic model retraining and enable
alerts and actions. Simultaneously, we monitor logs in telemetry data for the production
environment to detect any possible errors and resolve them on the go to ensure the
uninterrupted working of the production system. We also manage data pipelines, the ML
platform, and security on the go. With the successful implementation of this phase, we can
monitor the deployed models and retrain them in a robust, scalable, and secure manner.
In most cases, all three phases need to be implemented for your ML solution, but in some
cases just phases 1 and 2 are enough; for instance, when the ML models make batch
inferences and need not do inference in real time. By achieving these milestones and
implementing all three phases, we have set up a robust and scalable ML life cycle for our
applications systematically and sustainably.
Data
I used to believe that learning about data meant mastering tools such as Python, SQL,
and regression. The tool is only as good as the person and their understanding of the
context around it. The context and domain matter, from data cleaning to modeling to
interpretation. The best tools in the world won't fix a bad problem definition (or lack of
one). Knowing what problem to solve is a very context-driven and business-dependent
decision. Once you are aware of the problem and context, it enables you to discern the
right training data needed to solve the problem.
Training data is a vital part of ML systems. It plays a vital role in developing ML systems
compared to traditional software systems. As we have seen in the previous chapter, both
code and training data work in parallel to develop and maintain an ML system. It is not
only about the algorithm but also about the data. There are two aspects to ensure you have
the right data for algorithm training, which are to provide both the right quantity and
quality of data:
• Data quantity: Data scientists echo a common argument about their models, arguing
that model performance is not good because the quantity of data they were given
was not sufficient to produce good model performance. If they had more data, the
performance would have been better – are you familiar with such arguments? In
most cases, more data might not really help, as quality also is an important factor.
For instance, your models can learn more insights and characteristics from your
data if you have more samples for each class. For example, if you analyze anomalous
financial transactions with many samples in your data, you will discover more types of
anomalous transactions. If there is only one anomalous case, then ML is not useful.
The data requirements for ML projects should not solely focus on data quantity
itself, but also on the quality, which means the focus should not be on the number
of data samples but rather on the diversity of data samples. However, in some cases,
there are constraints on the quantity of data available to tackle some problems. For
example, let's suppose we work on models to predict the churn rate for an insurance
company. In that case, we can be restricted to considering data from a limited
period or using a limited number of samples due to the availability of data for a
certain time period; for example, 5 years (whereas the insurance company might
have operated for the last 50 years). The goal is to acquire data of the maximum
possible quantity and quality to train the best-performing ML models.
46 Characterizing Your Machine Learning Problem
• Data quality: Data quality is an important factor for training ML models; it impacts
model performance. The more comprehensive or higher the quality of the data,
the better the ML model or application will work. Hence the process before the
training is important: cleaning, augmenting, and scaling the data. There are some
important dimensions of data quality to consider, such as consistency, correctness,
and completeness.
Data consistency refers to the correspondence and coherence of the data samples
throughout the dataset. Data correctness is the degree of accuracy and the degree to
which you can rely on the data to truly reflect events. Data correctness is dependent
on how the data was collected. The sparsity of data for each characteristic (for
example, whether the data covers a comprehensive range of possible values to reflect
an event) reflects data completeness.
With an appropriate quantity of good-quality data, you can be sure that your
ML models and applications will perform above the required standards. Hence,
having the right standards is vital for the application to perform and solve business
problems in the most efficient ways.
Requirements
The product or business/tech problem owner plays a key role in facilitating the building
of a robust ML system efficiently by identifying requirements and tailoring them
with regard to the scope of data, collection of data, and required data formats. These
requirements are vital inputs for developers of ML systems, such as data scientists or
ML engineers, to start architecting the solution to address the problem by analyzing
and correlating the given dataset based on the requirements. ML solution requirements
should consist of comprehensive data requirements. Data requirement specifications
consist of information about the quality and quantity of the data. The requirements can be
more extensive; for example, they can contain estimations about anticipated or expected
predictive performance expressed in terms of the performance metrics determined during
requirements analysis and elicitation.
Tools and infrastructure 47
There is a surge in services provided by popular cloud service providers such as Microsoft,
AWS, and Google, which are complemented by data processing tools such as Airflow,
Databricks, and Data Lake. These are crafted to enable ML and deep learning, for
which there are great frameworks available such as scikit-learn, Spark MLlib, PyTorch,
TensorFlow, MXNet, and CNTK, among others. Tools and frameworks are many, but
procuring the right tools is a matter of choice and the context of your ML solution and
operations setup. Having the right tools will ensure high efficiency and automation for
your MLOps workflow. The options are many, the sky's the limit, but we have to start
from somewhere to reach the sky. For this reason, we will look to give you some hands-on
experience from here onward. It is always better to learn from real-life problems, and we
will do so by using the real-life business problem described in the next section.
Important note
Problem context:
You work as a data scientist in a small team with three other data scientists
for a cargo shipping company based in the port of Turku in Finland. 90%
of the goods imported into Finland come via cargo ships at the ports across
the country. For cargo shipping, weather conditions and logistics can be
challenging at times at the ports. Rainy conditions can distort operations and
logistics at the ports, which can affect the supply chain operations. Forecasting
rainy conditions in advance gives the possibility to optimize resources such as
human resources, logistics, and transport resources for efficient supply chain
operations at ports. Business-wise, forecasting rainy conditions in advance
enables ports to reduce operational costs by up to ~20% by enabling efficient
planning and scheduling of human resources, logistics, and transport resources
for supply chain operations.
Task:
You as a data scientist are tasked with developing an ML-driven solution to
forecast weather conditions 4 hours in advance at the port of Turku in Finland.
That will enable the port to optimize its resources, thereby enabling cost-
savings of up to 20%. To get started, you are provided with a historic weather
dataset covering a timeline of 10 years from the port of Turku (the dataset can
be accessed in the next chapter). Your task is to build a continuous-learning-
driven ML solution to optimize operations at the port of Turku.
Summary 49
To solve this problem, we will use Microsoft Azure, one of the most widely used cloud
services, and MLflow, an open source ML development tool, to get hands-on with using
resources. This way, we will get experience working on the cloud and with open source
software. Before starting the hands-on implementation in the next chapter, please make
sure to do the following:
Now, with this, you are all set to get hands-on with implementing an MLOps framework
for the preceding business problem.
Summary
In this chapter, we have learned about the ML solution development process, how
to identify a suitable ML solution to a problem, and how to categorize operations to
implement suitable MLOps. We got a glimpse into a generic implementation roadmap and
saw some tips for procuring essentials such as tools, data, and infrastructure to implement
your ML application. Lastly, we went through the business problem to be solved in the
next chapter by implementing an MLOps workflow (discussed in Chapter 1, Fundamentals
of MLOps Workflow) in which we'll get some hands-on experience in MLOps.
In the next chapter, we will go from theory to practical implementation. The chapter gets
hands-on when we start with setting up MLOps tools on Azure and start coding to clean
the data to address the business problem and get plenty of hands-on experience.
3
Code Meets Data
In this chapter, we'll get started with hands-on MLOps implementation as we learn by solving
a business problem using the MLOps workflow discussed in the previous chapter. We'll also
discuss effective methods of source code management for machine learning (ML), explore
data quality characteristics, and analyze and shape data for an ML solution.
We begin this chapter by categorizing the business problem to curate a best-fit MLOps
solution for it. Following this, we'll set up the required resources and tools to implement
the solution. 10 guiding principles for source code management for ML are discussed to
apply clean code practices. We will discuss what constitutes good-quality data for ML
and much more, followed by processing a dataset related to the business problem and
ingesting and versioning it to the ML workspace. Most of the chapter is hands-on and
designed to equip you with a good understanding of and experience with MLOps. For
this, we're going to cover the following main topics in this chapter:
Without further ado, let's jump into demystifying the business problem and implementing
the solution using an MLOps approach.
The first step in solving a problem is to simplify and categorize it using an appropriate
approach. In the previous chapter, we discussed how to categorize a business problem to
solve it using ML. Let's apply those principles to chart a clear roadmap to implementing it.
First, we'll see what type of model we will train to yield the maximum business value.
Secondly, we will identify the right approach for our MLOps implementation.
In order to decide on the type of model to train, we can start by having a glance at
the dataset available on GitHub: https://github.com/PacktPublishing/
EngineeringMLOps.
Here is a snapshot of weather_dataset_raw.csv, in Figure 3.1. The file size is 10.7
MB, the number of rows is 96,453, and the file is in CSV format:
Business problem analysis and categorizing the problem 53
• Model type: In order to save 20% of the operational costs at the port of Turku,
a supervised learning model is required to predict by classifying whether it will
rain or not rain. Data is labeled, and the Weather condition column depicts
whether an event has recorded rain, snow, or clear conditions. This can be framed or
relabeled as rain or no rain and used to perform binary classification. Hence, it is
straightforward to solve the business problem with a supervised learning approach.
• MLOps approach: By observing the problem statement and data, here are the facts:
(a) Data: The training data is 10.7 MB. The data size is reasonably small (it cannot
be considered big data).
(b) Operations: We need to train, test, deploy, and monitor an ML model to forecast the
weather at the port of Turku every hour (4 hours in advance) when new data is recorded.
(c) Team size: A small/medium team of data scientists, no DevOps engineers.
Based on the preceding facts, we can categorize the operations into small team ops; there is no
need for big data processing and the team is small and agile. Now we will look at some suitable
tools to implement the operations needed to solve the business problem at hand.
For us to get a holistic understanding of MLOps implementation, we will implement the
business problems using two different tools simultaneously:
We use these two tools to see how things work from a pure cloud-based approach and
from an open source / cloud-agnostic approach. All the code and CI/CD operations will
be managed and orchestrated using Azure DevOps, as shown in Figure 3.2:
Installing MLflow
We get started by installing MLflow, which is an open source platform for managing
the ML life cycle, including experimentation, reproducibility, deployment, and a central
model registry.
To install MLflow, go to your terminal and execute the following command:
After successful installation, test the installation by executing the following command to
start the mlflow tracking UI:
mlflow ui
Upon running the mlflow tracking UI, you will be running a server listening at port
5000 on your machine, and it outputs a message like the following:
You can access and view the mlflow UI at http://localhost:5000. When you have
successfully installed mlflow and run the tracking UI, you are ready to install the next tool.
When the resource group is reviewed and created, you can set up and manage all the
services related to the ML solution in this resource group. The newly created resource
group will be listed in the resource group list.
Setting up the resources and tools 57
You can find detailed instructions on creating an Azure Machine Learning instance
here: https://docs.microsoft.com/en-us/azure/machine-learning/
how-to-manage-workspace.
Azure DevOps
All the source code and CI/CD-related operations will be managed and orchestrated using
Azure DevOps. The code we manage in the repository in Azure DevOps will be used to
train, deploy, and monitor ML models enabled by CI/CD pipelines. Let's start by creating
an Azure DevOps subscription:
4. Import a repository from a public GitHub project from this repository: https://
github.com/PacktPublishing/EngineeringMLOps (as shown in Figure
3.5):
Figure 3.5 – Import the GitHub repository into the Azure DevOps project
After importing the GitHub repository, files from the imported repository will be
displayed.
60 Code Meets Data
JupyterHub
Lastly, we'll need an interactive data analysis and visualization tool to process data using
our code. For this, we use JupyterHub. This is a common data science tool used widely
by data scientists to process data, visualize data, and train ML models. To install it, follow
two simple steps:
• Modularity: It is better to have modular code than to have one big chunk.
Modularity encourages reusability and facilitates upgrading by replacing the
required components. To avoid needless complexity and repetition, follow this
golden rule:
Two or more ML components should be paired only when one of them uses
the other. If none of them uses each other, then pairing should be avoided.
An ML component that is not tightly paired with its environment can be more
easily modified or replaced than a tightly paired component.
10 principles of source code management for ML 61
To learn more about the implementation of unit testing, read this documentation:
https://docs.python.org/3/library/unittest.html.
62 Code Meets Data
• Version control (code, data and models): Git is used for version control of code in
ML systems. The purpose of version control is to ensure that all the team members
working on the system have access to up-to-date code and that code is not lost when
there is a hardware failure. One rule of working with Git should be to not break the
master (branch). This means when you have working code in the repository and
you add new features or make improvements, you do this in a feature branch, which
is merged to the master branch when the code is working and reviewed. Branches
should be given a short descriptive name, such as feature/label-encoder. Branch
naming and approval guidelines should be properly communicated and agreed upon
with the team to avoid any complexity and unnecessary conflicts. Code review is
done with pull requests to the repository of the code. Usually, it is best to review
code in small sets, less than 400 lines. In practice, it often means one module or a
submodule at a time.
Versioning of data is essential for ML systems as it helps us to keep track of which
data was used for a particular version of code to generate a model. Versioning
data can enable reproducing models and compliance with business needs and law.
We can always backtrack and see the reason for certain actions taken by the ML
system. Similarly, versioning of models (artifacts) is important for tracking which
version of a model has generated certain results or actions for the ML system. We
can also track or log parameters used for training a certain version of the model.
This way, we can enable end-to-end traceability for model artifacts, data, and code.
Version control for code, data, and models can enhance an ML system with great
transparency and efficiency for the people developing and maintaining it.
• Logging: In production, a logger is useful as it is possible to monitor and identify
important information. The print statements are good for testing and debugging
but not ideal for production. The logger contains information, especially system
information, warnings, and errors, that are quite useful in the monitoring of
production systems.
• Error handling: Error handling is vital for handling edge cases, especially ones that
are hard to anticipate. It is recommended to catch and handle exceptions even if you
think you don't need to, as prevention is better than cure. Logging combined with
exception handling can be an effective way of dealing with edge cases.
10 principles of source code management for ML 63
Returns:
bool: The return value. True for success, False
otherwise.
"""
64 Code Meets Data
When these five characteristics are maximized, it ensures the highest data quality. With
these principles in mind, let's delve into the implementation, where code meets data.
Firstly, let's assess the data and process it to get it ready for ML training. To get started,
clone the repository you imported to your Azure DevOps project (from GitHub):
Next, open your terminal and access the folder of the cloned repository and spin up
the JupyterLab server for data processing. To do so, type the following command in the
terminal:
jupyter lab
To process raw data and get it ready for ML, you will do the compute and data processing
on your local PC. We start by installing and importing the required packages and
importing the raw dataset (as shown in the dataprocessing.ipynb and .py scripts).
Python instructions in the notebooks must be executed in the existing notebook:
%matplotlib inline
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
from azureml.core import Workspace, Dataset
#import dataset
df = pd.read_csv('Dataset/weather_dataset_raw.csv')
With this, you have imported the dataset into a pandas DataFrame, df, for further
processing.
Data preprocessing
Raw data cannot be directly passed to the ML model for training purposes. We have to refine
or preprocess the data before training the ML model. To further analyze the imported data,
we will perform a series of steps to preprocess the data into a suitable shape for the ML
training. We start by assessing the quality of the data to check for accuracy, completeness,
reliability, relevance, and timeliness. After this, we calibrate the required data and encode
text into numerical data, which is ideal for ML training. Lastly, we will analyze the
correlations and time series, and filter out irrelevant data for training ML models.
df.describe()
Data preprocessing 67
By using the describe function, we can observe descriptive statistics in the output as
follows:
df.dtypes
• S_No int64
• Timestamp object
• Location object
• Temperature_C float64
• Apparent_Temperature_C float64
• Humidity float64
• Wind_speed_kmph float64
• Wind_bearing_degrees int64
• Visibility_km float64
• Pressure_millibars float64
• Weather_conditions object
• dtype: object
68 Code Meets Data
Most of the columns are numerical (float and int), as expected. The Timestamp
column is in object format, which needs to be changed to DateTime format:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.isnull().values.any()
Upon checking for any null values, if null values are discovered, as a next step the
calibration of missing data is essential.
df['Weather_conditions'].fillna(method='ffill', inplace=True,
axis=0)
NaN or null values have only been observed in the Weather_conditions column. We
replace the NaN values by using the fillna() method from pandas and the forward
fill (ffill) method. As weather is progressive, it is likely to replicate the previous event
in the data. Hence, we use the forward fill method, which replicates the last observed
non-null value until another non-null value is encountered.
Label encoding
As the machines do not understand human language or text, all the text has to be converted
into numbers. Before that, let's process the text. We have a Weather_conditons column
in text with values or labels such as rain, snow, and clear. These values are found using
pandas' value_counts() function, as follows:
df['Weather_conditions'].value_counts()
Data preprocessing 69
This will replace both snow and clear values with no_rain as both conditions
imply no rain conditions at the port. Now that labels are processed, we can convert the
Weather_conditions column into a machine-readable form or numbers using label
encoding. Label encoding is a method of converting categorical values into a machine-
readable form or numbers by assigning each category a unique value. As we have only
two categories, rain and no_rain, label encoding can be efficient as it converts these
values to 0 and 1. If there are more than two values, one-hot encoding is a good choice
because assigning incremental numbers to categorical variables can give the variables
higher priority or numerical bias during training. One-hot encoding prevents bias or
higher preference for any variable, ensuring neutral privileges to each value of categorical
variables. In our case, as we have only two categorical variables, we perform label
encoding using scikit-learn as follows:
Here, we import the LabelEncoder() function, which will encode the Weather_
conditions column into 0s and 1s using the fit_transform() method. We can do
this by replacing the previous textual column with a label encoded or machine-readable
form to column Weather_condition as follows:
y = pd.DataFrame(data=y, columns=["Weather_condition"])
df = pd.concat([df,y], axis=1)
df.drop(['Weather_conditions'], axis=1, inplace=True)
df['Future_weather_condition'] = df.Current_weather_condition.
shift(4, axis = 0)
df.dropna(inplace=True)
We will use pandas' dropna() function on the DataFrame to discard or drop null values,
because some rows will have null values due to shifting to a new column.
df.corr(method="pearson")
# Visualizing using heatmap
corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()
From the heatmap in Figure 3.8, we can see that the Temperature and Apparent_
Temperature_C coefficient is 0.99. S_No (Serial number) is a continuous value, which
is more or less like an incremental index for a DataFrame and can be discarded or filtered
out as it does not provide great value. Hence both Apparent_Temperature and S_No
are dropped or filtered. Now let's observe our dependent variable, Future_weather_
condition, and its correlation with other independent variables:
Anything between 0.5 and 1.0 has a positive correlation and anything between -0.5 and
-1.0 has a negative correlation. Judging from the graph, there is a positive correlation with
Current_weather_condition, and Temperature_C is also positively correlated
with Future_weather_c.
time = df['Timestamp]
temp = df['Temperature_C']
# plot graph
plt.plot(time, temp)
plt.show()
Figure 3.11 – Workspace credentials (Resource group, Subscription ID, and Workspace name)
Data registration and versioning 75
When the data is uploaded to the data store, then we will register the dataset to the
workspace and version it as follows:
weather_ds = dataset.register(workspace=workspace,
name=weather_ds_portofTurku, description='processed weather
data')
76 Code Meets Data
Feature Store
A feature store compliments the central storage by storing important features and make
them available for training or inference. A feature store is a store where you transform
raw data into useful features that ML models can use directly to train and infer to make
predictions. Raw Data typically comes from various data sources, which are structured,
unstructured, streaming, batch, and real-time. It all needs to get pulled, transformed
(using a feature pipeline), and stored somewhere, and that somewhere can be the feature
store. The feature store then takes the data and makes it available for consumption. Data
scientists tend to duplicate work (especially data processing). It can be avoided if we have
a centralized feature store. Feature store allows data scientists to efficiently share and reuse
features with other teams and thereby increase their productivity as they don't have to
pre-process features from scratch.
It is good to know the advantages of a feature store as it can be useful to fuel the ML
pipeline (especially the data ingestion step), however not suitable for all cases. It depends
on your use case. For our use case implementation, we will not use feature store but
proceed to liaisoning data directly from central storage where we have preprocessed and
registered the datasets we need for training and testing. With ingested and versioned data,
you are set to proceed towards building your ML Pipeline. The ML pipeline will enable
further feature engineering, feature scaling, curating training, and testing datasets that will
be used to train ML models and tune hyperparameters for machine learning training. The
ML pipeline and functionalities will be performed over cloud computing resources, unlike
locally on your computer as we did in this chapter. It will be purely cloud-based.
Summary
In this chapter, we have learned how to identify a suitable ML solution to a business
problem and categorize operations to implement suitable MLOps. We set up our tools,
resources, and development environment. 10 principles of source code management
were discussed, followed by data quality characteristics. Congrats! So far, you have
implemented a critical building block of the MLOps workflow – data processing and
registering processed data to the workspace. Lastly, we had a glimpse into the essentials of
the ML pipeline.
In the next chapter, you will do the most exciting part of MLOps: building the ML
pipeline. Let's press on!
4
Machine Learning
Pipelines
In this chapter, we will explore and implement machine learning (ML) pipelines by
going through hands-on examples using the MLOps approach. We will learn more
by solving the business problem that we've been working on in Chapter 3, Code Meets
Data. This theoretical and practical approach to learning will ensure that you will have
comprehensive knowledge of architecting and implementing ML pipelines for your
problems or your company's problems. A ML pipeline has modular scripts or code that
perform all the traditional steps in ML, such as data preprocessing, feature engineering,
and feature scaling before training or retraining any model.
We begin this chapter by ingesting the preprocessed data we worked on in the last chapter
by performing feature engineering and scaling it to get it in shape for the ML training. We
will discover the principles of ML pipelines and implement them on the business problem.
Going ahead, we'll look into ML model training, hyperparameter tuning, and the testing
of the trained models. Finally, we'll learn about packaging the models and their needed
artifacts. We'll register the models for further evaluation and will deploy the ML models.
We are going to cover the following main topics in this chapter:
1. Data ingestion
2. Model training
3. Model testing
4. Model packaging
5. Model registering
We will implement all these steps of the pipeline using the Azure ML service (cloud-
based) and MLflow (open source) simultaneously for the sake of a diverse perspective.
Azure ML and MLflow are a power couple for MLOps: they exhibit the features shown in
Table 4.1. They are also unique in their capabilities, as we can see from the following table.
Going through the basics of ML pipelines 81
Without further ado, let's configure the needed compute resources for the ML pipeline
using the following steps:
1. Go to your ML workspace.
Select a suitable compute option based on your training needs and cost limitations
and give it a name. For example, in Figure 4.4, a compute or virtual machine is
selected for the experiment Standard_D1_v2: it is a CPU with 1 Core, 3.5 GB of
RAM, and 50 GB of disk space. To select the suggested machine configuration or
size, you must check select from all options in the Virtual machine size section.
After selecting the desired virtual machine configuration or size, click the Next
button to proceed and you will see the screen shown in Figure 4.4.
4. Provision the compute resource created previously. After naming and creating
the needed compute resource, your compute resource is provisioned, ready, and
running for ML training on the cloud, as shown in Figure 4.5.
Now we'll begin with the hands-on implementation of the ML pipeline. Follow these steps
to implement the ML pipeline:
1. To start the implementation, clone the repository you have imported into the Azure
DevOps project. To clone the repository, click on the Clone button in the upper-
right corner from the Repos menu and then click on the Generate Git Credentials
button. A hash password will be created.
Figure 4.7 – Cloning an Azure DevOps Git repository (Generate Git Credentials)
2. Copy the HTTPS link from the Command Line section to get the Azure DevOps
repository link, like this:
https://xxxxxxxxx@dev.azure.com/xxxxx/Learn_MLOps/_git/
Learn_MLOps
86 Machine Learning Pipelines
3. Copy the password generated from step 1 and add it to the link from step 2 by
adding the password just after the first username separated by : before the @
character. Then it is possible to use the following git clone command without
getting permission errors:
git clone https://user:password_hash@dev.azure.com/user/
repo_created
4. Once you are running JupyterLab, we will access the terminal to clone the
repository to the azure compute. To access the terminal, you must select the
Terminal option from the Launcher tab. Another way to access the terminal
directly is by using the Terminal link from the Application URI column in the list
of compute instances in the Azure ML workspace. Go to the Terminal option of
JupyterLab and implement the following (as shown in Figure 4.7):
git clone https://xxxxxxxxx@dev.azure.com/xxxxx/Learn_
MLOps/_git/Learn_MLOps
Figure 4.8 – Clone the Azure DevOps Git repository on Azure compute
Going through the basics of ML pipelines 87
7. Next, we use setup MLflow (for tracking experiments). Use the get_mlflow_
tracking_url() function to get a tracking ID for where MLflow experiments
and artifacts should be logged (in this case, we get the tracking ID for the
provisioned training compute). Then use the set_tracking_uri() function
to connect to a tracking URI (the uniform resource identifier of a specific resource)
for the provisioned training compute. The tracking URI can be either for a remote
server, a database connection string, or a local path to log data in a local directory. In
our case, we point the tracking URI to the local path by default (on the provisioned
training compute):
uri = workspace.get_mlflow_tracking_uri( )
mlflow.set_tracking_uri(uri)
The URI defaults to the mlruns folder where MLflow artifacts and logs will be
saved for experiments.
By setting the tracking URI for your MLflow experiments, you have set the location
for MLflow to save its artifacts and logs in the mlruns folder (on your provisioned
compute). After executing these commands, check for the current path. You will find
the mlruns folder.
1. Using the Workspace() function from the Azure ML SDK, access the data from
the datastore in the ML workspace as follows:
from azureml.core import Workspace, Dataset
subscription_id = 'xxxxxx-xxxxxx-xxxxxxx-xxxxxxx'
resource_group = 'Learn_MLOps'
workspace_name = 'MLOps_WS'
workspace = Workspace(subscription_id, resource_group,
workspace_name)
Data ingestion and feature engineering 89
Note
Insert your own credentials, such as subscription_id, resource_
group, and workspace_name and initiate a workspace object using these
credentials.
When these instructions are successfully executed in the JupyterLab, you can run
the remaining blocks of code in the next cells.
2. Import the preprocessed dataset that was prepared in the previous chapter. The
preprocessed dataset is imported using the .get_by_name() function from the
Dataset function from the Azureml SDK and the function is used to retrieve the
needed dataset:
# Importing pre-processed dataset
dataset = Dataset.get_by_name (workspace,
name='processed_weather_data_portofTurku')
print(dataset.name, dataset.version)
3. Upon successfully retrieving or mounting the dataset, you can confirm by printing
dataset.name and dataset.version, which should print processed_
weather_data_portofTurku 1 or as per the name you have given the
dataset previously.
4. After retrieving the preprocessed data, it is vital to split it into training and
validation sets in order to train the ML model and test or evaluate it in the training
phase and later stages. Hence, we split it into the training and validation sets, by
splitting it in the 80% (training set) and 20% (test set) split-ratio as follows:
df_training = df.iloc[:77160]
df_test = df.drop(df_training.index)
df_training.to_csv('Data/training_data.csv',index=False)
df_test.to_csv('Data/test_data.csv',index=False)
5. After successfully splitting the data, these two datasets are stored and registered
to the datastore (connected to the Azure ML workspace) as follows:
datastore = workspace.get_default_datastore()
datastore.upload(src_dir='Data', target_path='data')
training_dataset = /
Dataset.Tabular.from_delimited_files(datastore.
path('data/training_data.csv'))
90 Machine Learning Pipelines
validation_dataset = /
Dataset.Tabular.from_delimited_files(datastore.
path('data/validation_data.csv'))
training_ds = training_dataset.
register(workspace=workspace, name='training_dataset',
description='Dataset to use for ML training')
test_ds = validation_dataset.register(workspace=workspace,
name='test_dataset',
description='Dataset for validation ML models')
By using the register() function, we are able to register the training and test datasets,
which can be imported later from the datastore.
Next, we will import the training data and ingest it into the ML pipeline and use the
test dataset later to test the model's performance on unseen data in production or for
model analysis.
The training dataset is now retrieved and will be used to further train the ML models.
The goal is to train classification models to predict whether it will rain or not. Hence,
select the Temperature, Humidity, Wind_speed, Wind_bearing, Visibility,
Pressure, and Current_weather_conditions features to train the binary
classification models to predict weather conditions in the future (4 hours ahead).
Data ingestion and feature engineering 91
1. Before training the ML models, selecting the right features and scaling the data
is vital. Therefore, we select features as follows. The values in the variable X
represent the independent variables and the variable Y is the dependent variable
(forecasted weather):
X = df[['Temperature_C', 'Humidity', 'Wind_speed_kmph',
'Wind_bearing_degrees', 'Visibility_km', 'Pressure_
millibars', 'Current_weather_condition']].values
y = df['Future_weather_condition'].values
2. Split the training data into the training and testing sets (for training validation
after training) using the train_test_split() function from sklearn. Fixing
the random seed (random_state) is needed to reproduce a training session by
keeping the samples from the previous experiment with the same configuration.
Hence, we will use random_state=1:
# Splitting the Training dataset into Train and Test set
for ML training
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.2, random_state=1)
With an 80% (training data) and 20% (test data) split, the training and test datasets
are now ready for feature scaling and ML model training.
3. For the ML model training to be optimal and efficient, the data needs to be on
the same scale. Therefore, we scale the data using StandardScalar() from
sklearn to calibrate all the numeric values in the data on the same scale:
With this step, the numeric values of the training data are scaled using
StandardScalar and all the values are transformed in the range of -1 to 1,
based on X_train values. Now we are ready to train ML
models (the fun part)!
92 Machine Learning Pipelines
Now we have initiated an experiment in both the Azure ML workspace and MLflow.
The following training step will be monitored and logged.
Machine learning training and hyperparameter optimization 93
The goal of this run or experiment is to train the best SVM model with the best
parameters. Grid Search is used to test the different parameter combinations and
optimize the convergence of the algorithm to the best performance. Grid Search
takes some time to execute (around 15 minutes on the STANDARD_DS11_V2 (2
cores, 14 GB RAM) compute machine). The result or the output of the Grid Search
suggests the best performing parameters to be C=1 and the kernel as rbf. Using
run.log(), we have logged the dataset used to train the model (the training set)
and keep track of the experiment. This data is logged to the Azure ML workspace
and the MLflow experiments.
4. Finally, using the best parameters, a new model is trained using C=1 and
kernel='rbf ' as follows:
svc = SVC(C=svc_grid.get_params(deep=True)
['estimator__C'], kernel=svc_grid.get_params(deep=True)
['estimator__kernel'])
svc.fit(X_train, y_train)
# Logging training parameters to AzureML and MLFlow
experiments
94 Machine Learning Pipelines
run.log("C", svc_grid.get_params(deep=True)
['estimator__C'])
run.log("Kernel", svc_grid.get_params(deep=True)
['estimator__kernel'])
After training the SVC classifier, the following output
is shown:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_
deprecated',
kernel='rbf', max_iter=-1, probability=False, random_
state=None,
shrinking=True, tol=0.001, verbose=False)
With this, we have trained the SVM model! We will now train the Random Forest
classifier model.
1. To start training the Random Forest classifier, initialize the experiment in the Azure
ML workspace and the MLflow experiment as follows:
myexperiment = Experiment(workspace, "support-vector-
machine")
mlflow.set_experiment("mlflow-support-vector-machine")
This is the expected result when finishing training the Random Forest model.
With this, you have successfully finished training the Random Forest model and,
in total, two ML models: the SVM classifier and the Random Forest classifier.
After training, it is vital to test the performance of the model in terms of accuracy
and other metrics to know whether the model is fit enough for the production or
testing environment.
96 Machine Learning Pipelines
Next, we will test the performance of the trained models on the test data that we split
before training the models.
We will measure these metrics for the trained model on the validation dataset. Let's see
the results for the SVM classifier and the Random Forest classifier.
predicted_svc = svc.predict(X_test)
acc = accuracy_score(y_test, predicted_svc)
fscore = f1_score(y_test, predicted_svc, average="macro")
precision = precision_score(y_test, predicted_svc,
average="macro")
Model packaging 97
The results of the test data metrics are logged in the Azure ML workspace as per the
experiment. You can read these logs later after registering the model (we will register
the model in Registering models and production artifacts).
The output of the model performance metrics on test data samples are logged to the Azure
ML workspace and MLflow experiments using the run.log() function.
Model packaging
After the trained model has been tested in the previous step, the model can be serialized
into a file to be exported to the test or the production environment. Serialized files
come with compatibility challenges, such as model interoperability, if not done right.
Model interoperability is a challenge, especially when models are trained using different
frameworks. For example, if model 1 is trained using sklearn and model 2 is trained
using TensorFlow, then model 1 cannot be imported or exported using TensorFlow for
further model fine-tuning or model inference.
98 Machine Learning Pipelines
To avoid this problem, ONNX offers an open standard for model interoperability. ONNX
stands for Open Neural Network Exchange. It provides a serialization standard for
importing and exporting models. We will use the ONNX format to serialize the models
to avoid compatibility and interoperability issues.
Using ONNX, the trained model is serialized using the skl2onnx library. The model is
serialized as the file svc.onnx for further exporting and importing of the model into
test and production environments:
The output of this code is a serialized svc.onnx file. Similarly, using ONNX, we will
convert the Random Forest model into a serialized file named rf.onnx for further
exporting and importing of the model into test and production environments:
The output of this code is a serialized rf.onnx file. Next, we will register these serialized
models to the model registry.
Let's register our serialized models in the previous section by using the model
.register() function from the Azure ML SDK. By using this function, the serialized
ONNX file is registered to the workspace for further use and deploying to the test and
production environment. Let's register the serialized SVM classifier model (svc.onnx):
print('Name:', model.name)
print('Version:', model.version)
The model is registered by naming and tagging the model as per the need. We can confirm
the successful registering of the model by checking the registered model name and
version. The output will reflect the model name you used when registering (for example,
support-vector-classifier) and will show the model version as 1. Likewise, let's
register the serialized Random Forest classifier model (rf.onnx):
print('Name:', model.name)
print('Version:', model.version)
100 Machine Learning Pipelines
After successful registering of the model, the output of the print function will reflect the
model name you used while registering (random-forest-classifier) and will show the
model version as 1. Lastly, we will register production artifacts for inference. Now you can see
both models in the Models section of the Azure ML workspace as shown in Figure 4.8:
pickle.dump(sc, scaler_pkl)
Registering models and production artifacts 101
The output of the code will save a serialized pickle file for the scaler with the filename
scaler.pkl. Next, we will register this file to the model registry to later download and
deploy together with our models for inference. The scaler is registered using the model
.register() function as follows:
print('Name:', scaler.name)
print('Version:', scaler.version)
Upon saving and registering the scaler object, a registered object can be found on the
Azure ML workspace. Likewise, registered models can be tracked, as shown in Figure 4.8:
Congratulations! Both the SVM classifier and Random Forest classifier, along with the
serialized scaler, are registered in the model registry. These models can be downloaded
and deployed later. This brings us to the successful implementation of the ML pipeline!
Summary
In this chapter, we went through the theory of ML pipelines and practiced them by
building ML pipelines for a business problem. We set up tools, resources, and the
development environment for training these ML models. We started with the data
ingestion step, followed by the model training step, testing step, and packaging step, and
finally, we completed the registering step. Congrats! So far, you have implemented
a critical building block of the MLOps workflow.
In the next chapter, we will look into evaluating and packaging production models.
5
Model Evaluation
and Packaging
In this chapter, we will learn in detail about ML model evaluation and interpretability
metrics. This will enable us to have a comprehensive understanding of the performance
of ML models after training them. We will also learn how to package the models and
deploy them for further use (such as in production systems). We will study in detail how
we evaluated and packaged the models in the previous chapter and explore new ways of
evaluating and explaining the models to ensure a comprehensive understanding of the
trained models and their potential usability in production systems.
104 Model Evaluation and Packaging
We begin this chapter by learning various ways of measuring, evaluating, and interpreting
the model's performance. We look at multiple ways of testing the models for production
and packaging ML models for production and inference. An in-depth study of the ML
models' evaluation will be carried out as you will be presented with a framework to
assess any kind of ML model and package it for production. Get ready to build a solid
foundation in terms of evaluation and get ML models ready for production. For this, we
are going to cover the following main topics in this chapter:
Cross-validation
Evaluating an ML model is vital to understanding its behaviour and this can be tricky.
Normally, the dataset is split into two sub-sets: the training and the test sets. First, the
training set is used to train the model, and then the test set is used to test the model. After
this, the model's performance is evaluated to determine the error using metrics such as
the accuracy percentage of the model on test data.
This methodology is not reliable and comprehensive because accuracy for one test set
can be different from another test set. To avoid this problem, cross-validation provides a
solution by fragmenting or splitting the dataset into folds and ensuring that each fold is
used as a test set at some point, as shown in Figure 5.2:
Precision
When a classifier is trained, precision can be a vital metric in quantifying positive class
predictions made by the classifier that are actually true and belong to the positive class.
Precision quantifies the number of correct positive predictions.
For example, let's say we have trained a classifier to predict cats and dogs from images.
Upon inferring the trained model on the test images, the model is used for predicting/
detecting dogs from images (in other words, dogs being the positive class). Precision, in
this case, quantifies the number of correct dog predictions (positive predictions).
Precision is calculated as the ratio of correctly predicted positive examples to the total
number of predicted positive examples.
Precision = TruePositives / (TruePositives + FalsePositives)
Precision focuses on minimizing false positives. High precision ranges from 0 to 1, and
it relates to a low false positive rate. The higher the precision, the better it is; for example,
an image classifier model that predicts whether a cancer patient requires chemotherapy
treatment. If the model predicts that a patient should be submitted for chemotherapy
when it is not really necessary, this can be very harmful as the effects of chemotherapy can
be detrimental when not required. This case is a dangerous false positive. A high-precision
score will result in fewer false positives, while having a low-precision score will result
in a high number of false positives. Hence, regarding the chemotherapy treatment the
prediction model should have a high-precision score.
108 Model Evaluation and Packaging
Recall
When a classifier is trained, recall can be used to quantify the positive class predictions
established from the total number of positive examples in the dataset. Recall measures
the number of correct positive predictions made out of the total number of positive
predictions that could have been made. Recall provides evidence of missed positive
predictions, unlike precision, which only tells us the correct positive predictions out of the
total number of positive predictions.
For example, take the same example discussed earlier, where we trained a classifier to
predict cats and dogs from images. Upon inferring the trained model on the test images,
the model is used for predicting/detecting dogs from images (in other words, dogs being
the positive class). Recall, in this case, quantifies the number of missed dog predictions
(positive predictions).
In this fashion, recall provides an empirical indication of the coverage of the positive class.
Recall = TruePositives / (TruePositives + FalseNegatives)
Recall focuses on minimizing false negatives. High recall relates to a low false negative
rate. The higher the recall, the better it is. For example, a model that analyzes the profile
data from a passenger in an airport tries to predict whether that passenger is a potential
terrorist. In this case, it is more secure to have false positives than false negatives. If the
models predict that an innocent person is a terrorist, this could be checked following a
more in-depth investigation. But if a terrorist passes, a number of lives could be in danger.
In this case, it is more secure to have false negatives than false positives as false negatives
can be checked with the help of an in-depth investigation. Recall should be high to avoid
false negatives. In this case, having a high recall score is prioritized over high precision.
F-score
In a case where we need to avoid both high false positives and high false negatives, f-score
is a useful measure for reaching this state. F-measure provides a way to consolidate both
precision and recall into single metrics that reflect both properties.
Neither precision nor recall portrays the whole story.
We can have the best precision with terrible recall, or alternatively, F-measure expresses
both precision and recall. It is measured according to the following formula:
Model evaluation and interpretability metrics 109
Confusion matrix
The confusion matrix is a metric that reports the performance of classification models
on a set of test data samples for which prediction values are pre-known. It is a metric
in matrix form where a confusion matrix is an N X N matrix, and N is the number of
classes being predicted. For example, let's say we have two classes to predict (binary
classification), then N=2, and, as a result, we will have a 2 X 2 matrix, like the one shown
here in Figure 5.3:
Figure 5.3 is an example of a confusion matrix for binary classification between diabetic
and non diabetic patients. There are 181 test data samples on which predictions are
made to classify patient data samples into diabetic and non diabetic categories. Using a
confusion matrix, you can get critical insights to interpret the model's performance. For
instance, at a glance, you will know how many predictions made are actually true and
how many are false positives. Such insights are invaluable for interpreting the model's
performance in many cases. Here are what these terms mean in the context of the
confusion matrix:
• True positives (TP): These are cases predicted to be yes and are actually yes as per
test data samples.
• True negatives (TN): These are the cases predicted to be no and these cases are
actually no as per test data samples.
• False positives (FP): The model predicted yes, but they are no as per test data
samples. This type of error is known as a Type I error.
• False negatives (FN): The model predicted no, but they are yes as per test data
samples. This type of error is known as a Type II error.
The confusion matrix can provide a big picture of the predictions made on the test data
samples and such insights are significant in terms of interpreting the performance of
the model. The confusion matrix is the de facto error analysis metric for classification
problems, as most other metrics are derived from this one.
Model evaluation and interpretability metrics 111
AUC-ROC
A different perspective for observing model performance can enable us to interpret
model performance and fine-tune it to derive better results. ROC and AUC curves can
enable such insights. Let's see how the Receiver Operating Characteristic (ROC) curve
can enable us to interpret model performance. The ROC curve is a graph exhibiting the
performance of a classification model at all classification thresholds. The graph uses two
parameters to depict the model's performance: True Positive Rate (TPR=TP/TP+FN) and
False Positive Rate (FPR=FPFP+TN).
The following diagram shows a typical ROC curve:
The AUC value varies from 0 to 1, and the classifier is able to correctly distinguish
between all the positive and negative class points if the AUC value is 1, and the classifier
is unable to correctly distinguish between all the positive and negative class points if the
AUC value is 0. When the AUC value is 0.5 (without manually setting a threshold), then
this is a random classifier.
AUC helps us to rank predictions according to their accuracy, but it does not give us
absolute values. Hence it is scale-independent. Additionally, AUC is independent of
the classification threshold. The classification threshold chosen does not matter when
using AUC as AUC estimates the quality of the model's predictions irrespective of what
classification threshold is chosen.
This measure results in high scores only when a prediction returns good rates for all these
four categories. The MCC score ranges from -1 to +1:
For example, an MCC score of 0.12 suggests that the classifier is very random. If it
is 0.93, this suggests that the classifier is good. MCC is a useful metric for helping to
measure the ineffectiveness of a classifier.
Purity
Purity is an external evaluation metric for cluster quality. To calculate purity, the clusters
are labeled according to the most common class in the cluster and then the accuracy of
this cluster assignment is measured by calculating the number of correctly assigned data
points and dividing by N (total number of data points clustered). Good clustering has
a purity value close to 1, and bad clustering has a purity value close to 0. Figure 5.5 is a
visual representation of an example of calculating purity, as explained below:
Regret
Regret is a commonly used metric for hybrid models such as reinforcement learning
models. At each time step, you calculate the difference between the reward of the optimal
decision and the decision taken by your algorithm. Cumulative regret is then calculated by
summing this up. The minimum regret is 0 with the optimal policy. The smaller the regret,
the better an algorithm has performed.
Regret enables the actions of the agent to be assessed with respect to the best policy for the
optimal performance of the agent as shown in Figure 5.7. The shaded region in red is the
regret:
Model interpretability and explaining why the model is making certain decisions or
predictions can be vital in a number of business problems or industries. Using techniques
discussed earlier, we can interpret the model's performance, but there are still some
gray areas, such as deep learning models, which are black-box models. It is noticeable
in general that these models can be trained to achieve great results or accuracies on test
data, but it is hard to say why. In such scenarios, SHapley Additive exPlanations (SHAP)
can be useful to decode what is happening with the predicted results and which feature
predictions correlate to the most. SHAP was proposed in this paper (at NIPS): http://
papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-
model-predictions.
SHAP works both for classification and regression models. The primary goal of SHAP is
to explain the model output prediction by computing the contribution of each feature.
The SHAP explanation method uses Shapley values to explain the feature importance for
model outputs or predictions. Shapley values are computed from cooperative game theory,
and these values range from -1 to 1. Shapley values describe the distribution of model
outputs among the features, as shown in Figure 5.8:
There are several SHAP explainer techniques, such as SHAP Tree Explainer, SHAP Deep
Explainer, SHAP Linear Explainer, and SHAP Kernel Explainer. Depending on the use
case, these explainers can provide useful information on model predictions and help us to
understand black-box models. Read more here: https://christophm.github.io/
interpretable-ml-book/shap.html
MIMIC explainer
Mimic explainer is an approach mimicking black-box models by training an interpretable
global surrogate model. These trained global surrogate models are interpretable models
that are trained to approximate the predictions of any black-box model as accurately as
possible. By using the surrogate model, a black-box model can be gauged or interpreted as
follows.
The following steps are implemented to train a surrogate model:
1. To train a surrogate model, start by selecting a dataset, X. This dataset can be the
same as the one used for training the black-box model or it can be another dataset
of similar distributions depending on the use case.
2. Get the predictions of the black-box model for the selected dataset, X.
3. Select an interpretable model type (linear model, decision tree, random forest, and
so on).
4. Using the dataset, X, and its predictions, train the interpretable model.
5. Now you have a trained surrogate model. Kudos!
6. Evaluate how well the surrogate model has reproduced predictions of the black-box
model, for example, using R-square or F-score.
7. Get an understanding of black-box model predictions by interpreting the surrogate
model.
The following interpretable models can be used as surrogate models: Light Gradient
boosting model (LightGBM), linear regression, stochastic gradient descent, or random
forest and decision tree.
Surrogate models can enable ML solution developers to gauge and understand the black-
box model's performance.
Model evaluation and interpretability metrics 119
Mean
The mean, or average, is the central value of the dataset. It is calculated by summing all the
values and dividing the sum by the number of values:
mean = x1 + x2 + x3 +.... + xn / n
Standard deviation
The standard deviation measures the dispersion of the values in the dataset. The lower the
standard deviation, the closer the data points to the mean. A widely spread dataset would
have a higher standard deviation.
Bias
Bias measures the strength (or rigidity) of the mapping function between the independent
(input) and dependent (output) variables. The stronger the assumptions of the model
regarding the functional form of the mapping, the greater the bias.
120 Model Evaluation and Packaging
High bias is helpful when the underlying true (but unknown) model has matching
properties as the assumptions of the mapping function. However, you could get
completely off-track if the underlying model does not exhibit similar properties as the
functional form of the mapping. For example, the assumption that there is a linear
relationship in the variables when in reality it is highly non-linear and it would lead to a
bad fit:
• Low bias: Weak assumptions with regard to the functional form of the mapping of
inputs to outputs
• High bias: Strong assumptions with regard to the functional form of the mapping of
inputs to outputs
The bias is always a positive value. Here is an additional resource for learning more about
bias in ML. This article offers a broader explanation: https://kourentzes.com/
forecasting/2014/12/17/the-bias-coefficient-a-new-metric-for-
forecast-bias/.
Variance
The variance of the model is the degree to which the model's performance changes when
it is fitted on different training data. The impact of the specifics on the model is captured
by the variance.
A high variance model will change a lot with even small changes in the training dataset.
On the other hand, a low variance model wouldn't change much even with large changes
in the training dataset. The variance is always positive.
R-squared
R-squared, also known as the coefficient of determination, measures the variation in the
dependent variable that can be explained by the model. It is calculated as the explained
variation divided by the total variation. In simple terms, R-squared measures how close
the data points are to the fitted regression line.
The value of R-squared lies between 0 and 1. A low R-squared value indicates that most of
the variation in the response variable is not explained by the model, but by other factors
not included in it. In general, you should aim for a higher R-squared value because this
indicates that the model better fits the data.
RMSE
The root mean square error (RMSE) measures the difference between the predicted
values of the model and the observed (true) values.
Model evaluation and interpretability metrics 121
Options are many, and you need to choose the right metric for real-world production
scenarios to have well-justified evaluations; for example, why a data scientist or a data
science team might want to select one evaluation metric over another, for instance,
R-squared over mean for a regression problem. It depends on the use case and type of data.
Human bias
Just like the human brain, ML systems are subject to cognitive bias. Human cognitive
biases are processes that disrupt your decision making and reasoning ability, ending
up in the production of errors. Human bias occurrences include stereotyping, selective
perception, the bandwagon effect, priming, affirmation predisposition, observational
selection bias, and the speculator's false notion. In many cases, it is vital to avoid such
biases for ML systems in order to make rational and optimal decisions. This will make
ML systems more pragmatic than humans if we manage to deduce human bias and rectify
it. This will be especially useful in HITL-based systems. Using bias testing, three types of
human biases can be identified and worked upon to maintain the ML system's decision
making such that it is free from human bias. These three human biases are as follows:
Interaction bias
When an ML system is fed a dataset containing entries of one particular type, an
interaction bias is introduced that prevents the algorithm from recognizing any other
types of entries. This type of bias can be identified in inference testing for trained models.
Methods such as SHAP and PFI can be useful in identifying feature bias.
Latent bias
Latent bias is experienced when multiple examples in the training set have a characteristic
that stands out. Then, the ones without that characteristic fail to be recognized by the
algorithm. For example, recently, the Amazon HR algorithm for selecting people based on
applications for roles within the company showed bias against women, the reason being
latent bias.
122 Model Evaluation and Packaging
Selection bias
Selection bias is introduced to an algorithm when the selection of data for analysis is not
properly randomized. For example, in designing a high-performance face recognition
system, it is vital to include all possible types of facial structures and shapes and from
all ethnic and geographical samples, so as to avoid selection bias. Selection bias can be
identified by methods such as SHAP or PFI to observe model feature bias.
Optimal policy
In the case of human reinforcement learning, the goal of the system is to maximize the
rewards of the action in the current state. In order to maximize the rewards for actions,
optimal policy can be used as a metric to gauge the system. The optimal policy is the
policy where the action that maximizes the reward/return of the current state is chosen.
The optimal policy is the metric or state that is ideal for a system to perform at its best.
In a human reinforcement learning-based system, a human operator or teacher sets the
optimal policy as the goal of the system is to reach human-level performance.
Rate of automation
Automation is the process of automatically producing goods or getting a task done
through the use of robots or algorithms with no direct human assistance.
The level of automation of an ML system can be calculated using the rate of automation
of the total tasks. It is basically the percentage of tasks fully automated by the system, and
these tasks do not require any human assistance. It shows what percentage of tasks are
fully automated out of all the tasks. For example, AlphaGo, by DeepMind, has achieved
100% automation to operate on its own to defeat human world champion players.
Risk rate
The probability of an ML model performing errors is known as the error rate. The error
rate is calculated based on the model's performance for production systems. The lower the
error rate, the better it is for an ML system. The goal of a human in the loop is to reduce
the error rate and teach the ML model to function at its most optimal.
Batch testing
Batch testing validates your model by performing testing in an environment that is
different from its training environment. Batch testing is carried out on a set of samples of
data to test model inference using metrics of choice, such as accuracy, RMSE, or f1-score.
Batch testing can be done in various types of computes, for example, in the cloud, or on a
remote server or a test server. The model is usually served as a serialized file, and the file is
loaded as an object and inferred on test data.
A/B testing
You will surely have come across A/B testing. It is often used in service design (websites,
mobile apps, and so on) and for assessing marketing campaigns. For instance, it is used to
evaluate whether a specific change in the design or tailoring content to a specific audience
positively affects business metrics such as user engagement, the click-through rate, or the
sales rate. A similar technique is applied in testing ML models using A/B testing. When
models are tested using A/B testing, the test will answer important questions such as the
following:
• Does the new model B work better in production than the current model A?
• Which of the two models' nominees work better in production to drive positive
business metrics?
To evaluate the results of A/B testing, statistical techniques are used based on the business
or operations to determine which model will perform better in production. A/B testing
is usually conducted in this manner, and real-time or live data is fragmented or split into
two sets, Set A and Set B. Set A data is routed to the old model, and Set B data is routed
to the new model. In order to evaluate whether the new model (model B) performs better
than the old model (model A), various statistical techniques can be used to evaluate model
performance (for example, accuracy, precision, recall, f-score, and RMSE), depending on the
business use case or operations. Depending on a statistical analysis of model performance in
correlation to business metrics (a positive change in business metrics), a decision is made to
replace the new model with the old one or determine which model is better.
124 Model Evaluation and Packaging
A/B testing is performed methodically using statistical hypothesis testing, and this
hypothesis validates two sides of a coin – the null hypothesis and the alternate hypothesis.
The null hypothesis asserts that the new model does not increase the average value of
the monitoring business metrics. The alternate hypothesis asserts that the new model
improves the average value of the monitoring business metrics. Ultimately, A/B testing
is used to evaluate whether the new model drives a significant boost in specific business
metrics. There are various types of A/B testing, depending on business use cases and
operations, for example, Z-test, G-test (I recommend knowing about these and
others), and others. Choosing the right A/B test and metrics to evaluate can be a win-win
for your business and ML operations.
Testing in CI/CD
Implementing testing as part of CI/CD pipelines can be rewarding in terms of automating
and evaluating (based on set criteria) the model's performance. CI/CD pipelines can be set
up in multiple ways depending on the operations and architecture in place, for instance:
• Upon a successful run of an ML pipeline, CI/CD pipelines can trigger a new model's
A/B test in the staging environment.
• When a new model is trained, it is beneficial to set up a dataset separate from the
test set to measure its performance against suitable metrics, and this step can be
fully automated.
• CI/CD pipelines can periodically trigger ML pipelines at a set time in a day to train
a new model, which uses live or real-time data to train a new model or fine-tune an
existing model.
Why package ML models? 125
• CI/CD pipelines can monitor the ML model's performance of the deployed model
in production, and this can be triggered or managed using time-based triggers or
manual triggers (initiated by team members responsible for quality assurance).
• CI/CD pipelines can provision two or more staging environments to perform A/B
testing on unique datasets to perform more diverse and comprehensive testing.
Portability
Packaging ML models into software artifacts enables them to be shipped or transported
from one environment to another. This can be done by shipping a file or bunch of files or a
container. Either way, we can transport the artifacts and replicate the model in various setups.
For example, a packaged model can be deployed in a virtual machine or serverless setup.
126 Model Evaluation and Packaging
Inference
ML inference is a process that involves processing real-time data using ML models
to calculate an output, for example, a prediction or numerical score. The purpose of
packaging ML models is to be able to serve the ML models in real time for ML inference.
Effective ML model packaging (for example, a serialized model or container) can facilitate
deployment and serve the model to make predictions and analyze data in real time or in
batches.
Interoperability
ML model interoperability is the ability of two or more models or components to
exchange information and to use exchanged information in order to learn or fine-tune
from each other and perform operations with efficiency. Exchanged information can be
in the form of data or software artifacts or model parameters. Such information enables
models to fine-tune, retrain, or adapt to various environments from the experience of
other software artifacts in order to perform and be efficient. Packaging ML models is the
foundation for enabling ML model interoperability.
Deployment agnosticity
Packaging ML models into software artifacts such as serialized files or containers enables
the models to be shipped and deployed in various runtime environments, such as in a
virtual machine, a container serverless environment, a streaming service, microservices,
or batch services. It opens opportunities for portability and deployment agnosticity using
the same software artifacts that an ML model is packaged in.
Serialized files
Serialization is a vital process for packaging an ML model as it enables model portability,
interoperability, and model inference. Serialization is the method of converting an object
or a data structure (for example, variables, arrays, and tuples) into a storable artefact, for
example, into a file or a memory buffer that can be transported or transmitted (across
computer networks). The main purpose of serialization is to reconstruct the serialized file
into its previous data structure (for example, a serialized file into an ML model variable)
in a different environment. This way, a newly trained ML model can be serialized into
a file and exported into a new environment where it can de-serialized back into an ML
model variable or data structure for ML inferencing. A serialized file does not save or
include any of the previously associated methods or implementation. It only saves the data
structure as it is in a storable artefact such as a file.
Here are some popular serialization formats in figure 5.1:
Packetizing or containerizing
We often encounter diverse environments for production systems. Every environment
possesses different challenges when it comes to deploying ML models, in terms
of compatibility, robustness, and scalability. These challenges can be avoided by
standardizing some processes or modules and containers are a great way to standardize
ML models and software modules.
A container is a standard unit of software made up of code and all its dependencies. It
enables the quick and reliable operation of applications from one computing environment
to another. It enables the software to be environment- and deployment-agnostic.
Containers are managed and orchestrated by Docker. Docker has become an industry
standard at developing and orchestrating containers.
Docker is an open source (https://opensource.com/resources/what-
open-source) tool. It has been developed to make it convenient to build, deploy,
and run applications by using containers. By using containers, a developer can package
an application with its components and modules, such as files, libraries, and other
dependencies, and deploy it as one package. Containers are a reliable way to run
applications using a Linux OS with customized settings. Docker containers are built using
Dockerfiles, which are used to containerize an application. After building a Docker image,
a Docker container is built. A Docker container is an application running with custom
settings as orchestrated by the developer. Figure 5.8 shows the process of building and
running a Docker container from a Dockerfile. A Dockerfile is built into a Docker image,
which is then run as a Docker container:
A Dockerfile, Docker image, and a Docker container are foundational components for
building and running containers. These are each described here:
ML models can be served in Docker containers for robustness, scalability, and deployment
agnosticity. In later chapters, we will deploy ML models using Docker for the purpose of
hands-on experience, hence, it is good to have a general understanding of this tool.
Microservices can be served in a REST API format, and this is a popular way to serve
ML models. Some Python frameworks, such as Flask, Django, and FastAPI, have become
popular in enabling ML models to serve as REST API microservices. To facilitate
robust and scalable system operations, software developers can sync with Dockerized
microservices via a REST API. To orchestrate Docker-based microservice deployments
on Kubernetes-supported infrastructure, Kubeflow is a good option. It is cloud-agnostic
and can be run on-premises or on local machines. Besides that, Kubeflow is based on
Kubernetes, but keeps the Kubernetes details and difficulties inside a box. Kubeflow
is a robust way of serving a model. This is a tool worth exploring: https://www.
kubeflow.org/docs/started/kubeflow-overview/.
import pandas as pd
import numpy as np
import warnings
import pickle
from math import sqrt
warnings.filterwarnings('ignore')
from azureml.core.run import Run
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.model import Model
# Connect to Workspace
ws = Workspace.from_config()
print(ws)
# Load Scaler and model to test
scaler = Model(ws,'scaler').download(exist_ok=True)
svc_model = Model(ws,'support-vector-classifier').
download(exist_ok=True)
After running this code, you will see new files downloaded in the left panel in the
JupyterLab window.
import onnxruntime as rt
import numpy
sess = rt.InferenceSession("svc.onnx")
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
ML model inference
To perform ML model inference, scale the test data and set it up for inference using the
fit_transform() method. Now, perform inference on the test data by using the
ONNX session and run sess.run() by passing the input data, test_data, in float
32 format. Lastly, print the results of model inference:
With these steps, we have successfully downloaded the serialized model, loaded it to a
variable, and performed inference on a test data sample. The expected result of the block
code is the value 1.
Summary
In this chapter, we have explored the various methods to evaluate and interpret ML
models. We have learned about production testing methods and the importance of
packaging models, why and how to package models, and the various practicalities and
tools for packaging models for ML model inference in production. Lastly, to understand
the workings of packaging and de-packaging serialized models for inference, we
performed the hands-on implementation of ML model inference using serialized models
on test data.
In the next chapter, we will learn more about deploying your ML models. Fasten your
seatbelts and get ready to deploy your models to production!
Section 2:
Deploying Machine
Learning Models
at Scale
This section will explain the options, methods, and landscape of machine learning
model deployment. We will deep dive into some of the fundamental aspects of production
deployments enabled by continuous integration, delivery, and deployment methods.
You will also get insights into designing and developing robust and scalable microservices
and APIs to serve your machine learning solutions.
This section comprises the following chapters:
In this chapter, we start by looking at how ML is different in research and production and
continue exploring the following topics:
Data
In general, data in research projects is static because data scientists or statisticians are
working on a set dataset and trying to beat the current state-of-the-art models. For
example, recently, many breakthroughs in natural language processing models have
been witnessed, for instance, with BERT from Google or XLNet from Baidu. To train
these models, data was scraped and compiled into a static dataset. In the research world,
to evaluate or benchmark the performance of the models, static datasets are used to
evaluate the performance, as shown in Table 6.2 (source: https://arxiv.org/
abs/1906.08237):
ML in research versus production 137
Fairness
In real life, biased models can be costly. Unfair or biased decisions will lead to poor
choices for business and operations. For ML models in production, it is important that
decisions made are as fair as possible. It can be costly for the business if the models in
production are not fair. For example, recently, Amazon made HR screening software that
screens applicants based on their suitability for the job they applied for. ML specialists at
Amazon discovered that male candidates were favored over female candidates (source:
https://www.businessinsider.com/amazon-ai-biased-against-
women-no-surprise-sandra-wachter-2018-10). This kind of system bias can
be costly because, in Amazon's case, you can miss out on some amazing talent as a result
of bias. Hence having fair models in production is critical and should be monitored
constantly. In research, fair models are important as well but not as critical as in
production or real life, and fairness is not critically monitored as in production. The goal
in research is to beat the state of the art, and model fairness is a secondary goal.
138 Key Principles for Deploying Your ML System
Interpretability
Model interpretability is critical in production in order to understand the correlation or
causality between the ML model's decisions and its impact on the operations or business
to optimize, augment, or automate a business or task at hand. This is not the case in
research, where the goal is to challenge or beat the state-of-the-art results, and here the
priority is better performance (such as accuracy, or other metrics). In the case of research,
ML model interpretability is good to have but not mandatory. Typically, ML projects are
more concerned with predicting outcomes than with understanding causality. ML models
are great at finding correlations in data, but not causation. We strive not to fall into the
pit of equating association with the cause in our ventures. Our ability to rely on ML is
severely hampered as a result of this issue. This problem severely limits our ability to use
ML to make decisions. We need resources that can understand the causal relationships
between data and build ML solutions that can generalize well from a business viewpoint.
Having the right model interpretability mechanisms can enhance our understanding of
causality and enable us to craft ML solutions that generalize well and are able to handle
previously unseen data. As a result, we can make more reliable and transparent decisions
using ML.
In the case of production (in a business use case), a lack of interpretability is not
recommended at all. Let us look at a hypothetical case. Let's assume you have cancer
and have to choose a surgeon to perform your surgery. Two surgeons are available, one
is human (with an 80% cure rate) and another is an AI black-box model (with a 90%
cure rate) that cannot be interpreted or explain how it works, but it has a high cure rate.
What would you choose? AI or a surgeon to cure cancer? It would be easier to replace the
surgeon with AI if the model was not a black-box model. Though the AI is better than the
surgeon, without understanding the model, decision, trust and compliance is an issue.
Model interpretability is essential to make legal decisions. Hence, it is vital to have model
interpretability for ML in production. We will learn more about this in later chapters.
Performance
When it comes to the performance of the ML models, the focus in research is to improve on the
state-of-the-art models, whereas in production the focus is to build better models than simpler
models that serve the business needs (state-of-the-art models are not the focus).
Understanding the types of ML inference in production 139
Priority
In research, training the models faster and better is the priority, whereas in production
faster inference is the priority as the focus is to make decisions and serve the business
needs in real time.
Deployment targets
In this section, we will look at different types of deployment targets and why and how
we serve ML models for inference in these deployment targets. Let's start by looking at a
virtual machine or an on-premises server.
Virtual machines
Virtual machines can be on the cloud or on-premises, depending on the IT setup of a business or
an organization. Serving ML models on virtual machines is quite common. ML models are served
on virtual machines in the form of web services. The web service running on a virtual machine
receives a user request (as an HTTP request) containing the input data. The web service, upon
receiving the input data, preprocesses it in the required format to infer the ML model, which is part
of the web service. After the ML model makes the prediction or performs the task, the output is
transformed and presented in a user-readable format. Commonly into JavaScript Object Notation
(JSON) or Extensible Markup language string (XML). Usually, a web service is served in the form
of a REST API. REST API web services can be developed using multiple tools; for instance, FLASK
or FAST API web application tools can be used to develop REST API web services using Python or
Spring Boot in Java, or Plumber in R, depending on the need. A combination of virtual machines is
used in parallel to scale and maintain the robustness of the web services.
140 Key Principles for Deploying Your ML System
In order to orchestrate the traffic and to scale the machines, a load balancer is used to
dispatch incoming requests to the virtual machines for ML model inference. This way,
ML models are deployed on virtual machines on the cloud or on-premises to serve the
business needs, as shown in the following diagram:
Containers
Containers are a reliable way to run applications using the Linux OS with customized
settings. A container is an application running with a custom setting orchestrated by the
developer. Containers are an alternative and more resource-efficient way of serving models
than virtual machines. They operate like virtual machines as they have their own runtime
environment, which is isolated and confined to memory, the filesystem, and processes.
Containers can be customized by developers to confine them to required resources such
as memory, the filesystem, and processes, and the virtual machines are limited to such
customizations. They are more flexible and operate in a modular way and hence provide
more resource efficiency and optimization. They allow the possibility to scale to zero, as
containers can be reduced to zero replicas and run a backup on request. This way, lower
computation power consumption is possible compared to running web services on virtual
machines. As a result of this lower computation power consumption, cost-saving on the
cloud is possible.
Understanding the types of ML inference in production 141
Containers present many advantages; however, one disadvantage can be the complexity
required to work with containers, as it requires expertise.
There are some differences in the way containers and virtual machines operate. For
example, there can be multiple containers running inside a virtual machine that share
the operating system and resources with the virtual machine, but the virtual machine
runs its own resources and operating system. Containers can operate modularly, but
virtual machines operate as single units. Docker is used to build and deploy containers;
however, there are alternatives, such as Mesos and CoreOS rkt. A container is typically
packaged with the ML model and web service to facilitate the ML inference, similar to
how we serve the ML model wrapped in a web service in the virtual machine. Containers
need to be orchestrated to be consumed by users. The orchestration of containers means
the automation of the deployment, management, scaling, and networking of containers.
Containers are orchestrated using a container orchestration system such as Kubernetes. In
the following diagram, we can see container orchestration with auto-scaling (based on the
traffic of requests):
Serverless
Serverless computing, as the name suggests, does not involve a virtual machine or
container. It eliminates infrastructure management tasks such as OS management, server
management, capacity provisioning, and disk management. Serverless computing enables
developers and organizations to focus on their core product instead of mundane tasks
such as managing and operating servers, either on the cloud or on-premises. Serverless
computing is facilitated by using cloud-native services.
For instance, Microsoft Azure uses Azure Functions, and AWS uses Lambda functions
to deploy serverless applications. The deployment for serverless applications involves
submitting a collection of files (in the form of .zip files) to run ML applications. The
.zip archive typically has a file with a particular function or method to execute. The zip
archive is uploaded to the cloud platform using cloud services and deployed as a serverless
application. The deployed application serves as an API endpoint to submit input to the
serverless application serving the ML model.
Deploying ML models using serverless applications can have many advantages: there's
no need to install or upgrade dependencies, or maintain or upgrade systems. Serverless
applications auto-scale on demand and are robust in overall performance. Synchronous
(execution happens one after another in a single series, A->B->C->D) and asynchronous
(execution happens in parallel or on a priority basis, not in order: A->C->D->B or A and
B together in parallel and C and D in parallel) operations are both supported by serverless
functions. However, there are some disadvantages, such as cloud resource availability
such as RAM or disk space or GPU unavailability, which can be crucial requirements
for running heavy models such as deep learning or reinforcement learning models. For
example, we can hit the wall of resource limitation if we have deployed a model without
using serverless operations. The model or application deployed will not auto-scale and
thus limit the available computation power. If more users infer the model or application
than the limit, we will hit the resource unavailability blocker. In the following diagram, we
can see how traditional applications and serverless applications are developed:
Understanding the types of ML inference in production 143
Model streaming
Model streaming is a method of serving models for handling streaming data. There is no
beginning or end of streaming data. Every second, data is produced from thousands of
sources and must be processed and analyzed as soon as possible. For example, Google
Search results must be processed in real time. Model streaming is another way of
deploying ML models. It has two main advantages over other model serving techniques,
such as REST APIs or batch processing approaches. The first advantage is asynchronicity
(serving multiple requests at a time). REST API ML applications are robust and scalable
but have the limitation of being synchronous (they process requests from the client on a
first come, first serve basis), which can lead to high latency and resource utilization. To
cope with this limitation, stream processing is available. It is inherently asynchronous as
the user or client does not have to coordinate or wait for the system to process the request.
144 Key Principles for Deploying Your ML System
Stream processing is able to process asynchronously and serve the users on the go.
In order to do so, stream processing uses a message broker to receive messages from
the users or clients. The message broker allows the data as it comes and spreads the
processing over time. The message broker decouples the incoming requests and facilitates
communication between the users or clients and the service without being aware of
each other's operations, as shown in figure 5.4. There are a couple of options for message
streaming brokers, such as Apace Storm, Apache Kafka, Apache Spark, Apache Flint,
Amazon Kinesis, and StreamSQL. The tool you choose is dependent on the IT setup and
architecture.
For example, let's look at the use case of an intelligent email assistant tasked to automate
customer service, as shown in Figure 5.4. In order to automate replies to serve its users,
the email assistant system performs multiple predictions using multiple models:
These four models deployed on REST API endpoints will generate high latency and
maintenance costs, whereas a streaming service is a good alternative as it can package and
serve multiple models as one process and continuously serve user requests in the form of a
stream. Hence in such cases, streaming is recommended over REST API endpoints.
1. We start by importing the required packages and check for the version of the Azure
ML SDK, as shown in the following code:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import azureml.core
from azureml.core import Workspace
from azureml.core.model import Model
The preceding code will print the Azure ML SDK version (for example, 1.10.0;
your version may be different).
Hands-on deployment (for the business problem) 147
2. Next, using the workspace function from the Azure ML SDK, we connect to
the ML workspace and download the required serialized files and model trained
earlier using the Model function from the workspace. The serialized scaler and
model are used to perform inference or prediction. Scaler will be used to shrink
the input data to the same scale of data that was used for model training, and the
model file is used to make predictions on the incoming data:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, sep =
'\n')
scaler = Model(ws,'scaler').download(exist_ok=True)
model = Model(ws,'support-vector-classifier').
download(exist_ok=True)
3. After the scaler and the model files are downloaded, the next step is to
prepare the scoring file. The scoring file is used to infer the ML models in
the containers deployed with the ML service in the Azure container instance and
Kubernetes cluster. The scoring script takes input passed by the user and infers
the ML model for prediction and then serves the output with the prediction to the
user. It contains two primary functions, init() and run(). We start by importing
the required libraries and then define the init() and run() functions:
%%writefile score.py
import json
import numpy as np
import os
import pickle
import joblib
import onnxruntime
import time
from azureml.core.model import Model
%%writefile score.py writes this code into a file named score.py, which
is later packed as part of the ML service in the container for performing ML model
inference.
4. We define the init() function; it downloads the required models and deserializes
them into variables to be used for the predictions:
def init():
global model, scaler, input_name, label_name
148 Key Principles for Deploying Your ML System
scaler_path = os.path.join(os.getenv('AZUREML_MODEL_
DIR'), 'scaler/2/scaler.pkl')
# deserialize the scalar file back into a variable to
be used for inference
scaler = joblib.load(scaler_path)
model_onnx = os.path.join(os.getenv('AZUREML_MODEL_DIR'),
'support-vector-classifier/2/svc.onnx')
# deserialize support vector classifer model
model = onnxruntime.InferenceSession(model_onnx,
None)
input_name = model.get_inputs()[0].name
label_name = model.get_outputs()[0].name
Using onnxruntime we can deserialize the support vector classifier model. The
InferenceSession() function is used for deserializing and serving the model
for inference, and the input_name and label_name variables are loaded from
the deserialized model.
5. In a nutshell, the init() function loads files (model and scaler) and
deserializes and serves the model and artifact files needed for making predictions,
which are used by the run() function as follows:
def run(raw_data):
try:
data = np.array(json.loads(raw_data)
['data']).astype('float32')
data = scaler.fit_transform(data.
reshape(1, 7))
# make prediction
model_prediction = model.run([label_
name], {input_name: data.astype(np.float32)})[0]
return model_prediction
The run() function takes raw incoming data as the argument, performs ML
model inference, and returns the predicted result as the output. When called, the
run() function receives the incoming data, which is sanitized and loaded into a
variable for scaling. The incoming data is scaled using the scaler loaded previously
in the init() function. Next, the model inference step, which is the key step,
is performed by inferencing scaled data to the model, as shown previously. The
prediction inferred from the model is then returned as the output. This way, the
scoring file is written into score.py to be used for deployment.
6. Next, we will proceed to the crucial part of deploying the service on an Azure
container instance. For this, we define a deployment environment by creating an
environment YAML (Yet Another Markup Language) file called myenv.yml, as
shown in the following code. Using the CondaDependencies() function, we
mention all the pip packages that need to be installed inside the Docker container
that will be deployed as the ML service. Packages such as numpy, onnxruntime,
joblib, azureml-core, azureml-defaults, and scikit-learn are
installed inside the container upon triggering the environment file:
from azureml.core.conda_dependencies import
CondaDependencies
myenv = CondaDependencies.create(pip_packages=["numpy",
"onnxruntime", "joblib", "azureml-core", "azureml-
defaults", "scikit-learn==0.20.3"])
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
aciconfig = AciWebservice.deploy_configuration(cpu_
cores=1,
memory_gb=1,
tags={"data": "weather"},
description='weather-prediction')
8. Now we are all set to deploy the ML or web service on the ACI. We will use score.
py, the environment file (myenv.yml), inference_config, and aci_config
to deploy the ML or web service. We will need to point to the models or artifacts to
deploy. For this, we use the Model() function to load the scaler and model files
from the workspace and get them ready for deployment:
%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
ws = Workspace.from_config()
model1 = Model(ws, 'support-vector-classifier')
model2 = Model(ws, 'scaler')
service = Model.deploy(workspace=ws,
name='weatherprediction',
models=[model1, model2],
inference_config=inference_config,
deployment_config=aciconfig)
service.wait_for_deployment(show_output=True)
Hands-on deployment (for the business problem) 151
9. After the models are mounted into variables, model1 and model2, we proceed
with deploying them as a web service. We use the deploy() function to deploy
the mounted models as a web service on the ACI, as shown in the preceding code.
This process will take around 8 minutes, so grab your popcorn and enjoy the service
being deployed. You will see a message like this:
Running..................................................
............................
Succeeded
ACI service creation operation finished, operation
"Succeeded"
CPU times: user 610 ms, sys: 103 ms, total: 713 ms
Wall time: 7min 57s
Congratulations! You have successfully deployed your first ML service using MLOps.
10. Let's check out the workings and robustness of the deployed service. Check out the
service URL and Swagger URL, as shown in the following code. You can use these
URLs to perform ML model inference for input data of your choice in real time:
print(service.scoring_uri)
print(service.swagger_uri)
The features in the input data are in this order: Temperature_C, Humidity,
Wind_speed_kmph, Wind_bearing_degrees, Visibility_km,
Pressure_millibars, and Current_weather_condition. Encode
the input data in UTF-8 for smooth inference. Upon inferring the model using
service.run(), the model returns a prediction of 0 or 1. 0 means a clear sky
and 1 means it will rain. Using this service, we can make weather predictions at the
port of Turku as tasked in the business problem.
13. The service we have deployed is a REST API web service that we can infer with an
HTTP request as follows:
import requests
headers = {'Content-Type': 'application/json', 'Accept':
'application/json'}
if service.auth_enabled:
headers['Authorization'] = 'Bearer '+ service.get_
keys()[0]
elif service.token_auth_enabled:
headers['Authorization'] = 'Bearer '+ service.get_
token()[0]
scoring_uri = service.scoring_uri
print(scoring_uri)
response = requests.post(scoring_uri, data=test_sample,
headers=headers)
print(response.status_code)
print(response.elapsed)
print(response.json())
When a POST request is made by passing input data, the service returns the model
prediction in the form of 0 or 1. When you get such a prediction, your service is
working and is robust enough to serve production needs.
Next, we will deploy the service on an auto-scaling cluster; this is ideal for
production scenarios as the deployed service can auto-scale and serve user needs.
Hands-on deployment (for the business problem) 153
1. As we did in the previous section, start by importing the required packages, such as
matplotlib, numpy, and azureml.core, and the required functions, such as
Workspace and Model, from azureml.core, as shown in the following code
block:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import azureml.core
from azureml.core import Workspace
from azureml.core.model import Model
2. Print the version of the Azure ML SDK and check for the version (it will print,
for example, 1.10.0; your version may be different). Use the config file and
Workspace function connect to your workspace, as shown in the following code
block:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, sep =
'\n')
scaler = Model(ws,'scaler').download(exist_ok=True)
model = Model(ws,'support-vector-classifier').
download(exist_ok=True)
154 Key Principles for Deploying Your ML System
3. Download the model and scaler files as we did previously. After the model
and the scaler files are downloaded, the next step is to prepare the scoring
file, which is used to infer the ML models in the containers deployed with the ML
service. The scoring script takes an input passed by the user, infers the ML model
for prediction, and then serves the output with the prediction to the user. We will
start by importing the required libraries, as shown in the following code block:
%%writefile score.py
import json
import numpy as np
import os
import pickle
import joblib
import onnxruntime
import time
from azureml.core.model import Model
4. As we made score.py previously for ACI deployment, we will use the same file.
It contains two primary functions, init() and run(). We define the init()
function; it downloads the required models and deserializes them into variables to
be used for predictions:
def init():
global model, scaler, input_name, label_name
scaler_path = os.path.join(os.getenv('AZUREML_MODEL_
DIR'), 'scaler/2/scaler.pkl')
# deserialize the model file back into a sklearn
model
scaler = joblib.load(scaler_path)
model_onnx = os.path.join(os.getenv('AZUREML_MODEL_
DIR'), 'support-vector-classifier/2/svc.onnx')
model = onnxruntime.InferenceSession(model_onnx,
None)
input_name = model.get_inputs()[0].name
label_name = model.get_outputs()[0].name
Hands-on deployment (for the business problem) 155
We will use the same run() function previously used in the section Deploying the
model on ACI for the AKS deployment. With this we can proceed to deploying the
service on AKS.
156 Key Principles for Deploying Your ML System
6. Next, we will proceed to the crucial part of deploying the service on Azure
Kubernetes Service. Create an environment in which your model will be deployed
using the CondaDependencies() function. We will mention all the required
pip and conda packages to be installed inside the Docker container that will be
deployed as the ML service. Packages such as numpy, onnxruntime, joblib,
azureml-core, azureml-defaults, and scikit-learn are installed
inside the container upon triggering the environment file. Next, use the publicly
available container in the Microsoft Container Registry without any authentication.
This container will install your environment and will be configured for deployment
to your target AKS:
from azureml.core import Environment
from azureml.core.conda_dependencies import
CondaDependencies
conda_deps = CondaDependencies.create(conda_
packages=['numpy','scikit-learn==0.19.1','scipy'], pip_
packages=["numpy", "onnxruntime", "joblib", "azureml-
core", "azureml-defaults", "scikit-learn==0.20.3"])
myenv = Environment(name='myenv')
myenv.python.conda_dependencies = conda_deps
8. Now we are all set to deploy the ML or web service on Azure Kubernetes Service
(auto-scaling cluster). In order to do so, we will need to create an AKS cluster and
attach it to the Azure ML workspace. Choose a name for your cluster and check if
it exists using the ComputeTarget() function. If not, a cluster will be created or
provisioned using the ComputeTarget.create() function. It takes a workspace
object, ws; a service name; and a provisioning config to create the cluster. We use
the default parameters for the provisioning config to create a default cluster:
%%time
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import
ComputeTargetException
from azureml.core.compute import AksCompute,
ComputeTarget
if aks_target.get_status() != "Succeeded":
aks_target.wait_for_completion(show_output=True)
158 Key Principles for Deploying Your ML System
Note
If a cluster with the same AKS cluster name ( aks_name = port-aks)
already exists, a new cluster will not be created. Rather, the existing cluster
(named port-aks here) will be attached to the workspace for further
deployments.
9. Next, we proceed to the critical task of deploying the ML service in the Kubernetes
cluster. In order to deploy, we need some prerequisites, such as mounting the
models to deploy. We mount the models using the Model() function to load the
scaler and model files from the workspace and get them ready for deployment,
as shown in the following code:
from azureml.core.webservice import Webservice,
AksWebservice
# Set the web service configuration (using default here)
aks_config = AksWebservice.deploy_configuration()
%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
ws = Workspace.from_config()
model1 = Model(ws, 'support-vector-classifier')
model2 = Model(ws, 'scaler')
Hands-on deployment (for the business problem) 159
10. Now we are all set to deploy the service on AKS. We deploy the service with the
help of the Model.deploy() function from the Azure ML SDK, which takes
the workspace object, ws; service_name; models; inference_config;
deployment_config; and deployment_target as arguments upon being
called:
%%time
aks_service_name ='weatherpred-aks'
aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)
Deploying the service will take approximately around 10 mins. After deploying the
ML service, you will get a message like the following:
Running........................ Succeeded AKS service
creation operation finished, operation "Succeeded"
Congratulations! Now you have deployed an ML service on AKS. Let's test it using
the Azure ML SDK.
11. We use the service.run() function to pass data to the service and get the
predictions, as follows:
import json
12. The deployed service is a REST API web service that can be accessed with an HTTP
request as follows:
import requests
if service.auth_enabled:
headers['Authorization'] = 'Bearer '+ service.get_
keys()[0]
elif service.token_auth_enabled:
headers['Authorization'] = 'Bearer '+ service.get_
token()[0]
scoring_uri = service.scoring_uri
print(scoring_uri)
response = requests.post(scoring_uri, data=test_sample,
headers=headers)
print(response.status_code)
print(response.elapsed)
print(response.json())
When a POST request is made by passing input data, the service returns the model
prediction in the form of 0 or 1. When you get such a prediction, your service is
working and is robust to serve production needs. The service scales from 0 to the
needed number of container replicas based on the user's request traffic.
Hands-on deployment (for the business problem) 161
1. We start by importing the required packages and check for the version of the Azure
ML SDK, as shown in the following code block:
import numpy as np
import mlflow.azureml
import azureml.core
# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)
2. Next, using the workspace function from the Azure ML SDK, we connect to the
ML workspace and set the tracking URI for the workspace using set_tracking_
uri:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, sep =
'\n')
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
3. Now go to the workspace and fetch the path to the mlflow model from the
models or experiments section and set the path:
4. Now we are all set to deploy to the ACI using mlflow and the azureml SDK.
Configure the ACI deployment target using the deploy_configuration
function and deploy to the ACI using the mlflow.azureml.deploy function.
The deploy function takes model_uri, workspace, model_name, service_
name, deployment_config, and custom tags as arguments upon being called:
# Configure
aci_config = AciWebservice.deploy_configuration
(cpu_cores=1,
memory_gb=1,
tags={'method' : 'mlflow'},
description='weather pred model',
location='eastus2')
# Deploy on ACI
(webservice,model) = mlflow.azureml.deploy(model_uri=
'runs:/{}/{}'.format(run.id, model_path), workspace=ws,
model_name='svc-mlflow', service_name='port-weather-
pred', deployment_config=aci_config, tags=None, mlflow_
home=None, synchronous=True)
webservice.wait_for_deployment(show_output=True)
You will get a deployment succeeded message upon successful deployment. For more
clarity on MLflow deployment, follow these examples: https://docs.microsoft.
com/en-us/azure/machine-learning/how-to-use-mlflow#deploy-and-
register-mlflow-models.
Congratulations! You have deployed ML models on diverse deployment targets such as
ACI and AKS using azureml and mlflow.
Next, we will focus on bringing the full capabilities of MLOps to the table using
continuous integration and continuous deployment to have a robust and dynamically
developing system in production.
Understanding the need for continuous integration and continuous deployment 163
Summary
In this chapter, we have learned the key principles of deploying ML models in production.
We explored the various deployment methods and targets and their needs. For a
comprehensive understanding and hands-on experience, we implemented the deployment
to learn how ML models are deployed on a diverse range of deployment targets such as
virtual machines, containers, and in an auto-scaling cluster. With this, you are ready to
handle any type of deployment challenge that comes your way.
In the next chapter, we will delve into the secrets to building, deploying, and maintaining
robust ML services enabled by CI and CD. This will enable the potential of MLOps! Let's
delve into it.
7
Building Robust
CI/CD Pipelines
In this chapter, you will learn about continuous operations in the MLOps pipeline. The
principles you will learn in this chapter are key to driving continuous deployments in
a business context. To get a comprehensive understanding and first-hand experience,
we will go through the concepts and hands-on implementation simultaneously. We
will set up a CI/CD pipeline for the test environment while learning about components
of continuous integration (CI) and continuous deployment (CD), pipeline testing,
and releases and types of triggers. This will equip you with the skills to automate the
deployment pipelines of machine learning (ML) models for any given scenario on the
cloud with continual learning abilities in tune with business. Let's start by looking at why
we need CI/CD in MLOps after all. We will continue by exploring the other topics as
follows:
Continuous integration
CI aims to synchronize the application (ML pipeline and application) with the developer
in real time. The developer's changes in commits or merges are validated by creating
an application build on the go and by performing automated tests against the build.
CI emphasizes automated testing with a focus on checking the application's robustness
(if it is not broken or bugged) when new commits are merged to the master or main
branch. Whenever a new commit is made to the master branch, a new build is created that
is tested for robustness using automated testing. By automating this process, we can avoid
delayed delivery of software and other integration challenges that can keep users waiting
for days for the release. Automation and testing are at the heart of CI.
Continuous delivery
Continuous delivery extends from CI to ensure that the new changes or releases are
deployed and efficiently brought to users; this is facilitated by automating testing and
release processes. Automating testing and release processes enable developers and product
managers to deploy the changes with one click of a button, enabling seamless control and
supervision capabilities at any phase of the process. In the continuous delivery process,
quite often, a human agent (from the QA team) is involved in approving a build (pass or
fail) before deploying it in production (as shown in Figure 7.1 in a continuous delivery
pipeline). In a typical continuous delivery pipeline, a build goes through preliminary
acceptance tests before getting deployed on the staging phase where a human agent
supervises the performance using smoke tests and other suitable tests.
Once the smoke tests have been passed, the human agent passes the build to be deployed
in production. Automating the build and release process and having a human agent
involved in the process ensures great quality as regards production and we can avoid some
pitfalls that may go unnoticed with a fully automated pipeline. Using continuous delivery,
a business can have full control over its release process and release a new build in small
batches (easy to troubleshoot in the case of blockers or errors) or have a full release within
a requisite time frame (daily, weekly, or monthly).
168 Building Robust CI/CD Pipelines
Continuous deployment
CD enables full automation and goes one step further than continuous delivery. All stages
of build and release to your production are completely automated without any human
intervention, unlike in continuous delivery. In such an automated pipeline, only a failed
test can stop a new change from being deployed to production. Continuous deployment
takes the pressure off the team to maintain the release pipeline and accelerates deployment
straight to the customers enabling continual learning via feedback loops with customers.
With such automation, there is no longer a release day for developers. It takes the pressure
off them and they can just focus on building the software without worrying about tests
and release management. Developers can build, test, and deploy the software at their
convenience and can go live within minutes instead of waiting for release days or for
human approval, which can delay the release of software to users by days and sometimes
weeks. Continuous deployment ensures full automation to deploy and serve robust and
scalable software to users.
Learn_MLOps
├──07_CICD_Pipeline
│ ├── AciDeploymentconfig.yml
├── AksDeploymentconfig.yml
└── InferenceConfig.yml
└── myenv.yml
└── score.py
1. Go to Project Settings (on the bottom left of your screen) and select Service
connections. Click the New service connection option/button to reveal the New
service connection window, as shown in Figure 7.2:
2. Select Azure Resource Manager for the connection type and proceed by clicking
Next. Select Service principal (automatic) and proceed to the final step of creating
a service principal.
3. You will be prompted to create a new service connection. Set the scope
as Machine Learning Workspace and point to the Subscription, Resource group
and Machine Learning Workspace as shown in Figure 7.3:
With this, your service principal with the given name (for example, mlops_sp) is ready
to be used for orchestrating CI/CD pipelines. Next, we will install the extension used for
the pipelines.
After entering the Marketplace, you will be presented with multiple extensions to add to
your Azure DevOps project. Next, we will search for the Machine Learning extension.
2. Search for the Machine Learning extension and install the extension for free. Click
the Get it free button to install the extension as shown in Figure 7.5:
1. As shown in Figure 7.7, go to the Artifacts section, click Add, select Azure
Repository, and then select the repository (for example, Learn_MLOps) to connect
with the release pipeline:
In the same way, connect the support_vector_classifier model to the release pipeline
artifacts. Start by clicking the Add button on Artifacts, select Azure ML Model
Artifact, point to the service endpoint (the service principal connected to your
Azure ML workspace, for example: mlops_sp) and select the support_vector_
classifier model trained previously in Chapter 4, Machine Learning Pipelines. Add the
model artifact to the pipeline by hitting the Add button:
1. To get started, click on the Add a stage box in the Stages section and add an
empty job (as shown in Figure 7.6) with the name DEV TEST. We will name the
stage DEV TEST as this will be our development and testing environment. Ideally,
both DEV and TEST are different stages, but for simplicity and avoiding repetitive
implementation, we will merge them both. See the following Figure 7.10:
2. After naming the stage, save the stage by clicking the Save button at the top. Every
stage is a composition of a series of steps or jobs to check the robustness of the
stage. Next, we will configure the jobs within the DEV TEST stage. A CI/CD job,
in simple terms, is a process or script to execute or test deployments (for example,
a job to deploy a model on the Kubernetes cluster). To configure jobs, click on the
1 job, 0 task link in the DEV TEST stage, as shown in Figure 7.11:
Upon clicking the 1 job, 0 task link in the DEV TEST stage, you will have to add
agent jobs.
3. Add a task to the agent job by clicking + in the Agent job tab. We will use a
pre-made template job named AzureML Model Deploy. Search and add AzureML
model deploy, as shown in Figure 7.12:
Next, we will look at the inferenceConfig file and its functionality. The following
snippet is taken from inferenceConfig.yml (in the repository). Here is a snapshot
of inferenceConfig.yml:
inferenceConfig.yml
entryScript: score.py
runtime: python
condaFile: myenv.yml
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- pip:
- numpy
- onnxruntime
- joblib
- azureml-core~=1.10.0
- azureml-defaults~=1.10.0
- scikit-learn==0.20.3
- inference-schema
- inference-schema[numpy-support]
- azureml-monitoring
channels:
- anaconda
- conda-forge
Setting up a CI/CD pipeline and the test environment (using Azure DevOps) 183
Both the score.py and myenv.yml files are tied up in the inferenceConfig.yml
file to facilitate the deployment and inference of ML models. Proceed by selecting your
inference configuration file (inferenceConfig.yml), as shown in Figure 7.14:
AciDeploymentConfig.yml
computeType: ACI
containerResourceRequirements:
cpu: 1
memoryInGB: 1
authEnabled: False
sslEnabled: False
appInsightsEnabled: True
It contains the infrastructural definition for provisioning the requisite compute for
deployment, such as CPU units, memory in GB, and other authentication or security
definitions. Let's select this deployment configuration file to set up the release pipeline for
the staging environment, as shown in Figure 7.15:
1. Click on the Create release button to execute jobs configured on your pipeline.
A popup will appear on the right of your screen (as shown in Figure 7.16)
to view and select artifacts to deploy in your staging environment.
2. Select the artifacts (_scaler and _support-vector-classifier) and select
their versions. For simplicity, version 1 is recommended for both.
If you want to choose another version of your model or scaler make sure to change
the path of your model and scaler in the score.py file (that is, insert the appropriate
version number in the scaler and model paths model-scaler/{version
number}/modelscaler.pkl and support-vector-classifier/
{version number} /svc.onnx. If you choose version 1, you don't have to
worry about changing the code in score.py file as the paths contain version 1.
3. After selecting artifacts and needed versions (version 1 is recommended), click on
the Create button to create the release for your selected artifacts:
4. Now the release pipeline (the CI/CD pipeline) is triggered to execute. All the steps
defined in the pipeline will execute, such as downloading the artifacts, provisioning
the ACI compute instance for deployment, and deploying the web service. Upon
successful execution, you'll be notified with a green tick-mark on your release, as
shown in Figure 7.17:
6. Finally, go and check your Azure ML workspace (from the Endpoints section) to
view the deployed web service, as shown in Figure 7.19:
• Artifactory triggers
Artifacts are generated at different stages in the pipeline and development process.
Generated artifacts, such as a trained model, metadata, uploaded Docker images,
or any file that has been uploaded, can be triggered to execute a certain process
in the CI/CD pipeline. Having such options can enable great flexibility and
functionality for the CI/CD pipeline.
• Docker Hub triggers
Every time you push a new Docker image to a Docker Hub repository of your choice,
a trigger in the CI/CD pipeline can be executed as per requirements. For example,
when you upload a new Docker image to Docker Hub (or Azure Container Registry),
the pipeline is triggered to deploy the Docker image as a web service.
• Schedule triggers
The pipeline process can be triggered following a specific time schedule. This type
of trigger is very useful for a scheduled clean-up or cron jobs or any other workflow
that needs to be run following a time interval; for example, a trigger for ML model
retraining at 12:00 every day.
• API triggers
The purpose of API triggers is to integrate with external services (or any other
application or service you have). This can be set up so your pipeline process is
triggered based on an event on another system. For example, when the system
admin comments retrain on a developer's platform, the pipeline can be triggered
to retrain the existing deployed model. These triggers are facilitated using API calls.
• Git triggers
Git triggers are commonly used to trigger pipeline executions, for instance when
new code is committed to a branch or a new pull request is made. When changes
are made to a repository, then certain processes can be triggered in the pipeline
as per requirements.
Pipeline execution triggers 189
Azure DevOps provides multiple trigger options (all of the above). Now, let's set up a Git
trigger, based on the Git commit made to the repository:
1. Go to Pipelines >> Releases and click Edit (in the top right of your screen) to edit
the existing pipeline.
2. Click on the repository artifact (named _Learn_MLOps), as shown in Figure 7.20,
and enable (by clicking on the toggle switch) the continuous deployment trigger.
3. Add a branch filter by including the develop branch. This will trigger the pipeline to
execute when changes or commits are made to the develop branch of the repository.
For the test or staging stage, configure a Git trigger for the develop branch only (not
the master or another branch). For production we can configure a Git trigger for the
master branch. This way, we can separate the Git trigger branches for the test and
production stages:
4. Click on the Save button at the top to configure the Git trigger. Congratulations! You
have successfully set up a continuous deployment Git trigger for your test environment.
Whenever there are changes to the develop branch of the repository, the pipeline will be
triggered to deploy a web service in the test (DEV TEST) environment.
Summary
In this chapter, we have learned the key principles of continuous operations in MLOps,
primarily, continuous integration, delivery, and deployment. We have learned this
by performing a hands-on implementation of setting up a CI/CD pipeline and test
environment using Azure DevOps. We have tested the pipeline for execution robustness
and finally looked into some triggers to enhance the functionality of the pipeline and also
set up a Git trigger for the test environment. This chapter serves as the foundation for
continual operations in MLOps and equips you with the skills to automate the deployment
pipelines of ML models for any given scenario on the cloud, with continual learning
abilities in tune with your business.
In the next chapter, we will look into APIs, microservices, and what they have to offer for
MLOps-based solutions.
8
APIs and
Microservice
Management
In this chapter, you will learn about APIs and microservice management. So far, we have
deployed ML applications that are served as APIs. Now we will look into how to develop,
organize, manage, and serve APIs. You will learn the principles of API and microservice
design for ML inference so that you can design your own custom ML solutions.
In this chapter, we will learn by doing as we build a microservice using FastAPI and
Docker and serve it as an API. For this, we will go through the fundamentals of designing
an API and microservice for an ML model trained previously (in Chapter 4, Machine
Learning Pipelines). Lastly, we will reflect on some key principles, challenges, and tips to
design a robust and scalable microservice and API for test and production environments.
The following topics will be covered in this chapter:
By accessing and communicating with application data and functionalities, APIs have
enabled the world's electronics, applications, and web pages to communicate with each
other in order to work together to accomplish business or operations-centric tasks.
In Figure 8.1, we can see the role of an API as it enables access to application data (from
the database) and communication with third parties or other applications such as mobile
applications (for mobile users), weather applications (on mobile or the web), electric
cars, and so on. APIs have been in operation since the dawn of computers, intending
to enable inter-application communication. Over time, we have seen developers come
to a consensus with protocols such as Simple Object Access Protocol (SOAP) and
Representational State Transfer (REST) in the early 2000s. In recent years, a generation
of new types of API protocols have been developed, such as Remote Procedure Call
(RPC) and GraphQL as seen in the following table:
Microservices
Microservices are a modern way of designing and deploying apps to run a service.
Microservices enable distributed applications rather than one big monolithic application
where functionalities are broken up into smaller fragments (called microservices). A
microservice is an individual application in a microservice architecture. This is contrary
to centralized or monolithic architectures, where all functionalities are tied up together
in one big app. Microservices have grown in popularity due to Service-Oriented
Architecture (SOA), an alternative to developing traditional monolithic (singular and
self-sufficient applications).
194 APIs and Microservice Management
The relationship between APIs and microservices is fascinating as it has two sides. As a
result of microservices-based architecture, an API is a direct outcome of implementing
that architecture in your application. Whereas at the same time, an API is an essential tool
for communicating between services in a microservices-based architecture to function
efficiently. Let's have a look at the next section, where we will glance through some
examples of ML applications.
We can see in Figure 8.3, the app is dockerized and deployed to the server:
All of the communication between each microservice is facilitated using APIs. The
advantages of microservice architecture are that if any of the services crash or errors
take place, that particular microservice is spawned to replace the failed one to keep the
whole service running. Secondly, each microservice can be maintained and improved
continuously by a dedicated team (of data scientists, developers, and DevOps engineers),
unlike coordinating teams, to work on a monolithic system.
198 APIs and Microservice Management
A RESTful API uses existing HTTP methodologies that are defined by the RFC 2616
protocol. Table 8.2 summarizes the HTTP methods in combination with their CRUD
operations and purpose in ML applications.
The fundamental HTTP methods are GET, POST, PUT, PATCH, and DELETE. These
methods correspond to CRUD operations such as create, read, update, and delete.
Using these methods, we can develop RESTful APIs to serve ML models. RESTful
APIs have gained significant adoption due to drivers such as OpenAPI. The OpenAPI
Specification is a standardized REST API description format. It has become a standardized
format for humans and machines; it enables REST API understandability and provides
extended tooling such as API validation, testing, and an interactive documentation
generator. In practice, the OpenAPI file enables you to describe an entire API with critical
information such as the following:
Learn_MLOps
├──08_API_Microservices
│ ├── Dockerfile
├── app
└── variables.py
└── weather_api.py
└── requirements.txt
└── artifacts
└── model-scaler.pkl
└── svc.onnx
The files listed in the directory tree for the folder 08_API_Microservices include
a Dockerfile (used to build a Docker image and container from the FASTAPI service)
and a folder named app. The app folder contains the files weather_api.py (contains
the code for API endpoint definitions), variables.py (contains the input variables
definition), and requirements.txt (contains Python packages needed for running
the API service), and a folder with model artifacts such as a model scaler (used to scale
incoming data) and a serialized model file (svc.onnx).
The model was serialized previously, in the model training and evaluation stage, as seen
in Chapter 5, Model Evaluation and Packaging. The model is downloaded and placed in
the folder from the model registry in the Azure Machine learning workspace (Learn_
MLOps) as shown in Figure 8.3:
Hands-on implementation of serving an ML model as an API 201
variables.py
We use only one package for defining input variables. The package we use is called
pydantic; it is a data validation and settings management package using Python-
type annotations. Using pydantic, we will define input variables in the class named
WeatherVariables used for the fastAPI service:
class WeatherVariables(BaseModel):
temp_c: float
202 APIs and Microservice Management
humidity: float
wind_speed_kmph: float
wind_bearing_degree: float
visibility_km: float
pressure_millibars: float
current_weather_condition: float
In the WeatherVariables class, define variables and their types as shown in the
preceding code. The same variables that were used for training the model in Chapter
4, Machine Learning Pipelines, will be used for inference. We define those input
variables here as temp_c, humidity, wind_speed_kmph, wind_bearing_
degree, visibility_km, pressure_millibars, and current_weather_
condition. Data types for these variables are defined as float. We will import the
WeatherVariables class and use the defined input variables in the fastAPI service.
Let's look at how we can use the variables defined in the WeatherVariables class in
the fastAPI service using the Weather_api.py file.
Weather_api.py
This file is used to define the fastAPI service. The needed model artifacts are imported
and used to serve API endpoints to infer the model for making predictions in real time or
in production:
Note
We imported the WeatherVariables class previously created in the
variables.py file. We will use the variables defined in this file for
procuring input data for the fastAPI service.
2. Next, we create an app object. You will notice some syntactic similarities of
fastAPI with the Flask web framework (if you have ever used Flask).
For instance, in the next step, we create the app object using the function
FastAPI() to create the app object. Creating an app object is similar to how
we do it via the Flask example: from Flask, import Flask and then we use the
Flask function to create the app object in the manner app = Flask (). You
will notice such similarities as we build API endpoints using fastAPI:
app = FastAPI()
3. After creating the app object, we will import the necessary model artifacts for
inference in the endpoints. Pickle is used to deserialize the data scaler file
model-scaler.pkl. This file was used to train the model (in Chapter 4, Machine
Learning Pipelines), and now we'll use it to scale the incoming data before model
inference. We will use the previously trained support vector classifier model, which
was serialized into the file named scv.onnx (we can access and download the file
as shown in Figure 8.3).
204 APIs and Microservice Management
4. ONNX Runtime is used to load the serialized model into inference sessions (input_
name and label_name) for making ML model predictions. Next, we can move
to the core part of defining the API endpoints to infer the ML model. To begin,
we make a GET request to the index route using the wrapper function @app.
get('/'):
@app.get('/')
def index():
return {'Hello': 'Welcome to weather prediction
service, access the api docs and test the API at
http://0.0.0.0/docs.'}
A function named index() is defined for the index route. It returns the welcome
message, pointing to the docs link. This message is geared toward guiding the users
to the docs link to access and test the API endpoints.
5. Next, we will define the core API endpoint, /predict, which is used to infer the
ML model. A wrapper function, @app.post('/predict'), is used to make a
POST request:
@app.post('/predict')
def predict_weather(data: WeatherVariables):
data = data.dict()
6. Next, we convert the data into a dictionary, fetch each input variable from the
dictionary, and compress them into a numpy array variable, data_to_pred. We
will use this variable to scale the data and infer the ML model:
data_to_pred = numpy.array([[temp_c, humidity, wind_
speed_kmph,
wind_bearing_degree,visibility_km, pressure_millibars,
current_weather_condition]])
The data (data_to_pred) is reshaped and scaled using the scaler loaded
previously using the fit_transform() function.
7. Next, the model inference step, which is the key step, is performed by inferencing
scaled data to the model, as shown in the preceding code. The prediction inferred
from the model is then returned as the output to the prediction variable:
if(prediction[0] > 0.5):
prediction = "Rain"
else:
prediction = "No_Rain"
return {
'prediction': prediction
}
206 APIs and Microservice Management
Requirement.txt
This text file contains all the packages needed to run the fastAPI service:
numpy
fastapi
uvicorn
scikit-learn==0.20.3
pandas
onnx
onnxruntime
These packages should be installed in the environment where you would like to run the
API service. We will use numpy, fastapi (an ML framework for creating robust APIs),
uvicorn (an AGSI server), scikit-learn, pandas, onnx, and onnxruntime (to
deserialize and infer onnx models) to run the FastAPI service. To deploy and run the API
service in a standardized way, we will use Docker to run the FastAPI service in a Docker
container.
Next, let's look at how to create a Dockerfile for the service.
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7
COPY ./app /app
Testing the API 207
Firstly, we use an official fastAPI Docker image from Docker Hub by using the
FROM command and pointing to the image – tiangolo/uvicorn-gunicorn-
fastapi:python3.7. The image uses Python 3.7, which is compatible with fastAPI.
Next, we copy the app folder into a directory named app inside docker image/
container. After the folder app is copied inside the Docker image/container, we will
install the necessary packages listed in the file requirements.txt by using the RUN
command.
As the uvicorn server (AGSI server) for fastAPI uses port 80 by default, we will
EXPOSE port 80 for the Docker image/container. Lastly, we will spin up the server inside
the Docker image/container using the command CMD "uvicorn weather_api:app
–host 0.0.0.0 –port 80". This command points to the weather_api.py file
to access the fastAPI app object for the service and host it on port 80 of the image/
container.
Congrats, you are almost there. Now we will test the microservice for readiness and see
whether and how it works.
1. Let's start by building the Docker image. For this, a prerequisite is to have Docker
installed. Go to your terminal or Command Prompt and clone the repository to
your desired location and access the folder 08_API_Microservices. Execute
the following Docker command to build the Docker image:
docker build -t fastapi .
208 APIs and Microservice Management
Execution of the build command will start building the Docker image following
the steps listed in the Dockerfile. The image is tagged with the name fastapi.
After successful execution of the build command, you can validate whether the
image is built and tagged successfully or not using the docker images command.
It will output the information as follows, after successfully building the image:
(base) user ~ docker images
REPOSITORY TAG IMAGE ID CREATED
SIZE
fastapi latest 1745e964f57f 56 seconds ago
1.31GB
2. Run a Docker container locally. Now, we can spawn a running Docker container
from the Docker image created previously. To run a Docker container, we use the
RUN command:
3. Test the API service using sample data. We will check whether the container is
running successfully or not. To check this, use the following command:
docker container ps
We can see that the container from the image fastapi is mapped and successfully
running on port 80 of the local machine. We can access the service and test it from the
browser on our local machine at the address 0.0.0.0:80.
Note
If you have no response or errors when you run or test your API service, you
may have to disable CORS validation from browsers such as Chrome, Firefox,
and Brave or add an extension (for example, go to the Chrome Web Store and
search for one) that will disable CORS validation for running and testing APIs
locally. By default, you don't need to disable CORS; do it only if required.
4. Now, test the /predict endpoint (by selecting the endpoint and clicking the Try it
out button) using input data of your choice, as shown in Figure 8.6:
Figure 8.7 – Input for the request body of the FastAPI service
5. Click Execute to make a POST call and test the endpoint. The input is inferred with
the model in the service and the model prediction Rain or No_Rain is the output
of the POST call, as shown in Figure 8.7:
Testing the API 211
Summary
In this chapter, we have learned the key principles of API design and microservice
deployment in production. We touched upon the basics of API design methods and
learned about FastAPI. For our business problem, we have learned by doing a practical
implementation of developing an API service in the Hands-on implementation of serving
an ML model as an API section using FastAPI and Docker. Using the practical knowledge
gained in this chapter, you can design and develop robust API services to serve your ML
models. Developing API services for ML models is a stepping stone to take ML models to
production.
In the next chapter, we will delve into the concepts of testing and security. We will
implement a testing method to test the robustness of an API service using Locust. Let's go!
9
Testing and Securing
Your ML Solution
In this chapter, we will delve into Machine Learning (ML) solution testing and security
aspects. You can expect to get a primer on various types of tests to test the robustness
and scalability of your ML solution, as well as the knowledge required to secure your
ML solution. We will look into multiple attacks on ML solutions and ways to defend your
ML solution.
In this chapter, we will be learning with examples as we perform load testing and security
testing for the business use case of weather prediction we have been previously working
on. We will start by reflecting on the need for testing and securing your ML solution and
go on to explore the other following topics in the chapter:
Data testing
The goal of testing data is to ensure that the data is of a high enough quality for ML model
training. The better the quality of the data, the better the models trained for the given
tasks. So how do we assess the quality of data? It can be done by inspecting the following
five factors of the data:
• Accuracy
• Completeness (no missing values)
• Consistency (in terms of expected data format and volume)
• Relevance (data should meet the intended need and requirements)
• Timeliness (the latest or up-to-date data)
Testing your ML solution by design 215
Based on these factors, if a company can manage each dataset's data quality when received
or created, the data quality is guaranteed. Here are some steps that your team or company
can use as quality assurance measures for your data:
Model testing
Model tests need to cover server issues such as the following:
These tests can be orchestrated in two phases: pre-training and post-training. Having
these tests facilitated in the workflow can produce robust models for production. Let's
look at what pre-train and post-train tests can be done by design.
216 Testing and Securing Your ML Solution
Pre-training tests
Tests can be performed to catch flaws before we proceed to the training stage. These flaws
could be in the data, pipelines, or parameters. Figure 9.1 suggests running pre-training
and post-training tests as part of a proposed workflow for developing high-quality models:
• Eliminating data pipeline debt by handling any data leakage, edge cases, and
optimizing to make the pipeline time- and resource-efficient
• Making sure the shape of your model output matches the labels in your dataset
• Examining the output ranges to make sure they match our expectations (such
as checking that the output of a classification model is a distribution with class
probabilities that sum to 1)
• Examining your training and validation datasets for label leakage
• Making sure the ETL pipeline outputs or fetches data in the required format
Pre-training tests do not need parameters to run, but they can be quite useful in catching
bugs before running the model training.
Post-training tests
Post-training tests enable us to investigate model performance and the logic behind
model predictions and foresee any possible flaws in the model before deploying the model
to production. Post-training tests enable us to detect flaws in model performance and
functionality. Post-training tests involve a model performance evaluation test, invariance
test, and minimum functionality test. Here is a recommended read for more insights on
post-training tests: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
(https://homes.cs.washington.edu/~marcotcr/acl20_checklist.pdf)
Hands-on deployment and inference testing (a business use case) 217
Using pip, locust will be installed – it takes around a minute to install. After
installation is successful it's time to curate the Python script using the locust.io
SDK to test an endpoint.
2. Curate the load_test.py script: Go to your favorite IDE and start curating the
script or follow the steps in the premade script. To access the premade script go to
the Engineering MLOps repository cloned previously, access the 09_Testing_
Security folder, and go to the load_test.py file. Let's demystify the code in
load_test.py – firstly, the needed libraries are imported as follows:
import time
import json
from locust import HttpUser, task, between
218 Testing and Securing Your ML Solution
We imported the time, json, and locust libraries, and then from locust we
import the following required functions: HttpUser (a user agent that can visit
different endpoints), task, and between.
3. Create a test_data variable with sample test data to infer the ML model during
the load test. Define headers we will use for the API calls in our load test:
test_data = json.dumps({"data": [[8.75, 0.83, 70, 259,
15.82, 1016.51, 1.0]]})
headers = {'Content-Type': 'application/json'}
4. Next, we will implement the core functionality of the load test as part of the
MLServiceUser class (you can name it whatever you want) by extending
HttpUser. HttpUser is the user agent that can visit different endpoints:
class MLServiceUser(HttpUser):
wait_time = between(1, 5)
@task
def test_weather_predictions(self):
self.client.post("", data=test_data,
headers=headers)
Locust -f load_test-py
Hands-on deployment and inference testing (a business use case) 219
The execution of the previous command will spin up the locust server at port
8089. We can perform load tests on the web interface rendered by locust.io. To
access the web service, open a browser of your choice and go to the following web
address: http://0.0.0.0:8089/, as shown in Figure 9.2:
7. Lastly, enter the endpoint or host you would like to load test and hit Start swarming
to start performing the load test. In Chapter 7, Building Robust CI and CD Pipelines,
we deployed an endpoint. It is recommended to test the deployed endpoint.
8. Go to your Azure ML workspace, access the Endpoints section, access the deployed
endpoint named dev-webservice, and copy and paste the endpoint web address into
the host textbox.
9. Next, click Start swarming to start load testing the endpoint. This will start the
load test and open a new page where you can monitor your load tests in real time as
shown in Figure 9.3:
If you have no failed requests and the average response time is within the range
required, then your endpoint has passed the load test and is ready to be served
to users. After or during the load testing, you can view charts of the load-testing
performance with critical information such as total requests per second, response
times, and the number of users with the progression of time. We can view this
information in real time as shown in Figure 9.4 and Figure 9.5:
Figure 9.4 – Charts showing the total requests per second and response times
In Figure 9.4 we can notice that the number of requests per second is in the range of
18-22 as the simulated users of locust.io make requests, and the response time in
milliseconds varies from 70 to 500 in some cases, with a 430ms variance between the
minimum and maximum. The average request time is 75ms (as seen in Figure 9.3).
222 Testing and Securing Your ML Solution
Please note that this kind of performance may or may not be ideal for a given use case,
depending on your business or user needs. A more stable response time is desirable; for
instance, a response time variance of no more than 50ms between the minimum and
maximum response times may be preferable for a stable performance. To achieve such
performance it is recommended to deploy your models on higher-end infrastructure as
appropriate, for example, a GPU or a high-end CPU, unlike the deployment on a CPU in
an Azure container instance. Similarly, in Figure 9.5 we can see the response times versus
the number of users:
Figure 9.5 – Charts showing the total requests per second and the number of users
We can see that the number of users spawned per second is 50, as mentioned (in Figure 9.2).
As time progresses, the spawn rate is constant and response times vary between 70-500ms,
with 75ms as the average response time.
1. Document or download results: After the load test has been executed successfully
you can document or present the results of the load test to the relevant stakeholders
(QA/product manager) using a test report. To download or access the test report,
go to the Download Data section and download the required information, such
as request statistics, failures, or exceptions, in the form of .csv files, as shown in
Figure 9.6:
Hands-on deployment and inference testing (a business use case) 223
A comprehensive test report is presented with critical information, such as the endpoint
inferred average request time and the minimum and maximum request times, and this
information is also presented in the form of visualized charts as seen in Figure 9.7. You
can also download this full report to present to your respective stakeholders.
Congratulations, you have performed a hands-on load test to validate your endpoint and
check whether your ML service is able to serve your business or user needs with efficiency.
Let's reflect upon each area of the ML life cycle and address confidentiality, integrity, and
availability in each area while looking at the different types of attacks.
Types of attacks
We will explore some of the most common attacks on ML systems. At a high level, attacks
by hackers can be broken down into four categories: poisoning, input attacks and evasion,
reverse engineering, and backdoor attacks. Let's see how attackers manage to infiltrate ML
systems via these attacks.
Poisoning
A hacker or attacker seeks to compromise an AI model in a poisoning attack. Poisoning
attacks can happen at any stage (training, deployment, or real-time inference). They occur
typically in training and inference. Let's see how poisoning attacks are implemented in
three typical ways:
• Dataset poisoning: Training datasets contain the knowledge on which the model
is trained. An attacker can manipulate this knowledge by infiltrating the training
dataset. Here, the attacker introduces wrongly labeled or incorrect data into the
training dataset, and with this the entire learning process is distorted. This is a direct
way to poison a model. Training datasets can be poisoned during the data collection
and curation phases, and it can be hard to notice or detect it as the training datasets
can come from multiple sources, can be large, and also as the attacker can infiltrate
within data distributions.
• Algorithm poisoning happens when an attacker meddles with the algorithm used
to train the model. It can be as simple as infiltrating hyperparameters or fiddling
with the architecture of the algorithm. For example, let's take federated learning
(which aims to preserve the privacy of individuals' data) where model training is
done on multiple subsets of private data (such as healthcare data from multiple
hospitals while preserving patients' confidential information). Multiple models are
derived from each subset and then combined to form a final model. During this,
an attacker can manipulate any subset of the data and influence the final resulting
model. The attacker can also create a fake model from fake data and concatenate it
with models produced from training on multiple subsets of private data to produce
a final model that deviates from performing the task efficiently, or serves the
attacker's motives.
• Model poisoning occurs when an attacker replaces a deployed model with an
alternative model. This kind of attack is identical to a typical cyber-attack where the
electronic files containing the model could be modified or replaced.
226 Testing and Securing Your ML Solution
Reverse engineering
For a user of an AI system, it can be a black box or opaque. It is common in AI systems to
accept inputs to generate outputs without revealing what is going on inside (in terms of
both the logic and algorithm). Training datasets, which effectively contain all the trained
system's knowledge, are also usually kept confidential. This, in theory, makes it impossible
for an outsider to predict why particular outputs are produced or what is going on inside
the AI system in terms of the algorithm, training data, or logic. However, in some cases,
these systems can be prone to reverse engineering. The attacker or hackers' goal in reverse
engineering attacks is to replicate the original model deployed as a service and use it to
their advantage.
In a paper titled Model Extraction Attacks against Recurrent Neural Networks
(https://arxiv.org/pdf/2002.00123.pdf), published in February 2020,
researchers conducted experiments on model extraction attacks against an RNN and
an LSTM trained with publicly available academic datasets. The researchers effectively
reproduce the functionality of an ML system via a model extraction attack. They
demonstrate that a model extraction attack with high accuracy can be extracted
efficiently, primarily by replicating or configuring a loss function or architecture from
the target model.
In another instance, researchers from the Max Planck Institute for Informatics showed in
2018 how they were able to infer information from opaque models by using a sequence of
input-output queries.
Summary 227
Backdoor attacks
In backdoor attacks, the attacks can embed patterns of their choice in the model in the
training or inference stages and infer the deployed model using pre-curated inputs to
produce unexpected outputs or triggers to the ML system. Therefore, backdoor attacks can
happen both in the training and inference phases, whereas evasion and poisoning attacks
can occur in a single phase during training or inference.
Poison attacks can be used as part of the attack in backdoor attacks, and in some
instances, the student model can learn to hack some backdoors from the teacher model
using transfer learning.
Backdoor attacks can cause integrity challenges, especially in the training stage, if the
attacker manages to use a poison attack to infiltrate training data and trigger an update
to the model or system. Also, backdoor attacks can be aimed to degrade performance,
exhaust or redirect resources that can lead to the system's failure, or attempt to introduce
peculiar behavior and outputs from the AI system.
Summary
In this chapter, we have learned the key principles of testing and security by design.
We explored the various methods to test ML solutions in order to secure them. For a
comprehensive understanding and hands-on experience, implementation was done to
load test our previously deployed ML model (from Chapter 7, Building Robust CI and CD
Pipelines) to predict the weather. With this, you are ready to handle the diverse testing and
security scenarios that will be channeled your way.
In the next chapter, we will delve into the secrets of deploying and maintaining robust ML
services in production. This will enable you to deploy robust ML solutions in production.
Let's delve into it.
10
Essentials of
Production Release
In this chapter, you will learn about the continuous integration and continuous
delivery (CI/CD) pipeline, the essentials of a production environment, and how to set
up a production environment to serve your previously tested and approved machine
learning (ML) models to end users. We will set up the required infrastructure for the CI/
CD pipeline's production environment, configure processes for production deployments,
configure pipeline execution triggers for complete automation, and learn how to manage
production releases. This chapter will cover the essential fundamentals of the CI/CD
pipeline and production environment since the pipeline is the product, not the model.
By learning about the fundamentals of CI/CD pipelines, you will be able to develop,
test, and configure automated CI/CD pipelines for your use cases or business. We will
cover an array of topics around production deployments and then delve into a primer on
monitoring ML models in production.
230 Essentials of Production Release
Let's begin by setting up the infrastructure that's required to build the CI/CD pipeline.
Let's look into the easiest way first; that is, using the Azure Machine Learning workspace
to provision an Azure Kubernetes cluster for production.
Setting up the production infrastructure 231
1. Go to the Azure Machine Learning workspace and then go to the Compute section,
which presents options for creating different types of computes. Select Inference
clusters and click Create, as shown in the following screenshot:
2. Clicking the Create button will present various compute options you can use to
create a Kubernetes service. You will be prompted to select a Region, which is where
your compute will be provisioned, and some configuration so that you can provision
in terms of cores, RAM, and storage. Select a suitable option (it is recommended
that you select Standard_D2_v4 as a cost-optimal choice for this experiment), as
shown in the following screenshot:
5. Once your AKS cluster has been provisioned, you will see a running Kubernetes
cluster with the name you provided for the compute (for example, prod-aks), as
shown in the preceding screenshot.
Figure 10.5 – Fetching the config file from your Azure Machine Learning workspace
Setting up the production infrastructure 235
├──10_Production_Release
├── create_aks_cluster.py
├── config.json
With this, you are now set to run the script (create_aks_cluster.py) to create AKS
compute for production deployments. Let's look at the create_aks_cluster.py script:
2. By importing the necessary functions, you can start using them by connecting to
your Azure Machine Learning workspace and creating the ws object. Do this by
using the Workspace function and pointing it to your config.json file, like so:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, sep =
'\n')
4. Next, we will create an AKS Kubernetes cluster for production deployments. Start
by choose a name for your AKS cluster (reference it to the aks_name variable),
such as prod-aks. The script will check if a cluster with the chosen name already
exists. We can use the try statement to check whether the AKS target with the
chosen name exists by using the ComputeTarget() function. It takes the
workspace object and aks_name as parameters. If a cluster is found with the
chosen name, it will print the cluster that was found and stop execution. Otherwise,
a new cluster will be created using the ComputeTarget.create() function,
which takes the provisioning config with the default configuration:
# Choose a name for your AKS cluster
aks_name = 'prod-aks'
if aks_target.get_status() != "Succeeded":
aks_target.wait_for_completion(show_output=True)
After successfully executing the preceding code, a new cluster with the chosen name (that
is, prod-aks) will be created. Usually, creating a new cluster takes around 15 minutes.
Once the cluster has been created, it can be spotted in the Azure Machine Learning
workspace, as we saw in Figure 10.4. Now that we have set up the prerequisites for
enhancing the CI/CD pipeline for our production environment, let's start setting it up!
Setting up our production environment in the CI/CD pipeline 237
1. Go to the Azure DevOps project you worked on previously and revisit the Pipelines
| Releases section to view your Port Weather ML Pipeline. We will enhance this
pipeline by creating a production stage.
2. Click on the Edit button to get started and click on Add under the DEV TEST
stage, as shown in the following screenshot:
3. Clicking the Add button under the DEV TEST stage will prompt you to select a template
to create a new stage. Select EMPTY JOB option (under Select a template text) and name
the stage production or PROD and save it, as shown in the following screenshot:
6. Next, point to your inference configuration file from the Azure DevOps repository,
as shown in the following screenshot. The inference configuration file represents
the configuration settings for a custom environment that's used for deployment.
We will use the same inference Config.yml file that we used for the DEV TEST
environment as the environment for deployment should be the same:
The deployment configuration file we will use for the production environment is
called AksDeploymentConfig.yml. It contains the configuration details for
our deployment, including enabling autoscaling with a min and max number of
replicas, authentication configuration, monitoring configuration and container
resource requirements with CPU (for other situations where inference needs to be
very fast or larger in-memory processing is needed, you may want to consider using
a GPU resource), and memory requirements, as shown here:
AksDeploymentConfig.yml
computeType: AKS
autoScaler:
autoscaleEnabled: True
minReplicas: 1
maxReplicas: 3
refreshPeriodInSeconds: 10
targetUtilization: 70
authEnabled: True
containerResourceRequirements:
cpu: 1
memoryInGB: 1
appInsightsEnabled: False
scoringTimeoutMs: 1000
maxConcurrentRequestsPerContainer: 2
maxQueueWaitMs: 1000
sslEnabled: False
With that, you have successfully set up the production environment and integrated it with
your CI/CD pipeline for automation. Now, let's test the pipeline by executing it.
Testing our production-ready pipeline 243
1. First, create a new release, go to the Pipelines | Releases section, select your
previously created pipeline (for example, Port Weather ML Pipeline), and click on
the Create Release button at the top right-hand side of the screen to initiate a new
release, as shown here:
2. Select the artifacts you would like to deploy in the pipeline (for example, Learn_
MLOps repo, _scaler, and support-vector-classifier model and
select their versions. Version 1 is recommended for testing PROD deployments
for the first time), and click on the Create button at the top right-hand side of the
screen, as shown in the preceding screenshot. Once you've done this, a new release
is initiated, as shown in the following screenshot:
4. Upon successfully working on a release, both the DEV TEST and PROD stages will
be deployed using CI and CD. You must ensure that the pipeline is robust. Next,
we can customize the pipeline further by adding custom triggers that will automate
the pipeline without any human supervision. Automating CI/CD pipelines without
any human supervision can be risky but may have advantages, such as real-time
continuous learning (monitoring and retraining models) and faster deployments.
It is good to know how to automate the CI/CD pipeline without any human
supervision in the loop. Note that it is not recommended in many cases as there is a
lot of room for error. In some cases, it may be useful – it really depends on your use
case and ML system goals. Now, let's look at triggers for full automation.
1. Go to the Pipelines | Releases section and select your pipeline (for example, Port
Weather ML pipeline). Then, click Edit:
2. To set up a Git trigger for the master branch (when changes are made to the master branch, a
new release is triggered), click on the Trigger icon (thunder icon) and move the on/off switch
button from disabled to enabled. This will enable the continuous deployment trigger.
3. Lastly, add a branch filter and point to the branch that you would like to set up a
trigger for – in this case, the master branch – as shown in the preceding screenshot.
Save your changes to set up the Git trigger.
By implementing these steps, you have set up a continuous deployment trigger to initiate a
new release when changes are made to the master branch.
1. Go to the Pipelines | Releases section and select your pipeline (for example, Port
Weather ML pipeline). Then, click Edit:
Upon clicking the Edit button, you will be directed to a portal where you can edit
your pipeline, as shown in the preceding screenshot.
2. To set up an Artifact trigger for your model, click on your choice of model, such as
Support Vector Classifier (SVC), and enable Continuous deployment trigger. In
the preceding screenshot , a trigger has been enabled for a model (SVC). Whenever
a new SVC model is trained and registered to the model registry that's connected to
your Azure Machine Learning workspace, a new release will be triggered to deploy
the new model via the pipeline.
3. Lastly, save your changes to set up an Artifact trigger for the SVC model. You
have a continuous deployment trigger set up to initiate a new release when a new
SVC model is trained and registered on your Azure Machine Learning workspace.
The pipeline will fetch the new model and deploy it to the DEV TEST and PROD
environments.
By implementing these steps, you have a continuous deployment trigger set up to initiate
a new pipeline release when a new artifact is created or registered in your Azure Machine
Learning workspace.
1. Go to the Pipelines | Releases section and select your pipeline (for example, Port
Weather ML pipeline). Then, click Edit, as shown in the following screenshot:
Configuring pipeline triggers for automation 249
By implementing these steps, you have a continuous deployment trigger set up to initiate a
new pipeline release at a set time interval.
Congratulations on setting up Git, Artifact, and Schedule triggers. These triggers enable
full automation for the pipeline. The pipeline has been set up and can now successfully
test and deploy models. You also have the option to semi-automate the pipeline by adding
a human or Quality Assurance (QA) expert to approve each stage in the pipeline. For
example, after the test stage, an approval can be made by the QA expert so that you can
start production deployment if everything was successful in the test stage. As a QA expert,
it is vital to monitor your CI/CD pipeline. In the next section, we'll look at some best
practices when it comes to managing pipeline releases.
250 Essentials of Production Release
Thoroughly understanding these questions post-release can help you improve and iterate
on your strategy and develop better release management practices.
252 Essentials of Production Release
Summary
In this chapter, we covered the essential fundamentals of the CI/CD pipeline and
production environment. We did some hands-on implementation to set up the production
infrastructure and then set up processes in the production environment of the pipeline for
production deployments. We tested the production-ready pipeline to test its robustness.
To take things to the next level, we fully automated the CI/CD pipeline using various
triggers. Lastly, we looked at release management practices and capabilities and discussed
the need to continuous monitor the ML system. A key takeaway is that the pipeline is the
product, not the model. It is better to focus on building a robust and efficient pipeline more
than building the best model.
In the next chapter, we will explore the MLOps workflow monitoring module and learn
more about the game-changing explainable monitoring framework.
Section 3:
Monitoring Machine
Learning Models in
Production
In this part, readers will get acquainted with the principles and processes of monitoring
machine learning systems in production. It will enable them to craft CI/CD pipelines to
monitor deployments and equip them to set up and facilitate continuous delivery and
continuous monitoring of machine learning models.
This section comprises the following chapters:
Model drift
We live in a dynamically changing world. Due to this, the environment and data in which
an ML model is deployed to perform a task or make predictions is continually evolving,
and it is essential to consider this change. For example, the COVID-19 pandemic has
presented us with an unanticipated reality. Many business operations have turned virtual,
and this pandemic has presented us with a unique situation that many perceive as the
new normal. Many small businesses have gone bankrupt, and individuals are facing
extreme financial scarcity due to the rise of unemployment. These people (small business
owners and individuals) have been applying for loans and financial reliefs to banks and
institutions like never before (on a large scale). Fraud detection algorithms that have
already been deployed and used by banks and institutions have not seen this velocity and
veracity of data, in terms of loan and financial relief applications.
All these changes in features (such as the applicant's income, their credit history, the
location of the applicant, the amount they've requested, and so on), due to an otherwise
loan-worthy applicant who hasn't applied for any loan beforehand losing their job, may
skew the model's weights/perceptive (or confuse the model). This presents an important
challenge for the models. To deal with such dynamically changing environments, it is
crucial to consider model drift and continually learn from it.
Drift is related to changes in the environment and refers to the degradation of predictive
ML models' performance and the relationship between the variables degrading. Following
are the four types of model changes with regards to models and data:
• Data drift: This is where properties of the independent variables change. For
example, as in the previous example, data changes due to seasonality or new
products or changes being added to meet the consumer's needs, as in the COVID-
19 pandemic.
• Feature drift: This is where the properties of the feature(s) change over time. For
instance, temperature changes with change in seasons. In winter the temperature is
cooler compared to the temperatures in summer or autumn.
• Model drift: This is where properties of dependent variables change. For instance,
in the preceding example, this is where the classification of fraud detection changes.
• Upstream data changes: This is when the data pipeline undergoes operational data
changes, such as when a feature is no longer being generated, resulting in missing
values. An example of this is a change of salary value for the customer (from dollar
to euros), where a dollar value is no longer being generated.
For more clarity, we will learn more about drift and develop drift monitors in the next
chapter (Chapter 12, Model Serving and Monitoring).
258 Key Principles for Monitoring Your ML System
Model bias
Whether you like it or not, ML is already impacting many decisions in your life, such
as getting shortlisted for your next job or getting mortgage approvals from banks.
Even Evan law enforcement agencies are using it to drill down potential crime suspects
to prevent crimes. ProPublica, a journalism organization (uses ML to predict future
criminals - https://www.propublica.org/article/machine-bias-risk-
assessments-in-criminal-sentencing). In 2016, Propublica’s ML showed cases
where the model was biased to predict black women as higher risk than white men, while
all previous records showed otherwise. Such cases can be costly and have devastating
societal impacts, so they need to be avoided. In another case, Amazon built an AI to hire
people but had to shut it down as it was discriminating against women (as reported by the
Washington Post). These kinds of biases can be costly and unethical. To avoid them, AI
systems need to be monitored so that we can build our trust in them.
Model bias is a type of error that happens due to certain features of the dataset (used
for model training) being more heavily represented and/or weighted than others. A
misrepresenting or biased dataset can result in skewed outcomes for the model's use
case, low accuracy levels, and analytical errors. In other words, it is the error resulting
from incorrect assumptions being made by the ML algorithm. High bias can result
in predictions being inaccurate and can cause a model to miss relevant relationships
between the features and the target variable being predicted. An example of this is the
aforementioned AI that had been built by Amazon to hire people but had a bias against
women. We will learn more about model bias in the Explainable Monitoring Framework
section, where we will explore Bias and threat detection.
Model transparency
AI is non-deterministic in nature. ML in particular continually evolves, updates, and
retrains over its life cycle. AI is impacting almost all industries and sectors. With its
increasing adoption and important decisions being made using ML, it has become vital
to establish the same trust level that deterministic systems have. After all, digital systems
are only useful when they can be trusted to do their jobs. There is a clear need for model
transparency – many CEO and business leaders are encouraging us to understand AI's
business decisions and their business impact. Recently, the CEO of TikTok made a
statement stating the following:
Model compliance
Model compliance has become important as the cost of non-compliance with
governments and society can be huge. The following headline was reported by the
Washington Post:
Explainable AI
In an ideal case, a business keeps model transparency and compliance at the forefront so that
the business is dynamically adapting to the changing environment, such as model drift, and
dealing with bias on the go. All this needs a framework that keeps all business stakeholders
(IT and business leaders, regulators, business users, and so on) in touch with the AI model in
order to understand the decisions the model is making, while focusing on increasing model
transparency and compliance. Such a framework can be delivered using Explainable AI as part
of MLOps. Explainable AI enables ML to be easily understood by humans.
Model transparency and explainability are two approaches that enable Explainable AI. The
ML models form patterns or rules based on the data they are trained on. Explainable AI
can help humans or business stakeholders understand these rules or patterns the model
has discovered, and also helps validate business decisions that have been made by the ML
model. Ideally, Explainable AI should be able to serve multiple business stakeholders, as
shown in the following diagram:
Explainable AI methods infused with MLOps can enable almost all business stakeholders
to understand and validate business decisions made by the AI, and also helps explain them
to the internal and external stakeholders. There is no one-stop solution for Explainable AI
as every use case needs its own Explainable AI method. There are various methods that
are gaining in popularity. We'll look at some examples in the following subsections.
In the preceding diagram, a model is predicting that a patient has diabetes. The LIME
explainer highlights and implies the symptoms of diabetes, such as dry skin, excessive
urine, and blurry vision, that contribute to the Diabetes prediction, while No fatigue is
evidence against it. Using the explainer, a doctor can decide and draw conclusions from
the model's prediction and provide the patient with the appropriate treatment.
You can learn more about AI explanations at https://cloud.google.com/
ai-platform/prediction/docs/ai-explanations/overview.
There are other methods apart from the ones mentioned in the previous list. This area is a
hot topic for research in the field of AI. Many researchers and business leaders are pursuing
solving Explainable AI problems to explain model decisions to internal and external
stakeholders. Having an Explainable AI-driven interface provisioned for multiple business
stakeholders can help them answer critical business questions. For instance, a business
leader needs to be able to answer, "How do these model decisions impact business?" while
for IT and Operations, it is vital to know the answer to "How do I monitor and debug?"
Monitoring in the MLOps workflow 263
Answering these questions for multiple business stakeholders enables employees and
businesses to adapt to AI and maximize value from it, by ensuring model transparency
and model compliance while adapting to changing environments by optimizing model
bias and drift.
Monitor
The monitor module is dedicated to monitoring the application in production (serving the
ML model). Several factors are at play in an ML system, such as application performance
(telemetry data, throughput, server request time, failed requests, error handling, and so
on), data integrity and model drift, and changing environments. The monitor module
should capture vital information from the system logs in production to track the ML
system's robustness. Let's look at the importance and functionality of three of the monitor
module's functionalities: data integrity, model drift, and application performance.
Data integrity
Ensuring the data integrity of an ML application includes checking incoming (input
data to the ML model) and outgoing (ML model prediction) data to ensure ML systems'
integrity and robustness. The monitor module ensures data integrity by inspecting the
volume, variety, veracity, and velocity of the data in order to detect outliers or anomalies.
Detecting outliers or anomalies prevents ML systems from having poor performance
and being susceptible to security attacks (for example, adversarial attacks). Data integrity
coupled with efficient auditing can facilitate the desired performance of ML systems to
derive business value.
266 Key Principles for Monitoring Your ML System
Model drift
If model drift is not measured, the model's performance can easily become sub-par and
can hamper the business with poor decision making and customer service. For example, it
is hard to foresee changes or trends in data during a black swan event such as COVID-19.
Here is some news that made it to the headlines:
Hence, it is important to monitor model drift in any form, such as data drift, concept drift,
or any upstream data changes, in order to adapt to the changing environments and serve
businesses and customers in the most relevant way and generate the maximum business value.
Application performance
It is critical to monitor application performance to foresee and prevent any potential
failures, since this ensures the robustness of ML systems. Here, we can monitor the
critical system logs and telemetry data of the production deployment target (for example,
Kubernetes or an on-premises server). Monitoring application performance can give us
key insights in real time, such as the server's throughput, latency, server request time,
number of failure requests or control flow errors, and so on. There is no hard and set way
of monitoring applications and, depending on your business use case, your application
performance mechanism can be curated and monitored to keep the system up and
running to generate business value.
Understanding the Explainable Monitoring Framework 267
In terms of the monitor component, we monitored data integrity, model drift, and
application performance. In the next section, we will analyze how to monitor the data of
the model and application.
Analyze
Analyzing your ML system in production in real time is key to understanding the
performance of your ML system and ensuring its robustness. Humans play a key role in
analyzing model performance and detecting subtle anomalies and threats. Hence, having a
human in the loop can introduce great transparency and explainability to the ML system.
We can analyze model performance to detect any biases or threats and to understand
why the model makes decisions in a certain pattern. We can do this by applying
advanced techniques such as data slicing, adversarial attack prevention techniques, or by
understanding local and global explanations. Let's see how we can do this in practice.
Data slicing
There are a great number of success stories surrounding ML in terms of improving
businesses and life in general. However, there is still room to improve data tools for
debugging and interpreting models. One key area of improvement is understanding why
models perform poorly on certain parts or slices of data and how we can balance their
overall performance. A slice is a part or a subset of a dataset. Data slicing can help us
understand the model's performance on different types of sub-datasets. We can split the
dataset into multiple slices or subsets and study the model's behavior on them.
268 Key Principles for Monitoring Your ML System
For example, let's consider a hypothetical case where we have trained a random forest
model to classify whether a person's income is above or below $50,000. The model has been
trained on the UCI census data (https://archive.ics.uci.edu/ml/datasets/
Census-Income+%28KDD%29). The results of the model for slices (or subsets) of data can
be seen in the following table. This table suggests that the overall metrics may be considered
acceptable as the overall log loss is low for all the data (see the All row). This is a widely
used loss metric for binary classification problems and represents how close the prediction's
likelihood is to the actual/true value; it is 0 or 1 in the case of binary classification. The more
the predicted probability diverges from the actual value, the higher the log loss value is.
However, the individual slices tell a different story:
Data slicing enables us to see subtle biases and unseen correlations to understand why a
model might perform poorly on a subset of data. We can avoid these biases and improve
the model's overall performance by training the model using balanced datasets that
represent all the data slices (for example, using synthetic data or by undersampling, and so
on) or by tuning hyperparameters of the models to reduce overall biases. Data slicing can
provide an overview of model fairness and performance for an ML system, and can also
help an organization optimize the data and ML models to reach optimal performance and
decent fairness thresholds. Data slicing can help build trust in the AI system by offering
transparency and explainability into data and model performance.
Note
To get a comprehensive overview of data slicing and automated data slicing
methods, take a look at Automated Data Slicing for Model Validation:
A Big data - AI Integration Approach at https://arxiv.org/
pdf/1807.06068.pdf.
Figure 11.7 – Global explanation of the RNN model (using RNNVis) to understand the process as a
whole (how hidden states, layers, and so on impact model outputs and predictive processes)
Source: https://blog.acolyer.org/2019/02/25/understanding-hidden-
memories-of-recurrent-neural-networks/
Understanding the Explainable Monitoring Framework 271
Here, for example, the co-clustering visualization shows different word clouds for words
with positive and negative sentiments. Using global explanations, we can simulate the
model's predictive process and understand correlations with regards to parameters or
the model's architecture (for example, hidden states and layers). Global explanations
offer two perspectives of explainability: the high-level model process and the predictive
explanations. On the other hand, local explanations give insights into single predictions.
Both explanations are valuable if we wish to understand the model's performance and
validate it comprehensively.
In the analyze component, we can analyze the model's performance using the techniques
we have explored, such as data slicing, Bias and threat detection, and local and global
explanations. In the next section, we will learn how to govern and control ML systems to
efficiently guide it to achieve operational or business objectives.
Govern
The ML systems' efficacy is dependent on the way it is governed to achieve maximum
business value. A great part of system governance involves quality assurance and control,
as well as model auditing and reporting, to ensure it has end-to-end trackability and
complies with regulations. Based on monitoring and analyzing the model's performance,
we can control and govern ML systems. Governance is driven by smart alerts and
actions to maximize business value. Let's look into how alerts and actions, model quality
assurance and control, and model auditing and reports orchestrate the ML system's
governance.
Model performance alerts are generated when the model experiences drift or anomalous
feature distribution or bias. When such events are recorded, the system administrator or
developers are alerted via email, SMS, push notifications, and voice alerting. These alert
actions (automated or semi-automated) can be used to mitigate system performance
deterioration. Depending on the situation and need, some possible actions can be
evoked, such as the following:
• Understand and validate the data's statistical relations (for example, mean, median,
mode, and so on) for training, testing, and inferring data.
Understanding the Explainable Monitoring Framework 273
Model assessments based on auditing and reporting will ensure healthy, transparent, and
robust governance mechanisms for organizations and enable them to have end-to-end
traceability in order to comply with the regulators. Having such mechanisms will help save
organizations a great amount of time and resources and enable efficiency in interactions
with the regulators.
Summary
In this chapter, we learned about the key principles for monitoring an ML system. We
explored some common monitoring methods and the Explainable Monitoring Framework
(including the monitor, analyze, and govern stages). We then explored the concepts of
Explainable Monitoring thoroughly.
In the next chapter, we will delve into a hands-on implementation of the Explainable
Monitoring Framework. Using this, we will build a monitoring pipeline in order to
continuously monitor the ML system in production for the business use case (predicting
weather at the port of Turku).
The next chapter is quite hands-on, so buckle up and get ready!
12
Model Serving and
Monitoring
In this chapter, we will reflect on the need to serve and monitor machine learning (ML)
models in production and explore different means of serving ML models for users or
consumers of the model. Then, we will revisit the Explainable Monitoring framework
from Chapter 11, Key Principles for Monitoring Your ML System, and implement it for
the business use case we have been solving using MLOps to predict the weather. The
implementation of an Explainable Monitoring framework is hands-on. We will infer the
deployed API and monitor and analyze the inference data using drifts (such as data drift,
feature drift, and model drift) to measure the performance of an ML system. Finally, we
will look at several concepts to govern ML systems for the robust performance of ML
systems to drive continuous learning and delivery.
Let's start by reflecting on the need to monitor ML in production. Then, we will move on
to explore the following topics in this chapter:
In a typical scenario (in on-demand mode), a model is served as a service for users to
consume, as shown in Figure 12.2. Then, an external application on a machine or a human
makes a query to the prediction or ML service using their data. The ML service, upon
receiving a request, uses a load balancer to route the request to an available resource
(such as a container or an application) within the ML application. The load balancer also
manages resources within the ML service to orchestrate and generate new containers or
resources on demand. The load balance redirects the query from the user to the model
running in a container within the ML application to get the prediction. On getting the
prediction, the load balance reverts back to the external application on a machine, or to
a human who is making the request, or to the query within the model prediction. In this
way, the ML service is able to serve its users. The ML system orchestrates with the model
store or registry to keep itself updated with either the latest or best-performing models in
order to serve the users in the best manner. In comparison to this typical scenario where
users make a query, there is another use case where the model is served as a batch service.
One of the key advantages of batch processing is that unlike a REST API-based service, a
batch service might require lighter or less infrastructure. Writing a batch job is easier for a
data scientist compared to deploying an online REST service. This is because the data scientist
just needs to train a model or deserialize a trained model on a machine and perform batch
inference on a batch of data. The results of batch inference can be stored in a database as
opposed to sending responses to users or consumers. However, one major disadvantage is the
high latency and it not being in real time. Typically, a batch service can process hundreds or
thousands of features at once. A series of tests can be used to determine the optimal batch size
to arrive at an acceptable latency for the use case. Typical batch sizes can be 32, 64, 128, or 518
to the power of 2. Batch inference can be scheduled periodically and can serve many use cases
where latency is not an issue. One such example is discussed next.
A real-world example
One real-world example is a bank extracting information from batches of
text documents. A bank receives thousands of documents a day from its
partner institutions. It is not possible for a human agent to read through all
of them and highlight any red flags in the operations listed in the documents.
Batch inferencing is used to extract name entities and red flags from all the
documents received by the bank in one go. The results of the batch inference or
serving are then stored in a database.
A real-world example
Consider a chatbot serving human customers to book flight tickets. It performs
contextual inference to serve human users.
A real-world example
Consider a social media company that has millions of users. The company uses
a single or common ML model for the recommender system to recommend
newsfeed articles or posts to users. As the volume of requests is high in order
to serve many users, it cannot depend on a REST API-based ML system (as
it is synchronous). A streaming solution is better as it provides asynchronous
inference for the company to serve its users. When a user logs into their
application or account hosted on a machine (such as a social media company
server), the application running on their machine infers the ML model (that is,
the recommender system) via a streaming service for recommendations for the
user newsfeed. Likewise, thousands of other users log in at the same time. The
streaming service can serve all of these users seamlessly. Note that this wouldn't
have been possible with the REST API service. By using a streaming service for
the recommender system model, the social media company is able to serve its
high volume of users in real time, avoiding significant lags.
• Problem context: You work as a data scientist in a small team with three other data
scientists for a cargo shipping company based in the port of Turku in Finland. 90%
of the goods imported into Finland arrive via cargo shipping at various ports across
the country. For cargo shipping, weather conditions and logistics can be challenging
at times. Rainy conditions can distort operations and logistics at the ports, which
can affect supply chain operations. Forecasting rainy conditions in advance
allows us to optimize resources such as human resources, logistics, and transport
resources for efficient supply chain operations at ports. Business-wise, forecasting
rainy conditions in advance enables ports to reduce their operational costs by up
to approximately 20% by enabling the efficient planning and scheduling of human
resources, logistics, and transport resources for supply chain operations.
282 Model Serving and Monitoring
So far, we have developed ML models and deployed them as REST API endpoints inside a
Kubernetes cluster at http://20.82.202.164:80/api/v1/service/weather-
prod-service/score (the address of your endpoint will be different).
Next, we will replicate a real-life inference scenario for this endpoint. To do this, we will use
the test dataset we had split and registered in Chapter 4, Machine Learning Pipelines, in the
Data ingestion and feature engineering section. Go to your Azure ML workspace and download
the test_data.csv dataset (which was registered as test_dataset) from the Datasets
section or the Blob storage that is connected to your workspace, as shown in Figure 12.5:
Figure 12.5 – Downloading the validation dataset (which was previously split and registered)
Implementing the Explainable Monitoring framework 283
Get ready to infer the test_data.csv data with the REST API endpoint or ML service.
Go to the 12_Model_Serving_Monitoring folder and place the downloaded dataset
(test_data.csv) inside the folder. Next, access the inference. py file:
import json
import requests
import pandas as pd
data = pd.read_csv('test_data.csv')
data = data.drop(columns=['Timestamp', 'Location', 'Future_
weather_condition'])
url = 'http://20.82.202.164:80/api/v1/service/weather-prod-
service/score'
headers = {'Content-Type':'application/json'}
for I in range(len(data)):
inference_data = data.values[i].tolist()
inference_data = json.dumps(""dat"": [inference_
data]})
r = requests.post(url, data=inference_data,
headers=headers)
print(str(i)+str(r.content))
5. Finally, we will loop through the data array by inferring each element in the array
with the endpoint. To run the script, simply replace 'url' with your endpoint and
run the following command in the Terminal (from the folder location) to execute
the script:
>> python3 inference.py
The running script will take around 10–15 minutes to infer all of the elements of the
inference data. After this, we can monitor the inference and analyze the results of the
inferring data. Let's monitor and analyze this starting with data integrity.
• Data integrity:
-To register the target dataset
-To create a data drift monitor
-To perform data drift analysis
-To perform feature drift analysis
• Model drift
• Application performance
Data integrity
To monitor data integrity for inference data, we need to monitor data drift and feature drift
to see whether there are any anomalous changes in the incoming data or any new patterns:
• Data drift: This is when the properties of the independent variables change. For
example, data changes can occur due to seasonality or the addition of new products
or changes in consumer desires or habits, as it did during the COVID-19 pandemic.
Implementing the Explainable Monitoring framework 285
• Feature drift: This is when properties of the feature(s) change over time. For example,
the temperature is changing due to changing seasons or seasonality, that is, in summer,
the temperature is warmer compared to temperatures during winter or autumn.
To monitor drifts, we will measure the difference for the baseline dataset versus the target
dataset. The first step is to define the baseline dataset and the target dataset. This depends
on use case to use case; we will use the following datasets as the baseline and target
datasets:
We will use the training dataset that we previously used to train our models as the baseline
dataset. This is because the model used in inference knows the patterns in the training
dataset very well. The training dataset is ideal for comparing how inference data changes
over time. We will compile all the inference data collected during inference into the
inference dataset and compare these two datasets (that is, the baseline dataset and the
target dataset) to gauge data and feature drifts for the target dataset.
1. Go to the Datasets section and click on Create dataset. Then, select the From
datastore option, as shown in Figure 12.6:
3. After selecting the modeldata datastore, you will be prompted to mention the
path of the file(s). Click on the Browse button to specify the path. You will be
presented with a list of files in your modeldata datastore. Go to the location where
you can spot an input.csv file. You can find this in the folder of your support
vectorclassifier model, which is inside the folder with your service name
(for example, prod-webservice). Then, go into the subfolders (the default,
inputs, and folders structured with dates), and go to the folder of your current date
to find the input.csv file. Select the input.csv file, as shown in Figure 12.8:
Figure 12.8 – Selecting path of the input.csv file (the Inputs-Inference-data registration)
Implementing the Explainable Monitoring framework 289
4. After selecting the input.csv file, click on the Save button and change the last
part to include /**/inputs*.csv (as shown in Figure 12.9). This is an important
step that will refer to all of the input.csv files in the inputs folder dynamically.
Without referencing all of the input.csv files, we will confine the path to only
one input.csv file (which was selected previously in Figure 12.8). By referring to
all of the input.csv files, we will compile all of the input data (the inputs.csv
files) into the target dataset (for example, Inputs-Inference-Data):
Figure 12.9 – Referencing the path to dynamically access all the input.csv files
290 Model Serving and Monitoring
As shown in Figure 12.10, we can configure the settings and preview the dataset.
Point to the correct column names by selecting the Column headers dropdown and
then selecting Combine headers from all files. Check for the correct column names
(for example, Temperature_C and Humidity). After selecting the appropriate
column names, click on the Next button to advance to the next window. Select the
right schema by selecting all the columns you would like to monitor, along with
their data types, as shown in Figure 12.11:
Make sure that you select the Timestamp and Date properties in the $aml_dc_
scoring_timestamp column as these contain the timestamps of the inference. This
step is important. Only a time-series format dataset can be used to compute drift
(by the Azure drift model); otherwise, we cannot compute drift. After selecting
the right schema by selecting all of the columns, click on Next to confirm all of the
necessary details (such as the name of the dataset, the dataset version, its path, and
more).
6. Click on the Create button to create the dataset. When your dataset has been
created successfully, you can view the dataset from the Dataset section in your
Azure ML workspace. Go to the Datasets section to confirm your dataset has been
created. Identify and click on your created dataset. Upon clicking, you will be able to
view the details of your registered inference dataset, as shown in Figure 12.12:
You can see all the essential attributes of your registered dataset in Figure 12.12. It is
important to note that the relative path is dynamic and it points to the referencing all
of the input.csv files. The result of referencing all of the input files is shown in Files
in dataset. This will show multiple files in the long run (that is, it shows 6 files after 6
days of registering the dataset). It might be 1 file for you, as you have only just registered
the dataset. With the passing of days or time, the number of input.csv files will keep
increasing as a new input.csv file is created in the datastore in Blob storage each day.
Congratulations on registering the inference data. Next, we will configure the data drift
monitor.
1. Go to your workspace and access the Datasets section. Then, select Dataset
Monitors (it is in preview mode at the moment, as this feature is still being tested).
Click on Create, as shown in Figure 12.13:
2. Upon selecting the Create button, you will be prompted to create a new data drift
monitor. Select the target dataset of your choice.
3. In the Registering the target dataset section, we registered the inputs.csv files as
Input-InferenceData. Select your inference dataset as the target dataset, as
shown in Figure 12.14:
Figure 12.15 – Select the baseline dataset and configure the monitor settings
5. After selecting the baseline dataset, you will be prompted to set up monitor settings,
such as the name of data drift monitor (for example, weather-Data-Drift), the
compute target to run data drift jobs, the frequency of data drift jobs (for example,
once a day), and the threshold for monitoring drift (for example, 60). You will also
be asked to give an email of your choice to receive notifications when the data drift
surpasses a set threshold.
296 Model Serving and Monitoring
6. After configuring the settings, create a data drift monitor. Go to your newly created
data drift (in the Datasets section, click on Dataset Monitors to view your drift
monitors), as shown in Figure 12.16:
7. Go to the Compute section, access the Compute clusters tab, and create a new
compute resource (for example, drift-compute – Standard_DS_V2 machine), as
shown in Figure 12.17:
9. Click on Analyze existing data and submit a run to analyze any existing inference
data, as shown in Figure 12.18:
• Feature drift analysis: You can assess individual features and their drift by scrolling
down to the Feature details section and selecting a feature of your choice. For
example, we can see the Temperature_C distribution over time feature, as shown in
Figure 12.20:
Model drift
Monitoring model drift enables us to keep a check on our model performance in
production. Model drift is where the properties of dependent variables change. For
example, in our case, this is the classification results of the weather (that is, rain or no
rain). Just as we set up data drift in the Creating the data drift monitor section, we can also
set up a model drift monitor to monitor model outputs. Here are the high-level steps to set
up model drift:
Application performance
You have deployed the ML service in the form of REST API endpoints, which can be
consumed by users. We can monitor these endpoints using Azure Application Insights
(enabled by Azure Monitor). To monitor our application performance, access the
Application Insights dashboard, as shown in Figure 12.23. Go to the Endpoints section
in your Azure ML service workspace and select the REST API endpoint your ML model
is deployed on. Click on Application Insights url to access the Application Insights
endpoint connected to your REST API endpoint:
From the Application Insights Overview section, we can monitor and analyze critical
application performance information for your ML service. Additionally, we can monitor
information such as failed requests, server response time, server requests, and availability
from the Overview section, as shown in Figure 12.24:
Based on these metrics and this information, we can monitor the application
performance. Ideally, we should not have any failed requests or long server response times.
To get deeper insights into the application performance, we can access the application
dashboard (by clicking on the button at the top of the screen), as shown in Figure 12.25:
Data slicing
For our use case, we will leave out data slicing as we do not have much variety in terms
of demographics or samples within the data (for example, sex, age groups, and more). To
measure the fairness of the model, we will focus on bias detection.
308 Model Serving and Monitoring
Summary
In this chapter, we learned about the key principles of serving ML models to our users
and monitoring them to achieve maximized business value. We explored the different
means of serving ML models for users or consumers of the model and implemented the
Explainable Monitoring framework for a hypothetical business use case and deployed
a model. We carried out this hands-on implementation of an Explainable Monitoring
framework to measure the performance of ML systems. Finally, we discussed the need for
governing ML systems to ensure the robust performance of ML systems.
We will further explore the governance of ML systems and continual learning concepts in
the next and final chapter!
13
Governing the
ML System for
Continual Learning
In this chapter, we will reflect on the need for continual learning in machine learning
(ML) solutions. Adaptation is at the core of machine intelligence. The better the
adaptation, the better the system. Continual learning focuses on the external environment
and adapts to it. Enabling continual learning for an ML system can reap great benefits. We
will look at what is needed to successfully govern an ML system as we explore continuous
learning and study the governance component of the Explainable Monitoring Framework,
which helps us control and govern ML systems to achieve maximum value.
We will delve into the hands-on implementation of governance by enabling alert and
action features. Next, we will look into ways of assuring quality for models and controlling
deployments, and we'll learn the best practices to generate model audits and reports. Lastly,
we will learn about methods to enable model retraining and maintain CI/CD pipelines.
312 Governing the ML System for Continual Learning
Let's start by reflecting on the need for continual learning and go on to explore the
following topics in the chapter:
Continual learning
Continual learning is built on the principle of continuously learning from data, human
experts, and the external environment. Continual learning enables lifelong learning, with
adaptation at its core. It enables ML systems to become intelligent over time to adapt to
the task at hand. It does this by monitoring and learning from the environment and the
human experts assisting the ML system. Continual learning can be a powerful add-on to
an ML system. It can allow you to realize the maximum potential of an AI system over
time. Continual learning is highly recommended. Let's have a look at an example:
Figure 13.1 – A loan issuing scenario – a traditional system versus an ML system assisted by a human
Understanding the need for continual learning 313
• Adaptation: In most straightforward applications, data drift might stay the same as
data keeps coming in. However, many applications have dynamically changed data
drifts, such as recommendation or anomaly detection systems, where data keeps
flowing. In these cases, continually learning is important for adapting and being
accurate with predictions. Hence, adapting to the changing nature of data and the
environment is important.
• Scalability: A white paper published by IDC (https://www.seagate.com/
files/www-content/our-story/trends/files/idc-seagate-
dataage-whitepaper.pdf) suggests that by 2025, the rate of data generation
will grow to 160 ZB/year, and that we will not be able to store all of it. The paper
predicts that we will only be able to store between 3% and 12%. Data needs to be
processed on the fly; otherwise, it will be lost since the storage infrastructure cannot
keep up with the data that is produced. The main trick here is to process incoming
data once, store only the essential information, and then get rid of the rest.
• Relevance: Predictions from ML systems need to be relevant and need to adapt to
changing contexts. Continual learning is needed to keep the ML systems highly
relevant to and valuable in the changing contexts and environments.
• Performance: Continual learning will enable high performance for the
ML system, since it powers the ML system to be relevant by adapting to the
changing data and environment. In other words, being more relevant will improve
the performance of the ML system, for example, in terms of accuracy or other
metrics, by providing more meaningful or valuable predictions.
What is an alert?
An alert is a scheduled task running in the background to monitor an application to check
if specific conditions are being detected. An alert is driven by three things:
We can create alerts based on application performance to monitor aspects such as the following:
An important area of governing ML systems is dealing with errors, so let's turn our
attention to error handling.
These edge cases or errors are common and can be addressed in the application by using
try and exception techniques. The strategy is to mitigate situations where your ML
system will look very basic or naive for the user; for example, a chatbot that sends error
messages in the chats. In such cases, the cost of errors is high, and the users will lose trust
in the ML system.
We will implement some custom exception and error handling for the business use case
we have been implementing and implement actions based on the alerts that are generated.
Let's get started:
2. This Instrumentation Key can be accessed from your Application Insights, which
should be connected to the ML application, as shown in the following screenshot:
str(e)},
{'ErrorCode': 101})
model_onnx = os.path.join(os.getenv('AZUREML_MODEL_
DIR'),
'support-vector-classifier/2/svc.onnx')
try:
model = onnxruntime.InferenceSession(model_onnx,
None)
except Exception as e:
tc.track_event('FileNotFound', {'error_message':
str(e)},
{'ErrorCode': 101})
input_name = model.get_inputs()[0].name
label_name = model.get_outputs()[0].name
# variables to monitor model input and output data
inputs_dc = ModelDataCollector("Support vector
classifier model", designation="inputs", feature_
names=["Temperature_C", "Humidity", "Wind_speed_kmph",
"Wind_bearing_degrees", "Visibility_km", "Pressure_
millibars", "Current_weather_condition"])
prediction_dc = ModelDataCollector("Support vector
classifier model", designation="predictions", feature_
names=["Future_weather_condition"])
Two custom events are tracked in the init function to monitor whether a
FileNotFound error occurs when we load the scaler and model artifacts. If
a file is not found, the tc.track_events() function will log the error message
that's generated by the exception and tag the custom code 101.
4. Likewise, we will implement some other custom events – that is, ValueNotFound,
OutofBoundsException, and InferenceError – in the run function:
@input_schema('data', NumpyParameterType(np.
array([[34.927778, 0.24, 7.3899, 83, 16.1000, 1016.51,
1]])))
@output_schema(NumpyParameterType(np.array([0])))
def run(data):
try:
322 Governing the ML System for Continual Learning
inputs_dc.collect(data)
except Exception as e:
tc.track_event('ValueNotFound', {'error_
message': str(e)},
{'ErrorCode': 201})
We use try and except to collect incoming data using the model data collector
function. This collects the incoming data and stores it in the blob storage
connected to the Azure ML service. If the incoming data contains some anomalous
data or a missing value, an exception is raised. We will raise a ValueNotFound
error using the track_event function so that we can log the exception message
and custom code (in this case, a random or custom number of 201 is given to track
the error). After collecting the incoming data, we will attempt to scale the data
before inference:
try:
# scale incoming data
data = scaler.transform(data)
except Exception as e:
tc.track_event('ScalingException',
{'ScalingError': str(e)},
{'ErrorCode': 301})
Now, we will use try and except again to make sure the inference is successful
without any exceptions. Let's see how we can handle exceptions in this case. Note
that we are accessing element number 2 for the model.run function. This causes
an error in the model's inference as we are referring to an incorrect or nonexistent
element of the list:
try:
# model inference
result = model.run([label_name], {input_
name:
data.astype(np.float32)})[2]
output = 'error'
return output
In case an exception occurs when the model is being inferenced, we can use the
track_event() function to generate a custom event called InferenceError.
This will be logged on Application Insights with an error message and a custom
error code of 401. This way, we can log custom errors and exceptions on
Application Insights and generate actions based on these errors and exceptions.
Now, let's look at how to investigate these errors in Application Insights using the error
logs and generate actions for it.
324 Governing the ML System for Continual Learning
Setting up actions
We can set up alerts and actions based on the exception events that we created previously
(in the Dealing with errors section). In this section, we will set up an action in the form of
an email notification based on an alert that we've generated. Whenever an exception or
alert is generated in Application Insights, we will be notified via an email. Then, we can
investigate and solve it.
Let's set up an action (email) upon receiving an alert by going to Application Insights,
which should be connected to your ML system endpoint. You can access Application
Insights via your Azure ML workspace. Let's get started:
1. Go to Endpoints and check for Application Insights. Once you've accessed the
Application Insights dashboard, click on Transaction search, as shown in the
following screenshot, to check for your custom event logs (for example, inference
exception):
2. You can check for custom events that have been generated upon exceptions and
errors occurs via the logs, and then set up alerts and actions for these custom
events. To set up an alert and action, go to the Monitoring > Alerts section and
click on New alert rule, as shown in the following screenshot:
3. Here, you can create conditions for actions based on alerting. To set up a condition,
click on Add condition. You will be presented with a list of signals or log events
you can use to make conditions. Select InferenceError, as shown in the following
screenshot:
4. After selecting the signal or event of your choice, you will get to configure its
condition logic, as shown in the following screenshot. Configure the condition
by setting up a threshold for it. In this case, we will provide a threshold of 400 as
the error raises a value of 401 (since we had provided a custom value of 401 for
the InferenceError event). When an inference exception occurs, it raises an
InferenceError with a value above 400 (401, to be precise):
5. After setting up the threshold, you will be asked to configure other actions, such
as running an Automation Runbook, Azure Function, Logic App, or Secure
Webhook, as shown in the following screenshot. For now, we will not prompt these
actions, but it is good to know that we have them since we can run some scripts or
applications as a backup mechanism to automate error debugging:
6. Finally, we will create a condition. Click Review and create to create the condition,
as shown in the preceding screenshot. Once you have created this condition, you
will see it in the Create alert rule panel, as shown in the following screenshot. Next,
set up an action by clicking on Add action groups and then Create action group:
7. Provide an email address so that you can receive notifications, as shown in the
following screenshot. Here, you can name your notification (in the Alert rule name
field) and provide the necessary information to set up an email alert action:
After providing all the necessary information, including an email, click on the Review +
Create button to configure the action (an email based on an error). Finally, provide alert
rule details such as Alert rule name, Description, and Severity, as shown in the following
screenshot:
8. Click on Create alert rule to create an email alert based on the error (for example,
InferenceError). With that, you have created an alert, so now it's time to test it.
Go to the 13_Govenance_Continual_Learning folder and access the test_
inference.py script (replace the URL with your endpoint link). Then, run the
script by running the following command:
python3 test_inference.py
9. Running the script will output an error. Stop the script after performing some
inferences. Within 5-10 minutes of the error, you will be notified of the error via
email, as shown in the following screenshot:
Next, let's look at how to ensure we have quality assurance for models and can control
them in order to maximize business value.
• Enable continual learning by retraining your models and evaluating their performance.
• Evaluate the performance of all the models on a new dataset at periodic intervals.
• Raise an alert when an alternative model starts giving better performance or greater
accuracy than the existing model.
• Maintain a registry of models containing their latest performance details and reports.
• Maintain end-to-end lineages of all the models to reproduce them or explain their
performance to stakeholders.
Data audit
Data is what drives many of the decisions made by ML systems. Due to this, the auditors
need to consider data for auditing and reporting, inspecting the training data, testing the
data, inferring the data, and monitoring the data. This is essential and having end-to-end
traceability to track the use of data (for example, which dataset was used to train which
model) is needed for MLOps. Having a Git for Data type of mechanism that versions data
can enable auditors to reference, examine, and document the data.
• If a build is broken, a fix it asap policy from the team should be implemented.
• Integrate automated acceptance tests.
• Require pull requests.
• Peer code review each story or feature.
• Audit system logs and events periodically (recommended).
• Regularly report metrics visibly to all the team members (for example, slackbot or
email notifications).
By implementing these practices, we can avoid high failure rates and make the CI/CD
pipeline robust, scalable, and transparent for all the team members.
338 Governing the ML System for Continual Learning
Summary
In this chapter, we learned about the key principles of continual learning in ML solutions.
We learned about Explainable Monitoring (the governance component) by implementing
hands-on error handling and configuring actions to alert developers of ML systems using
email notifications. Lastly, we looked at ways to enable model retraining and how to
maintain the CI/CD pipeline. With this, you have been equipped with the critical skills to
automate and govern MLOps for your use cases.
Congratulations on finishing this book! The world of MLOps is constantly evolving for
the better. You are now equipped to help your business thrive using MLOps. I hope you
enjoyed reading and learning by completing the hands-on MLOps implementations. Go
out there and be the change you wish to see. All the best with your MLOps endeavors!
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as
well as industry leading tools to help you plan your personal development and advance
your career. For more information, please visit our website.
Why subscribe?
• Spend less time learning and more time coding with practical eBooks and Videos
from over 4,000 industry professionals
• Improve your learning with Skill Plans built especially for you
• Get a free eBook or video every month
• Fully searchable for easy access to vital information
• Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at packt.com and as a print
book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
customercare@packtpub.com for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for
a range of free newsletters, and receive exclusive discounts and offers on Packt books and
eBooks.
340 Other Books You May Enjoy
• Setup your Azure Machine Learning workspace for data experimentation and
visualization
• Perform ETL, data preparation, and feature extraction using Azure best practices
• Implement advanced feature extraction using NLP and word embeddings
• Train gradient boosted tree-ensembles, recommendation engines and deep neural
networks on Azure Machine Learning
• Use hyperparameter tuning and Azure Automated Machine Learning to optimize
your ML models
• Deploy, operate and manage your ML models at scale
342
setting up, for test environment 174 setting up, for test environment 174
Continuous Integration/Continuous deployment targets, ML inference
Deployment (CI/CD) 29, 217 in production
continuous monitoring containers 140, 141
about 252 model streaming 143
enabling, for service 274 serverless 142
convolutional neural network (CNN) 19 virtual machine 139, 140
create, read, update, delete (CRUD) 199 deploy module steps, MLOps pipeline
cross-validation 106, 107 Application Testing 21
Production Release 22
D DevOps method
evolution 10
data Dev Test stage
auditing 334 configuring 179
ingesting, for training ML models 88, 89 setting up 178
data characteristics, machine learning distributed learning 6
accuracy 64 Docker
completeness 64 microservices, developing with 206, 207
relevance 64 Docker artifacts 129
reliability 64 Docker container 129
timeliness 64 Dockerfile 129
data drift 257, 284 Docker Hub triggers 188
data preprocessing Docker images 129
about 66
data correlations 70-73
data, filtering 70-73
E
data quality assessment 66, 67 Elastic Compute Cloud 5
Future_weather_condition 70 ensemble learning 33
label encoding 68, 69 error rate 122
missing data, calibrating 68 errors
time series analysis 73 dealing with 316-323
data registering 74, 75 evasion attack 226
data versioning 74, 75 Explainable AI
deductive learning 35 about 260
deep learning explainability 263
evolution 5, 6 model transparency 263
Deeplift 262 Explainable Monitoring 263, 264
deployment pipeline Explainable Monitoring Framework
346
hybrid models
about 29, 31
K
ensemble learning 33 K-fold cross-validation 107
federated learning 34
multi-instance learning 32
multitask learning 32
L
reinforcement learning 32 label encoding 68, 69
self-supervised learning 32 large-scale data ops 38
semi-supervised learning 31 large-scale MLOps 41, 42
transfer learning 33 learning models
hybrid models, metrics about 29, 30
about 114 metrics 105
human test, versus machine test 114 learning models, types
human, versus machine test 115 supervised learning models 30
hyperparameter unsupervised learning models 30
optimizing 92 Light Gradient boosting model
(LightGBM) 118
I Local Interpretable Model-Agnostic
Explanation (LIME) 261, 262
implementation roadmap for MLOps low bias 120
based solution, phases
about 42
ML development 43
M
operations 44 Machine Learning (ML)
transition to operations 44 about 172, 192
inductive learning 35 data quality 64-66
inference ready models 130 evolution 5, 6
infrastructure training 92
evolution 4, 5 machine test
input attack 226 versus human test 114
Integrated Gradients 261 manual model
interactive reinforcement learning 36 retraining 336
interpretability metrics 104 Matthews Correlation Coefficient
(MCC) 112
J mean 119
metrics
JavaScript Object Notation (JSON) 139 defining 96
JupyterHub 60 microservices
348
163
poisoning attack, ways
Q
algorithm poisoning 225 Quality Assurance (QA) 249
dataset poisoning 225
model poisoning 225
precision 107
R
principles of source code rand index 113
management, for ML Random Forest classifier
about 60 about 94
clean code 61 testing 97
code readability 63 training 94, 95
commenting and documentation 64 Random Forest model
error handling 62 reference link 94
logging 62 rate of automation 122
modularity 60 Read and Docs
single task dedicated functions 61 URL 64
structuring 61 real-life business problem 48, 49
testing 61 recall 108
version control 62 Recall-Oriented Understudy for Gisting
production artifacts Evaluation (ROUGE) 115
registering 100, 101, 102 Receiver Operating Characteristic
production environment (ROC) 111
setting up, in CI/CD pipeline 237-242 recurrent neural network
production infrastructure (RNN) model 270
setting up 230 regret 116
production-ready pipeline reinforcement learning 32
testing 243, 244 remote procedure call (RPC) 193
production testing, methods representational state transfer
about 122 (REST) 193, 198
A/B testing 123 requirement.txt file
batch testing 123 code 206
CI/CD, testing 124, 125 resource group
shadow test 124 creating 56
stage test 124 REST API based microservices 198, 199
project audit 335 return 116
Proof of Concept (PoC) 163, 166, 195 reverse engineering 226
purity 113 reward per return 116
risk rate 122
352
T W
target datasets 285 waterfall method
test environment evolution 8
about 21 weather_api.py file
continuous integration (CI), code 202-206
setting up for 174
deployment pipeline, setting up for 174
setting up 178-184
Y
setting up, with Azure DevOps 168, 169 Yet Another Markup Language
time series analysis 73 (YAML) 149
traditional software development
challenges 11, 12
training dataset
used, for training ML models 90, 91
transductive learning 35
transfer learning 33
True Positive Rate (TPR=TP/TP+FN) 111
Turing test 115
Type I error 110
Type II error 110
U
unsupervised learning models 30
unsupervised learning models, metrics
about 112
purity 113
rand index 113
silhouette coefficient 114
upstream data changes 257
V
variables.py file
code 201, 202
variance 120
virtual machines 139