Five Principles Deploying Linux Cloud
Five Principles Deploying Linux Cloud
Five Principles Deploying Linux Cloud
Sam R. Alapati
Five Principles for
Deploying and Managing
Linux in the Cloud
With Azure
Sam R. Alapati
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Five Principles for
Deploying and Managing Linux in the Cloud, the cover image, and related trade dress
are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author, and do not represent the
publisher’s views. While the publisher and the author have used good faith efforts to
ensure that the information and instructions contained in this work are accurate, the
publisher and the author disclaim all responsibility for errors or omissions, includ‐
ing without limitation responsibility for damages resulting from the use of or reli‐
ance on this work. Use of the information and instructions contained in this work is
at your own risk. If any code samples or other technology this work contains or
describes is subject to open source licenses or the intellectual property rights of oth‐
ers, it is your responsibility to ensure that your use thereof complies with such licen‐
ses and/or rights.
This work is part of a collaboration between O’Reilly and Microsoft. See our state‐
ment of editorial independence.
978-1-492-04092-7
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
How the Cloud Is Being Used 10
Benefits of Cloud Computing 11
Types of Cloud Services: Iaas, PaaS, and Saas 12
Types of Cloud Deployments 14
Cloud-Enabling Technology 14
Cloud Computing Architectures 18
Running Linux in the Cloud: The Role of Containers 19
iii
Reference Architecture for Running a Web Application in
Multiple Regions 53
5. Principle 4: Ensure Your Linux VMs Are Secure and Backed Up. . . . . 65
Security in the Cloud 65
A Shared Responsibility Security Model in the Cloud 66
Security Concerns Due to Shared IT Resources 68
Cloud Security Tools and Mechanisms That Contribute to
Better Security 69
Disaster Recovery in the Cloud 70
Traditional DR Strategies Versus Cloud-Based Strategies 72
How the Cloud Shifts the DR Tradeoffs 75
iv | Table of Contents
Preface
v
such as Azure Migrate, offered by cloud vendors to support your
move.
High availability (through geographically disparate regions and
multiple Availability Zones) and load balancing are two of the most
common benefits offered by a cloud-based computing environment.
Azure Virtual Machine Scale Sets (VMSSs) provide both high availa‐
bility and scalability, and they support automatic scaling of server
capacity based on performance metrics. Caching strategies and con‐
tent delivery networks (CDNs) enhance the scalability of web appli‐
cations in the cloud. You can adopt technology like Azure Storage
replication to achieve high availability and durability.
Monitoring server and application health and performance in the
cloud can pose many problems, as compared to traditional systems
monitoring. Application performance monitoring is usually a key
component of your overall efforts in this regard. Dynamic resource
allocation means you have less visibility into how resources are
being utilized in the cloud. To get a meaningful, unified view of your
cloud infrastructure, you may need to reach beyond cloud vendor–
offered tools, such as Amazon CloudWatch, Microsoft Azure Moni‐
tor, and Google Stackdriver. There are several excellent third-party
tools (Datadog, for example) that you can effectively integrate with a
cloud-based environment like Azure.
In the cloud, security is based on a shared responsibility model,
where the cloud provider and the cloud user have specific security
charges. The cloud provider is responsible for the security of the
cloud, and the customer is tasked with security in the cloud environ‐
ment. Shared IT resources in a public cloud are a natural cause of
concern. A solid network security framework, practical configura‐
tion management tools, strong access controls, and virtual private
clouds (VPCs) are some of the ways in which cloud consumers can
strengthen their cloud security posture.
Effective cloud-based disaster recovery (DR) strategies differ from
traditional DR strategies that rely heavily on off-site duplication of
infrastructure and data. Cloud-based DR solutions offer features like
elasticity and virtualization, which make it easier to offload backup
and DR to the cloud. More likely than not, your backup and DR sol‐
ution in the cloud will cost you less and be more dependable, with
minimal downtime.
vi | Preface
Finally, cloud environments pose special challenges in the areas of
operational governance, legal issues, accessibility, and data disclo‐
sure regulations. The cloud service provider must satisfy the four
fundamental requirements—security, compliance, privacy and con‐
trol, and transparency—to effectively serve its customers in the
cloud. Cloud consumers can adopt various strategies, such as role-
based access controls, network controls, and hierarchical account
provisioning, to enhance security and governance in a cloud envi‐
ronment.
Preface | vii
CHAPTER 1
Introduction
9
consumers, according to previously agreed-upon Service Level
Agreements (SLAs). The cloud provider provisions and manages the
compute resources and owns the resources that it leases to the cloud
consumers. However, it’s possible for a provider to resell the resour‐
ces it leases from even larger cloud providers.
Regardless of whether it’s Amazon Web Services (AWS), Google
Cloud Platform (GCP), or Microsoft Azure, all clouds consist of a
set of physical assets that support virtual resources, such as virtual
machines (VMs). These computing assets and resources run within
datacenters located around the globe, in regions such as Western
Europe or the eastern United States.
The distribution of computing resources across the globe offers
redundancy in failure situations and higher speed (lower latency),
by locating computing resources closer to users. Software and hard‐
ware both become services in a cloud environment. It’s through
these services that you gain access to the underlying resources.
Leading cloud providers, such as AWS, GCP, and Microsoft Azure
offer a long list of services, such as computing, storage, databases,
and identity and security, as well as big data and analytics services.
You can mix and match the services to create custom computing
infrastructures to meet your needs and then add your own applica‐
tions on top of the infrastructure to build your computing environ‐
ment.
Many cloud computing services let developers work with them via
REST APIs, as well as via a command-line interface (CLI). All cloud
vendors offer easy-to-use dashboards to control resources; manage
billing, security and users; and to optimize your cloud usage.
10 | Chapter 1: Introduction
cloud infrastructure. So the cloud is increasingly a venue for regular
enterprise IT workloads.
12 | Chapter 1: Introduction
meaning that the cloud consumer has a high degree of control
over the cloud environment. The consumer must configure and
maintain the bare infrastructure provisioned by the cloud pro‐
viders.
Platform as a service (PaaS)
PaaS is a computing model in which the cloud provider provi‐
sions, sets up, and manages all the computing infrastructure,
such as servers, networks, and databases, and you do the rest.
PaaS is a ready-to-use computing environment since the resour‐
ces and services are already deployed and configured. PaaS
computing services include those that help you develop, test,
and deliver custom software applications. Developers can
quickly create their apps, and the cloud provider sets up and
manages the underlying computing infrastructure. The cloud
consumer can replace their entire on-premise computing envi‐
ronment in favor of a PaaS. Or they can use the PaaS to scale up
their IT environment and/or reduce costs with the cloud envi‐
ronment.
Software as a service (SaaS)
SaaS is how a cloud provider delivers software applications on
demand over the internet. In this mode, the provider manages
not only the infrastructure but also the software applications,
and users connecting to the application over the internet. The
software program is modeled as a shared cloud service and
made available to users as a product. Cloud consumers have
limited administrative and management control, with a SaaS
cloud delivery model.
A good example of a SaaS model is the transitioning of Adobe’s
well-known Creative Suite to a SaaS model called Adobe Crea‐
tive Cloud. As Adobe migrates more products to this model, it
has signed a strategic partnership with Microsoft to make
Microsoft Azure its preferred cloud platform.
“Adobe is offering consumer and enterprise applications in
Azure, along with our next-gen applications, like Adobe Cloud
Platform,” says Brandon Pulsipher, Vice President of Technical
Operations and Managed Services at Adobe. “Our partnership
with Microsoft demonstrates that cloud-native applications in
Azure make great sense for large and small customers alike.”
Cloud-Enabling Technology
The cloud owes its phenomenal growth over the past decade to sev‐
eral technological developments, of which virtualization (server,
14 | Chapter 1: Introduction
storage, and network, among others) is but one. Other key innova‐
tions include various web technologies, service-oriented architec‐
tures, multitenant technologies, resource replication, cloud storage
devices, and object storage. I briefly describe the main cloud ena‐
bling technologies in the following sections.
Virtualization
The largest cloud platforms, such as AWS and Azure, have set up a
number of massive datacenters across the world, specifically
designed to deliver services at a massive scale. By the end of 2017,
Synergy Research Group, which tracks IT and cloud-related mar‐
kets, estimated that there would be 390 hyperscale datacenters in the
world. Each of the companies that fall under the large cloud plat‐
forms has at least 16 datacenter sites, on average, according to Syn‐
ergy, with the biggest cloud providers (AWS, Microsoft, GCP, and
IBM) operating the most datacenters.
Hyperscale virtualization is at the heart of cloud computing. A soft‐
ware called hypervisor sits on the physical server and helps abstract
the machine’s resources. Most of us are familiar with server virtuali‐
zation, but in the cloud, other resources, such as storage and net‐
works, are also virtualized.
Cloud computing relies on virtualization, but it’s much more than
simple virtualization. A cloud provider allocates virtual resources
into centralized resource pools called a cloud. Cloud computing is
the orchestration of these clouds of computing resources through
management and automation software. In addition to virtualized
resources, a cloud offers features such as self-service, automatic scal‐
ing, and enhanced security.
Virtualization is the process of converting a physical IT resource
into (multiple) virtual resources. Cloud-based systems virtualize
many types of IT resources, such as:
Servers
Physical servers are the basis of virtual servers.
Storage
Virtual storage devices or virtual disks are based on underlying
physical storage.
Cloud-Enabling Technology | 15
Network
Physical routers and switches can serve as the basis of logical
network fabrics, such as VLANs.
Power
You can abstract physical uninterruptable power supplies
(UPSs) and power distribution units into virtual UPSs.
The best-known virtualization technology, of course, is server virtu‐
alization. In a nonvirtualized environment, the OS is configured for
a specific hardware and you must usually reconfigure the OS, if you
modify the IT resources. Virtualization translates IT hardware into
emulated and standardized software-based copies. Thus, virtual
servers are hardware independent. It’s this hardware independence
that enables you to move a virtual server to anther virtualization
host, without worrying about the hardware-software compatibility
requirements.
Virtualization, by allowing multiple virtual servers to share a single
physical server, enables server consolidation, which leads to higher
hardware utilization, load balancing, and optimization of computing
resources. On top of this, virtual machines can run different guest
operating systems on the same host. All these virtualization features
support the hallmarks of cloud computing, including on-demand
provisioning and usage, elasticity, scalability, and resiliency.
Web Technologies
Web technologies are used by cloud providers in two ways: as the
implementation medium for web-based services, and as a manage‐
ment interface for cloud services. Well-known elements, such as
Uniform Resource Locators (URLs), the Hypertext Transfer Proto‐
col (HTTP), and markup languages, such as HTML and XML, are
the essential components of the technology architecture of the web.
Web applications are distributed applications that use these web-
based technologies, and their easy accessibility makes them part of
all cloud-based environments. PaaS cloud deployment models help
consumers develop and deploy their web applications by providing
separate web server, application server, and database server environ‐
ments. Many applications benefit from the cloud model, particularly
from the elastic nature of of cloud infrastructure provisioning.
Cloud providers themselves use a lot of web technologies for enable‐
ment, most notably REST APIs and JSON, among others.
16 | Chapter 1: Introduction
Web services are the first popular medium for sophisticated web-
based service logic. Web services are also called SOAP-based, since
they rely on the SOAP messaging format for exchanging requests
and responses between web services. The API of a web service uses a
markup language called Web Service Description Language
(WSDL), and the messages exchanged by the web services are
expressed using the XML Schema Definition (XSD) language (XML
Schema).
Along with the Universal Description, Discovery, and Integration
(UDDI) standard for regulating service registries where WSDL defi‐
nitions can be published, XML schema, SOAP, and WSDL are the
essential components of early web service technologies. Later web
service technologies (called WS-*) address other functional areas,
such as security, transactions, and reliability.
Representational State Transfer (REST) services are based on a ser‐
vice architecture that operates according to a set of constraints to
emulate the properties of the web. REST describes a set of architec‐
tural principles through which data is transmitted over a standard
interface, such as HTTP. REST focuses on the design rules for creat‐
ing stateless services. A client accesses the resources using unique
URIs for the resources, and unique representations of the resources
are returned to the client. With microservices or, at the very least, a
proliferation of endpoints and applications, the cloud needs a lot of
messaging and so all cloud providers have queues, buses, notifica‐
tions, and other message passing and orchestration abilities.
Resource Replication
Resource replication is the creation of multiple instances of the same
computing resource. Typically, virtualization strategies are used to
implement the replication of the resources. For example, a hypervi‐
sor replicates multiple instances of a virtual server, using stored vir‐
tual server images. Most commonly, servers, cloud storage devices,
and networks are replicated in a cloud environment.
Cloud-Enabling Technology | 17
you can access these object storage interface mechanisms via REST
or web services.
For Linux system administrators, cloud storage represents new chal‐
lenges that they’re not used to with their local storage area network/
network attached storage (SAN/NAS) storage systems. Cloud stor‐
age involves a lot of REST-based storage operations versus filesys‐
tem operations. Just like in Azure, you have blob storage, files,
managed disks, and Third-party–provided NAS-like appliances.
And that’s just for files (blobs). Key-value pairs, secrets, document
storage, and ultimately, database persistence are a whole different
ball game.
Measured Usage
Closely related to the ability to use computing resources on demand
is the concept of measured usage. All cloud providers charge their
consumers just for the IT resources used, rather than for the resour‐
ces that are provisioned or allocated to the consumer. Measuring
usage supports customer billing, as well as usage reporting.
Resource Pooling
Resource pooling is how a cloud provider pools a large amount of
computing resources to service multiple consumers. The cloud pro‐
vider dynamically allocates and deallocates virtual resources to
cloud consumers according to fluctuations in demand. Multite‐
nancy (multiple cloud consumers, unbeknownst to each other, shar‐
ing a single instance of a computing resource) supports resource
pooling.
18 | Chapter 1: Introduction
Dynamic Scalability (Elastic Resource Capability)
Dynamic scalability and elasticity refer to the ability of a cloud pro‐
vider to transparently scale computing resources in response to the
runtime conditions of a user’s environment. Virtualization enables
cloud providers to maintain large pools of computing capacity on
hand to service the needs of their customers with minimum delays.
One of the key reasons for migrating to the cloud is its built-in elas‐
ticity, which obviates the need to incur large capital expenditures on
infrastructure, in anticipation of an organization’s growth.
Load Balancing
Load balancing is how a cloud platform manages online traffic by
distributing workloads across multiple servers and other computing
resources. Load balancing can be automatic or on demand. The goal
of load balancing is to keep workload performance at the highest
possible levels by preventing overloading of the computing resour‐
ces, thus enhancing the user experience.
Running microservices
Containers are ideal for running small, self-contained applications
that perform single tasks or run single processes. You can, for exam‐
ple, run separate containers for a web server, application server, or
message queue, among others. Since the containers run independent
of the other containers, it’s easy to scale specific parts of the applica‐
tion up or down, as needed.
20 | Chapter 1: Introduction
deployment process can pull the image, test the app, and deploy it to
production.
You can achieve easily replicable, speedy, reliable, and manageable
deployments by orchestrating the deployment of the containers you
use for CI/CD, using Kubernetes in the Azure Container Service.
Figure 1-2 shows a container-based CI/CD architecture using Jen‐
kins and Kubernetes on the Azure Container Service.
Figure 1-2. CI/CD with Jenkins and Kubernetes on the Azure Con‐
tainer Service
22 | Chapter 1: Introduction
Azure Container Instances (ACI)
For running Docker containers on Axure VMs.
Azure Container Registry (ACR)
For storing and managing container images.
24 | Chapter 1: Introduction
CHAPTER 2
Principle 1: Understand Which
Linux VMs Are Adaptable to the
Cloud
Running your services in the public cloud could mean lower main‐
tenance costs, along with elasticity (the ability to quickly scale your
infrastructure according to business demand), robust disaster recov‐
ery, and high-availability services. However, without thorough
assessments and planning, a cloud migration effort can cost exces‐
sive time and money and can result in a paucity of required techno‐
logical skills, along with security and compliance issues stemming
from a lack of control over your cloud computing resources, band‐
width difficulties, and more.
In our discussion of how to migrate to the cloud, I chose not to
dwell on the initial business case for a move to the cloud. I assume
that a business case has been made for such a move, based on the
cost, effort, potential pitfalls, long-term benefits, and the ease (or
difficulty) of the migration and implementation. I focus on the tech‐
nological implications of this move.
From a purely technical point of view, adopting the cloud is easier
than setting up or expanding a datacenter-based computing infra‐
structure. However, it’s also easy to flounder in your cloud adoption
effort, if you don’t educate yourself well. Often the failure to success‐
fully adopt the cloud doesn’t just leave you where you started—
25
you’re actually likely to lose critical time and to waste your resour‐
ces, which can put you behind your competitors.
There are many half-baked truths and pitfalls in cloud computing.
To pick the best of cloud computing, it’s important for you to under‐
stand which parts will work for you and how to plan and implement
effectively for a migration to the cloud. In this chapter, I explain the
importance of properly assessing and identifying your cloud readi‐
ness, along with the key phase of discovery during the migration to
the cloud. Identifying the parts of the infrastructure that are cloud-
capable is key step in migrating to the cloud. Identification of pilot
applications and development of detailed plans to implement the
pilot project come later in the journey to the cloud.
26 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
Cloud Readiness Assessments
Whether you’re planning a complete migration to the cloud or you’d
like to move a couple of applications over, a cloud assessment is
your starting point. A good assessment takes your cloud goals and
determines the best ways to achieve them, by understanding the
changes you need to make and learning how the move impacts all
areas of your business.
Cloud migrations tend to be more complex than some might esti‐
mate, and poorly done assessment prior to the move can make the
migration even messier. A proper assessment reveals how ready
your organization is, from both a technical and a business view‐
point. It should cover technology processes, the technology teams,
and business elements. The assessment should set your expectations
regarding the benefits you should reap and how to maximize the
potential benefits from a move to the cloud.
Shareholder Interviews
The main purpose of the shareholder interviews is to communicate
the organization’s vision for the cloud. The assessment team also
gathers the expectations of the stakeholders regarding the potential
performance of key enterprise applications in the cloud.
Prioritization
Select a set of noncritical applications and services to migrate for a
new cloud infrastructure and service proof of concept (POC) exer‐
cise or to perform a risk analysis.
28 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
cloud-based system. This includes solution architects well versed in
the cloud, as well as DevOps personnel who can work with the cloud
vendor’s application deployment and CI/CD tools.
Cost Analysis
The assessment team must find out the organization’s cost expecta‐
tions. For many organizations, a key reason for migrating to the
cloud is that there are fewer capital outlays as compared to a tradi‐
tional datacenter-based environment. However, if an organization
doesn’t spend the effort to learn how to optimize cloud resource use,
the cloud may turn out to be more expensive than expected. For
example, of spot purchases of surplus computing power is a power‐
ful way to reduce the cost of running virtual servers in the cloud.
• Pre-deployment tasks
• Migration tasks
• Go-live tasks
Pre-Deployment Tasks
Pre-deployment tasks include broad brush tasks, such as understand‐
ing the scope of the migration and creating the cloud architecture.
30 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
Creating the cloud architectures
Creating the cloud architecture involves selecting the types of cloud
services that the organization must adopt, based on various criteria,
such as business requirements, cost, performance, reliability, and
scalability.
Migration Tasks
Migration tasks include setting up the necessary networks, creating
your computing infrastructure, deploying the applications and data‐
bases, and planning the cutover from on-premise systems to the
new cloud-based systems.
Go-Live Tasks
If all goes well during the performance testing and security assess‐
ment, perform the go-live tasks and cut over to the new systems.
Careful, continuous monitoring of the new systems is critical, so you
can quickly revert to the old systems if you run into unexpected
glitches.
32 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
Azure Migrate can discover up to 1,000 VMs in a sin‐
gle discovery.
Figure 2-1 shows how the Azure Migrate service works by discover‐
ing information about your on-premise VMs.
After you move to the cloud, you need to continuously push appli‐
cation changes to VMs. Figure 2-2 shows how you can set up an
immutable infrastructure CI/CD using Jenkins and Terraform on
Azure Virtual Machine Scale Sets (VMSSs). An Azure VMSS lets
you create and manage a group of identical, load-balanced VMs.
The number of VM instances automatically increases or decreases,
based on demand or per a schedule that you define.
34 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
Figure 2-2. Immutable Infrastructure CI/CD using Jenkins and Terra‐
form on Azure VMSSs
Whether you use Java, Node.js, Go, or PHP to develop your applica‐
tions, you’ll need a CI/CD pipeline to automatically push changes to
the VMs that support those applications.
Here’s an architectural overview of the immutable infrastructure
shown in Figure 2-2:
For a CI/CD sample that uses a template and uses Jenkins and Terra‐
form on Azure VMSSs, please read “CI/CD using Jenkins on Azure
Virtual Machine Scale Sets”.
Deploying VMs using a template based on Jenkins and Terraform
on an Azure VMSS makes it simple for system administrators to
deploy their infrastructure. Here’s an example:
$ azure config mode arm
36 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
Azure Migrate attempts to map every disk attached to
an on-premise VM to a disk in Azure.
38 | Chapter 2: Principle 1: Understand Which Linux VMs Are Adaptable to the Cloud
• Use the third-party tool CloudEndure to move a wider range of
supported VMs to Azure. As with the Azure Site recovery tool,
CloudEndure uses replication during the migration of the VMs
to Azure.
• If speed of migration is a key requirement, use a tool such as
Velostrata, which quickly moves on-premise VMs to Azure, by
replicating just the VM’s compute runtime to Azure and repli‐
cating the VM’s storage slowly over time.
41
siderations before you can start working in the cloud. Although
some of these concepts are also present in on-premise environ‐
ments, several of them may be foreign to Linux system administra‐
tors.
Azure, for example, uses a capability called availability sets to help
deploy reliable VM-based solutions. An availability set is a logical
grouping construct to ensure that your VMs are distributed across
multiple hardware clusters, isolated from each other. Azure guaran‐
tees that your VMs in an availability set are placed across multiple
physical servers, racks, storage units, and network switches. If a
hardware or software failure occurs, your applications remain avail‐
able to users since only a subset of your VMs is affected.
Availability sets are useful in scenarios, such as the following, which
cause VMs to be unavailable or to go into a failed state:
All the major cloud providers offer built-in load balancing. There
are two broad types of load balancers—application load balancers
and network load balancers. Application load balancers sit between
incoming application traffic and computing resources, such as VMs.
They monitor the health of the registered targets and route traffic to
only the healthy targets. You can add and remove targets, such as
computing instances, dynamically from a load balancer, without
interrupting the flow of requests to your applications. Network load
balancers work at the fourth layer of the Open Systems Interconnec‐
tion (OSI) model.
The Azure Application Gateway (Layer 7 load balancer) protects
web applications against well-known web exploits. The Azure Secu‐
rity Center scans Azure cloud resources for vulnerabilities, such as
web apps that aren’t protected by the Microsoft Web Application
Firewall (WAF), a feature of the Azure Application Gateway. Azure
Security Center will recommend an application gateway WAF for
LRS replicates your data within a storage scale unit, which is a col‐
lection of racks of storage units. Data replicas are spread across fault
domains and upgrade domains, which represent groups of nodes
that are a considered a physical unit of failure or are upgraded
together. This ensures data availability, even if a hardware failure
affects a single storage rack, or when you upgrade a set of nodes.
LRS however, doesn’t offer protection against a datacenter-level cat‐
astrophe, such as a fire, in which all your data replicas might be lost.
As a result, Azure recommends geo-redundant storage (GRS) repli‐
cation for most of your applications. For scenarios such as transac‐
tional applications that can’t accept any downtime, there’s also a
zone redundant storage (ZRS) replication, in which customers can
access data even if a single Availability Zone is unavailable, since
ZRS replicates data synchronously across multiple Availability
Zones.
Hardware failures
When VMs fail due to hardware issues, cloud providers automati‐
cally move the VMs to a different location and restart them. For
example, following hardware failure, Azure moves a VM to a differ‐
ent location and restarts it within 5–15 minutes. You can support a
higher SLA by deploying two standalone nodes into an availability
set.
Zero-Downtime Architectures
VMs are at the heart of many cloud environments. Since multiple
VMs are run from a single physical server, the physical server is a
single point of failure (SPOF) for all its guest VMs. Should a physi‐
cal server crash or otherwise be affected, all the VMs running on
that server can become unavailable. However, cloud VMs, in most
cases, offer a very high level of uptime (11 nines).
The zero-downtime architecture is a failover system, in which the
cloud provider moves virtual servers to a different physical server
when the original physical host fails. You can place several physical
servers in a server group managed by a fault tolerance system. The
fault tolerance system automatically moves the virtual servers
around while the VMs are live, thus avoiding any service interrup‐
tions.
In addition to strategies such as live migration of VMs, a cloud pro‐
vider can also follow a strategy of resource replication by creating
new VMs and cloud service instances automatically when the VM
or a service experiences failure.
Caching
Caching is a common strategy to improve the performance and scal‐
ability of an application. Caching temporarily copies frequently
accessed application data to fast storage. The closer the storage is to
the application, the faster the application response times will be.
You can use either an open source caching solution, such as Redis,
or a cloud provider–owned caching tool to implement a caching
strategy that caches session state, semi-static transaction data, and
HTML output. For example, Azure Redis Cache is an implementa‐
tion of the open source Redis Cache that runs as a service in the
Azure cloud. Any Azure application can access the Azure Redis
Cache service.
55
Application Performance Monitoring (APM)
and the Cloud
Traditional IT monitoring is focused on monitoring the computing
environment—servers, storage, and networks, among other pieces.
However, most cloud deployments don’t require you to perform
these standard monitoring functions. The cloud provider monitors
and manages the infrastructure that you’re renting, so you don’t
need to worry about typical IT infrastructure issues, such as servers
that crash, disks that fail, and networks that drop packets. All these
traditional concerns are gone.
You may not even have any servers or other infrastructure when you
are using a cloud environment. For example, you may use a service,
such as Azure App Service, to deploy your applications to the cloud.
And you may rely on Azure SQL databases and a hosted caching
service, such as Redis.
Serverless computing (Azure Functions and AWS Lambda) is a rela‐
tively new phenomenon that promises to grow in importance. Serv‐
erless architectures help developers deploy applications as chunks of
business logic. The cloud provider spins up the necessary comput‐
ing infrastructure to process the requests for the functions. And it
requires no servers at all, because the deployment unit is just code!
You don’t need to worry about provisioning the servers for running
the functions, but you do need to know which requests are being
heavily used and which requests are running slowly.
Monitoring your applications rather than your servers and other
infrastructure components is key in a cloud environment. Applica‐
tion performance monitoring tools helps to monitor your end-user
experience and to provide end-to-end visibility into your applica‐
tion stack. A good APM tool provides deep dive application compo‐
nent monitoring for your enterprise applications. It helps your
development, middleware, database, and server experts to trouble‐
shoot performance bottlenecks and to perform root-cause analysis
across the cloud infrastructure.
APM tools replace guesswork and reduce your reliance on manual
monitoring processes. They help managers understand how the IT
services impact their business operations. By monitoring application
performance end-to-end and providing insights into capacity uti‐
lization, they enable businesses to make sound decisions about
56 | Chapter 4: Principle 3: Monitor Your Applications Running on Linux Across the Entire
Stack
resource allocation. They also help the IT groups monitor how well
the applications are meeting their SLAs, thus ensuring a good end-
user experience.
Log Analysis
Collecting and analyzing your systems and application data can
offer insights into your cloud infrastructure. Efficient log analysis
helps you gain operational insights, with minimal time spent look‐
ing for anomalies across the cloud environment.
Performance Benchmarks
To measure system performance effectively, you must compare per‐
formance metrics against valid performance benchmarks. Without
58 | Chapter 4: Principle 3: Monitor Your Applications Running on Linux Across the Entire
Stack
this comparison, it’s hard for you to tell how current performance
compares to the benchmarks, and you won’t be able to gauge the
severity of potential issues.
• CPU usage
• Disk I/O
• Memory utilization
• Network performance
CPU metrics
CPU usage has traditionally been the most common performance
metric when monitoring Linux servers. You need to receive alerts
when server CPUs are reaching their saturation point. The key sta‐
tistic to watch is the percentage of time the CPU is in use. The raw
CPU percentage doesn’t tell the whole story—you want to dig
deeper and find out what percentage of CPU usage is for running
user applications (CPU user time) and what percentage is being used
by the system (CPU privileged time).
I/O performance
Disk read and write metrics help you identify I/O bottlenecks.
Cloud providers offer multiple instance types, with each one opti‐
mized for specific types of workloads. Some instance types are
meant for high I/O-based workloads, and others, for heavy CPU
usage-related applications.
If you’re running applications that involve high amounts of writes
and you notice I/O bottlenecks, you can switch to a different
instance type that offers a higher number of input/output operations
per second (IOPS).
Performance Benchmarks | 59
Memory utilization
Monitoring memory usage is a crucial component of monitoring
VMs in the cloud. A low memory condition adversely impacts
application performance. Monitoring reveals the amount of used
and free memory for the instances. Paging events occur when an
application requests pages not available in memory.
In low memory situations, pages are written to disk to free up work‐
ing memory. The application must then retrieve the page from
memory. An excessive amount of paging drastically slows down an
application. Spikes in paging indicate that the VM is unable to cope
with the requests from the application.
Network performance
Network monitoring shows the rate at which network traffic is flow‐
ing in and out of a VM. Network metrics are shown in the statistic
bytes per second (bytes received per second and bytes sent per sec‐
ond), indicating the volume of network traffic.
60 | Chapter 4: Principle 3: Monitor Your Applications Running on Linux Across the Entire
Stack
ing tools like Datadog is that they help you add metrics from multi‐
ple systems to performance dashboards, thus providing you a
comprehensive view of your entire infrastructure, regardless of
where the components live.
Cloud-Monitoring Tools
Although organizations typically pour a lot of effort and money into
application development, they don’t place a similar emphasis on
cloud-monitoring tools. The truth is that, without the visibility and
insight provided by powerful cloud-monitoring tools, you don’t
know exactly how your applications are performing and therefore
you don’t get the direction for improving the applications.
Cloud monitoring is a catchall phrase and includes a wide variety of
tools. Service providers offer their own out-of-the-box monitoring
tools, such as Amazon CloudWatch and Microsoft Azure Monitor.
However, these tools may not be adequate for many cloud consum‐
ers, especially those with multicloud and hybrid cloud architectures.
Cloud-monitoring tools can be in-house tools offered by the cloud
provider or tools offered by independent SaaS providers. Cloud
monitoring is increasingly being offered as a fully managed on-
demand service, with the service providing the tools for monitoring
both cloud and on-premise infrastructures and web applications.
The cloud monitoring is delivered through a SaaS-based software
Cloud-Monitoring Tools | 61
that tracks performance across the entire cloud stack. Cloud admin‐
istrators and development teams can review the performance statis‐
tics in a central dashboard, and they can get alerts about
performance issues through email, and SMS, among other options.
Cloud proprietary and third-party monitoring tools can also work
well together. There are specific advantages in using the two types of
tools. Cloud provider monitoring tools are preinstalled and precon‐
figured, so they’re ready to use, out of the box. SaaS monitoring
tools have the advantage that they help monitor more than one type
of cloud infrastructure, so they allow you to monitor all your appli‐
cations and services from a single point.
New Relic, SolarWinds, and PagerDuty are some of the well-known
cloud provider and third-party monitoring tools. All leading cloud
providers offer built-in monitoring tools, as I explain in the follow‐
ing sections.
Amazon CloudWatch
Amazon CloudWatch us a tool offered by AWS that helps you moni‐
tor application metrics, log files, and react to changes in your AWS
resources.
Google Stackdriver
Google Stackdriver offers monitoring and logging for applications
that you run in the Google Cloud and in AWS. Although Stackdriver
is natively integrated with Google Cloud Platform cloud products, it
lets you aggregate data across cloud platforms.
62 | Chapter 4: Principle 3: Monitor Your Applications Running on Linux Across the Entire
Stack
The Importance of a Comprehensive
Monitoring Solution
An effective monitoring solution must help you do the following:
64 | Chapter 4: Principle 3: Monitor Your Applications Running on Linux Across the Entire
Stack
CHAPTER 5
Principle 4: Ensure Your Linux VMs
Are Secure and Backed Up
65
According to the “RightScale 2018 State of the Cloud
Report,” security is a challenge for 77% of respondents.
It is the largest issue for enterprises starting out with
the cloud. For intermediate and advanced users, cloud
costs are the bigger challenge.
66 | Chapter 5: Principle 4: Ensure Your Linux VMs Are Secure and Backed Up
provider and the cloud customer divide up the security responsibili‐
ties.
The amount of work that the cloud customer must do in the security
area depends on the specific cloud services that the customer choo‐
ses and on the sensitivity of their data. Regardless of the specific
cloud services that you use, you’ll always be responsible for security
features, such as user account management and SSL/TLS of trans‐
mitting data, among others.
Let’s review the two areas of security that fall under the cloud pro‐
vider’s domain—service security and global infrastructure security.
Service Security
In addition to being responsible for the security of the cloud infra‐
structure, the cloud provider must securely configure all the man‐
aged services (such as databases and big data analytical services) that
it offers its users. The cloud provider handles all security tasks for
the managed services, such as hardening the operating system,
patching the databases with the latest security updates, configuring
firewalls, and setting up DR plans. The cloud customer only needs
to resolve two things with the managed services:
68 | Chapter 5: Principle 4: Ensure Your Linux VMs Are Secure and Backed Up
These vulnerabilities can be avoided if the cloud consumers and the
cloud providers support compatible security frameworks. But in a
public cloud, you can’t ensure this.
The cloud provider always has privileged access to the data you store
in the cloud, since the trust boundaries overlap in the cloud. The
security of the data depends on the security policies and access con‐
trols enforced by both the cloud consumer and the provider. Since
most cloud IT resources are shared among users, it creates a poten‐
tial source of data exposure to malicious cloud consumers.
Access Control
Cloud providers generally excel in the provision of strong access
control mechanisms. Most cloud providers rely on centralized IAM
to manage users, security credentials (like passwords, access keys,
and permissions), and authorization policies that control which
resources and services users can access.
70 | Chapter 5: Principle 4: Ensure Your Linux VMs Are Secure and Backed Up
physical damage caused by a fire or a flood. A disaster could also be
the result of human error, such as accidental deletion of data.
Elastic and speedy provisioning of computing power is a primary
reason for the success of cloud computing. Just as you modernize
your server inventory with low-cost, virtual servers that you provi‐
sion on demand, and just as you provision low-cost cloud storage of
various types (object, block, or archive, for example), you must also
modernize your backup solutions to take advantage of the elasticity
and flexibility offered by a cloud environment.
Since a cloud environment usually requires you to transfer and store
large amounts of data, cloud users must learn to leverage the provid‐
er’s elastic cloud computing features to efficiently perform the data
transfer and storage.
72 | Chapter 5: Principle 4: Ensure Your Linux VMs Are Secure and Backed Up
Off-Site Storage with Tape Vaults
An off-site tape storage service-based DR strategy is vastly less
expensive than maintaining a standby DR site, but this solution
offers much lower RTOs and RPOs. Following a disaster, you must
set up your alternate infrastructure and configure it before you can
restore the data on tape storage, all of which takes considerable time.
Elasticity
Elasticity is one of the calling cards of a cloud environment. It’s very
easy to provision compute, storage, and networking services in the
cloud. You can provision, configure, and start up huge amounts of
these services within a few minutes. The elasticity offered by cloud
compute and storage is ideal for the infrequent requirements for
infrastructure in a DR solution.
Virtualization
A cloud computing system relies on virtual servers, which are easy
to copy and back up to off-site DR datacenters, and you can quickly
Simpler Management
One of the big advantages of using a cloud-based DR solution is that
you don’t need to perform any patching to bring the DR site up-to-
date. The DR site is automatically updated, and you can minimize
74 | Chapter 5: Principle 4: Ensure Your Linux VMs Are Secure and Backed Up
recovery issues by sequencing the order of multitier applications
running on multiple VMs. You can test your DR plans without
adversely impacting your production workloads.
Lower Costs
A cloud-based BCDR solution allows you to lower your infrastruc‐
ture cost by using the cloud as the secondary site for running your
business during outages. You can avoid datacenter costs by moving
to the cloud and taking advantage of the geographical regions
offered by cloud providers and setting up DR between those regions.
Infrastructure costs are lower for a DR solution in the cloud because
you pay just for the resources you need to run your applications in
the cloud during outages. In addition, you can use automatic recov‐
ery to the cloud to keep your on-premise applications available dur‐
ing outages.
Reduced Downtime
Cloud-based BCDR solutions, such as Azure Site Recovery, can offer
best-in-class RPO and RTO. Cloud-based BCDR solutions should
offer very highly dependable SLAs. Instead of waiting weeks to rep‐
licate the infrastructure and recover data following a disaster, you
can expect immediate recovery—at no additional cost.
76 | Chapter 5: Principle 4: Ensure Your Linux VMs Are Secure and Backed Up
backups to the service. Managerial costs are lower (no tape costs, for
example) as are management costs when you contract with a backup
service.
Backup services don’t merely back up your data. They offer several
powerful features, such as ease of use (no scripts to back up your
data), central dashboards to manage the backups, and the ability to
export backup reports, to help in your compliance efforts. The most
important protection offered by a good cloud backup service is the
security of the data that’s backed up. The service allows you to set up
access controls to ensure that only authorized users perform backup
operations.
Microsoft Azure offers Backup to protect your data. Azure Backup
is a pay-as-you-go service that offers you flexibility as to the data
you want to protect and as to the length of time for which you want
to retain the backups. Backup helps you restore your VMs and phys‐
ical servers, both in the cloud and on-premise, free of cost.
You can use Azure Backup along with Azure Site Recovery to store
backups in Azure and to replicate the workloads to Azure rather
than to a secondary site. Using these two services together simplifies
the building of DR solutions. Both tools support hybrid environ‐
ments. Azure Site Recovery helps you replicate workloads to Azure
rather than to a secondary site, eliminating the need for a dedicated
secondary datacenter. It offers a better solution than that of running
a secondary site which replicates data to the cloud.
Azure Site Recovery stores the replicated data in Azure Backup, and
when a failover is required, it creates Azure VMs based on the repli‐
cated data. You can set your own RPO thresholds to determine how
often Azure creates data recovery points. Azure Site Recovery
reduces RTOs, with its automated recovery process. You can test
failovers to support DR drills without adversely affecting your pro‐
duction systems.
79
Data encryption and auditing of the provider’s datacenters might
seem to be obvious solutions to the cloud consumer’s quandary. But
the nature of the cloud environments, where encryption may even
hinder processing activity, and the infeasibility of consumer auditing
of the cloud service provider’s datacenters mean that these solutions
aren’t practical in most cases.
Security
IT managers are concerned about potential vulnerabilities in the
cloud compared to their on-premise security. The cloud provider
must safeguard its customers’ data with rigorous security controls
and state-of-the-art security technology, including vulnerability
assessments and data encryption.
Compliance
The cloud provider must enable its customers to satisfy a wide range
of governmental and regulatory agency compliance standards, both
domestic and international, as well as industry certifications and
attestations.
Compliance road maps continuously evolve, and cloud users must
be assured that the cloud provider’s compliance strategies are also
evolving over time to meet increasingly stringent standards and reg‐
ulations.
Transparency
The cloud service provider must enable its customers to have full
visibility into their data, such as the locations where the provider
stores the data and how the provider manages it. Businesses must be
able to independently verify the storage, access, and security of their
data.
Networking controls
Network access in a hybrid cloud environment can include both
internal and external (internet-based) network access. VPNs in the
cloud, such as Amazon Virtual Private Cloud and Azure VN
(VNets), logically isolate a part of the public cloud to help keep a
business from the rest of it.
Network security groups are virtual firewalls that consist of rules
that control the flow of network traffic by specifying how a cloud
resource, such as a VM, can connect to the internet or to other sub‐
nets in a virtual network.
The ISO 27001 and 27002 certifications, for example, provide assur‐
ance that the cloud provider has implemented a set of specific secu‐
rity controls and a system of management practices to ensure that
the controls function as they should.
In addition to the US standards, there are numerous regional or
national standards, such as Europe’s ENISA Information Assurance
Framework and Japan’s Cloud Security Mark. All these standards
require rigorous annual visits to the cloud providers’ facilities by
accredited auditors.
Nondisclosure Agreements
Cloud providers naturally zealously guard proprietary information
about their physical architecture and their security and control sys‐
tems. However, the provider must be able to share certain aspects of
its architecture and internal security controls with its customers,
subject to the customers signing a nondisclosure agreement.
Summary
This book explained the strategies for deploying Linux environ‐
ments in the cloud, with a focus on Microsoft Azure. There are mul‐
tiple strategies for an organization to move to the cloud. Regardless
of the cloud vendor one chooses, the key to success in cloud envi‐
ronments is to follow a set of guiding principles for cloud opera‐
tions. Understanding how virtualization, and more recently,
containerization, and serverless computing play a crucial role is also
important to doing well in the cloud.
Planning a cloud migration is vital, since a poorly planned and
implemented cloud effort can set an organization back. Before you
start a cloud migration, it’s important to create a working cloud
adoption road map. Conducting effective cloud readiness assess‐
ments sets the tone for the ensuing migration. Accurate workload,
application, and database analysis reduces the surprise factor when
you make the move to the cloud.
Cloud migrations consist of distinct operations, such as a set of pre-
deployment tasks to get you ready for the migration, the migration
Summary | 89
vide visibility into your application stack, helping you troubleshoot
performance issues in the cloud.
If you’re running a multicloud or hybrid cloud architecture, out-of-
the-box monitoring tools, such as CloudWatch or Azure Monitor,
may not be sufficient. There are powerful monitoring services and
tools offered by independent SaaS providers, as well as third-party
monitoring tools, like New Relic, PagerDuty, and SolarWinds. Iden‐
tifying the right metrics to monitor user experience, gathering uni‐
form metrics for on-premise and cloud-based services, and paying
attention to cloud service usage and costs are the key guidelines
when monitoring a cloud environment.
Cloud environments employ a shared security model, in which the
cloud vendor is in charge of securing the cloud infrastructure, and
you are responsible for securing your infrastructure and applica‐
tions. Configuration management tools, access control mechanisms,
and VPCs are some of the ways you can enhance your cloud envi‐
ronment.
RTO and RPO are what determine an acceptable system downtime.
Cloud environments make it easier to set up a DR solution for your
systems, since they offer elasticity and virtualization of resources.
Instead of relying on outmoded conventional strategies, you can
move all your backups and disaster recovery solutions to the cloud
and run your business with zero downtime. A cloud-based solution,
such as Azure Site Recovery, offers a good RTO and RPO. Instead of
using a homegrown backup system, you can take advantage of a
cloud-based backup service (SaaS), such as Azure Backup, serviced
to safeguard data.
Governance and compliance are two areas where a cloud environ‐
ment poses special problems, due to multiple compliance, legal,
accessibility, and data disclosure agreements. You can enhance secu‐
rity and governance in the cloud by using strategies such as RBAC,
hierarchical account provisioning, and network security groups.
Continuous security assessments in the cloud, through tools such as
Amazon Inspector, are key to mitigating vulnerabilities and reduc‐
ing the incidence of malicious attacks on your cloud environment.
Azure Security Center offers centralized security policy manage‐
ment, continuous security assessments, prioritized alerts, and
policy-based enablement of control and governance.