Unit 4 Infrastructure As A Service
Unit 4 Infrastructure As A Service
Infrastructure as a service
Definition:
Infrastructure can be considered as any hardware or software resource used to either execute
and/or control the complete computational operations. Infrastructure as a Service(IaaS), thus
can be explained as a mechanism of renting the infrastructure to the end user for
carrying infrastructure dependent work on the said infrastructure without
implementing infrastructure. This helps user to concentrate on the work and frees user from the
overhead of implementation of the required infrastructure to carry infrastructure dependent work.
User is charged generally on pay per use basis or depending upon the contract
conditions(Service level Agreements(SLA)) between user and infrastructure provider. Follows
the brief information on IaaS and types of resources, which can be converted into IaaS. Figure 8
shows a simple logical view of Infrastructure as a service scenario of cloud computing.
Load Balancing Classification
Metrics for load balancing
It's difficult to talk about architectures without the perspective of utility. The measure of an
architecture from a variety of characteristics, including cost, performance, remote access, and so
on. Therefore, I first define a set of criteria by which cloud storage models are measured, and
then explore some of the interesting implementations within cloud storage architectures.
First, let's discuss a general cloud storage architecture to set the context for the later exploration
of unique architectural features.
General architecture
Cloud storage architectures are primarily about delivery of storage on demand in a highly
scalable and multi-tenant way. Generically (see Figure 1), cloud storage architectures consist of a
front end that exports an API to access the storage. In traditional storage systems, this API is the
SCSI protocol; but in the cloud, these protocols are evolving. There, you can find Web service
front ends, file-based front ends, and even more traditional front ends (such as Internet SCSI, or
iSCSI). Behind the front end is a layer of middleware that I call the storage logic. This layer
implements a variety of features, such as replication and data reduction, over the traditional data-
placement algorithms (with consideration for geographic placement). Finally, the back end
implements the physical storage for data. This may be an internal protocol that implements
specific features or a traditional back end to the physical disks.
Characteristic Description
Storage
efficiency Measure of how efficiently the raw storage is used
Cost Measure of the cost of the storage (commonly in dollars per gigabyte)
Manageability
One key focus of cloud storage is cost. If a client can buy and manage storage locally compared
to leasing it in the cloud, the cloud storage market disappears. But cost can be divided into two
high-level categories: the cost of the physical storage ecosystem itself and the cost of managing
it. The management cost is hidden but represents a long-term component of the overall cost. For
this reason, cloud storage must be self-managing to a large extent. The ability to introduce new
storage where the system automatically self-configures to accommodate it and the ability to find
and self-heal in the presence of errors are critical. Concepts such as autonomic computing will
have a key role in cloud storage architectures in the future.
Access method
One of the most striking differences between cloud storage and traditional storage is the means
by which it's accessed (see Figure 2). Most providers implement multiple access methods, but
Web service APIs are common. Many of the APIs are implemented based on REST principles,
which imply an object-based scheme developed on top of HTTP (using HTTP as a transport).
REST APIs are stateless and therefore simple and efficient to provide. Many cloud storage
providers implement REST APIs, including Amazon Simple Storage Service (Amazon S3),
Windows Azure™, and Mezeo Cloud Storage Platform.
One problem with Web service APIs is that they require integration with an application to take
advantage of the cloud storage. Therefore, common access methods are also used with cloud
storage to provide immediate integration. For example, file-based protocols such as
NFS/Common Internet File System (CIFS) or FTP are used, as are block-based protocols such as
iSCSI. Cloud storage providers such as Six Degrees, Zetta, and Cleversafe provide these access
methods.
Although the protocols mentioned above are the most common, other protocols are suitable for
cloud storage. One of the most interesting is Web-based Distributed Authoring and Versioning
(WebDAV). WebDAV is also based on HTTP and enables the Web as a readable and writable
resource. Providers of WebDAV include Zetta and Cleversafe in addition to others.
Figure 2. Cloud storage access methods
You can also find solutions that support multi-protocol access. For example, IBM® Smart
Business Storage Cloud enables both file-based (NFS and CIFS) and SAN-based protocols from
the same storage-virtualization infrastructure.
Performance
There are many aspects to performance, but the ability to move data between a user and a remote
cloud storage provider represents the largest challenge to cloud storage. The problem, which is
also the workhorse of the Internet, is TCP. TCP controls the flow of data based on packet
acknowledgements from the peer endpoint. Packet loss, or late arrival, enables congestion
control, which further limits performance to avoid more global networking issues. TCP is ideal
for moving small amounts of data through the global Internet but is less suitable for larger data
movement, with increasing round-trip time (RTT).
Amazon, through Aspera Software, solves this problem by removing TCP from the equation. A
new protocol called the Fast and Secure Protocol (FASP™) was developed to accelerate bulk
data movement in the face of large RTT and severe packet loss. The key is the use of the UDP,
which is the parter transport protocol to TCP. UDP permits the host to manage congestion,
pushing this aspect into the application layer protocol of FASP (see Figure 3).
Figure 3. The Fast and Secure Protocol from Aspera Software
Using standard (non-accelerated) NICs, FASP efficiently uses the bandwidth available to the
application and removes the fundamental bottlenecks of conventional bulk data-transfer
schemes. The Related topics section provides some interesting statistics on FASP performance
over traditional WAN, intercontinental transfers, and lossy satellite links.
Multi-tenancy
One key characteristic of cloud storage architectures is called multi-tenancy. This simply means
that the storage is used by many users (or multiple "tenants"). Multi-tenancy applies to many
layers of the cloud storage stack, from the application layer, where the storage namespace is
segregated among users, to the storage layer, where physical storage can be segregated for
particular users or classes of users. Multi-tenancy even applies to the networking infrastructure
that connects users to storage to permit quality of service and carving bandwidth to a particular
user.
Scalability
You can look at scalability in a number of ways, but it is the on-demand view of cloud storage
that makes it most appealing. The ability to scale storage needs (both up and down) means
improved cost for the user and increased complexity for the cloud storage provider.
Scalability must be provided not only for the storage itself (functionality scaling) but also the
bandwidth to the storage (load scaling). Another key feature of cloud storage is geographic
distribution of data (geographic scalability), allowing the data to be nearest the users over a set of
cloud storage data centers (via migration). For read-only data, replication and distribution are
also possible (as is done using content delivery networks). This is shown in Figure 4.
Figure 4. Scalability of cloud storage
Internally, a cloud storage infrastructure must be able to scale. Servers and storage must be
capable of resizing without impact to users. As discussed in the Manageability section,
autonomic computing is a requirement for cloud storage architectures.
Availability
Once a cloud storage provider has a user's data, it must be able to provide that data back to the
user upon request. Given network outages, user errors, and other circumstances, this can be
difficult to provide in a reliable and deterministic way.
There are some interesting and novel schemes to address availability, such as information
dispersal. Cleversafe, a company that provides private cloud storage (discussed later), uses the
Information Dispersal Algorithm (IDA) to enable greater availability of data in the face of
physical failures and network outages. IDA, which was first created for telecommunication
systems by Michael Rabin, is an algorithm that allows data to be sliced with Reed-Solomon
codes for purposes of data reconstruction in the face of missing data. Further, IDA allows you to
configure the number of data slices, such that a given data object could be carved into four slices
with one tolerated failure or 20 slices with eight tolerated failures. Similar to RAID, IDA permits
the reconstruction of data from a subset of the original data, with some amount of overhead for
error codes (dependent on the number of tolerated failures). This is shown in Figure 5.
Figure 5. Cleversafe's approach to extreme data availability
With the ability to slice data along with cauchy Reed-Solomon correction codes, the slices can
then be distributed to geographically disparate sites for storage. For a number of slices (p) and a
number of tolerated failures (m), the resulting overhead is p/(p-m). So, in the case of Figure 5,
the overhead to the storage system for p = 4 and m = 1 is 33%.
The downside of IDA is that it is processing intensive without hardware acceleration. Replication
is another useful technique and is implemented by a variety of cloud storage providers. Although
replication introduces a large amount of overhead (100%), it's simple and efficient to provide.
Control
A customer's ability to control and manage how his or her data is stored and the costs associated
with it is important. Numerous cloud storage providers implement controls that give users greater
control over their costs.
Amazon implements Reduced Redundancy Storage (RRS) to provide users with a means of
minimizing overall storage costs. Data is replicated within the Amazon S3 infrastructure, but
with RRS, the data is replicated fewer times with the possibility for data loss. This is ideal for
data that can be recreated or that has copies that exist elsewhere.
Efficiency
Storage efficiency is an important characteristic of cloud storage infrastructures, particularly with
their focus on overall cost. The next section speaks to cost specifically, but this characteristic
speaks more to the efficient use of the available resources over their cost.
To make a storage system more efficient, more data must be stored. A common solution is data
reduction, whereby the source data is reduced to require less physical space. Two means to
achieve this include compression—the reduction of data through encoding the data using a
different representation—and de-duplication—the removal of any identical copies of data that
may exist. Although both methods are useful, compression involves processing (re-encoding the
data into and out of the infrastructure), where de-duplication involves calculating signatures of
data to search for duplicates.
Cost
One of the most notable characteristics of cloud storage is the ability to reduce cost through its
use. This includes the cost of purchasing storage, the cost of powering it, the cost of repairing it
(when drives fail), as well as the cost of managing the storage. When viewing cloud storage from
this perspective (including SLAs and increasing storage efficiency), cloud storage can be
beneficial in certain use models.
An interesting peak inside a cloud storage solution is provided by a company called Backblaze
(see Related topics for details). Backblaze set out to build inexpensive storage for a cloud storage
offering. A Backblaze POD (shelf of storage) packs 67TB in a 4U enclosure for under US$8,000.
This package consists of a 4U enclosure, a motherboard, 4GB of DRAM, four SATA controllers,
45 1.5TB SATA hard disks, and two power supplies. On the motherboard, Backblaze runs
Linux® (with JFS as the file system) and GbE NICs as the front end using HTTPS and Apache
Tomcat. Backblaze's software includes de-duplication, encryption, and RAID6 for data
protection. Backblaze's description of their POD (which shows you in detail how to build your
own) shows you the extent to which companies can cut the cost of storage, making cloud storage
a viable and cost-efficient option.
Cloud storage models
Thus far, I've talked primarily about cloud storage providers, but there are models for cloud
storage that allow users to maintain control over their data. Cloud storage has evolved into three
categories, one of which permits the merging of two categories for a cost-efficient and secure
option.
Much of this article has discussed public cloud storage providers, which present storage
infrastructure as a leasable commodity (both in terms of long-term or short-term storage and the
networking bandwidth used within the infrastructure). Private clouds use the concepts of public
cloud storage but in a form that can be securely embedded within a user's firewall. Finally,
hybrid cloud storage permits the two models to merge, allowing policies to define which data
must be maintained privately and which can be secured within public clouds (see Figure 6).
Figure 6. Cloud storage models
The cloud models are shown graphically in Figure 6. Examples of public cloud storage providers
include Amazon (which offer storage as a service). Examples of private cloud storage providers
include IBM, Parascale, and Cleversafe (which build software and/or hardware for internal
clouds). Finally, hybrid cloud providers include Egnyte, among others.
Virtual Machine Migration
Cloud NAS (Network Attached Storage)
What is Cloud NAS?
According to Technavio,, Cloud NAS is gaining traction in the marketplace. But we still see a lot
of confusion whenhen people hear the terms “Cloud NAS”, “Cloud
“Cloud-based
based NAS”, or storage
gateway. So what is Cloud NAS?NAS A cloud NAS works like the legacy, on-premises
premises NAS
currently in your data center. But unlike traditional NAS or SAN infrastructures, a cloud NAS is
not a physical machine; it’s software-based
software based and designed to work in the cloud.
Cloud NAS is a “NAS in the cloud” that takes advantage of cloud computing to simplify
infrastructure and reduce costs. Most cloud NAS products workwork on cloud providers like Amazon
Web Services (AWS) and Microsoft Azure. Cloud NAS uses the cloud as the central source for
all data, but still provides common enterprise NAS features.
Why do you need a Cloud NAS NAS?? The way we work has evolved, but data storage hasn’t changed
substantially in over two decades. It’s time for storage to catch up. With the right set of
capabilities, a cloud NAS shortens the amount of time it takes to migrate from an on-premises
on
NAS to the cloud. It’s also much easier to manage than legacy, on-premises
on premises NAS systems.
A Cloud NAS provides significant benefits including:
including
• Eliminate legacy NAS Systems: A cloud NAS works with public cloud providers, so you’ll no
longer need an on-premises
premises NAS. Once you’re done migrating to the cloud, you’ll finally be able
to unplug your legacy NAS and end your expensive maintenance renewal contracts.
• No More Local Backups or Tapes: Cloud providers such as AWS and Microsoft Azure
automatically backup and archive data, so you can consolidate backup and tape archive
operations from multiple sites to the cloud.
• Built-in Disaster Recovery: A cloud
cloud NAS uses the cloud as the central data source, letting you
consolidate all of your backup and DR under one roof. Since cloud providers use redundant
copies of data and multiple data centers to architect system durability in to their service, your
data is always recoverable. Your data is already stored off-site across multiple sites, so you don’t
have to worry about tape backups. Snapshots provide point-in-time recovery for as long as you
need it.
• Pay as You Go: You only pay your cloud provider for the storage you need. With cloud storage
becoming cheaper, you can instantly scale your cloud instances to best suit your needs and
not worry about costs.
Use cases for a Cloud NAS include:
• On-Premises to Cloud Backup: Replicate and backup your data from your VMware datacenter
to the cloud. Eliminate physical backup tape required by business compliance and archive data in
inexpensive S3 object storage or send to cold storage like AWS Glacier for long-term storage.
• New Apps and Proofs of Concept (POC): A cloud NAS lets developers quickly stand up
storage infrastructure for a new application or proof of concept project without any storage
hardware. Developers can easily create a storage infrastructure with just a few clicks.
• File Services for S3 Object Storage: Object storage systems provide a lower-cost, more durable
and scalable alternative to traditional NAS and SAN hardware storage systems. However, they’re
optimize for performance with object I/O, but do not perform as well with file I/O and lack the
robust capabilities of traditional NAS filers. Frequently, object storage solutions have limited or
low-performing files services. Cloud NAS enables customers to take advantage of the scalability,
durability and low cost of object storage. Replace expensive on-premises SAN and NAS
equipment, while still providing file services for existing enterprise applications.
• Docker Persistent Storage: Docker cannot natively share volumes across multiple Docker
hosts. If data is not in a volume, the data disappears when you delete the Docker container. With
a cloud NAS, you can share persistent storage between Docker containers and hosts. Share
snapshots of your data to S3 or elsewhere for use even after your Docker container has exited.
• SaaS-Enable Applications: The growing trend from on-premises to Software-as-a-Service
(SaaS) deployments is undeniable. Traditional applications typically do not support block storage
or object storage. Converting your client/server applications to support block or object storage
requires application development and is usually slow and costly. For legacy applications with
incompatible file protocols, cloud NAS offers file services and support for NFS, CIFS/SMB,
iSCSI, and AFP protocols with Active Directory and LDAP integration to SaaS-enable your
existing applications for the cloud with ease.
Case study
Micro soft Azure
AWS(Amazon Web Services)
Google App Engine (GAE)
Scope:
• Overview
• Data storage
• Security
• Overview
Programming languages support
Java:
• App Engine runs JAVA apps on a JAVA 7 virtual machine (currently
supports JAVA 6 as well).
• Uses JAVA Servlet standard for web applications:
• WAR (Web Applications ARchive) directory structure.
• Servlet classes
• Java Server Pages (JSP)
• Static and data files
• Deployment descriptor (web.xml)
• Other configuration files
Python:
• Uses WSGI (Web Server Gateway Interface) standard.
• Python applications can be written using:
• Webapp2 framework
• Django framework
• Any python code that uses the CGI (Common Gateway Interface)
standard.
PHP (Experimental support):
• Local development servers are available to anyone for developing
and testing local applications.
• Only whitelisted applications can be deployed on Google App Engine.
(https://gaeforphp.appspot.com/).
Google’s Go:
• Go is an Google’s open source programming environment.
• Tightly coupled with Google App Engine.
• Applications can be written using App Engine’s Go SDK.
Data storage