Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CC-Notes 19-09-22

Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

CLOUD COMPUTING

Course Code :21F00205c


MCA – Semester - II
Course Objectives:
• To understand the need of Cloud Computing.
• To develop cloud applications.
• To demonstrate design the architecture for new cloud application.
• To teach how to re-architect the existing application for the cloud.
Course Outcomes (CO): Student will be able to
• Outline the procedure for Cloud deployment (L4)
• Investigate different cloud service models and deployment models (L4)
• Compare different cloud services. (L4)
• Design applications for an organization which use cloud environment. ( L6)
• Understand the concept and challenge of big data and why existing technology is inadequate to analyze the big
data. (L2)
UNIT – I Lecture Hrs:
Introduction to cloud computing: Introduction, Characteristics of cloud computing, Cloud Models, Cloud Services
Examples, Cloud Based services and applications
Cloud concepts and Technologies: Virtualization, Load balancing, Scalability and Elasticity, Deployment,
Replication, Monitoring, Software defined, Network function virtualization, Map Reduce, Identity and Access
Management, services level Agreements, Billing.
Cloud Services and Platforms: Compute Services, Storage Services, Database Services, Application services, Content
delivery services, Analytics Services, Deployment and Management Services, Identity & and Access Management
services, Open Source Private Cloud software
UNIT – II Lecture Hrs:
Hadoop&MapReduce: Apache Hadoop, HadoopMapReduce Job Execution, Hadoop Schedulers, Hadoop Cluster
setup.
Cloud Application Design: Reference Architecture for Cloud Applications, Cloud Application Design Methodologies,
Data Storage Approaches.
Python Basics : Introduction, Installing Python, Python data Types & Data Structures, Control flow, Function,
Modules, Packages, File handling, Date/Time Operations, Classes.
UNIT – III Lecture Hrs:
Python for Cloud: Python for Amazon web services, Python for Google Cloud Platform, Python for windows Azure,
Python for MapReduce, Python packages of Interest, Python web Application Frame work, Designing a RESTful web
API.
Cloud Application Development in Python: Design Approaches, Image Processing APP,Document Storage App,
MapReduce App, Social Media Analytics App.
UNIT – IV Lecture Hrs:
Big Data Analytics: Introduction, Clustering Big Data, Classification of Big data, Recommendation of Systems.
Multimedia Cloud: Introduction, Case Study: Live video Streaming App, Streaming Protocols, case Study: Video
Transcoding App.
Cloud Application Benchmarking and Tuning: Introduction, Workload Characteristics, Application Performance
Metrics, Design Considerations for a Benchmarking Methodology, Benchmarking Tools, Deployment Prototyping,
Load Testing & Bottleneck Detection case Study, Hadoop benchmarking case Study.
UNIT – V
Cloud Security: Introduction, CSA Cloud Security Architecture, Authentication, Authorization, Identity & Access
Management, Data Security, Key Management, Auditing.
Cloud for Industry, Healthcare &Education:Cloud Computing for Healthcare, Cloud computing for
Energy Systems, Cloud Computing for Transportation Systems, Cloud Computing for Manufacturing Industry,
Cloud computing for Education.
Migrating into a Cloud: Introduction, Broad Approaches to migrating into the cloud, the seven –step model of
migration into a cloud.
Organizational readiness and Change Management in The Cloud Age :Introduction, Basic concepts of Organizational
Readiness, Drivers for changes : A frame work to comprehend the competitive environment , common change
management models, change management maturity models, Organizational readiness self – assessment.
Text Books:
1. Cloud computing A hands-on Approach‖ By ArshdeepBahga, Vijay Madisetti, Universities Press, 2016
2. Cloud Computing Principles and Paradigms: By Raj kumarBuyya, James Broberg, AndrzejGoscinski, wiley,
2016

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 1 | 78


UNIT-1
Introduction to Cloud Computing:
Cloud Computing is the delivery of computing services such as servers, storage, databases,
networking, software, analytics, intelligence, and more, over the Cloud (Internet).

Cloud Computing provides an alternative to the on-premises datacentre. With an on-premises


datacentre, we have to manage everything, such as purchasing and installing hardware, virtualization,
installing the operating system, and any other required applications, setting up the network, configuring
the firewall, and setting up storage for data. After doing all the set-up, we become responsible for
maintaining it through its entire lifecycle.
But if we choose Cloud Computing, a cloud vendor is responsible for the hardware purchase and
maintenance. They also provide a wide variety of software and platform as a service. We can take any
required services on rent. The cloud computing services will be charged based on usage.

The cloud environment provides an easily accessible online portal that makes handy for the user
to manage the compute, storage, network, and application resources. Some cloud service providers are
in the following figure.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 2 | 78


Types of Cloud Computing / (Deployment Models):
Deployment models define the type of access to the cloud, i.e., how the cloud is located? Cloud
can have any of the four types of access: Public, Private, Hybrid, and Community.

Cloud Computing Service Models:


Cloud computing is based on service models. These are categorized into three basic service models which
are -
1) Infrastructure-as–a-Service (IaaS)

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 3 | 78


2) Platform-as-a-Service (PaaS)
3) Software-as-a-Service (SaaS)

In short form:
Infrastructure-as-a-Service (IaaS)
IaaS provides access to fundamental resources such as physical machines, virtual machines,
virtual storage, etc.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 4 | 78


Platform-as-a-Service (PaaS)
PaaS provides the runtime environment for applications, development and deployment tools, etc.
Software-as-a-Service (SaaS)
SaaS model allows to use software applications as a service to end-users.

History of Cloud Computing:


The concept of Cloud Computing came into existence in the year 1950 with implementation of
mainframe computers, accessible via thin/static clients. Since then, cloud computing has been evolved
from static clients to dynamic ones and from software to services. The following diagram explains the
evolution of cloud computing:

Benefits /Advantages of cloud computing:


➢ Cost: It reduces the huge capital costs of buying hardware and software.
➢ Speed: Resources can be accessed in minutes, typically within a few clicks.
➢ Scalability: We can increase or decrease the requirement of resources according to the business
requirements.
➢ Productivity: While using cloud computing, we put less operational effort. We do not need to
apply patching, as well as no need to maintain hardware and software. So, in this way, the IT team
can be more productive and focus on achieving business goals.
➢ Reliability: Backup and recovery of data are less expensive and very fast for business continuity.
➢ Security: Many cloud vendors offer a broad set of policies, technologies, and controls that
strengthen our data security.
Cloud Computing has numerous advantages. Some of them are listed below -
• One can access applications as utilities, over the Internet.
• One can manipulate and configure the applications online at any time.
• It does not require to install a software to access or manipulate cloud application.
• Cloud Computing offers online development and deployment tools, programming runtime
environment through PaaS model.
• Cloud resources are available over the network in a manner that provide platform
independent access to any type of clients.
• Cloud Computing offers on-demand self-service. The resources can be used without
interaction with cloud service provider.
• Cloud Computing is highly cost effective because it operates at high efficiency with
optimum utilization. It just requires an Internet connection
• Cloud Computing offers load balancing that makes it more reliable.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 5 | 78


Risks related to Cloud Computing:
Although cloud Computing is a promising innovation with various benefits in the world of
computing, it comes with risks. Some of them are discussed below:
Security and Privacy
It is the biggest concern about cloud computing. Since data management and infrastructure
management in cloud is provided by third-party, it is always a risk to handover the sensitive information
to cloud service providers.
Although the cloud computing vendors ensure highly secured password protected accounts, any
sign of security breach may result in loss of customers and businesses.
Lock In
It is very difficult for the customers to switch from one Cloud Service Provider (CSP) to
another. It results in dependency on a particular CSP for service.
Isolation Failure
This risk involves the failure of isolation mechanism that separates storage, memory, and routing
between the different tenants.
Management Interface Compromise
In case of public cloud provider, the customer management interfaces are accessible through the
Internet.
Insecure or Incomplete Data Deletion
It is possible that the data requested for deletion may not get deleted. It happens because either of
the following reasons
• Extra copies of data are stored but are not available at the time of deletion
• Disk that stores data of multiple tenants is destroyed.

Characteristics of Cloud Computing:


There are four key characteristics of cloud computing. They are shown in the following diagram:

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 6 | 78


Cloud computing is becoming popular day by day. Continuous business expansion and growth
requires huge computational power and large-scale data storage systems. Cloud computing can help
organizations expand and securely move data from physical locations to the 'cloud' that can be accessed
anywhere.
Cloud computing has many features that make it one of the fastest growing industries at present.
The flexibility offered by cloud services in the form of their growing set of tools and technologies has
accelerated its deployment across industries. This blog will tell you about the essential features of cloud
computing.

1. Resources Pooling
Resource pooling is one of the essential features of cloud computing. Resource pooling means that a
cloud service provider can share resources among multiple clients, each providing a different set of
services according to their needs. It is a multi-client strategy that can be applied to data storage,
processing and bandwidth-delivered services. The administration process of allocating resources in real-
time does not conflict with the client's experience.
2. On-Demand Self-Service
It is one of the important and essential features of cloud computing. This enables the client to
continuously monitor server uptime, capabilities and allocated network storage. This is a fundamental
feature of cloud computing, and a customer can also control the computing capabilities according to their
needs.
3. Easy Maintenance
This is one of the best cloud features. Servers are easily maintained, and downtime is minimal or
sometimes zero. Cloud computing powered resources often undergo several updates to optimize their
capabilities and potential. Updates are more viable with devices and perform faster than previous
versions.
4. Scalability and Rapid Elasticity
A key feature and advantage of cloud computing is its rapid scalability. This cloud feature enables cost-
effective handling of workloads that require a large number of servers but only for a short period. Many
customers have workloads that can be run very cost-effectively due to the rapid scalability of cloud
computing.
5. Economical
This cloud feature helps in reducing the IT expenditure of the organizations. In cloud computing, clients
need to pay the administration for the space used by them. There is no cover-up or additional charges
that need to be paid. Administration is economical, and more often than not, some space is allocated for
free.
6. Measured and Reporting Service
Reporting Services is one of the many cloud features that make it the best choice for organizations. The
measurement and reporting service is helpful for both cloud providers and their customers. This enables
both the provider and the customer to monitor and report which services have been used and for what
purposes. It helps in monitoring billing and ensuring optimum utilization of resources.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 7 | 78


7. Security
Data security is one of the best features of cloud computing. Cloud services make a copy of the stored
data to prevent any kind of data loss. If one server loses data by any chance, the copied version is restored
from the other server. This feature comes in handy when multiple users are working on a particular file
in real-time, and one file suddenly gets corrupted.
8. Automation
Automation is an essential feature of cloud computing. The ability of cloud computing to automatically
install, configure and maintain a cloud service is known as automation in cloud computing. In simple
words, it is the process of making the most of the technology and minimizing the manual effort. However,
achieving automation in a cloud ecosystem is not that easy. This requires the installation and deployment
of virtual machines, servers, and large storage. On successful deployment, these resources also require
constant maintenance.
9. Resilience
Resilience in cloud computing means the ability of a service to quickly recover from any disruption. The
resilience of a cloud is measured by how fast its servers, databases and network systems restart and
recover from any loss or damage. Availability is another key feature of cloud computing. Since cloud
services can be accessed remotely, there are no geographic restrictions or limits on the use of cloud
resources.
10. Large Network Access
A big part of the cloud's characteristics is its ubiquity. The client can access cloud data or transfer data
to the cloud from any location with a device and internet connection. These capabilities are available
everywhere in the organization and are achieved with the help of internet. Cloud providers deliver that
large network access by monitoring and guaranteeing measurements that reflect how clients access cloud
resources and data: latency, access times, data throughput, and more.

Example for Cloud Computing Services Provider:


Azure Portal Overview
Azure portal is a platform where we can access and manage all our applications at one place. We
can build, manage, and monitor everything from simple web-apps to complex cloud applications using a
single console.
So, first of all, to log into the Azure portal, we need to register. And, if we are registering for the
first time, we will get 12 months of popular free services. And also, depending on the country, we will
get some amount of free credit that needs to be consumed within 30 days. And in addition to all these
things, we will get some services that are free forever.
So, make sure you are completely ready to try all the services before you register for Azure
because that credit is only available for 30 days.
Creating an Azure Account:
Step 1: Open https://azure.microsoft.com/en-us/free/
then click on Start free; it will redirect you to the next step.

Step 2: It will ask you to login with your Microsoft account. If you already have a Microsoft account,
you can fill the details and login. And if you don’t have one, you must signup first to proceed further.
Step 3: After logging in to your Microsoft Account. You will be redirected to the next page, as shown
below. Here you need to fill the required fields, and they will ask for your credit card number to verify
your identity and to keep out spam and bots. You won’t be charged unless you upgrade to paid
services.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 8 | 78


Cloud Based Services and Applications:
Here are some examples of Cloud computing applications:
Cloud Computing for Health Care:
Medical professionals can do diagnostics, host information, and analyze patients
remotely with the help of cloud computing. Cloud computing allows doctors to share
information quickly from anywhere. It also saves costs by allowing large data file transfers
instantly. This certainly increases efficiency.
Ultimately, cloud technology helps the medical team ensure patients receive the best
possible care without unnecessary delay. The condition of patients can also be updated in
seconds with the help of remote conferencing. Patients can access their own health
information from all of their care providers and store it in a personal health record (PHR).

Cloud Computing for Energy Systems:


Energy systems (such as smart grids, power plants, wind turbine farms, etc.) have thousands of
sensors that gather real-time maintenance data continuously for condition monitoring and failure
prediction purposes. These energy systems have a large number of critical components that must function
correctly so that the systems can perform their operations correctly.
Example: in systems such as power grids, real-time information is collected using specialized electrical
sensors called Phasor Measurement Units (PMU) at the substations. The information received from PMUs must
be monitored in real-time for estimating the state of the system and for predicting failures. Maintenance and repair
of such complex systems is not only expensive but also time consuming, therefore failures can cause huge losses
for the operators, and supply outage for consumers. But by storing all these data in cloud storage, processing and
analysis of machine maintenance data, collected from a large number of sensors embedded in machines, in a cloud
computing environment. By using this cloud local decisions will be taken with global information efficiently.

Cloud Computing in Education:


CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 9 | 78
Cloud computing is also useful in educational institutions for distance learning. It
offers various services for universities, colleges, professors, and teachers to reach thousands
of students all around the world. Companies like Google and Microsoft offer various services
free of charge to faculties, teachers, professors, and students from various learning
institutions. Various educational institutions across the world use these services to improve
their efficiency and productivity.

Government:
The U.S. military and government were early adopters of cloud computing. Their
Cloud incorporates social, mobile, and analytics technologies. Although, they must adhere to
strict compliance and security measures (FIPS, FISMA, and FedRAMP). This protects
against cyber threats both domestically and abroad.
Big data Analytics:
Cloud computing helps data scientists analyze various data patterns, insights for better
predictions and decision making. There are many open-source big data development and
analytics tools available like Cassandra, Hadoop, etc., for this purpose.
Communication:
Cloud computing provides network-based access to communication tools like emails
and social media. WhatsApp also uses a cloud-based infrastructure to facilitate user
communications. All the information is stored in the service provider’s hardware.
Business Process:
Nowadays, many business processes like emails, ERP, CRM, and document
management have become cloud-based services. SaaS has become the most vital method for
enterprises. Some examples of SaaS include Salesforce, HubSpot.
Facebook, Dropbox, and Gmail:
Cloud computing can be used for the storage of files. It helps you automatically
synchronize the files from different devices like desktop, tablet, mobile, etc. Dropbox allows
users to store and access files up to 2 GB for free. It also provides an easy backup feature.
Social Networking platforms like Facebook demand powerful hosting to manage and
store data in real-time. Cloud-based communication provides click-to-call facilities from
social networking sites and access to the instant messaging system.
Citizen Services: The cloud technology can be used for handling citizen services too. It is
widely used for storing, managing, updating citizen details, acknowledging forms, and even
verifying the current status of applications can be performed with the help of cloud
computing.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 10 | 78


Cloud Services Examples:
Examples In IaaS: Amazon EC2, Google Compute Engine, Azure VMs.
Amazon EC2(TM): Amazon Elastic Compute Cloud (EC2) is an Infrastructure as a service (IaaS)
offering from Amazon.com. EC2 is a web service that provides computing capacity in the form of virtual
machines that are launched in Amazon’s cloud computing environment.
This amazon EC2 allows users to launch instances on demand using a simple web-based interface.
Amazon provides pre-configured amazon Machine Images (AMIs) which are templates of cloud
instances.
Instances can be launched with a variety of operating systems. Users can load their applications
on running instances and rapidly and easily increase or decrease capacity to meet the dynamic application
performance requirements with EC2.
With EC2, users can even provision hundreds or thousands of server instances simultaneously,
manage network access permissions, and monitor usage resources through a web interface. Amazon EC2
provides instances of various computing capacities ranging from small instances to extra-large instances
Example: 1 virtual core with 1EC2 compute unit, 1.7GB memory and 160GB instance storage to
4 virtual cores with 2 EC2 compute units each, 15GB memory and 1690 GB instance storage.
Amazon EC2 model also provides, cluster graphical processor unit (GPU) instances and high
input/output instances. The pricing model for EC2 instance hours used for on-demand instances.

Google Compute Engine (GCE): is an IaaS offering from Google. GCE provides virtual machines of
various computing capacities ranging from small instances (e.g., 1 virtual core with 1.38 GCE unit and
1.7 GB memory) to high memory machine types (e.g., 8 virtual cores with 22 GCE units and 52GB
memory).

Windows Azure Virtual Machines: is an IaaS offering from Microsoft. Azure VMs provides virtual
machines of various computing capacities ranging from small instances (1 virtual core with 1.75GB
memory) to memory intensive machine types (8 virtual cores with 56GB memory).

Google App Engine (GAE) is a Platform-as-a-Service (PaaS) offering from Google. GAE™ is a cloud-
based web service for hosting web applications and storing data. GAE allows users to build scalable and
reliable applications that run on the same systems that power Google’s own applications. GAE provides
a software development kit (SDK) for developing web applications software that can be deployed on
GAE.
Developers can develop and test their applications with GAE SDK on a local machine and then
upload it to GAE with a simple click of a button. Applications hosted in GAE are easy to build, maintain
and scale. Users don't need to worry about launching additional computing instances when the application
load increases. GAE provides seamless scalability by launching additional instances when application
load increases. GAE provides automatic scaling and load balancing capability. GAE supports
applications written in several programming languages.
With GAEs Java runtime environment developers can build applications using Java programming
language and standard Java technologies such as Java Servlets. GAE also provides runtime environments
for Python and Go programming languages.
The pricing model for GAE is based on the amount of computing resources used. GAE provides
free computing resources for applications up to a certain limit. Beyond that limit, users are billed based
on the amount of computing resources used, such as amount bandwidth consumed, number of resources
instance hours for front-end and back-end instances, amount of stored data, channels, and recipients
emailed.

SaaS – Salesforce Cloud Services:


Salesforce Sales Cloud (TM): is a cloud-based customer relationship management (CRM) Software-as-
a-Service (SaaS) offering. Users can access CRM application from anywhere through internet-enabled
devices such as workstations, laptops, tablets and smartphones. Sales Cloud allows sales representatives
to manage customer profiles, track opportunities, optimize campaigns from lead to close and monitor the
impact of campaigns.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 11 | 78


Salesforce Service Cloud is a cloud-based customer service management SaaS. Service Cloud provides
companies a call-center like view and allows creating, tracking, routing and escalating cases. Service
Cloud can be fully integrated with a company’s call-center to customers. Service Cloud also provides
self service capabilities to customers.

Salesforce Marketing Cloud is based social marketing SaaS. Marketing cloud allows companies to
identify sales leads from social media, discover advocates, identify the most trending information on any
topic. Marketing cloud allows companies to pro-actively engage with customers, manage social listening,
create and deploy social content, manage and execute optimized social advertisement campaigns and
track the performance of social campaigns.

******
Cloud Concepts and Technologies:
Here we will learn about…
Virtualization
Load Balancing
Scalability & Elasticity
Deployment
Replication
Monitoring
MapReduce
Identity and Access Management
Service Level Agreements
Billing
Software Defined
Network function virtualization.
These are the key concepts and enabling technologies of cloud computing.

Virtualization:
Virtualization refers to the partitioning the resources of a physical system (such as computing,
storage, network and memory) into multiple virtual resources. Virtualization is the key enabling
technology of cloud computing and allows pooling of resources. In cloud computing, resources are
pooled to serve multiple users using multi-tenancy (multiple users using by rent).
Cloud allows the multiple users to be served by the same physical hardware. Users are assigned
virtual resources that run on top of the physical resources. Below diagram shows the architecture of
virtualization technology in cloud computing.

The physical resources such as computing, storage memory and network resources are virtualized.
The virtualization layer partitions the physical resources into multiple virtual machines. The
virtualization layer allows multiple operating system instances to run currently as virtual machines on
the same underlying physical resources.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 12 | 78


Hypervisor: A hypervisor is virtualization software that runs an operating system. The virtualization
layer consists of a hypervisor or a virtual machine monitor (VMM). The hypervisor presents a virtual
operating platform to a guest operating system (OS).
There are two types of hypervisors as shown in the following figures.
Type 1 hypervisor executes on bare system (empty system). LynxSecure, RTS Hypervisor, Oracle VM,
Sun xVM Server, VirtualLogic VLX are examples of Type 1 hypervisor. The following diagram shows
the Type 1 hypervisor. The type1 hypervisor does not have any host operating system because they are
installed on a bare system (empty system).

Type 2 hypervisor is a software interface that emulates the devices with


which a system normally interacts. Containers, KVM (Kernel-
based Virtual Machine), Microsoft Hyper V, VMWare Fusion, Virtual Server 2005 R2, Windows Virtual
PC and VMWare workstation 6.0 are examples of Type 2 hypervisor. The following diagram shows the
Type 2 hypervisor run directly on the host hardware and control the hardware and monitor the guest
operating systems.

Type 2 hypervisors or hosted hypervisors run on top of a conventional (main/host) operating system and
monitor the guest operating systems.
Guest OS: A guest OS is an operating system that is installed in a virtual machine in addition to
the host or main OS. In virtualization, the guest OS can be different from the host OS.
Various forms of virtualization approaches exist:
➢ Full Virtualization
➢ Para Virtualization
➢ Hardware Virtualization
Full Virtualization:
In full virtualization, the virtualization layer completely decouples (separates) the guest OS from
the underlying hardware. The guest OS requires no modification and is not aware that it is being
virtualized. Full virtualization is enabled by direct execution of user requests and binary translation of
OS requests.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 13 | 78


Para-Virtualization:
In para-virtualization, the guest OS is modified to enable communication with the hypervisor to
improve performance and efficiency. The guest OS kernel is modified to replace non-virtualizable
instructions with hyper calls that communicate directly with the virtualization layer hypervisor.

Hardware Virtualization:
Hardware assisted virtualization is enabled by hardware features such as Intel’s Virtualization
Technology (VT-x) and AMD’s AMD-V. In hardware assisted virtualization, privileged and sensitive
calls are set to automatically trap to the hypervisor. Thus, there is no need for either binary translation
or para-virtualization.
Load Balancing:
One of the important features of cloud computing is scalability. Cloud computing resources can
be scaled up on demand to meet the performance requirements of applications.
Load balancing distributes workloads across multiple servers to meet the application workloads.
The goals of load balancing techniques are to achieve maximum utilization of resources, minimizing the
response times, maximizing throughput.
Load balancing distributes the incoming user requests across multiple resources. With load
balancing, cloud-based applications can achieve high availability and reliability. Since multiple resources
under a load balancer are used to serve the user requests, in the event of failure of one or more of the
resources, the load balancer can automatically reroute the user traffic to the healthy resources.
The routing of user requests is determined based on a load balancing algorithm. Commonly used
load balancing algorithms are as follows:
Round Robin:
In round robin load balancing, the servers are selected one by one to serve the incoming requests
in a non-hierarchical circular fashion with no priority assigned to a specific server.
Weighted Round Robin:
In weighted round robin load balancing, servers are assigned some weights. The incoming
requests are proportionally routed using a static or dynamic ratio of respective weights.
Low Latency: (Low Delay)
In low latency load balancing the load balancer monitors the latency of each server. Each
incoming request is routed to the server which has the lowest latency.
Least Connections: In least connections load balancing, the incoming requests are routed to the server
with the least number of connections.
Priority:
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 14 | 78
In priority load balancing, each server is assigned a priority. The incoming traffic is routed to the
highest priority server as long as the server is available. When the highest priority server fails, the
incoming traffic is routed to a server with a lower priority.
Overflow:
Overflow load balancing is similar to priority load balancing. When the incoming requests to
highest priority server overflow, the requests are routed to a lower priority server.
Sticky Sessions:
In this approach all the requests belonging to a user session are routed to the same server. These
sessions are called sticky sessions. The benefit of this approach is that it makes session management
simple. Here drawback is; if a server fails, all the sessions belonging to that server are lost.
A session is a way to store information (in variables) to be used across multiple pages.
Unlike a cookie, the information is not stored on the user’s computer.

Session Database:
In this approach, all the session information is stored externally in a separate session database,
which is often replicated (replication means duplicate) to avoid a single point of failure. Though, this
approach involves additional overhead of storing the session information, however, unlike the sticky
session approach, this approach allows automatic failover.
Browser Cookies:
In this approach, the session information is stored on the client side in the form of browser
cookies. The benefit of this approach is that it makes the session management easy and has the least
amount of overhead for the load balancer.
URL re-writing:
In this approach, a URL re-write engine stores the session information by modifying the URLs
on the client side.
Though this approach avoids overhead on the load balancer, a drawback is that the amount of
session information that can be stored is limited. For applications that require larger amounts of session
information, this approach does not work.
Load balancing can be implemented in software or hardware. Software-based load balancers run
on standard operating systems, and like other cloud resources, load balancers are also virtualized.
Hardware-based load balancers implement load balancing algorithms in Application Specific
Integrated Circuits (ASICs).

Scalability & Elasticity:


Multi-tier applications such as e-Commerce, social networking, business-to-business, etc. can
experience rapid changes in their traffic. Each website has a different traffic pattern which is determined
by a number of factors generally hard to predict beforehand.
Modern web applications have multiple tiers of deployment with number of servers in each tier.
Capacity planning is an important task for such applications. Capacity planning involves determining the
right sizing of each tier of the deployment of application in terms of the number resources and the capacity
of each resource.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 15 | 78


Capacity planning may be for computing, storage, memory or network resources. Capacity
planning are based on demands for applications and account for worst case peak loads of applications. If
workloads of applications increase, the traditional approaches have been either to scale up (increase) or
scale out (decrease).
Scaling up involves upgrading the hardware resources (adding additional computing, memory,
storage or network resources).
Scaling out involves addition of more resources of the same type.
Scaling up and scaling out approaches are based on demand forecasts at regular intervals of time.
Deployment: Software deployment refers to the process of making the application work on a target
device.

Diagram: Cloud application deployment lifecycle


The above diagram shows the cloud application deployment lifecycle. Deployment prototyping
can help in making deployment architecture design choices. By comparing performance of alternative
deployment architectures, deployment prototyping can help in choosing the best and most effective
deployment architecture that can meet the application performance requirements. The following tables
shows popular cloud deployment management tools.

Examples of popular cloud deployment management tools.

Deployment Design:

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 16 | 78


In this step the application deployment is created with various tiers as specified in the deployment
configuration. The variables in this step include the number of servers in each tier, computing, memory
and storage capacities of servers, server interconnections, load balancing and replication strategies.
Deployment is created by providing the cloud resources as specified in the deployment
configuration. The process of resource providing and deployment creation is often automated and
involves a number of steps such as launching of server instances, configuration of servers, and
deployment of various tiers of the application on the servers.
Performance Evaluation:
Once the application is deployed in the cloud, the next step in the deployment lifecycle is to verify
whether the application meets the performance requirements with the deployment. This step involves
monitoring the workload on the application and measuring various workload parameters such as response
time and throughput. In addition to this, the utilization of servers (CPU, memory, disk, I/O, etc.) in each
tier is also monitored.
Deployment Refinement:
After evaluating the performance of the application, deployments are refined so that the
application can meet the performance requirements. Various alternatives are here, such as vertical scaling
(scaling up), horizontal scaling (scaling out), alternative server interconnections, alternative load
balancing and replication strategies, for instance.
Replication:
Replication is used to create and maintain multiple copies of the data in the cloud. Replication of
data is important for practical reasons such as business continuity and disaster recovery. In the event of
data loss at the primary location, organizations can continue to operate their applications from secondary
data sources. With real-time replication of data, organizations can achieve faster recovery from failures.
❖ Traditional business continuity and disaster recovery approaches don’t provide efficient, cost
effective and automated recovery of data.
❖ Cloud based data replication approaches provide replication of data in multiple locations,
automated recovery, low recovery point objective (RPO) and low recovery time objective (RTO).
Cloud enables rapid implementation of replication solutions for disaster recovery for small
and medium enterprises and large organizations. With cloud-based data replication organizations can
plan for disaster recovery without making any capital expenditures on purchasing, configuring and
managing secondary site locations. Cloud provides affordable replication solutions with pay-per-use /
pay-as-you-go pricing models.
There are three types of replication approaches as shown in the following diagram.
1. Array-based Replication
2. Network-based Replication
3. Host-based Replication.
Array-Based Replication:
Array-based replication is an approach to data backup in which compatible storage arrays use
built-in software to automatically copy data from one storage array to another.
Array-based replication software runs on one or more storage controller’s resident in disk storage
systems, synchronously or asynchronously replicating data between similar storage array models at the
logical unit number (LUN) or volume block level. The term can refer to the creation of local copies of
data within the same array as the source data, as well as the creation of remote copies in an array situated
off site.
Array based replication uses Network Attached Storage (NAS) or Storage Area Network (SAN)
to replicate.
A drawback of this array-based replication is that it requires similar arrays at local and remote
locations.
The cost for setting up array-based replication are higher than the other approaches.
Network-based Replication:
In the Network Based replication, the replication occurs at the network layer between the server
and the storage systems. By offloading replication from server and storage systems, network-based
replication can work across a large number of server platforms and storage systems, making it ideal for
highly heterogeneous environments. One of the most widely used Network based replication technique
is the Continuous Data Protection (CDP).

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 17 | 78


Continuous data protection (CDP) is a network-based replication solution that provides the
capability to restore data and VMs to any previous PIT (Point In Time). Traditional data protection
technologies offer a limited number of recovery points. If a data loss occurs, the system can be rolled
back only to the last available recovery point. Whereas CDP tracks all the changes to the production
volumes and maintains consistent point-in-time images.This makes the CDP to restore data to any
previous PIT.

Diagram of Replication approaches: (a) Array-based replication. (b) Network-based replication


(c) Host-based replication

Host-Based Replication:
Host-based Replication runs on standard servers and uses software to transfer data from a local
to remote location. It processes using servers to copy data from one site to another. Host-based replication
is conducted by software that resides on application servers and forwards data changes to another device.
The process is usually file-based and asynchronous: The software traps write input/output (I/O) and then
forward changes to replication targets.
The host acts the replication control mechanism. And agent is installed on the hosts that
communicates with the agents on the other hosts.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 18 | 78


To enable efficient and secure data copying, host-based replication software products include
capacities such as deduplication, compression, encryption, and throttling. Host-based replication can also
provide server and application failover capability to aid in disaster recovery.
Host-based replication with cloud-infrastructure provides affordable replication solutions. With
host-based replication, entire virtual machines can be replicated in real-time.

Monitoring:
Cloud resources can be monitored by monitoring services provided by the cloud service providers.
Monitoring services allow cloud users to collect and analyze the data on various monitoring metrics. The
following diagram shows a generic architecture for cloud monitoring service.

This monitoring service collects data on various system and application metrics from the cloud
computing instances. Monitoring services provide various pre-defined metrics. Users can also define
their custom metrics for monitoring the cloud resources. Users can define various actions based on the
monitoring data.
For example: auto-scaling a cloud deployment when the CPU usage of monitored resources
becomes high. Monitoring services also provide various statistics based on the monitoring data collected.
The following table shows the commonly used monitoring metrics for cloud computing resources.

Monitoring of cloud resources is important because it allows the users to keep track of the health
of applications and services deployed in the cloud.
Example: An organization which has its website hosted in the cloud can monitor the performance
of the website and also the website traffic.
With the monitoring data available at run-time users can make operational decisions such as
scaling up or scaling down cloud resources.

Software Defined Networking:

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 19 | 78


Software-Defined Networking (SDN) is a networking architecture that gives networks more
programmability and flexibility by separating the control plane from the data plane. The role of software
defined networks in cloud computing lets users respond quickly to changes. SDN management makes
network configuration more efficient and improves network performance and monitoring.
Conventional network architecture built with specialized hardware (switches, routers, etc.).
Network devices in conventional network architectures are getting exceedingly complex with the
increasing number of distributed protocols being implemented and the use of proprietary hardware and
interfaces. In this architecture control plane and data plane are coupled. Control plane is the part of the
network that carries the signaling and routing message traffic while the data plane is the part of the
network that carries the payload data traffic.
The limitations of the conventional network architectures are as follows:
Complex Network Devices: Conventional networks are getting increasingly complex with more
and more protocols being implemented to improve link speeds and reliability. Due to the complexity of
devices, making changes in the networks to meet the dynamic traffic patterns has become increasingly
difficult.
Management Overhead: Conventional networks involve significant management overhead.
Network mangers find it increasingly difficult to manage multiple network devices and interfaces from
multiple vendors.
Limited Scalability: The virtualization technologies used in cloud computing environments has
increased the number of virtual hosts requiring network access, which is becoming increasingly difficult
with conventional networks.
SDN attempts to create network architectures that are simpler, inexpensive, scalable, agile and
easy to manage.

SDN Architecture

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 20 | 78


SDN Layers
The above diagrams show the SDN architecture and the SDN Layers in which the control and
data planes are decoupled and the network controller is centralized.
Software-based SDN controllers maintain a unified view of the network and make configuration,
management and provisioning simpler. The underlying infrastructure in SDN uses simple packet
forwarding hardware. The underlying network infrastructure is abstracted from the applications. Network
devices become simple with SDN as they do not require implementations of a large number of protocols.
Network devices receive instructions from the SDN controller on how to forward the packets. These
devices can be simpler and cost less as they can be built from standard hardware and software
components.
The kye elements of SDN are as follows:
➢ Centralized Network Controller: With decoupled the control and data planes and centralized
network controller, the network administrators can configure the network.
➢ Programmable Open APIs: SDN architecture supports programmable open APIs for interface
between the SDN application and control layers. These open APIs that allow implementing
various network services such as routing, quality of service (QoS), access control etc.
➢ Standard Communication Interface (OpenFlow): SDN architecture uses a standard
communication interface between the control and infrastructure layers. OpenFlow, which is
defined by the Open Network Foundation (ONF) is the broadly accepted SDN protocol for the
Southbound interface.
Network Function Virtualization:
Network Function Virtualization (NFV) is a technology that leverages virtualization to
consolidate the heterogeneous network devices onto industry standard high-volume servers, switches and
storage.
NFV is complementary to SDN as NFV can provide the infrastructure on which SDN can run.
NFV and SDN are mutually beneficial to each other but not dependent.

NFV architecture

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 21 | 78


Network functions can be virtualized without SDN, similarly, SDN can run without NFV. The
above NFV architecture was standardized by the European Telecommunications Standards Institute
(ETSI). The Key elements of the NFV architecture are as follows;
➢ Virtualized Network Function (VNF): VNF is a software implementation of a network
function. It has a capability to running over the NFV Infrastructure (NFVI).
➢ NFV Infrastructure (NFVI): NFVI includes compute, network and storage resources that are
virtualized.
➢ NFV Management and Orchestration: NFV Management and Orchestration focuses on all
virtualization-specific management tasks and covers the orchestration and lifecycle management
of physical and/or software resources that support the infrastructure virtualization, and the
lifecycle management of VNFs.
NFV is applicable only to data plane and control plane functions in fixed and mobile networks.
MapReduce:
MapReduce is a parallel data processing model for processing and analysis of massive scale data.
MapReduce model has two phases: 1) Map and 2) Reduce.
MapReduce programs are written in a functional programming style to create Map and Reduce
functions. The input data to the map and reduce phases is in the form of key-value pairs. Run-time
systems for MapReduce are typically large clusters built of commodity (inexpensive) hardware.
The MapReduce run-time systems take care of tasks such partitioning the data, scheduling of jobs
and communication between nodes in the cluster. This makes it easier for programmers to analyze
massive scale data without worrying about tasks such as data partitioning and scheduling.

MapReduce workflow Diagram


From the above diagram: it shows the flow of data for a MapReduce job. MapReduce programs
take a set of input key-value pairs and produce a set of output key-value pairs. MapReduce programs take
advantage of locality of data and the data processing takes place on the nodes where the data resides.
In traditional approaches for data analysis, data is moved to the compute nodes which results in
significant of data transmission between the nodes in a cluster.
MapReduce programming model moves the computation to where the data resides. So that it
decreases the transmission of data and improving efficiency.
MapReduce programming model is well suited for parallel processing of massive scale data in
which the data analysis tasks reached by independent map and reduce operations.

Identity and Access Management:


Identity and Access Management (IDAM) for cloud describes the authentication and
authorization of users to provide secure access to cloud resources.
Organizations with multiple users can use IDAM services provided by the cloud service provider
for management of user identifiers and user permissions.
IDAM services allow organizations to centrally manage users, access permissions, security
credentials and access keys. IDAM services allow creation of user groups where all the users in a group
have the same access permissions. Identity and Access Management is enables by a number of

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 22 | 78


technologies such as OpenAuth, Role-based Access Control (RBAC), Digital Identities, Securities
Tokens, Identity Providers, etc.

OAuth Example
OAuth is an open standard for authorization, it allows resource owners to share their private
resources stored on one site with another site without handling out the credentials. In the OAuth model,
an application (which is not the resource owner) requests access to resources controlled by the resource
owner (but hosted by the server). The resource owner grants permission to access the resources in the
form of a token and matching shared-secret.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 23 | 78


RBAC is an approach for restricting access to authorized users. The following diagram Fig.2.21
shows an example of a typical RBAC framework. A user who wants to access cloud resources is required
to send his/her data to the system administrator who assigns permissions and access control policies
which are stored in the User Roles and Data Access Policies databases respectively.
Service Level Agreements:
A service level agreement (SLA) for cloud specifies the level of service that is formally defined
as a part of the service contract with the cloud service provider.
SLAs provide a level of service for each service which is specified in the form of minimum level
of service guaranteed and a target level. SLAs contain a number of performance metrics and the
corresponding service level objectives. The below table 2.5 lists the common criteria cloud SLAs.

Billing:
Cloud service providers offer a number of billing models described as follow.
Elastic Pricing: In elastic pricing or pay-as-you-use pricing model, the customers are charged based on
the usage of cloud resources. Cloud computing provides the benefit of provision resources on-demand.
These on-demand provisioning and elastic pricing models bring cost savings for customers. Elastic
pricing model suited for customers who consume cloud resources for short durations.
Fixed Pricing: In fixed pricing models, customers are charged a fixed amount per month for the cloud
resources. For example: fixed amount can be changed per month for running a virtual machine instance,
irrespective of the actual usage. Fixed pricing model is suited for customers who want to use cloud
resources for longer durations.
Spot Pricing: Spot pricing models offer variable pricing for cloud resources which is driven by market
demand. When the demand for cloud resources is high, the prices increase and when the demand is lower,
the prices decrease.
*****
Cloud Services and Platforms:
Here we will learn about various types of cloud computing services including:
1) Compute services,
2) storage,
3) database,
4) application,
5) content delivery,
6) analytics,
7) deployment
8) management and
9) identity & access management.
For each category of cloud services, examples of services provided by various cloud service
providers including Amazon, Google and Microsoft are described.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 24 | 78


Compute Services:
Compute services provide dynamically scalable compute capacity in the cloud. Compute
resources can be provisioned on-demand in the form of virtual machines. Virtual machines can be created
from standard images provided by the cloud service provider. (e.g. Ubuntu image, Windows server
image, etc.) or custom images created by the users.
A machine image is a template that contains a software configuration (operating system,
application server, and applications). Compute service can be accessed from the web consoles of these
services that provide graphical user interfaces for provisioning, managing, and monitoring these services.
Cloud service providers also provide APIs for various programming languages (such as Java, Python,
etc.)
Features:
➢ Scalable: Compute services allow rapidly provisioning as many virtual machine instances as
required. The provisioned capacity can be scaled-up or down based on the workload levels.
Auto scaling policies can be defined for compute services that are triggered when the
monitored metrics (CPU usage, memory usage, etc.) go above pre-defined thresholds.
➢ Flexible: Compute services give a wide range of options for virtual machines with multiple
instance types, operating systems, zones/regions, etc.
➢ Secure: Compute services provide various security features that control the access to the
virtual machine instances such as security groups, access control lists, network fire-walls, etc.
users can securely connect to the instances with SSH using authentication mechanisms such as
OAuth or security certificates and keypairs.
➢ Cost effective: Cloud service providers offer various billing options such as on-demand
instances which are billed per-hour, reserved instances which are reserved after one-time initial
payment, spot instances for which users can place bids, etc.
Amazon Elastic Compute Cloud:
Amazon Elastic Compute Cloud (EC2) is an Infrastructure as a service (IaaS) offering from
Amazon.com. EC2 is a web service that provides computing capacity in the form of virtual machines that
are launched in Amazon’s cloud computing environment.
This amazon EC2 allows users to launch instances on demand using a simple web-based interface.
Amazon provides pre-configured amazon Machine Images (AMIs) which are templates of cloud
instances.
Instances can be launched with a variety of operating systems. Users can load their applications
on running instances and rapidly and easily increase or decrease capacity to meet the dynamic application
performance requirements with EC2.
With EC2, users can even provision hundreds or thousands of server instances simultaneously,
manage network access permissions, and monitor usage resources through a web interface. Amazon EC2
provides instances of various computing capacities ranging from small instances to extra-large instances
Example: 1 virtual core with 1EC2 compute unit, 1.7GB memory and 160GB instance storage to 4 virtual
cores with 2 EC2 compute units each, 15GB memory and 1690 GB instance storage.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 25 | 78


Google Compute Engine (GCE): is an IaaS offering from Google. GCE provides virtual machines of
various computing capacities ranging from small instances (e.g., 1 virtual core with 1.38 GCE unit and
1.7 GB memory) to high memory machine types (e.g., 8 virtual cores with 22 GCE units and 52GB
memory).
Windows Azure Virtual Machines: is an IaaS offering from Microsoft. Azure VMs provides virtual
machines of various computing capacities ranging from small instances (1 virtual core with 1.75GB
memory) to memory intensive machine types (8 virtual cores with 56GB memory).
Storage Services:
Cloud storage services allow storage and retrieval of any amount of data, at any time from
anywhere on the web. Most cloud storage services organize data into buckets or containers. Buckets or
containers store objects which are individual pieces of data.
Features:
➢ Scalability: Cloud storage services provide high capacity and scalability. Objects up to several
tera-bytes in size can be uploaded and multiple buckets/containers can be created on cloud
storages.
➢ Replication: When an object is uploaded it is replicated at multiple facilities and/or on multiple
devices within each facility.
➢ Access Policies: Cloud storage services provide several security features such as Access Control
Lists (ACLs), bucket/container level policies, etc. ALCs can be used to selectively grant access
permissions on individual objects.
➢ Encryption: Cloud storage services provided Server-Side Encryption (SSE) options to encrypt all
data stored in the cloud storage.
➢ Consistency: Strong data consistency is provided for all upload and delete operations. Therefore,
any object that is uploaded can be immediately downloaded after the upload is complete.
Example-1: Amazon Simple Storage Service (S3) is an online cloud-based data storage infrastructure
for storing and retrieving any amount of data. S3 provide highly reliable, scalable, fast, fully
redundant and affordable storage infrastructure.
Example-2: Google Cloud Storage.
Database Services:
Cloud database services allow you to set-up and operate relational or non-relational databases in
the cloud. The benefit of using cloud database services is that it relieves the application developers from
the time-consuming database administration tasks.
Popular relational databases provided by various cloud service providers include MySQL, Oracle,
SQL Server, etc.
The non-relational (No-SQL) databases provided by cloud service providers are mostly
proprietary solutions. No-SQL databases are usually fully-managed and deliver seamless throughput and
scalability.
Features:
➢ Scalability: Cloud database services allow provisioning as much compute and storage resources
as required to meet the application workload levels. Provisioned capacity can be scaled-up or
down.
➢ Reliability: Cloud database services are reliable and provide automated backup and snapshot
options.
➢ Performance: Cloud database services provide guaranteed performance with options such as
guaranteed input/output operations per second (IOPS) which can be provisioned upfront.
➢ Security: Cloud database services provide several security features to restrict the access to the
database instances and stored data, such as network firewalls and authentication mechanisms.
Examples:
Ex.1.Amazon Relational Data Store; is a web service that makes it easy to setup, operate and scale
a relational database in the cloud. The console provides an instance launch wizard that allows you to
select the type of database to create (MySQL, Oracle or SQL Server) database instance size, allocated
storage, DB instance identifier, DB username and password.
Ex.2.Amazon DynamoDB is the non-relational (No-SQL) database from Amazon.
Ex.3.Google Cloud SQL is the relational database service from Google.
Ex.4.Windows Azure SQL Database is the relational database service from Microsoft.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 26 | 78
Application Services:
Here we will learn about various cloud application services such as application runtimes and
frameworks, queuing services, email services, notification services and media services.
Application Runtime & Frameworks:
Cloud-based application runtimes and frameworks allow developers to develop and host
applications in the cloud. Application runtimes provide support for programming languages (e.g., Java,
Python, or Ruby). Application runtimes automatically allocate resources for applications and handle the
application scaling, without the need to run and maintain servers.
Example: Google App Engine is the platform-as-a-service (PaaS) from Google, which
includes both an application runtime and web frameworks.
GAE features include: Runtimes, Sandbox (isolated from other applications), Web
Frameworks, Datastore, Authentication, URL Fetch service, Email service, Image Manipulation, Task
Queues, Scheduled Tasks services.
Queuing Services:
Cloud-based queuing services allow de-coupling application components. The de-coupled
components communicate via messaging queues. Queues are useful for asynchronous processing.
Another use of queues is to act as overflow buffers to handle temporary volume spikes or mismatches in
message generation and consumption rates from application components.
Queuing services from various cloud service providers allow short messages of a few kilo-bytes
in size. Messages can be enqueued and read from the queues simultaneously.
Example:
Ex1: Amazon Simple Queue Service: is a queuing service from amazon. SQS is a distributed
queue that supports messages of up to 256 KB in size. SQS supports multiple writers and readers and
locks messages while they are being processed.
Ex2: Google Task Queue Service: is a queuing service from Google and is a part of the Google
App Engine platform. Task queues allow applications to execute tasks in background. Task is a unit of
work to be performed by an application.
There are two different configurations for Task Queues – Push Queue and Pull Queue.
Push Queue is the default queue that processes tasks based on the processing rate configured in
the queue definition.
Pull Queues allow task consumers to lease a specific number of tasks for a specific duration.
The tasks are processed and deleted before the lease ends.
The tasks are processed and deleted before the lease ends.
Ex3: Windows Azure Queue Service: is a queuing service from Microsoft. Azure Queue service
allows storing large numbers of messages that can be accessed from anywhere in the world via
authenticated calls using HTTP or HTTPS. The size of the single message can be up to 64KB.
Email Services:
Cloud-based email services allow applications hosted in the cloud to send emails.
Ex.1: Amazon Simple Email Service: is bulk and transactional email-sending service from Amazon. SES
is an outbound-only email-sending service that allows applications hosted in the Amazon cloud to send
emails such as marketing emails, transactional emails and other types of correspondence.
Ex.1: Google Email Service: is part of the Google App Engine platform that allows App Engine
applications to send email messages on behalf of the app’s administrators, and on behalf of users with
Google Accounts. App Engine apps can also receive emails. Apps send messages using the Mail service
and receive messages in the form of HTTP requests initiated by App Engine and posted to the app.
Notification Services:
Cloud-based notification services or push messaging services allow applications to push messages
to internet connected smart devices such as smartphones, tablets, etc. Push messaging services are based
on publish-subscribe model. Example: if consumers subscribe to various topics/channels provided by a
publisher/ producer. Whenever new content is available on one of those topics/ channels, the notification
service pushes that information out to the consumer.
Amazon Simple Notification Service: is a push messaging service from Amazon. SNS has two
types of clients – publishers and subscribers. Publishers communicate by sending messages to the
subscribers related to topics. Subscribers are the consumers who subscribed to the topics. Subscribers are
the consumers who subscribe to topics to receive notifications.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 27 | 78
Google Cloud Messaging:
Google Cloud Messaging for Android provides push messaging for Android devices. GCM
allows applications to send data from the application servers to their users Android devices, and also to
receive messages from devices on the same connection.
Media Services:
Cloud service providers provide various types of media services that can be used by applications
for manipulating, transforming or transcoding media such as images, videos, etc.
Examples: (platforms)
Amazon Elastic Transcoder:
Amazon Elastic Transcoder is a cloud-based video transcoding service from Amazon. This AET can be
used to convert video files from their source format into various other formats. Elastic Transcoder
provides a number of pre-defined transcoding presets.
Google Images Manipulation Service:
Google Images Manipulation service is a part of the Google App Engine platform. Image
Manipulation service provides the capability to resize, crop, rotate, flip and enhance images. Image
Service accepts images in various formats including JPED, PNG, WEBP, GIF, BMP, TIFF and ICO
formats and can return transformed images in JPEG, WEBP and PNG formats.
Windows Azure Media Services:
It provides the various media services such as encoding & format conversion, content protection
and on-demand and live streaming capabilities. Azure Media Services provides applications the
capability to build media workflows for uploading, storing, encoding, format conversion, content
protection, and media delivery.
Content Delivery Services: (CDN)
Cloud-based content delivery service include Content Delivery Networks (CDNs). A CDN is a
distributed system of servers located across multiple geographic locations to serve content to end-users
with high availability and high performance. CDNs are useful for serving static content like text,
images, scripts, etc.
Other examples (platforms)
Amazon CloudFront
Windows Azure Content Delivery Network
Analytics Services:
Cloud-based analytics services allow analyzing massive data sets stored in the cloud either in
cloud storage or in cloud databases using programming models such as MapReduce. Using cloud
analytics services applications can perform data-intensive tasks such as data mining, log file analysis,
machine learning, web indexing, etc.
Amazon Elastic MapReduce
Is the MapReduce service from Amazon based Hadoop framework running on Amazon EC2
and Amazon S3. EMR supports various job types:
• Custom JAR: Custom JAR job flow runs a Java program that you have uploaded to
Amazon S3.
• Hive program: Hive is a data warehouse system for Hadoop. You can use Hive to process
data using the SQL-like language, called Hive-QL.
• Streaming job: Streaming job flow runs a single Hadoop job consisting of map and reduce
functions implemented in a script or binary that you have uploaded to Amazon S3.
• Pig programs: Apache Pig is a platform for analyzing large data sets that consists of a
high-level language (Pig Latin) for expressing data analysis programs.
Google MapReduce Service:
Google MapReduce Service is a part of the App Engine platform. App Engine MapReduce is
optimized for App Engine environment and provides capabilities such as automatic sharing for faster
execution, standard data input readers for iterating over blob and datastore data, standard output writers,
etc.
Google BigQuery:
Google BigQuery is a service for querying massive datasets. BigQuery allows querying datasets
using SQL-like queries. The BiqQuery queries are run against append-only tables and use the processing
power of Google’s infrastructure for speeding up queries.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 28 | 78
Deployment & Management Services:
Cloud-based deployment & management services allow you to easily deploy and manage
applications in the cloud. These services automatically handle deployment tasks such as capacity
provisioning, load balancing, auto-scaling, and application health monitoring.
Example platforms:
Amazon Elastic Beanstalk:
Amazon provides a deployment service called Elastic Beanstalk. It allows you to quickly deploy
and manage applications in the AWS cloud. Elastic Beanstalk supports Java, PHP, .NET, Node.js,
Python, and Ruby applications.
With Elastic Beanstalk you just need to upload the application and specify configuration settings
in a simple wizard and the service automatically handles instance provisioning, server configuration, load
balancing and monitoring.

Identity & Access Management Services:


These IDAM services allow managing the authentication and authorization of users to provide
secure access to cloud resources. IDAM services are useful for organizations which have multiple users
who access the cloud resources. Using IDAM services you can manage user identifiers, user permissions,
security credentials and access keys.
Examples:
Amazon Identity & Access Management:
AWS Identity and Access Management (IAM) allows you to manage users and user permissions
for an AWS account. With IAM you can manage users, security credentials such as access keys, and
permissions for AWS resources users’ access. IAM also allows you to control creation, rotation, and
revocation security credentials of users.
Windows Azure Active Directory:
Is an Identity & Access Management Service from Microsoft. Azure Active Directory provides a
cloud-based identity provider that easily integrates with your on-premises active directory deployments
and also provides support for third party identity providers.

*********
Open-Source Private Cloud Software:
The open-source cloud software can be used to build private clouds. Here we will learn about
popular public cloud platforms. Here we learning about 1) CloudStack 2) OpenStack.
CloudStack: Apache CloudStack is an open-source cloud software that can be used for creating private
cloud offerings. CloudStack manages the network, storage, and compute nodes to provide cloud
infrastructure. A CloudStack installation consists of a Management Server and the cloud infrastructure
that it manages.
The cloud infrastructure can be as simple as one host running the hypervisor or a large cluster of
hundreds of hosts. The management server allows you to configure and manage the cloud resources.
The management server manages one or more zones where each zone is typically a single
datacenter. Each zone has one or more pods. A pod is a rack of hardware comprising of a switch and one
or more clusters. A cluster consists of one or more hosts and a primary storage. A host is computing
node that runs guest virtual machines. The primary storage of a cluster stores the disk volumes for all the
virtual machines running on the hosts in that cluster. Each zone has a secondary storage that stores
templates, ISO images, and disk volume snapshots.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 29 | 78


Eucalyptus:
Eucalyptus is an open-source private cloud software for building private and hybrid clouds that
are compatible with Amazon Web Services (AWS) APIs. The following diagram shows the architecture
of Eucalyptus.

The Node Controller (NC) hosts the virtual machine instances and manages the virtual network
endpoints. The cluster-level (availability-zone) consists of three components –
1. Cluster Controller (CC),
2. Storage Controller (SC) and
3. VMWare Broker.
• The CC manages the virtual machines and is the front-end for a cluster.
• The SC manages the Eucalyptus block volumes and snapshots to the instances within its
specific cluster. SC is equivalent to AWS Elastic Block Store (EBS).
• The VMWare Broker is an optional component that provides an AWS-compatible
interface for VMware environments.
At the cloud-level there are two components: 1. Cloud Controller (CLC), 2. Walrus.
CLC provides an administrative interface for cloud management and performs high-level
resource scheduling, system accounting, authentication and quota management.
Walrus is equivalent to Amazon S3 and serves as a persistent storage to all of the virtual machines
in the Eucalyptus cloud.
OpenStack:
OpenStack is a cloud operating system consist of a collection of interacting services that control
computing, storage, and networking resources. The following diagram shows the architecture of
OpenStack.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 30 | 78


The OpenStack services:
The compute service (called nova-compute) manages networks of virtual machines running on nodes,
providing virtual servers on demand.
The network service (called nova-networking) provides connectivity between the interfaces of other
OpenStack services.
The volume service (Cinder) manages storage volumes for virtual machines.
The object storage service (swift) allows users to store and retrieve files.
The identity service (keystone) provides authentication and authorization for other services.
The image registry (glance) acts as a catalog and repository for virtual machine images.
The scheduler (nova-scheduler) maps the nova-API calls to the appropriate OpenStack components. The
scheduler takes the virtual machine requests from the queue and determines where they should run.
The messaging service (rabbit-mq) acts as a central node for message passing between daemons.
The Orchestration (balance) activities such as running an instance are performed by the nova-api which
accepts and responds to the end user compute API calls.
The dashboard (called horizon) provides web-based interface for managing OpenStack services.

***************
*****

UNIT-2
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 31 | 78
Hadoop & MapReduce:
Apache Hadoop The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.
What is Hadoop
Hadoop is an open-source framework from Apache and is used to store process and analyze data
which are very huge in volume. Hadoop is written in Java and is not OLAP (Online Analytical
Processing). It is used for batch/offline processing. It is being used by Facebook, Yahoo, Google, Twitter,
LinkedIn and many more. Moreover, it can be scaled up just by adding nodes in the cluster.
The Hadoop ecosystem consist of a number of projects (modules) as below.
Modules of Hadoop
1. HDFS: Hadoop Distributed File System. It states that the files will be broken into blocks and
stored in nodes over the distributed architecture. HDFS is a distributed file system, it runs on large
clusters and provides high throughput access to data. HDFS was built to reliably store very large
files across machines in a large cluster bult of different hardware.
HDFS stores each file as a sequence of blocks all of which are in the same size except the
last block. The blocks of each file are replicated on multiple machines in a cluster with a default
replication of 3 to provide fault tolerance.
2. Hadoop Yarn: Is a framework is used for job scheduling and manage the cluster.
3. Hadoop Map Reduce: This is a framework which helps Java programs to do the parallel
computation on data using key value pair. The Map task takes input data and converts it into a
data set which can be computed in Key value pair. The output of Map task is consumed by reduce
task and then the out of reducer gives the desired result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other Hadoop
modules. This module support other Hadoop modules.
5. HBase: HBase is an open source and sorted map data built on Hadoop. It is column oriented and
horizontally scalable. HBase is non-relational, distributed, column-oriented database that
provides structured data storage for large tables.
HBase is well suited for sparse data sets which are very common in big data use cases. Hbase
provides APIs enabling development in practically any programming language.
6. Zookeeper: Zookeeper is a high-performance distributed coordination service for maintaining
configuration information, naming, providing distributed synchronization and group services.
7. Pig: Pig is a data flow language and an execution environment for analyzing large datasets. Pig
compiler produces a sequence of MapReduce jobs that analyze data in HDFS using the Hadoop
MapReduce framework.
8. Hive: Hive is a distributed data warehouse infrastructure for Hadoop. Hive provides an SQL-like
language called HiveQL. HiveQL allows easy data summarization, ad-hoc querying, and analysis
of large datasets stored in HDFS.
9. Cassandra: Cassandra is scalable multi-master database with no single points of failure.
Casandra is designed to handle massive scale data spread across many servers and provides a
highly available service with no single point of failure. Casandra is a No-SQL solution that
provides a structured key-value store.
10. Flume: Flume is a distributed, reliable and available service for collecting, analyzing and moving
large amounts of data from applications to HDFS.

Hadoop Architecture:
The Hadoop architecture is a package of the file system, MapReduce engine and the HDFS
(Hadoop Distributed File System). The MapReduce engine can be MapReduce/MR1 or YARN/MR2.
A Hadoop cluster consists of a single master and multiple slave nodes. The master node includes
Job Tracker, Task Tracker, Name Node, and Data Node whereas the slave node includes Data Node and
Task Tracker.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 32 | 78


HDFS Layer (Hadoop Distributed File System):
The Hadoop Distributed File System (HDFS) is a distributed file system for Hadoop. It contains
a master/slave architecture. This architecture consist of a single NameNode performs the role of master,
and multiple DataNodes performs the role of a slave.
Both NameNode and DataNode are capable enough to run on commodity machines (widely
available machines). The Java language is used to develop HDFS. So, any machine that supports Java
language can easily run the NameNode and DataNode software.
NameNode
o It is a single master server exist in the HDFS cluster.
o As it is a single node, it may become the reason of single point failure.
o It manages the file system namespace by executing an operation like the opening, renaming and
closing the files.
o It simplifies the architecture of the system.
DataNode
o The HDFS cluster contains multiple DataNodes.
o Each DataNode contains multiple data blocks.
o These data blocks are used to store data.
o It is the responsibility of DataNode to read and write requests from the file system's clients.
o It performs block creation, deletion, and replication upon instruction from the NameNode.
Job Tracker
o The role of Job Tracker is to accept the MapReduce jobs from client and process the data by using
NameNode.
o In response, NameNode provides metadata to Job Tracker.
Task Tracker
o It works as a slave node for Job Tracker.
o It receives task and code from Job Tracker and applies that code on the file. This process can also
be called as a Mapper.
MapReduce Layer:
The MapReduce comes into existence when the client application submits the MapReduce job to
Job Tracker. In response, the Job Tracker sends the request to the appropriate Task Trackers. Sometimes,
the Task Tracker fails or time out. In such a case, that part of the job is rescheduled.

Advantages of Hadoop:
o Fast: In HDFS the data distributed over the cluster and are mapped which helps in faster retrieval.
Even the tools to process the data are often on the same servers, thus reducing the processing
time. It is able to process terabytes of data in minutes and Peta bytes in hours.
o Scalable: Hadoop cluster can be extended by just adding nodes in the cluster.
o Cost Effective: Hadoop is open source and uses commodity hardware to store data so it really
cost effective as compared to traditional relational database management system.
o Resilient to failure: HDFS has the property with which it can replicate data over the network, so
if one node is down or some other network failure happens, then Hadoop takes the other copy of
data and use it. Normally, data are replicated thrice but the replication factor is configurable.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 33 | 78
Hadoop MapReduce Job Execution:
MapReduce is a framework using which we can write applications to process huge amounts of
data, in parallel, on large clusters of commodity hardware in a reliable manner.

What is MapReduce?
MapReduce is a processing technique and a program model for distributed computing based on
java. The MapReduce algorithm contains two important tasks, namely Map and Reduce.
Map takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs).
Secondly, reduce task, which takes the output from a map as an input and combines those data
tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is
always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data processing over multiple
computing nodes. Under the MapReduce model, the data processing primitives are called mappers and
reducers. Decomposing a data processing application into mappers and reducers is sometimes nontrivial.
But, once we write an application in the MapReduce form, scaling the application to run over hundreds,
thousands, or even tens of thousands of machines in a cluster are merely a configuration change. This
simple scalability is what has attracted many programmers to use the MapReduce model.
The Algorithm for MapReduce:
• Generally, MapReduce paradigm is based on sending the computer to where the data resides!
• MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
o Map stage − The map or mapper’s job is to process the input data. Generally, the input data is in the
form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to
the mapper function line by line. The mapper processes the data and creates several small chunks of
data.
o Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The
Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new
set of output, which will be stored in the HDFS.
• During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.
• The framework manages all the details of data-passing such as issuing tasks, verifying task completion,
and copying data around the cluster between the nodes.
• Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
• After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result,
and sends it back to the Hadoop server.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 34 | 78


MapReduce is a programming model designed to process large amount of data in parallel by dividing the
job into several independent local tasks. Running the independent tasks locally reduces the network usage
drastically. To run the tasks locally, the data needs move to the data nodes for data processing.
The below tasks occur when the user submits a MapReduce job to Hadoop.
1. The local job client prepares the job for submission and hand it off to the Job Tracker.
2. The Job Tracker schedules the job and distributes the map work among the Tasl Trackers for
parallel processing.
3. Each Task Tracker issues a Map Task.
4. The Job Tracker receives progress information from the Task Trackers.
5. Once the mapping phase results available, the Job Tracker distributes the reduce work among the
Task Trackers for parallel processing.
6. Each Task Trackers for parallel processing.
7. The Job Tracker receives progress information from the Task Trackers.
8. Once the Reduce task completed, cleanup task will be performed.

The above diagram shows the Hadoop cluster comprises of a Master node, backup node and a number
of slave nodes. The master node runs the NameNode and JobTracker process and the Slave Nodes run
the DataNode and TaskTracker components of Hadoop. The backup node runs the secondary NameNode
process. The functions of the main processes of Hadoop are described as follows:
1) NameNode:
NameNode keeps the directory tree of all files in the file system, and tracks where across
the cluster the file data is kept. It does not store the data of these files itself. Client applications
talk to the NameNode whenever need to locate a file or when they want to add/copy/move/delete
a file.
The NameNode responds to the successful requests by returning a list of relevant
DataNode servers where the data lives. NameNode serves as both directory namespace manager
and ‘inode table’ for the Hadoop DFS. There is a single NameNode running in any DFS
deployment.
2) Secondary NameNode:
HDFS is not currently a high availability system. The NameNode is a Single Point of
Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline. An
optional Secondary NameNode which is hosted on a separate machine creates checkpoints of the
namespace.
3) JobTracker:
The JobTracker is the service within Hadoop that distributes MapReduce tasks to specific
nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.
4) TaskTracker:
TaskTracker is a node in a Hadoop cluster that accepts Map, Reduce and Shuffle tasks
from the JobTracker. Each TaskTracker has a defined number of slots which indicate the number
of tasks that it can accept.
When the JobTracker tries to find a TaskTracker to schedule a map or reduce task it first
looks for an empty slot on the same node that hosts the DataNode containing the data. If an empty
slot is not found on the same node, then the JobTracker looks for an empty slot on a node in the
same rack.
5) DataNode:
A DataNode stores data in an HDFS file system. A functional HDFS file system has more
than one DataNode, with data replicated across them. DataNodes connect to the NameNode on
startup.
DataNodes respond to request from the NameNode for filesystem operations. Client
applications can talk directly to a DataNode, once the NameNode has provided the location of the
data. Similarly, MapReduce operations assigned to TaskTracker instances near a DataNode, talk
directly to the DataNode to access the files.
TaskTracker instances can be deployed on the same servers that host DataNode instances,
so that MapReduce operations are performed close to the data.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 35 | 78


Steps of MapReduce Job Execution flow:
MapReduce processes the data in various phases with the help of different components. Let’s
discuss the steps of job execution in Hadoop.
1. Input Files
In input files data for MapReduce job is stored. In HDFS, input files reside. Input files format is arbitrary.
Line-based log files and binary format can also be used.
2. InputFormat
After that InputFormat defines how to split and read these input files. It selects the files or other objects
for input. InputFormat creates InputSplit.
3. InputSplits
It represents the data which will be processed by an individual Mapper. For each split, one map task is
created. Thus the number of map tasks is equal to the number of InputSplits. Framework divide split into
records, which mapper process.
4. RecordReader
It communicates with the inputSplit. And then converts the data into key-value pairs suitable for reading
by the Mapper. RecordReader by default uses TextInputFormat to convert data into a key-value pair.
It communicates to the InputSplit until the completion of file reading. It assigns byte offset to each line
present in the file. Then, these key-value pairs are further sent to the mapper for further processing.
5. Mapper
It processes input record produced by the RecordReader and generates intermediate key-value pairs. The
intermediate output is completely different from the input pair. The output of the mapper is the full
collection of key-value pairs.
Hadoop framework doesn’t store the output of mapper on HDFS. It doesn’t store, as data is temporary
and writing on HDFS will create unnecessary multiple copies. Then Mapper passes the output to the
combiner for further processing.
6. Combiner
Combiner is Mini-reducer which performs local aggregation on the mapper’s output. It minimizes the
data transfer between mapper and reducer. So, when the combiner functionality completes, framework
passes the output to the partitioner for further processing.
7. Partitioner
Partitioner comes into the existence if we are working with more than one reducer. It takes the output of
the combiner and performs partitioning.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 36 | 78


Partitioning of output takes place on the basis of the key in MapReduce. By hash function, key (or a
subset of the key) derives the partition.
On the basis of key value in MapReduce, partitioning of each combiner output takes place. And then the
record having the same key value goes into the same partition. After that, each partition is sent to a
reducer.
Partitioning in MapReduce execution allows even distribution of the map output over the reducer.
8. Shuffling and Sorting
After partitioning, the output is shuffled to the reduce node. The shuffling is the physical movement of
the data which is done over the network. As all the mappers finish and shuffle the output on the reducer
nodes.
Then framework merges this intermediate output and sort. This is then provided as input to reduce phase.
9. Reducer
Reducer then takes set of intermediate key-value pairs produced by the mappers as the input. After that
runs a reducer function on each of them to generate the output.
The output of the reducer is the final output. Then framework stores the output on HDFS.
10. RecordWriter
It writes these output key-value pair from the Reducer phase to the output files.
11. OutputFormat
OutputFormat defines the way how RecordReader writes these output key-value pairs in output files. So,
its instances provided by the Hadoop write files in HDFS. Thus OutputFormat instances write the final
output of reducer on HDFS.
**************************
Hadoop Schedulers:
Hadoop scheduler is a pluggable component. It gives the support to the different scheduling
algorithms. The default scheduler in Hadoop is FIFO (First In, First Out). In addition to this, two
advanced schedulers are also available. So total schedulers are as follows:
1) FIFO Scheduler (First in First Out scheduler)
2) Fair Scheduler
3) Capacity Scheduler
The Fair Scheduler, developed at Facebook, and the Capacity Scheduler, Developed at Yahoo.
The pluggable scheduler framework provides the flexibility to support a variety of workloads with
varying priority and performance constraints.
Efficient job scheduling makes Hadoop a multi-tasking system. It can process multiple data sets
for multiple jobs for multiple users simultaneously.
1) FIFO: FIFO is the default scheduler in Hadoop. It maintains a work queue in which the jobs are
queued. The scheduler pulls jobs in first in first out manner (means oldest job first) for scheduling.
There is no concept of priority or size of job in FIFO scheduler.
Advantage:
• No need for configuration
• First Come First Serve
• simple to execute
Disadvantage:
• Priority of task doesn’t matter, so high priority jobs need to wait
• Not suitable for shared cluster

2) Fair Scheduler:
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 37 | 78
The Fair Scheduler allocates resources evenly between multiple jobs and also provides
capacity guarantees. Fair Scheduler assigns resources to jobs such that each job gets an equal
share of the available resources on average over time. Unlike the FIFO scheduler, which forms
a queue of jobs, the Fair Scheduler lets short jobs finish in reasonable time without starving long
jobs. Tasks slots are assigned to the new jobs, so that each job gets roughly the same amount of
CPU time.
The Fair Scheduler maintains a set of pools into which jobs are placed. Each pool has a
guaranteed capacity. A configuration file is used for specifying the pools and the guaranteed
capacities. When there is a single job running, all the resources are assigned to that job. When
there are multiple jobs in the pools, each pool gets at least as many task slots as guaranteed.
When a pool does not require the guaranteed share the excess capacity is split between
other jobs. With this mechanism resources will be utilized efficiently.
All the pools have equal share by default. It is possible to provide more or less share to
a pool by specifying the share in the configuration file.

3) Capacity Scheduler:
The capacity scheduler has similar functionally as the Fair Scheduler but adopts a
different scheduling philosophy. In Capacity Scheduler, we need to define a number of named
queues. The Capacity Scheduler gives each queue its capacity when it contains jobs, and shares
any unused capacity between the queues. Within each queue FIFO scheduling with priority is
used.
Capacity Scheduler allows strict access control on queues. The access controls are
defined on a per-queue basis. Jobs are sorted based on when they are submitted and their
priorities.
Advantage:
• Best for working with Multiple clients or priority jobs in a Hadoop cluster
• Maximizes throughput in the Hadoop cluster
Disadvantage:
• More complex
• Not easy to configure for everyone

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 38 | 78


****
Python Basics: Introduction, Installing Python, Python data Types:
Python is a widely used general-purpose, high level programming language. It was created by Guido van
Rossum in 1991 and further developed by the Python Software Foundation. It was designed with an
emphasis on code readability, and its syntax allows programmers to express their concepts in fewer lines
of code.
Python is a programming language that lets you work quickly and integrate systems more efficiently.
Python programming language is suitable for cloud computing. Python 2.0 was released in the
year 2000 and Python 3.0 was released in the year 2008. The most recent release of Python is version
3.10.
The main characteristics of Python are:
Multi-paradigm programming language:
Python supports more than one programming paradigms including object-oriented programming
and structural programming.
Interpreted Language:
Python is an interpreted language and does not require an explicit compilation step. The Python
interpreter executes the program source code directly, statement by statement, as a processor or scripting
engine does.
Interactive Language:
Python provides an interactive mode in which the user can submit commands at the Python
prompt and interact with the interpreter directly.
The Key benefits of Python are:
Easy to Learn, read and maintain:
Python is a minimalistic language with relatively few keywords and has fewer syntactical
constructions as compared to other languages. Reading Python programs feels like English with pseudo-
code. Easy to learn, extremely powerful language for wide range of applications. Easy to maintain.
Object and Procedure Oriented:
Python supports both procedure-oriented programming and object-oriented programming.
Procedure oriented paradigm allows programs to be written around procedures or functions and allows
reuse of code.
Extendable:
Python is an extendable language and allows integration of low-level modules written in
languages such as C/C++.
Scalable: Due to the minimalistic nature of Python, it provides a manageable structure for large
programs.
Portable: Since Python is an interpreted language, programmers do not worry about compilation, linking
and loading of programs. Python programs can be directly executed from source code and copied from
one machine to other without worrying about portability.
The Python interpreter converts the source code to an intermediate form called byte codes and
then translates into the native language of specific system and then runs it.
Broad Library Support:

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 39 | 78


Python has a broad library support and works on various platforms such as Windows, Linux, Mac,
etc. There are a large number of Python packages available for various applications such as machine
learning, image processing, network programming, cryptography, etc.

Installing Python:
Python is a widely used high-level programming language. To write and execute code in python,
we first need to install Python on our system.
Installing Python on Windows takes a series of few easy steps.
Step 1 − Select Version of Python to Install
Python has various versions available with differences between the syntax and working of
different versions of the language. We need to choose the version which we want to use or need. There
are different versions of Python 2 and Python 3 available.
Step 2 − Download Python Executable Installer
On the web browser, in the official site of python (www.python.org), move to the Download for
Windows section.
All the available versions of Python will be listed. Select the version required by you and click
on Download. Let suppose, we chose the Python 3.9.1 version.
On clicking download, various available executable installers shall be visible with different
operating system specifications. Choose the installer which suits your system operating system and
download the instlaller. Let suppose, we select the Windows installer(64 bits).
The download size is less than 30MB.
Step 3 − Run Executable Installer
We downloaded the Python 3.9.1 Windows 64 bit installer.
Run the installer. Make sure to select both the checkboxes at the bottom and then click Install New.
Step 4 − Verify Python is installed on Windows
To ensure if Python is successfully installed on your system. Follow the given steps −
• Open the command prompt.
• Type ‘python’ and press enter.
• The version of the python which you have installed will be displayed if the python is
successfully installed on your windows.
Step 5 − Verify Pip was installed
PIP is a powerful package management system (Python Package Manager) for Python software
packages. Thus, make sure that you have it installed.
To verify if pip was installed, follow the given steps −
• Open the command prompt.
• Enter pip –V to check if pip was installed.
• The following output appears if pip is installed successfully.

We have successfully installed python and pip on our Windows system.


Python data Types & Data Structures:
Data Types:
Every programming language has built-in data types, including Python. Data types provide
information about the different kinds of data and variables.
A data type is a characteristic that tells the compiler (or interpreter) how a programmer intends
to use the data. There are two general categories of data types, differing whether the data is changeable
after definition:
1. Immutable. Data types that are not changeable after assignment.
2. Mutable. Data types that are changeable after assignment.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 40 | 78


Variables store different types of data. Creating a variable of a particular data type creates an
object of a data type class. The Python interpreter automatically assumes a type when creating a variable.

Numeric:
In Python, numeric data type represents the data which has numeric value. Numeric value can
be integer, floating number or even complex numbers. These values are defined
as int, float and complex class in Python.
• Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fraction or decimal). In Python there is no limit to how long an integer
value can be.
• Float – This value is represented by float class. It is a real number with floating point
representation. It is specified by a decimal point. Optionally, the character e or E followed
by a positive or negative integer may be appended to specify scientific notation.
• Complex Numbers – Complex number is represented by complex class. It is specified
as (real part) + (imaginary part)j. For example – 2+3j
Note – type( ) function is used to determine the type of data type.
Example Python Program:
# Python program to Output:
# demonstrate numeric value
a=5 Type of a: <class 'int'>
print("Type of a: ", type(a)) Type of b: <class 'float'>

b = 5.0 Type of c: <class 'complex'>


print("\nType of b: ", type(b))

c = 2 + 4j
print("\nType of c: ", type(c))
Sequence Type:
In Python, sequence is the ordered collection of similar or different data types. Sequences allows to
store multiple values in an organized and efficient fashion. There are several sequence types in
Python –
• String
• List
• Tuple
1) String
In Python, Strings are arrays of bytes representing Unicode characters. A string is a collection
of one or more characters put in a single quote, double-quote or triple quote. In python there is no
character data type, a character is a string of length one. It is represented by str class.
Creating String
Strings in Python can be created using single quotes or double quotes or even triple quotes.
# Python Program for
# Creation of String
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 41 | 78
# Creating a String
# With single Quotes
String1 = 'Welcome to Ashoka College'
print("String with the use of Single Quotes: ")
print(String1)
Output:
String with the use of Single Quotes:
Welcome to Ashoka College
List:
Lists are just like the arrays, declared in other languages which is an ordered collection
of data. It is very flexible as the items in a list do not need to be of the same type.
Creating List
Lists in Python can be created by just placing the sequence inside the square brackets [ ].
# Creating a List with
# the use of multiple values output:
List = ["Ashoka", "Women’s", "College"] List containing multiple values:
print("\nList containing multiple values:") Ashoka
print(List[0]) College
print(List[2])

Tuple
Just like list, tuple is also an ordered collection of Python objects. The only difference
between tuple and list is that tuples are immutable i.e. tuples cannot be modified after
it is created. It is represented by tuple class.
Creating Tuple
In Python, tuples are created by placing a sequence of values separated by
‘comma’ with or without the use of parentheses for grouping of the data sequence.
Tuples can contain any number of elements and of any datatype (like strings,
integers, list, etc.).
# Creating an empty tuple Output:
Tuple1 = ()
print("Initial empty Tuple: ") Initial empty Tuple:
print (Tuple1) ()
# Creating a Tuple with the use of
Strings Tuple with the use of String:
Tuple1 = (‘apple', 'banana') ('apple', 'banana')
print("\nTuple with the use of String:
") Tuple using List:
print(Tuple1) (1, 2, 4, 5, 6)

# Creating a Tuple with


# the use of list
list1 = [1, 2, 4, 5, 6]
Boolean :
print("\nTuple using List: ")
Data type with one of the two built-in values, True or False.
print(tuple(list1))
# Python program to
# demonstrate boolean type Output:
print(type(True)) <class 'bool'>
print(type(False)) <class 'bool'>
print(type(true))

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 42 | 78


Set
In Python, Set is an unordered collection of data type that is iterable, mutable and has no
duplicate elements. The order of elements in a set is undefined though it may consist of various
elements.
Creating Sets
Sets can be created by using the built-in set( ) function. Type of elements in a set need not be the same,
various mixed-up data type values can also be passed to the set.
# Creating a Set Output:
set1 = set() Initial blank Set:
print("Initial blank Set: ")
set()
print(set1)
Set with the use of String:
{'o', 'r', 'A', 'w', 'u', ' ', 'H', 'e', 'Y'}

Set with the use of List:


# Creating a Set with {'Apple', 'Banana', 'Orange'}
# the use of a String
set1 = set("How Are You")
Dictionaries:
print("\nSet with the use of String: ")
print(set1)
Dictionary is a mapping data type or a kind of hash table maps keys to values. Keys in a dictionary
can be of any data type, through numbers and strings are commonly used for keys. Values in a dictionary
can
# be any data type
Creating or object.
a Set with
# theDictionary
use of isaanList
unordered set of a key-value pair of items. It is like an associative array or a hash
table
set1where each key stores a specific
= set(["Apple", value. "Orange"])
"Banana", Key can hold any primitive data type, whereas value is an
arbitrary Python object.
print("\nSet with the use of List: ")
(Hash Table is a data structure which stores data in an associative manner. In a hash table,
print(set1)
data is stored in an array format, where each data value has its own unique index value.)
# Creating a Dictionary
# with Integer Keys Output:
Dic = {1: 'Ashoka', 2: 'College', 3: 'Kurnool'} Dictionary with the use of Integer Keys:
{1: 'Ashoka', 2: 'College', 3: 'Kurnool'}
print("\nDictionary with the use of Integer Keys: ")
print(Dic) Out[8]: dict
type(Dic)

Key Value

Type Conversions:
Type Conversion. The process of converting the value of one data type (integer, string,
float, etc.) to another data type is called type conversion.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 43 | 78


Control Flow Statements:
Control flow statements in python are similar to other previous programming languages. Control flow
statements are used to control the flow of execution of the program or instructions.
if statement:
The if statement in Python is similar to the if statement in other languages. See the
following example:
Example-1:
num = 100
if num < 200:
print("num is less than 200")
Output:
num is less than 200

Example-2:
#!/usr/bin/env python3
>>> x = 3
>>> if x < 10:
... print('x below ten')

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 44 | 78


...
x below ten
>>> if x > 10:
... print('x is greater than ten')
...
>>> if x > 1 and x < 4:
... print('x is in range')
...
x is in range
>>>
Indentation and Blocks:
An if statement doesn’t need to have a single statement, it can have a block. A block is more than one
statement.

The example below shows a code block with 3 statements (print). A block is seen by Python as a single
entity, that means that if the condition is true, the whole block is executed (every statement).

#!/usr/bin/env python3
x=4
if x < 5:
print("x is smaller than five")
print("this means it's not equal to five either")
print("x is an integer")
All programming languages can create blocks, but Python has a unique way of doing it. A block is
defined only by its indention.
If-Else
You can use if statements to make an interactive program. Copy the program below and run
it.
It has several if statements, that are evaluated based on the keyboard input.
Because keyboard input is used, we use the equality sign (==) for string comparison.
The second string is typed, but we need a number. You can convert the string to an integer
using int().
It also makes use of the else keyword, this is the other evaluation case. When comparing
age (age < 5) the else means (>= 5), the opposite.
#!/usr/bin/env python3

gender = input("Gender? ")


gender = gender.lower()
if gender == "male":
print("Your cat is male")
elif gender == "female":
print("Your cat is female")
else:
print("Invalid input")

age = int(input("Age of your cat? "))


if age < 5:
print("Your cat is young.")
else:
print("Your cat is adult.")

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 45 | 78


Elif
If you want to evaluate several cases, you can use the elif clause. elif is short for else if.
Unlike else with elif you can add an expression.
That way instead of writing if over and over again, you can evaluate all cases quickly.
>>> x = 3
>>> if x == 2:
... print('two')
... elif x == 3:
... print('three')
... elif x == 4:
... print('four')
... else:
... print('something else')
...
three
>>>
Looping Statements:
A loop is a used for iterating over a set of statements repeatedly. In Python we have three types
of loops for, while and do-while.
While all the ways provide similar basic functionality, they differ in their syntax and
condition checking time.
1. While Loop:
2. In python, while loop is used to execute a block of statements repeatedly until a given
condition is satisfied. And when the condition becomes false, the line immediately after
the loop in the program is executed.
Syntax :
while expression:
statement(s)
3. All the statements indented by the same number of character spaces after a programming
construct are considered to be part of a single block of code. Python uses indentation as its method of
grouping statements.
Example:
# Python program to illustrate
# while loop
count = 0
while (count < 3):
count = count + 1
print("Hello Geek")
Output:
Hello Geek
Hello Geek
Hello Geek
We can also use else in while loop:
#Python program to illustrate
# combining else with while
count = 0
while (count < 3):
count = count + 1
print("Hello Geek")

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 46 | 78


else:
print("In Else Block")

Output:
Hello Geek
Hello Geek
Hello Geek
In Else Block
for in Loop: For loops are used for sequential traversal. For example: traversing a list or string or array
etc. In Python, there is no C style for loop, i.e., for (i=0; i<n; i++). There is “for in” loop which is
similar to for each loop in other languages. Let us learn how to use for in loop for sequential traversals.
Syntax:
for iterator_var in sequence:
statements(s)
It can be used to iterate over a range and iterators.
# Python program to illustrate
# Iterating over range 0 to n-1
n=4
for i in range(0, n):
print(i)
Output :
0
1
2
3
Nested For Loop:
# Python program to illustrate
# nested for loops in Python
from __future__ import print_function
for i in range(1, 5):
for j in range(i):
print(i, end=' ')
print()
Output:
1
22
333
4444
Python doesn't have do-while loop. But we can create a program like this.
The do while loop is used to check condition after executing the statement. It is like while loop but it is
executed at least once.
---------------------------------------------------------------
Function in Python:
In Python, a function is a group of related statements that performs a specific task.
Functions help break our program into smaller and modular chunks. As our program grows larger and
larger, functions make it more organized and manageable.
Furthermore, it avoids repetition and makes the code reusable.
Syntax of Function

def function_name(parameters):
"""docstring"""
statement(s)
Above shown is a function definition that consists of the following components.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 47 | 78


1. Keyword def that marks the start of the function header.
2. A function name to uniquely identify the function. Function naming follows the same rules of
writing identifiers in Python.
3. Parameters (arguments) through which we pass values to a function. They are optional.
4. A colon (:) to mark the end of the function header.
5. Optional documentation string (docstring) to describe what the function does.
6. One or more valid python statements that make up the function body. Statements must have the
same indentation level (usually 4 spaces).
7. An optional return statement to return a value from the function.
Example of a function
def greet(name):
"""
This function greets to
the person passed in as
a parameter
"""
print("Hello, " + name + ". Good morning!")
How to call a function in python?
Once we have defined a function, we can call it from another function, program, or even the Python
prompt. To call a function we simply type the function name with appropriate parameters.
>>> greet('Paul')
Hello, Paul. Good morning!

Try running the above code in the Python program with the function definition to see the output.

def greet(name):
"""
This function greets to
the person passed in as
a parameter
"""
print("Hello, " + name + ". Good morning!")

greet('Paul')
Note: In python, the function definition should always be present before the
function call. Otherwise, we will get an error.
Python Modules:
A Python module is a file containing Python definitions and statements. A module can define
functions, classes, and variables. A module can also include runnable code. Grouping related code into
a module makes the code easier to understand and use. It also makes the code logically organized.
Example: create a simple module
# A simple module, calc.py
def add(x, y):
return (x+y)
def subtract(x, y):
return (x-y)
Import Module in Python – Import statement
We can import the functions, classes defined in a module to another module using the import
statement in some other Python source file.
Syntax:
import module
When the interpreter encounters an import statement, it imports the module if the module is present in
the search path. A search path is a list of directories that the interpreter searches for importing a module.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 48 | 78


Note: This does not import the functions or classes directly instead imports the module only. To access
the functions inside the module the dot(.) operator is used.
Example: Importing modules in Python
# importing module calc.py
import calc
print(calc.add(10, 2))
Output:
12
# importing sqrt() and factorial from the
# module math
from math import sqrt, factorial
# if we simply do "import math", then
# math.sqrt(16) and math.factorial()
# are required.
print(sqrt(16))
print(factorial(6))
Output:
4.0
720

A module can be considered as several functionalities that would be executed if you include the
module in your application. You can create your module, save it and use it in another application as well.
Modules have .py extension and can be saved externally, independent of the application.
So basically, a module is a file that has a bunch of functions defined in it that can be imported as
a whole file into any application.
Modules increases the reusability of the code, as well as the scalability. That’s why they are
considered essential to programming.
A function is a block of organized, reusable code that is used to perform a single, related
action. There are two types of functions, user-defined and built-in functions. Built-in functions are
provided by python to help in coding like print( ), input( ), etc.
The difference between function vs module in Python is that a function is more specific to a
task, to fulfill a functionality while a module defines classes, functions, attributes, etc.

Create a Module
To create a module just save the code you want in a file with the file extension .py:
Example
Save this code in a file named mymodule.py
def greeting(name):
print("Hello, " + name)
Use a Module
Now we can use the module we just created, by using the import statement:
Example
Import the module named mymodule, and call the greeting function:
import mymodule

mymodule.greeting("Jonathan")
Note: When using a function from a module, use the syntax: module_name.function_name.

Variables in Module
The module can contain functions, as already described, but also variables of all types (arrays,
dictionaries, objects etc):

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 49 | 78


Example
Save this code in the file mymodule.py
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example
Import the module named mymodule, and access the person1 dictionary:
import mymodule

a = mymodule.person1["age"]
print(a)

Naming a Module
You can name the module file whatever you like, but it must have the file extension .py
Re-naming a Module
You can create an alias when you import a module, by using the as keyword:
Example
Create an alias for mymodule called mx:
import mymodule as mx

a = mx.person1["age"]
print(a)

Built-in Modules
There are several built-in modules in Python, which you can import whenever you like.
Example
Import and use the platform module:
import platform

x = platform.system()
print(x)
Packages in Python:
We organize a large number of files in different folders and subfolders based on some criteria,
so that we can find and manage them easily.
In the same way, a package in Python takes the concept of the modular approach to next logical
level. As you know, a module can contain multiple objects, such as classes, functions, etc. A package
can contain one or more relevant modules. Physically, a package is a folder containing one or more
module files.
Let's create a package named mypackage, using the following steps:
• Create a new folder named D:\MyApp.
• Inside MyApp, create a subfolder with the name 'mypackage'.
• Create an empty __init__.py file in the mypackage folder.
• Using a Python-aware editor like IDLE, create modules greet.py and functions.py with the
following code:
File name: greet.py
def SayHello(name):
print("Hello ", name)

File name:functions.py
def sum(x,y):
return x+y

def average(x,y):

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 50 | 78


return (x+y)/2

def power(x,y):
return x**y

That's it. We have created our package called mypackage. The following is a folder structure:

Package Folder Structure


Importing a Module from a Package:
Now, to test our package, navigate the command prompt to the MyApp folder and invoke the Python
prompt from there.

D:\MyApp>python

Import the functions module from the mypackage package and call its power() function.

>>> from mypackage import functions

>>> functions.power(3,2)

It is also possible to import specific functions from a module in the package.


>>> from mypackage.functions import sum
>>> sum(10,20)
30
>>> average(10,12)
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
NameError: name 'average' is not defined

__init__.py
The package folder contains a special file called __init__.py, which stores the package's content. It
serves two purposes:
1. The Python interpreter recognizes a folder as the package if it contains __init__.py file.
2. __init__.py exposes specified resources from its modules to be imported.
An empty __init__.py file makes all functions from the above modules available when this package
is imported. Note that __init__.py is essential for the folder to be recognized by Python as a package.
You can optionally define functions from individual modules to be made available.

File Handling in Python:


Python supports file handling and allows users to handle files i.e., to read and write files,
along with many other file handling options, to operate on files.
Python treats files differently as text or binary and this is important. Each line of code
includes a sequence of characters and they form a text file. Each line of a file is termi nated with a

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 51 | 78


special character, called the EOL or End of Line characters like comma {,} or newline character. It
ends the current line and tells the interpreter a new one has begun. Let’s start with the reading and
writing files.
Working of open( ) function
Before performing any operation on the file like reading or writing, first, we have to open that
file. For this, we should use Python’s inbuilt function open( ) but at the time of opening, we have to
specify the mode, which represents the purpose of the opening file.
f = open(filename, mode)
Where the following mode is supported:
1. r: open an existing file for a read operation.
2. w: open an existing file for a write operation. If the file already contains some data then it
will be overridden.
3. a: open an existing file for append operation. It won’t override existing data.
4. r+: To read and write data into the file. The previous data in the file will be overridden.
(r+ does not delete the content and does not create a new file if such file doesn’t exist)
5. w+: To write and read data. It will override existing data. (read and write purpose it
deletes the content of the file and creates it if it doesn’t exist)
6. a+: To append and read data from the file. It won’t override existing data.
Take a look at the below example:
# a file named "geek", will be opened with the reading mode.
file = open(abc.txt', 'r')
# This will print every line one by one in the file
for each in file:
print (each)
The open command will open the file in the read mode and the for loop will print each line
present in the file.
Working of read() mode
There is more than one way to read a file in Python. If you need to extract a string that contains all
characters in the file then we can use file.read( ). The full code would work like this:
• Python3

# Python code to illustrate read() mode


file = open("file.txt", "r")
print (file.read())

Another way to read a file is to call a certain number of characters like in the following code
the interpreter will read the first five characters of stored data and return it as a string:
# Python code to illustrate read() mode character wise
file = open("file.txt", "r")
print (file.read(5))
Creating a file using write() mode
Let’s see how to create a file and how to write mode works, so in order to manipulate the file, write
the following in your Python environment:
• Python3

# Python code to create a file


file = open('geek.txt','w')
file.write("This is the write command")
file.write("It allows us to write in a particular file")
file.close()

The close() command terminates all the resources in use and frees the system of this particular
program.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 52 | 78
Example program:
f=open("poem.txt")
d=f.read()
d=d.replace("the","them")
f.close()
f=open("poem.txt","w")
f.write(d)
f.close()
(Before executing this, you have to create a text file with file name poem.txt. Then type some text with
some the words. Then execute the above program, all the words will be replaced with them)
Another example program:
def program2():
f = open("MyFile.txt","w")
line1=input("Enter the text:")
line2=input("Enter the text:")
line3=input("Enter the text:")
new_line="\n"
f.write(line1)
f.write(new_line)
f.write(line2)
f.write(new_line)
f.write(line3)
f.write(new_line)
f.close()
program2()
Output:
Enter the text: Hi welcome to Python
Enter the text: We will learn Python programming in ashoka college
Enter the text: It is very easy to learn
Next you go to open the MyFile.txt file, it contains the above text which you
Entered.

Like this we can handle the files as per our requirement by using Python code.

Date/Time Operations in Python:


Python has an in-built module named DateTime to deal with dates and times in numerous
ways. In this article, we are going to see basic DateTime operations in Python.
There are six main object classes with their respective components in the datetime module mentioned
below:
1. datetime.date
2. datetime.time
3. datetime.datetime
4. datetime.tzinfo
5. datetime.timedelta
6. datetime.timezone
Now we will see the program for each of the functions under datetime module mentioned above.
datetime.date():
We can generate date objects from the date class. A date object represents a date having a year,
month, and day.
Syntax: datetime.date( year, month, day)

strftime to print day, month, and year in various formats. Here are some of them are:
current.strftime(“%m/%d/%y”) that prints in month(Numeric)/date/year format
current.strftime(“%b-%d-%Y”) that prints in month(abbreviation)-date-year format
current.strftime(“%d/%m/%Y”) that prints in date/month/year format

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 53 | 78


current.strftime(“%B %d, %Y”) that prints in month(words) date, year format

Example program:
from datetime import date

# You can create a date object containing


# the current date
# by using a classmethod named today()
current = date.today()

# print current year, month, and year individually


print("Current Day is :", current.day)
print("Current Month is :", current.month)
print("Current Year is :", current.year)

# strftime() creates string representing date in


# various formats
print("\n")
print("Let's print date, month and year in different-different ways")
format1 = current.strftime("%m/%d/%y")
Output:
Current Day is : 28
# prints in month/date/year format Current Month is : 8
print("format1 =", format1) Current Year is : 2022

format2 = current.strftime("%b-%d-%Y")
# prints in month(abbreviation)-date-year format Let's print date, month and yea
r in different-different ways
print("format2 =", format2) format1 = 08/28/22
format2 = Aug-28-2022
format3 = current.strftime("%d/%m/%Y") format3 = 28/08/2022
format4 = August 28, 2022
# prints in date/month/year format
print("format3 =", format3)

format4 = current.strftime("%B %d, %Y")

# prints in month(words) date, year format


print("format4 =", format4)

datetime.time():
A time object generated from the time class represents the local time.
Components:
• hour
• minute
• second
• microsecond
• tzinfo
Syntax: datetime.time(hour, minute, second, microsecond)
Example programm:
from datetime import time

# time() takes hour, minutes, second,


# microsecond respectively in order
# if no parameter is passed in time() by default
# it takes 0

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 54 | 78


defaultTime = time()

print("default_hour =", defaultTime.hour)


print("default_minute =", defaultTime.minute)
print("default_second =", defaultTime.second) Output:
print("default_microsecond =", defaultTime.microsecond) default_hour = 0
default_minute = 0
# passing parameter in different-different ways default_second = 0
# hour, minute and second respectively is a default default_microsecond = 0
# order time_1 = 10:05:25
time1= time(10, 5, 25) time_2 = 10:05:25
print("time_1 =", time1) time_3 = 10:05:25.000055

# assigning hour, minute and second to respective


# variables
time2= time(hour = 10, minute = 5, second = 25)
print("time_2 =", time2)

# assigning hour, minute, second and microsecond to


# respective variables
time3= time(hour=10, minute= 5, second=25, microsecond=55)
print("time_3 =", time3)

Similarly, we can implement date and time.


datetime.datetime():
datetime.datetime() module shows the combination of a date and a time.
Components:
• year
• month
• day
• hour
• minute
• second,
• microsecond
• tzinfo
Syntax: datetime.datetime( year, month, day )
or
datetime.datetime(year, month, day, hour, minute, second, microsecond)
Current date and time using the strftime() method in different ways:
• strftime(“%d”) gives current day
• strftime(“%m”) gives current month
• strftime(“%Y”) gives current year
• strftime(“%H:%M:%S”) gives current time in an hour, minute, and second
format
• strftime(“%m/%d/%Y, %H:%M:%S”) gives date and time together

Example 1: Get Current Date and Time


import datetime Output:
2022-08-28 19:06:02.457389
datetime_object = datetime.datetime.now()
print(datetime_object)

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 55 | 78


Here, we have imported datetime module using import datetime statement. One of the classes
defined in the datetime module is datetime class. We then used now() method to create
a datetime object containing the current local date and time.
Example 2: Get Current Date
import datetime
Output:
2022-08-28
date_object = datetime.date.today()
print(date_object)

Python Classes and Objects:


A class is a user-defined blueprint or prototype from which objects are created.
Classes are used to bind the data and functionality together. Creating a new class
creates a new type of object, allowing new instances of that type to be made.
Each class instance can have attributes attached to it for maintaining its state.
Class instances can also have methods (defined by their class) for modifying their
state.
To understand the need for creating a class in Python let’s consider an example.

Example
Create a class named Person, use the __init__() function to assign values for name
and age:

class Person:
def __init__(self, name, age): Output:
self.name = name John
self.age = age 36
p1 = Person("John", 36)
print(p1.name)
print(p1.age)

Note: The __init__() function is called automatically every time the class is being
used to create a new object.
Self is used to represent the instance of the class. With this keyword, you can access the attributes
and methods of the class in python. It binds the attributes with the given arguments. self is used in
different places and often thought to be a keyword. But unlike in C++, self is not a keyword in Python.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 56 | 78


You can read Cloud Computing pdf book file.

UNIT-3
Python for Cloud: Python for Amazon web services, Python for Google Cloud Platform, Python for
windows Azure, Python for MapReduce, Python packages of Interest, Python web Application Frame
work, Designing a RESTful web API.
Cloud Application Development in Python: Design Approaches, Image Processing APP, Document
Storage App, MapReduce App, Social Media Analytics App.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 57 | 78


Google Cloud SQL:
Fully managed relational database service for MySQL, PostgreSQL, and SQL Server with rich
extension collections, configuration flags, and developer ecosystems.
New customers get $300 in free credits to spend on Cloud SQL. You won’t be charged until you
upgrade.
• Reduce maintenance cost with fully managed MySQL, PostgreSQL and SQL
Server databases
• Ensure business continuity with reliable and secure services backed by 24/7
SRE team
• Automate database provisioning, storage capacity management, and other
time-consuming tasks
• Database observability made easy for developers with Cloud SQL Insights
• Easy integration with existing apps and Google Cloud services like GKE and
BigQuery
Key features
Fully managed
Cloud SQL automatically ensures your databases are reliable, secure, and scalable so that
your business continues to run without disruption. Cloud SQL automates all your backups,
replication, encryption patches, and capacity increases—while ensuring greater than 99.95%
availability, anywhere in the world.
Integrated
Access Cloud SQL instances from just about any application. Easily connect from App
Engine, Compute Engine, Google Kubernetes Engine, and your workstation. Open up analytics
possibilities by using BigQuery to directly query your Cloud SQL databases.
Reliable
Easily configure replication and backups to protect your data. Go further by enabling
automatic failover to make your database highly available. Your data is automatically
encrypted, and Cloud SQL is SSAE 16, ISO 27001, and PCI DSS compliant and supports HIPAA
compliance.
Easy migrations to Cloud SQL
Database Migration Service (DMS) makes it easy to migrate your production databases to
Cloud SQL with minimal downtime. This serverless offering eliminates the hassle of manually
provisioning, managing, and monitoring migration-specific resources. DMS leverages the
native replication capabilities of MySQL and PostgreSQL to maximize the fidelity and
reliability of your migration. And it’s available at no additional charge for native like-to-like
migrations to Cloud SQL. Learn more.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 58 | 78


Python Web Application Framework:
Python Web framework is a collection of packages or modules that allow developers to write Web
applications or services. With it, developers don’t need to handle low-level details like protocols, sockets
or process/thread management.
Python web framework will help you with:
• Interpreting requests (getting form parameters, handling cookies and sessions,..)
• Producing responses (presenting data as HTML or in other formats,..)
• Storing data persistently (and other things)
Now, let’s look at the most useful and famous Python web framework to help you with Web development.

Python Full-Stack Frameworks


A full-stack framework in Python is one which attempts to provide a complete solution for applications.
It attempts to supply components for each layer in the stack.

1. Django:
Django Python is a framework for perfectionists with deadlines. With it, you can build better Web apps in
much less time, and in less code. Django is known for how it focusses on automating. It also believes in
the DRY (Don’t Repeat Yourself) principle.
Django was originally developed for content-management systems, but is now used for many kinds of
web applications. This is because of its templating, automatic database generation, DB access layer,
and automatic admin interface generation. It also provides a web server for development use.
Giant companies that use Django Python are- Instagram, Pinterest, Disqus, Mozilla, The Washington
Times, and Bitbucket. In fact, when we think of the terms ‘framework’ and ‘Python’, the first thing that
comes to our minds is Django.
We will see more on Django in another lesson.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 59 | 78


• Below is an architecture diagram for MVT.

Django Architecture

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 60 | 78


2. Pyramid:
Pyramid is another popular full-stack framework to build complex web apps with ease.
It provides routing, renderers, and command-line tools for bootstrapping a project, but offers you the
ability to choose your database layer, templating system, etc. It helps in URL generation, templating, and
asset specifications, quality measurement, security management, comprehensive documentation, and
HTML structure generation.
Pyramid has active community support. Its ability to work equally well with small and full-scale
applications is one of its advantages. It also gives extensive testing support and offers flexible tools for
development.
3. Web2Py:
Web2py is an open-source full stack framework that allows you to develop scalable, secure, and
portable web app with ease and in a fast way. It has its web-based IDE, debugger, and deployment
controls. One can deploy, debug, test, administer the database, and maintain applications using this
framework.
It has huge community support and it is backward compatible. Some of the other features are
built-in data protection, modular MVC architecture support, data security, and role-based access control.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 61 | 78


CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 62 | 78
Cloud Application Development in Python:
Design Approaches:
Here we will learn about application design approaches to the infrastructure-as-a-Service (IaaS)
and Platform-as-a-Service (PaaS) cloud service models.

8.1.1 Design methodology for laaS service model

Traditional application design approaches such as Service-Oriented Architecture (SOA) use


component-based designs. The components in these approaches can span multiple tiers (such
as web, application and database tiers), which makes it difficult to map them to multi-tier
cloud architectures.
Here we will learn about the Cloud Component Model (CCM) approach for designing
cloud applications, which is a more recent approach that classifies the components based on
the types of functions performed and types of cloud resources used.
The CCM design approach is suited for applications that use the Infrastructure-as-a-
Service (IaaS) cloud service model.
With the IaaS model the developers get the flexibility to map the CCM architectures to
cloud deployments. Virtual machines for various tiers (such as web, application and database
tiers) can be provisioned and auto scaling options for each tier can be defined.
Figure 8.1 shows the steps involved in a CCM based application design approach.
In the first step, the building blocks of an application components are identified based on these
groupings.
In the second step, the interactions between the application components are defined. CCM
approach is based on loosely coupled and stateless designs.
loosely coupled design:
In a loosely coupled design, components are independent, and changes in one will not
affect the operation of others. This approach offers optimal flexibility and re-usability when
components are added, replaced, or modified.
Stateless Clouds: Transmissions via stateless protocols means that data is being
transmitted without any information about the sender or receiver being retained by
either. Thus, both are unaware of the “state” of the other.

Messaging queues are used for asynchronous communication. CCM components


expose functional interfaces (such as REST APIs for loose coupling) and performance
interfaces (for reporting the performance of components). An external status database is used
for storing the state.
In the third step, the application components are mapped to specific cloud resources (such as
web servers, application servers, database servers, etc.)

The benefits of CCM approach are as follows:


➢ Improved Application Performance: CCM uses loosely coupled components that
communicate asynchronously. Designing an application with loosely components makes
it possible to scale up (or scale out) the application components that limit the
performance.
➢ Savings in Design, Testing & Maintenance Time: The CCM methodology achieves
savings in application design, testing and maintenance time. Savings in design time
come from use of standard components. Savings in testing time come from the use of
loosely coupled components which can be tested independently.
➢ Reduced Application Cost: Applications designed with CCM can leverage both vertical
and horizontal scaling options for improving the application performance. Both types of
scaling options involve additional costs for provisioning servers with higher computing
capacity or launching additional servers. Costs for cloud resources can be reduced by
identifying the application components which limit the performance and scaling up (or

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 63 | 78


scaling out) cloud resources for only those components. This is not possible for
applications which have tightly coupled and hard wired systems.
➢ Reduced Complexity: A simplified deployment architecture can be more easier to design
and manage. Therefore, depending on application performance and cost requirements,
it may be more beneficial to scale vertically instead of horizontally. For example, if
equivalent amount of performance can be obtained at a more cost-effective rate, then
deployment architectures can be simplified using small number of large server instances
(vertical scaling) rather than using a large number of small server

8.1.2 Design methodology for PaaS service model:

For applications that use the Platform-as-a-service (PaaS) cloud service model, the
architecture and deployment design steps shown in Figure 8.1 are not required since the
platform takes care of the architecture and deployment.
In the component design step, the developers have to take into consideration the
platform specific features. For example, applications designed with Google App Engine (GAE)
can leverage the GAE Image Manipulation service for image processing tasks.
Different PaaS offerings such as Google App Engine, Windows Azure Web Sites, etc.,
provide platform specific software development kits (SDKs) for developing cloud applications.
Applications designed for specific PaaS offerings run in sandbox environments and are
allowed to perform only those actions that do not interfere with the performance of other
applications. The deployment and scaling is handled by the platform while the developers focus
on the application development using the platform-specific SDKs. Portability is a major
constraint for PaaS based applications as it is difficult to move the application from one cloud
vendor to the other due to the use of vendor-specific APIs and PaaS SDKs.

8.2 Image Processing App:


Here we will learn how to develop a cloud-based Image Processing application.
This application provides online image filtering capability. Users can upload image files and
choose the filters to apply. The selected filters are applied to the image and the processed
image can then be downloaded.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 64 | 78


Filtering is a technique for modifying or enhancing an image. For example, you can filter an
image to emphasize certain features or remove other features. Image processing operations
implemented with filtering include smoothing, sharpening, and edge enhancement.

Figure 8.2 shows the component design step for the image processing app. In this step we
identify the application components and group them based on the type of functions performed
and type of resources required. The web tier for the image processing app has front ends for
image submission and displaying processed images. The application tier has components for
processing the image submission requests, processing the submitted image and processing
requests for displaying the results. The storage tier comprises of the storage for processed
images.
Figure 8.3 shows the architecture design step which defines the interactions between the
application components. This application uses the Django framework; therefore, the web tier
components map to the Django templates and the application tier components map to the
Django views. A cloud storage is used for the storage tier. For each component, the
corresponding code box numbers are mentioned.
Figure 8.4 shows the deployment design for the image processing app. This is a multi-tier
architecture comprising of load balancer, application servers and a cloud storage for processed
images. For each resource in the deployment the corresponding Amazon Web Services (AWS)
cloud service is mentioned.

Box 8.4 shows the source code of the Django View of the Image Processing app. The
function home( ) in the view renders the image submission page. This function checks if the
request method is POST or not.
If the request method is not POST then the file upload form is rendered in the template.
Whereas, if the request is POST, the file selected by the user is uploaded to the media directory
(specified in the Django project settings). The selected filter is then applied on the uploaded file
in the applyfilter( ) function. This example uses the Python Imaging Library (PIL) for image
filtering operations.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 65 | 78


CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 66 | 78
Document Storage App:
This topic will explain, how to develop a cloud-based document storage (Cloud Drive) application. This
application allows users to store documents on a cloud-based storage.
Figure 8.9 shows the component design step for the Cloud Drive App. In this step we identify the
application components and group them based on the type of functions performed and type of
resources required.
1) The web tier for the Cloud Drive app has front ends for uploading files, viewing/deleting files
and user profile.
2) The application tier has components for processing requests for uploading files, processing
requests for viewing/deleting files and the component that handles the registration, profile and
login functions.
3) The database tier comprises of a user credentials database.
4) The storage tier comprises of the storage for files.
Figure 8.10 shows the architecture design step which defines the interactions between the application
components. This application uses the Django framework; therefore, the web tier components map to the
Django templates and the application tier components map to the Django views. A MySQL database is
used for the database tier and a cloud storage is used for the storage tier. For each component, the
corresponding code box numbers are mentioned.
Figure 8.11 shows the deployment design for the Cloud Drive app. This is a multi-tier deployment
comprising of load balancer, application servers, cloud storage for storing documents and a database
server for storing user credentials. For each resource in the reference architecture the corresponding
Amazon Web Services (AWS) cloud service is mentioned.
Figure 8.12 shows a screenshot of the user registration page and
Figure 8.13 shows a screenshot of the login page. The Cloud Drive app uses the Django's in-built user
authentication system. Box 8.6 shows the source code for the Django template for the registration Page.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 67 | 78


CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 68 | 78
UNIT-4

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 69 | 78


Big Data Analytics: Introduction, Clustering Big Data, Classification of Big data, Recommendation of
Systems. Multimedia Cloud: Introduction, Case Study: Live video Streaming App, Streaming Protocols,
case Study: Video Transcoding App. Cloud Application Benchmarking and Tuning: Introduction,
Workload Characteristics, Application Performance Metrics, Design Considerations for a Benchmarking
Methodology, Benchmarking Tools, Deployment Prototyping, Load Testing & Bottleneck Detection case
Study, Hadoop benchmarking case Study.
Big Data Analytics Introduction:
The volume of data has exploded to unimaginable levels in the past decade, and at the same
time, the price of data storage has systematically reduced. Private companies and research institutions
capture terabytes of data about their users’ interactions, business, industrial, healthcare, social
media, and also sensors from devices such as mobile phones and automobiles. The challenge of this
era is to make sense of this sea of data. This is where big data analytics comes into picture.
Big Data Analytics largely involves collecting data from different sources, modify it in a way
that it becomes available to be consumed by analysts and finally deliver data products useful to the
organization business.
The process of converting large amounts of unstructured raw data, retrieved from different
sources to a data product useful for organizations forms the core of Big Data Analytics.
Some examples:
➢ Data generated by social networks including text, images, audio and video data.
➢ Click-stream data generated by web applications such as e-Commerce to analyze user behavior.
➢ Machine sensor data collected from sensors embedded in industrial and energy systems for
monitoring their health and detecting failures.
➢ Healthcare data collected in electronic health record (EHR) systems.
➢ Logs generated by web applications
➢ Stock markets data
Characteristics of Big Data:
Velocity
Velocity refers to the speed of the data processing. High velocity is essential for the performance of any
big data process. This consists of the rate of change, activity bursts, and the linking of incoming data
sets.
Value
Value refers to the benefits that the organization derives from the data. Does it match the organization's
goals? Does it help the organization enhance itself? It is among the most important characteristics
of big data core?
Volume
Volume refers to the quantity of the data that you have. We measure the volume of our data
in Gigabytes, Zettabytes (ZB), and Yottabytes (YB). According to industry trends, the volume of data
will rise substantially in the coming years.
Variety
Variety refers to the various types of big data. It is one of the most significant problems facing the big
data industry as it affects performance. It is vital to manage the variety of your data correctly by
organizing it. Variety is the various types of data you collect from different types of sources.
Veracity
Veracity refers to the accuracy of the data. It is one of the most critical characteristics of Big Data since
low veracity can significantly damage the accuracy of its results.
Validity
Validity refers to how valid and relevant the data is to be used for the intended purpose.
Volatility
Big data is constantly changing. The data you collected from a source a day ago may be different from
what you found today. This is called variability of the data and affects the homogenization of the data.
Visualization
Visualization refers to displaying your insights generated by big data through visual representations such
as charts and graphs. It has recently become predominant as big data professionals regularly share their
insights with non-technical audiences.
Clustering Big Data:
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 70 | 78
Clustering is the process of grouping similar data items together. Means data items that are more
similar to each other (based on some similarity criteria) than other data items are put in one cluster.
Clustering big data will be happens in applications such as:
➢ Clustering social network data to find a group of similar users.
➢ Clustering electronic health record (HER) data to find similar patients.
➢ Clustering sensor data to group similar or related faults in a machine.
➢ Clustering market research data to group similar customers.
➢ Clustering clickstream data to group similar users.
Clustering is achieved by clustering algorithms that belong to a broad category algorithm called
Unsupervised Mahine Learning. Unsupervised machine learning algorithms find the patterns and hidden
structure in data for which no training data is available.
Some popular clustering algorithms and their applications to big data.
1) K-means clustering
2) DBSCAN clustering
3) Hierarchical clustering algorithms.
etc.

Multimedia Cloud Introduction:


This topic introduces the principal concepts of multimedia cloud computing and presents a
novel framework. We address multimedia cloud computing from multimedia-aware cloud (media
cloud) and cloud-aware multimedia (cloud media).

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 71 | 78


First, we present a multimedia-aware cloud, which addresses how a cloud can perform distributed
multimedia processing and storage and provide quality of service (QOS) provisioning for multimedia
services.
To achieve a high QoS for multimedia services we propose a Media-Edge Cloud (MEC)
architecture in which storage, Central Processing Unit (CPU) and Graphics Processing Unit (GPU)
clusters are presented edge to provide distributed parallel processing and QoS adaptation for various
types devices.
Then multimedia services provide storage, sharing, authoring, adaptation and delivery,
rendering and retrieval can optimally utilize cloud-computing resources to achieve better Quality of
Experience.

With the development of web 2.0 and higher and through the increasing reach of high speed
internet to wireless applications, multimedia rich web applications have become widely popular in recent
years.
There are various types of multimedia web applications including multimedia storage,
processing, transcoding and streaming applications. Due the higher resource requirements for
multimedia applications, cloud computing is proving to be an efficient and cost-effective solution. With
the increasing demand for multimedia rich web applications on wireless platforms.
Multimedia Cloud provides storage, processing, and streaming services to millions of mobile
users around the world.

Figure 10.1 shows a reference architecture for a multimedia cloud. In this architecture.
The first layer is the infrastructure services layer that includes computing and storage resources.
On top of the infrastructure services layer is the platform services layer it includes frameworks and
services for streaming and associated tasks such as transcoding and analytics for development of
multimedia applications.
The topmost layer is the applications such as live video streaming, video transcoding, video-on-demand,
multimedia processing etc.
Cloud-based multimedia applications alleviates (bearable) the burden of installing and maintaining
multimedia applications locally on the devices (desktops, tablets, smartphones, etc) and provide access
to rich multimedia content.
A multimedia cloud can have various service models such as IaaS, PaaS, SaaS that offers
infrastructure, platform or application services as shown in above Figure.

Case Study: Live Video Streaming App:


Here we will see how to develop and deploy multimedia applications on the cloud. When come
to live video streaming, video streaming applications have become very popular in the recent years with
more and more users watching events broadcast live on the internet.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 72 | 78


Live streamed events can be viewed by audiences around the world on different types of devises
connected to the internet such as laptops, desktops, tablets, smartphones, internet-TVs, etc.
This feature needs to reach a much wider audience across larger geographic area with applications
of streaming technology.

The above Figure 10.3 shows a screenshot of a live video streaming demo. This application allows on-
demand creation of video streaming instances in the cloud.
First step in the stream instance creation workflow will specify the details of the stream.
Figure 10.4 and 10.5 show the second and third steps in which an instance size is selected and then the
instance is launched.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 73 | 78


The live streaming application is created using the Django framework and uses Amazon EC2
cloud instances. For video stream encoding and publishing, the Adobe Flash Media Live Encoder and
Flash Media Server are used. This template uses the JQuery smart wizard to get user inputs for the
streaming instance. The form inputs are processed in a Django view.
After the user completes the stream instance creation wizard, an instance of Flash Media Server
is created in the Amazon cloud. Figure 10.6 shows the settings to use with the Flash Media Encoder
application on the client. This page also provides a link to the page in which the live stream can be
viewed.

The Flash Media Server (FMS) URL and stream details provided in the stream details page
and then entered in the Flash Media Encoder (FME) application. The client gets the video and audio
feed from a camera and microphone or gets a multiplexed feed from video/audio mixers. The video and
audio formats and bit rates can be specified in FME.
After all settings are complete, streaming can be started by clicking on the start button. In this
example, RTMP protocol (Real-Time Messaging Protocol) is used for streaming.
Example program: find in the page number 337 in provided pdf file.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 74 | 78


Streaming Protocols
As we know ‘Streaming’ is an act or instance of flowing. Here we are concerned about
streaming a media, which includes video and audio. With streaming media, the end user does not have
to wait to download a file completely to play it. Instead, the media is sent in a continuous stream of data
and is played as it arrives. The user needs a player, which is a special program that uncompressed and
sends video data to the display and audio data to speakers.
There are different types of Streaming Media Systems, which includes servers (eg: Wowza
Media Server, Windows Media Services etc) and clients (eg: VLC Media Player, MPlayer etc). These
Streaming Media Systems are governed by certain Streaming Protocols.
Different types of Streaming protocols are:
1) RTMP (Real Time Messaging Protocol)
2) RTSP (Real Time Streaming Protocol)
3) HTTP Streaming Protocols
• HLS
• MPEG-DASH
• HDS
• Smooth Streaming
RTMP (Real Time Messaging Protocol)
Real Time Messaging Protocol (RTMP) was initially a proprietary protocol developed by
Macromedia for streaming audio, video and data over the Internet, between a Flash player and a server.
RTMP provides a bidirectional message multiplex service over a reliable stream transport, such
as TCP, intended to carry parallel streams of video, audio, and data messages, with associated timing
information, between a pair of communicating peers. Implementations typically assign different priorities
to different classes of messages, which can effect the order in which messages are enqueued to the
underlying stream transport when transport capacity is constrained.
RTMP delivery of Flash Video is provided by licensed server software from Adobe, notably
Flash Media Server (FMS). FMS is installed on a networked server and manages streaming flash video
separately from the Web server hosting the flash movie (SWF (Shock Wave Flash)) and other HTML
content. Licensing FMS for high volume Web sites is very expensive. But there are affordable shared
hosting FMS services. Using a Flash Video Streaming Service (FVSS) provider is also an option for
high volume Flash Video deployment.

RTSP (Real Time Streaming Protocol)


The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in
entertainment and communications systems to control streaming media servers. The protocol is used for
establishing and controlling media sessions between end points. Clients of media servers issue VCR-like
commands, such as play and pause, to facilitate real-time control of playback of media files from the
server.
Real-time streaming protocol uses a combination of protocols such as TCP (connection based
protocol), UDP (connectionless protocol), and RTP to achieve various functions by maintaining
session/state between server and client through an identifier. In other words, the RTSP server and client
can send requests simultaneously by choosing the appropriate delivery mechanism, an advantage over
other protocol types.
The session begins with “Setup” from the client or already defined transport information that
indicates the server to allocate resource for data stream,”Play” where the data is transmitted according to
the request from client, “Pause” in which the streaming is temporarily disabled without actually
disconnecting the server, “Record” where the streaming data is recorded by the client as per the time-
stamp carrying the information of start and end time and “Close” where the resources are freed and the
client-server session comes to an end.
RTSP delivers continuous streams of requested data without actually storing it on the hard disk.
This is not recommended for those who do not want to compromise on video quality. But this will be
highly advantageous in many instances such as a conference which could be viewable to many people
at once regardless of location.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 75 | 78


HTTP Streaming Protocols
Most of the common protocols available today support streaming over HTTP. In fact, these
protocols are just layers over HTTP, to make streaming video over HTTP work, so the end user does not
have to do anything special.

HLS (HTTP Live Streaming)


HTTP Live Streaming provides a reliable, cost-effective means of delivering continuous and
long-form video over the Internet. It allows a receiver to adapt the bit rate of the media to the current
network conditions in order to maintain uninterrupted playback at the best possible quality (adaptive
streaming). It supports interstitial content boundaries. It provides a flexible framework for media
encryption. It can efficiently offer multiple versions of the same content, such as audio translations.
It offers compatibility with large-scale HTTP caching infrastructure to support delivery to large
audiences.
HLS is native to popular iOS devices (iPhone), the users of which are accustomed / habitual to
paying for apps and other services. HLS is not supported natively on Windows OS platforms.

MPEG-DASH:
Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH is a
developing ISO Standard. As the name suggests, DASH is a standard for adaptive streaming over HTTP
that has the potential to replace existing proprietary technologies like Microsoft Smooth Streaming,
Adobe Dynamic Streaming, and Apple HTTP Live Streaming (HLS). A unified standard
would be a boon to content publishers, who could produce one set of files that play on all DASH-
compatible devices.(ipad, iPhone, Nexus 7, Fir kids edition, Fire x version)
MPEG-DASH works by breaking the content into a sequence of small HTTP-based file
segments, each segment containing a short interval of playback time of content that is potentially many
hours in duration, such as a movie or the live broadcast of a sports event. The content is made available
at a variety of different bit rates, i.e., alternative segments encoded at different bit rates covering aligned
short intervals of play back time are made available. While the content is being played back by an MPEG-
DASH client, the client automatically selects from the alternatives the next segment to download and
play back based on current network conditions. The client selects the segment with the highest bit rate
possible that can be downloaded in time for play back without causing stalls or re-buffering events in the
playback. Thus, an MPEG-DASH client can seamlessly adapt to changing network conditions, and
provide high quality play back with fewer stalls or re-buffering events.

HDS:
HDS or HTTP Dynamic Streaming, is Adobe’s method for adaptive bitrate streaming for Flash
Video. This method of streaming enables on-demand and live adaptive bitrate video delivery of MP4
media over regular HTTP connections. When there is recording to be done, we have to implement
nDVR(wowza). So we can record the live stream on the go. HDS is mainly used in live applications
where nDVR is implemented. The chunks will not be missed while recording is going on. So in live
streaming it is highly useful.
[Adaptive bitrate streaming adjusts video quality based on network conditions to improve
video streaming over HTTP networks. This process makes playback as smooth as possible
for viewers regardless of their device, location, or Internet speed.]
Smooth Streaming:
Smooth Streaming is introduced by Microsoft. It is an IIS (Internet Information Services)
Media Services extension that enables adaptive streaming of media to clients (which include
Silverlight) over HTTP. Smooth Streaming uses the simple but powerful concept of delivering small
content fragments (typically 2 seconds worth of video) and verifying that each has arrived within the
appropriate time and played back at the expected quality level. If one fragment doesn’t meet these
requirements, the next fragment delivered will be at a somewhat lower quality level. Conversely, when
conditions allow it, the quality of subsequent fragments will be at a higher level.
Here is an overview of the recognized file extensions and mime types for these protocols, plus
their browser playback support:

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 76 | 78


Case Study: Video Transcoding App:
Here we will see develop a video transcoding application based on multimedia cloud. The demo
application is built upon the Amazon Elastic Transcoder. Elastic Transcoder is highly scalable, relatively
easy to use service from Amazon. It allows converting video files from their source format into mobile
devices like smartphones, tablets and PCs.
Here some steps involved in building a video transcoding application using Python and Django.
The transcoding application allows users to upload video files and choose the conversion pre-sets.
Figure 10.9, 10.10, 1011 show screenshots of the video file submission wizard.

The source code for the Django template for the home page is provided in below. This template uses the
jQuery smart wizard to get user inputs. The form inputs are processed in a Django view described.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 77 | 78


Figure 10.12 shows a screenshot of the video transcoding app after the video file is submitted by the user.
The video files submitted for transcoding are uploaded to an Amazon S3 bucket and a new transcoding
job is then created. Then user can view the job status and obtain the download link for the transcoded
video from the job status page.

CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 78 | 78

You might also like