CC-Notes 19-09-22
CC-Notes 19-09-22
CC-Notes 19-09-22
The cloud environment provides an easily accessible online portal that makes handy for the user
to manage the compute, storage, network, and application resources. Some cloud service providers are
in the following figure.
In short form:
Infrastructure-as-a-Service (IaaS)
IaaS provides access to fundamental resources such as physical machines, virtual machines,
virtual storage, etc.
1. Resources Pooling
Resource pooling is one of the essential features of cloud computing. Resource pooling means that a
cloud service provider can share resources among multiple clients, each providing a different set of
services according to their needs. It is a multi-client strategy that can be applied to data storage,
processing and bandwidth-delivered services. The administration process of allocating resources in real-
time does not conflict with the client's experience.
2. On-Demand Self-Service
It is one of the important and essential features of cloud computing. This enables the client to
continuously monitor server uptime, capabilities and allocated network storage. This is a fundamental
feature of cloud computing, and a customer can also control the computing capabilities according to their
needs.
3. Easy Maintenance
This is one of the best cloud features. Servers are easily maintained, and downtime is minimal or
sometimes zero. Cloud computing powered resources often undergo several updates to optimize their
capabilities and potential. Updates are more viable with devices and perform faster than previous
versions.
4. Scalability and Rapid Elasticity
A key feature and advantage of cloud computing is its rapid scalability. This cloud feature enables cost-
effective handling of workloads that require a large number of servers but only for a short period. Many
customers have workloads that can be run very cost-effectively due to the rapid scalability of cloud
computing.
5. Economical
This cloud feature helps in reducing the IT expenditure of the organizations. In cloud computing, clients
need to pay the administration for the space used by them. There is no cover-up or additional charges
that need to be paid. Administration is economical, and more often than not, some space is allocated for
free.
6. Measured and Reporting Service
Reporting Services is one of the many cloud features that make it the best choice for organizations. The
measurement and reporting service is helpful for both cloud providers and their customers. This enables
both the provider and the customer to monitor and report which services have been used and for what
purposes. It helps in monitoring billing and ensuring optimum utilization of resources.
Step 2: It will ask you to login with your Microsoft account. If you already have a Microsoft account,
you can fill the details and login. And if you don’t have one, you must signup first to proceed further.
Step 3: After logging in to your Microsoft Account. You will be redirected to the next page, as shown
below. Here you need to fill the required fields, and they will ask for your credit card number to verify
your identity and to keep out spam and bots. You won’t be charged unless you upgrade to paid
services.
Government:
The U.S. military and government were early adopters of cloud computing. Their
Cloud incorporates social, mobile, and analytics technologies. Although, they must adhere to
strict compliance and security measures (FIPS, FISMA, and FedRAMP). This protects
against cyber threats both domestically and abroad.
Big data Analytics:
Cloud computing helps data scientists analyze various data patterns, insights for better
predictions and decision making. There are many open-source big data development and
analytics tools available like Cassandra, Hadoop, etc., for this purpose.
Communication:
Cloud computing provides network-based access to communication tools like emails
and social media. WhatsApp also uses a cloud-based infrastructure to facilitate user
communications. All the information is stored in the service provider’s hardware.
Business Process:
Nowadays, many business processes like emails, ERP, CRM, and document
management have become cloud-based services. SaaS has become the most vital method for
enterprises. Some examples of SaaS include Salesforce, HubSpot.
Facebook, Dropbox, and Gmail:
Cloud computing can be used for the storage of files. It helps you automatically
synchronize the files from different devices like desktop, tablet, mobile, etc. Dropbox allows
users to store and access files up to 2 GB for free. It also provides an easy backup feature.
Social Networking platforms like Facebook demand powerful hosting to manage and
store data in real-time. Cloud-based communication provides click-to-call facilities from
social networking sites and access to the instant messaging system.
Citizen Services: The cloud technology can be used for handling citizen services too. It is
widely used for storing, managing, updating citizen details, acknowledging forms, and even
verifying the current status of applications can be performed with the help of cloud
computing.
Google Compute Engine (GCE): is an IaaS offering from Google. GCE provides virtual machines of
various computing capacities ranging from small instances (e.g., 1 virtual core with 1.38 GCE unit and
1.7 GB memory) to high memory machine types (e.g., 8 virtual cores with 22 GCE units and 52GB
memory).
Windows Azure Virtual Machines: is an IaaS offering from Microsoft. Azure VMs provides virtual
machines of various computing capacities ranging from small instances (1 virtual core with 1.75GB
memory) to memory intensive machine types (8 virtual cores with 56GB memory).
Google App Engine (GAE) is a Platform-as-a-Service (PaaS) offering from Google. GAE™ is a cloud-
based web service for hosting web applications and storing data. GAE allows users to build scalable and
reliable applications that run on the same systems that power Google’s own applications. GAE provides
a software development kit (SDK) for developing web applications software that can be deployed on
GAE.
Developers can develop and test their applications with GAE SDK on a local machine and then
upload it to GAE with a simple click of a button. Applications hosted in GAE are easy to build, maintain
and scale. Users don't need to worry about launching additional computing instances when the application
load increases. GAE provides seamless scalability by launching additional instances when application
load increases. GAE provides automatic scaling and load balancing capability. GAE supports
applications written in several programming languages.
With GAEs Java runtime environment developers can build applications using Java programming
language and standard Java technologies such as Java Servlets. GAE also provides runtime environments
for Python and Go programming languages.
The pricing model for GAE is based on the amount of computing resources used. GAE provides
free computing resources for applications up to a certain limit. Beyond that limit, users are billed based
on the amount of computing resources used, such as amount bandwidth consumed, number of resources
instance hours for front-end and back-end instances, amount of stored data, channels, and recipients
emailed.
Salesforce Marketing Cloud is based social marketing SaaS. Marketing cloud allows companies to
identify sales leads from social media, discover advocates, identify the most trending information on any
topic. Marketing cloud allows companies to pro-actively engage with customers, manage social listening,
create and deploy social content, manage and execute optimized social advertisement campaigns and
track the performance of social campaigns.
******
Cloud Concepts and Technologies:
Here we will learn about…
Virtualization
Load Balancing
Scalability & Elasticity
Deployment
Replication
Monitoring
MapReduce
Identity and Access Management
Service Level Agreements
Billing
Software Defined
Network function virtualization.
These are the key concepts and enabling technologies of cloud computing.
Virtualization:
Virtualization refers to the partitioning the resources of a physical system (such as computing,
storage, network and memory) into multiple virtual resources. Virtualization is the key enabling
technology of cloud computing and allows pooling of resources. In cloud computing, resources are
pooled to serve multiple users using multi-tenancy (multiple users using by rent).
Cloud allows the multiple users to be served by the same physical hardware. Users are assigned
virtual resources that run on top of the physical resources. Below diagram shows the architecture of
virtualization technology in cloud computing.
The physical resources such as computing, storage memory and network resources are virtualized.
The virtualization layer partitions the physical resources into multiple virtual machines. The
virtualization layer allows multiple operating system instances to run currently as virtual machines on
the same underlying physical resources.
Type 2 hypervisors or hosted hypervisors run on top of a conventional (main/host) operating system and
monitor the guest operating systems.
Guest OS: A guest OS is an operating system that is installed in a virtual machine in addition to
the host or main OS. In virtualization, the guest OS can be different from the host OS.
Various forms of virtualization approaches exist:
➢ Full Virtualization
➢ Para Virtualization
➢ Hardware Virtualization
Full Virtualization:
In full virtualization, the virtualization layer completely decouples (separates) the guest OS from
the underlying hardware. The guest OS requires no modification and is not aware that it is being
virtualized. Full virtualization is enabled by direct execution of user requests and binary translation of
OS requests.
Hardware Virtualization:
Hardware assisted virtualization is enabled by hardware features such as Intel’s Virtualization
Technology (VT-x) and AMD’s AMD-V. In hardware assisted virtualization, privileged and sensitive
calls are set to automatically trap to the hypervisor. Thus, there is no need for either binary translation
or para-virtualization.
Load Balancing:
One of the important features of cloud computing is scalability. Cloud computing resources can
be scaled up on demand to meet the performance requirements of applications.
Load balancing distributes workloads across multiple servers to meet the application workloads.
The goals of load balancing techniques are to achieve maximum utilization of resources, minimizing the
response times, maximizing throughput.
Load balancing distributes the incoming user requests across multiple resources. With load
balancing, cloud-based applications can achieve high availability and reliability. Since multiple resources
under a load balancer are used to serve the user requests, in the event of failure of one or more of the
resources, the load balancer can automatically reroute the user traffic to the healthy resources.
The routing of user requests is determined based on a load balancing algorithm. Commonly used
load balancing algorithms are as follows:
Round Robin:
In round robin load balancing, the servers are selected one by one to serve the incoming requests
in a non-hierarchical circular fashion with no priority assigned to a specific server.
Weighted Round Robin:
In weighted round robin load balancing, servers are assigned some weights. The incoming
requests are proportionally routed using a static or dynamic ratio of respective weights.
Low Latency: (Low Delay)
In low latency load balancing the load balancer monitors the latency of each server. Each
incoming request is routed to the server which has the lowest latency.
Least Connections: In least connections load balancing, the incoming requests are routed to the server
with the least number of connections.
Priority:
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 14 | 78
In priority load balancing, each server is assigned a priority. The incoming traffic is routed to the
highest priority server as long as the server is available. When the highest priority server fails, the
incoming traffic is routed to a server with a lower priority.
Overflow:
Overflow load balancing is similar to priority load balancing. When the incoming requests to
highest priority server overflow, the requests are routed to a lower priority server.
Sticky Sessions:
In this approach all the requests belonging to a user session are routed to the same server. These
sessions are called sticky sessions. The benefit of this approach is that it makes session management
simple. Here drawback is; if a server fails, all the sessions belonging to that server are lost.
A session is a way to store information (in variables) to be used across multiple pages.
Unlike a cookie, the information is not stored on the user’s computer.
Session Database:
In this approach, all the session information is stored externally in a separate session database,
which is often replicated (replication means duplicate) to avoid a single point of failure. Though, this
approach involves additional overhead of storing the session information, however, unlike the sticky
session approach, this approach allows automatic failover.
Browser Cookies:
In this approach, the session information is stored on the client side in the form of browser
cookies. The benefit of this approach is that it makes the session management easy and has the least
amount of overhead for the load balancer.
URL re-writing:
In this approach, a URL re-write engine stores the session information by modifying the URLs
on the client side.
Though this approach avoids overhead on the load balancer, a drawback is that the amount of
session information that can be stored is limited. For applications that require larger amounts of session
information, this approach does not work.
Load balancing can be implemented in software or hardware. Software-based load balancers run
on standard operating systems, and like other cloud resources, load balancers are also virtualized.
Hardware-based load balancers implement load balancing algorithms in Application Specific
Integrated Circuits (ASICs).
Deployment Design:
Host-Based Replication:
Host-based Replication runs on standard servers and uses software to transfer data from a local
to remote location. It processes using servers to copy data from one site to another. Host-based replication
is conducted by software that resides on application servers and forwards data changes to another device.
The process is usually file-based and asynchronous: The software traps write input/output (I/O) and then
forward changes to replication targets.
The host acts the replication control mechanism. And agent is installed on the hosts that
communicates with the agents on the other hosts.
Monitoring:
Cloud resources can be monitored by monitoring services provided by the cloud service providers.
Monitoring services allow cloud users to collect and analyze the data on various monitoring metrics. The
following diagram shows a generic architecture for cloud monitoring service.
This monitoring service collects data on various system and application metrics from the cloud
computing instances. Monitoring services provide various pre-defined metrics. Users can also define
their custom metrics for monitoring the cloud resources. Users can define various actions based on the
monitoring data.
For example: auto-scaling a cloud deployment when the CPU usage of monitored resources
becomes high. Monitoring services also provide various statistics based on the monitoring data collected.
The following table shows the commonly used monitoring metrics for cloud computing resources.
Monitoring of cloud resources is important because it allows the users to keep track of the health
of applications and services deployed in the cloud.
Example: An organization which has its website hosted in the cloud can monitor the performance
of the website and also the website traffic.
With the monitoring data available at run-time users can make operational decisions such as
scaling up or scaling down cloud resources.
SDN Architecture
NFV architecture
OAuth Example
OAuth is an open standard for authorization, it allows resource owners to share their private
resources stored on one site with another site without handling out the credentials. In the OAuth model,
an application (which is not the resource owner) requests access to resources controlled by the resource
owner (but hosted by the server). The resource owner grants permission to access the resources in the
form of a token and matching shared-secret.
Billing:
Cloud service providers offer a number of billing models described as follow.
Elastic Pricing: In elastic pricing or pay-as-you-use pricing model, the customers are charged based on
the usage of cloud resources. Cloud computing provides the benefit of provision resources on-demand.
These on-demand provisioning and elastic pricing models bring cost savings for customers. Elastic
pricing model suited for customers who consume cloud resources for short durations.
Fixed Pricing: In fixed pricing models, customers are charged a fixed amount per month for the cloud
resources. For example: fixed amount can be changed per month for running a virtual machine instance,
irrespective of the actual usage. Fixed pricing model is suited for customers who want to use cloud
resources for longer durations.
Spot Pricing: Spot pricing models offer variable pricing for cloud resources which is driven by market
demand. When the demand for cloud resources is high, the prices increase and when the demand is lower,
the prices decrease.
*****
Cloud Services and Platforms:
Here we will learn about various types of cloud computing services including:
1) Compute services,
2) storage,
3) database,
4) application,
5) content delivery,
6) analytics,
7) deployment
8) management and
9) identity & access management.
For each category of cloud services, examples of services provided by various cloud service
providers including Amazon, Google and Microsoft are described.
*********
Open-Source Private Cloud Software:
The open-source cloud software can be used to build private clouds. Here we will learn about
popular public cloud platforms. Here we learning about 1) CloudStack 2) OpenStack.
CloudStack: Apache CloudStack is an open-source cloud software that can be used for creating private
cloud offerings. CloudStack manages the network, storage, and compute nodes to provide cloud
infrastructure. A CloudStack installation consists of a Management Server and the cloud infrastructure
that it manages.
The cloud infrastructure can be as simple as one host running the hypervisor or a large cluster of
hundreds of hosts. The management server allows you to configure and manage the cloud resources.
The management server manages one or more zones where each zone is typically a single
datacenter. Each zone has one or more pods. A pod is a rack of hardware comprising of a switch and one
or more clusters. A cluster consists of one or more hosts and a primary storage. A host is computing
node that runs guest virtual machines. The primary storage of a cluster stores the disk volumes for all the
virtual machines running on the hosts in that cluster. Each zone has a secondary storage that stores
templates, ISO images, and disk volume snapshots.
The Node Controller (NC) hosts the virtual machine instances and manages the virtual network
endpoints. The cluster-level (availability-zone) consists of three components –
1. Cluster Controller (CC),
2. Storage Controller (SC) and
3. VMWare Broker.
• The CC manages the virtual machines and is the front-end for a cluster.
• The SC manages the Eucalyptus block volumes and snapshots to the instances within its
specific cluster. SC is equivalent to AWS Elastic Block Store (EBS).
• The VMWare Broker is an optional component that provides an AWS-compatible
interface for VMware environments.
At the cloud-level there are two components: 1. Cloud Controller (CLC), 2. Walrus.
CLC provides an administrative interface for cloud management and performs high-level
resource scheduling, system accounting, authentication and quota management.
Walrus is equivalent to Amazon S3 and serves as a persistent storage to all of the virtual machines
in the Eucalyptus cloud.
OpenStack:
OpenStack is a cloud operating system consist of a collection of interacting services that control
computing, storage, and networking resources. The following diagram shows the architecture of
OpenStack.
***************
*****
UNIT-2
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 31 | 78
Hadoop & MapReduce:
Apache Hadoop The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.
What is Hadoop
Hadoop is an open-source framework from Apache and is used to store process and analyze data
which are very huge in volume. Hadoop is written in Java and is not OLAP (Online Analytical
Processing). It is used for batch/offline processing. It is being used by Facebook, Yahoo, Google, Twitter,
LinkedIn and many more. Moreover, it can be scaled up just by adding nodes in the cluster.
The Hadoop ecosystem consist of a number of projects (modules) as below.
Modules of Hadoop
1. HDFS: Hadoop Distributed File System. It states that the files will be broken into blocks and
stored in nodes over the distributed architecture. HDFS is a distributed file system, it runs on large
clusters and provides high throughput access to data. HDFS was built to reliably store very large
files across machines in a large cluster bult of different hardware.
HDFS stores each file as a sequence of blocks all of which are in the same size except the
last block. The blocks of each file are replicated on multiple machines in a cluster with a default
replication of 3 to provide fault tolerance.
2. Hadoop Yarn: Is a framework is used for job scheduling and manage the cluster.
3. Hadoop Map Reduce: This is a framework which helps Java programs to do the parallel
computation on data using key value pair. The Map task takes input data and converts it into a
data set which can be computed in Key value pair. The output of Map task is consumed by reduce
task and then the out of reducer gives the desired result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other Hadoop
modules. This module support other Hadoop modules.
5. HBase: HBase is an open source and sorted map data built on Hadoop. It is column oriented and
horizontally scalable. HBase is non-relational, distributed, column-oriented database that
provides structured data storage for large tables.
HBase is well suited for sparse data sets which are very common in big data use cases. Hbase
provides APIs enabling development in practically any programming language.
6. Zookeeper: Zookeeper is a high-performance distributed coordination service for maintaining
configuration information, naming, providing distributed synchronization and group services.
7. Pig: Pig is a data flow language and an execution environment for analyzing large datasets. Pig
compiler produces a sequence of MapReduce jobs that analyze data in HDFS using the Hadoop
MapReduce framework.
8. Hive: Hive is a distributed data warehouse infrastructure for Hadoop. Hive provides an SQL-like
language called HiveQL. HiveQL allows easy data summarization, ad-hoc querying, and analysis
of large datasets stored in HDFS.
9. Cassandra: Cassandra is scalable multi-master database with no single points of failure.
Casandra is designed to handle massive scale data spread across many servers and provides a
highly available service with no single point of failure. Casandra is a No-SQL solution that
provides a structured key-value store.
10. Flume: Flume is a distributed, reliable and available service for collecting, analyzing and moving
large amounts of data from applications to HDFS.
Hadoop Architecture:
The Hadoop architecture is a package of the file system, MapReduce engine and the HDFS
(Hadoop Distributed File System). The MapReduce engine can be MapReduce/MR1 or YARN/MR2.
A Hadoop cluster consists of a single master and multiple slave nodes. The master node includes
Job Tracker, Task Tracker, Name Node, and Data Node whereas the slave node includes Data Node and
Task Tracker.
Advantages of Hadoop:
o Fast: In HDFS the data distributed over the cluster and are mapped which helps in faster retrieval.
Even the tools to process the data are often on the same servers, thus reducing the processing
time. It is able to process terabytes of data in minutes and Peta bytes in hours.
o Scalable: Hadoop cluster can be extended by just adding nodes in the cluster.
o Cost Effective: Hadoop is open source and uses commodity hardware to store data so it really
cost effective as compared to traditional relational database management system.
o Resilient to failure: HDFS has the property with which it can replicate data over the network, so
if one node is down or some other network failure happens, then Hadoop takes the other copy of
data and use it. Normally, data are replicated thrice but the replication factor is configurable.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 33 | 78
Hadoop MapReduce Job Execution:
MapReduce is a framework using which we can write applications to process huge amounts of
data, in parallel, on large clusters of commodity hardware in a reliable manner.
What is MapReduce?
MapReduce is a processing technique and a program model for distributed computing based on
java. The MapReduce algorithm contains two important tasks, namely Map and Reduce.
Map takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs).
Secondly, reduce task, which takes the output from a map as an input and combines those data
tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is
always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data processing over multiple
computing nodes. Under the MapReduce model, the data processing primitives are called mappers and
reducers. Decomposing a data processing application into mappers and reducers is sometimes nontrivial.
But, once we write an application in the MapReduce form, scaling the application to run over hundreds,
thousands, or even tens of thousands of machines in a cluster are merely a configuration change. This
simple scalability is what has attracted many programmers to use the MapReduce model.
The Algorithm for MapReduce:
• Generally, MapReduce paradigm is based on sending the computer to where the data resides!
• MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
o Map stage − The map or mapper’s job is to process the input data. Generally, the input data is in the
form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to
the mapper function line by line. The mapper processes the data and creates several small chunks of
data.
o Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The
Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new
set of output, which will be stored in the HDFS.
• During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.
• The framework manages all the details of data-passing such as issuing tasks, verifying task completion,
and copying data around the cluster between the nodes.
• Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
• After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result,
and sends it back to the Hadoop server.
The above diagram shows the Hadoop cluster comprises of a Master node, backup node and a number
of slave nodes. The master node runs the NameNode and JobTracker process and the Slave Nodes run
the DataNode and TaskTracker components of Hadoop. The backup node runs the secondary NameNode
process. The functions of the main processes of Hadoop are described as follows:
1) NameNode:
NameNode keeps the directory tree of all files in the file system, and tracks where across
the cluster the file data is kept. It does not store the data of these files itself. Client applications
talk to the NameNode whenever need to locate a file or when they want to add/copy/move/delete
a file.
The NameNode responds to the successful requests by returning a list of relevant
DataNode servers where the data lives. NameNode serves as both directory namespace manager
and ‘inode table’ for the Hadoop DFS. There is a single NameNode running in any DFS
deployment.
2) Secondary NameNode:
HDFS is not currently a high availability system. The NameNode is a Single Point of
Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline. An
optional Secondary NameNode which is hosted on a separate machine creates checkpoints of the
namespace.
3) JobTracker:
The JobTracker is the service within Hadoop that distributes MapReduce tasks to specific
nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.
4) TaskTracker:
TaskTracker is a node in a Hadoop cluster that accepts Map, Reduce and Shuffle tasks
from the JobTracker. Each TaskTracker has a defined number of slots which indicate the number
of tasks that it can accept.
When the JobTracker tries to find a TaskTracker to schedule a map or reduce task it first
looks for an empty slot on the same node that hosts the DataNode containing the data. If an empty
slot is not found on the same node, then the JobTracker looks for an empty slot on a node in the
same rack.
5) DataNode:
A DataNode stores data in an HDFS file system. A functional HDFS file system has more
than one DataNode, with data replicated across them. DataNodes connect to the NameNode on
startup.
DataNodes respond to request from the NameNode for filesystem operations. Client
applications can talk directly to a DataNode, once the NameNode has provided the location of the
data. Similarly, MapReduce operations assigned to TaskTracker instances near a DataNode, talk
directly to the DataNode to access the files.
TaskTracker instances can be deployed on the same servers that host DataNode instances,
so that MapReduce operations are performed close to the data.
2) Fair Scheduler:
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 37 | 78
The Fair Scheduler allocates resources evenly between multiple jobs and also provides
capacity guarantees. Fair Scheduler assigns resources to jobs such that each job gets an equal
share of the available resources on average over time. Unlike the FIFO scheduler, which forms
a queue of jobs, the Fair Scheduler lets short jobs finish in reasonable time without starving long
jobs. Tasks slots are assigned to the new jobs, so that each job gets roughly the same amount of
CPU time.
The Fair Scheduler maintains a set of pools into which jobs are placed. Each pool has a
guaranteed capacity. A configuration file is used for specifying the pools and the guaranteed
capacities. When there is a single job running, all the resources are assigned to that job. When
there are multiple jobs in the pools, each pool gets at least as many task slots as guaranteed.
When a pool does not require the guaranteed share the excess capacity is split between
other jobs. With this mechanism resources will be utilized efficiently.
All the pools have equal share by default. It is possible to provide more or less share to
a pool by specifying the share in the configuration file.
3) Capacity Scheduler:
The capacity scheduler has similar functionally as the Fair Scheduler but adopts a
different scheduling philosophy. In Capacity Scheduler, we need to define a number of named
queues. The Capacity Scheduler gives each queue its capacity when it contains jobs, and shares
any unused capacity between the queues. Within each queue FIFO scheduling with priority is
used.
Capacity Scheduler allows strict access control on queues. The access controls are
defined on a per-queue basis. Jobs are sorted based on when they are submitted and their
priorities.
Advantage:
• Best for working with Multiple clients or priority jobs in a Hadoop cluster
• Maximizes throughput in the Hadoop cluster
Disadvantage:
• More complex
• Not easy to configure for everyone
Installing Python:
Python is a widely used high-level programming language. To write and execute code in python,
we first need to install Python on our system.
Installing Python on Windows takes a series of few easy steps.
Step 1 − Select Version of Python to Install
Python has various versions available with differences between the syntax and working of
different versions of the language. We need to choose the version which we want to use or need. There
are different versions of Python 2 and Python 3 available.
Step 2 − Download Python Executable Installer
On the web browser, in the official site of python (www.python.org), move to the Download for
Windows section.
All the available versions of Python will be listed. Select the version required by you and click
on Download. Let suppose, we chose the Python 3.9.1 version.
On clicking download, various available executable installers shall be visible with different
operating system specifications. Choose the installer which suits your system operating system and
download the instlaller. Let suppose, we select the Windows installer(64 bits).
The download size is less than 30MB.
Step 3 − Run Executable Installer
We downloaded the Python 3.9.1 Windows 64 bit installer.
Run the installer. Make sure to select both the checkboxes at the bottom and then click Install New.
Step 4 − Verify Python is installed on Windows
To ensure if Python is successfully installed on your system. Follow the given steps −
• Open the command prompt.
• Type ‘python’ and press enter.
• The version of the python which you have installed will be displayed if the python is
successfully installed on your windows.
Step 5 − Verify Pip was installed
PIP is a powerful package management system (Python Package Manager) for Python software
packages. Thus, make sure that you have it installed.
To verify if pip was installed, follow the given steps −
• Open the command prompt.
• Enter pip –V to check if pip was installed.
• The following output appears if pip is installed successfully.
Numeric:
In Python, numeric data type represents the data which has numeric value. Numeric value can
be integer, floating number or even complex numbers. These values are defined
as int, float and complex class in Python.
• Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fraction or decimal). In Python there is no limit to how long an integer
value can be.
• Float – This value is represented by float class. It is a real number with floating point
representation. It is specified by a decimal point. Optionally, the character e or E followed
by a positive or negative integer may be appended to specify scientific notation.
• Complex Numbers – Complex number is represented by complex class. It is specified
as (real part) + (imaginary part)j. For example – 2+3j
Note – type( ) function is used to determine the type of data type.
Example Python Program:
# Python program to Output:
# demonstrate numeric value
a=5 Type of a: <class 'int'>
print("Type of a: ", type(a)) Type of b: <class 'float'>
c = 2 + 4j
print("\nType of c: ", type(c))
Sequence Type:
In Python, sequence is the ordered collection of similar or different data types. Sequences allows to
store multiple values in an organized and efficient fashion. There are several sequence types in
Python –
• String
• List
• Tuple
1) String
In Python, Strings are arrays of bytes representing Unicode characters. A string is a collection
of one or more characters put in a single quote, double-quote or triple quote. In python there is no
character data type, a character is a string of length one. It is represented by str class.
Creating String
Strings in Python can be created using single quotes or double quotes or even triple quotes.
# Python Program for
# Creation of String
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 41 | 78
# Creating a String
# With single Quotes
String1 = 'Welcome to Ashoka College'
print("String with the use of Single Quotes: ")
print(String1)
Output:
String with the use of Single Quotes:
Welcome to Ashoka College
List:
Lists are just like the arrays, declared in other languages which is an ordered collection
of data. It is very flexible as the items in a list do not need to be of the same type.
Creating List
Lists in Python can be created by just placing the sequence inside the square brackets [ ].
# Creating a List with
# the use of multiple values output:
List = ["Ashoka", "Women’s", "College"] List containing multiple values:
print("\nList containing multiple values:") Ashoka
print(List[0]) College
print(List[2])
Tuple
Just like list, tuple is also an ordered collection of Python objects. The only difference
between tuple and list is that tuples are immutable i.e. tuples cannot be modified after
it is created. It is represented by tuple class.
Creating Tuple
In Python, tuples are created by placing a sequence of values separated by
‘comma’ with or without the use of parentheses for grouping of the data sequence.
Tuples can contain any number of elements and of any datatype (like strings,
integers, list, etc.).
# Creating an empty tuple Output:
Tuple1 = ()
print("Initial empty Tuple: ") Initial empty Tuple:
print (Tuple1) ()
# Creating a Tuple with the use of
Strings Tuple with the use of String:
Tuple1 = (‘apple', 'banana') ('apple', 'banana')
print("\nTuple with the use of String:
") Tuple using List:
print(Tuple1) (1, 2, 4, 5, 6)
Key Value
Type Conversions:
Type Conversion. The process of converting the value of one data type (integer, string,
float, etc.) to another data type is called type conversion.
Example-2:
#!/usr/bin/env python3
>>> x = 3
>>> if x < 10:
... print('x below ten')
The example below shows a code block with 3 statements (print). A block is seen by Python as a single
entity, that means that if the condition is true, the whole block is executed (every statement).
#!/usr/bin/env python3
x=4
if x < 5:
print("x is smaller than five")
print("this means it's not equal to five either")
print("x is an integer")
All programming languages can create blocks, but Python has a unique way of doing it. A block is
defined only by its indention.
If-Else
You can use if statements to make an interactive program. Copy the program below and run
it.
It has several if statements, that are evaluated based on the keyboard input.
Because keyboard input is used, we use the equality sign (==) for string comparison.
The second string is typed, but we need a number. You can convert the string to an integer
using int().
It also makes use of the else keyword, this is the other evaluation case. When comparing
age (age < 5) the else means (>= 5), the opposite.
#!/usr/bin/env python3
Output:
Hello Geek
Hello Geek
Hello Geek
In Else Block
for in Loop: For loops are used for sequential traversal. For example: traversing a list or string or array
etc. In Python, there is no C style for loop, i.e., for (i=0; i<n; i++). There is “for in” loop which is
similar to for each loop in other languages. Let us learn how to use for in loop for sequential traversals.
Syntax:
for iterator_var in sequence:
statements(s)
It can be used to iterate over a range and iterators.
# Python program to illustrate
# Iterating over range 0 to n-1
n=4
for i in range(0, n):
print(i)
Output :
0
1
2
3
Nested For Loop:
# Python program to illustrate
# nested for loops in Python
from __future__ import print_function
for i in range(1, 5):
for j in range(i):
print(i, end=' ')
print()
Output:
1
22
333
4444
Python doesn't have do-while loop. But we can create a program like this.
The do while loop is used to check condition after executing the statement. It is like while loop but it is
executed at least once.
---------------------------------------------------------------
Function in Python:
In Python, a function is a group of related statements that performs a specific task.
Functions help break our program into smaller and modular chunks. As our program grows larger and
larger, functions make it more organized and manageable.
Furthermore, it avoids repetition and makes the code reusable.
Syntax of Function
def function_name(parameters):
"""docstring"""
statement(s)
Above shown is a function definition that consists of the following components.
Try running the above code in the Python program with the function definition to see the output.
def greet(name):
"""
This function greets to
the person passed in as
a parameter
"""
print("Hello, " + name + ". Good morning!")
greet('Paul')
Note: In python, the function definition should always be present before the
function call. Otherwise, we will get an error.
Python Modules:
A Python module is a file containing Python definitions and statements. A module can define
functions, classes, and variables. A module can also include runnable code. Grouping related code into
a module makes the code easier to understand and use. It also makes the code logically organized.
Example: create a simple module
# A simple module, calc.py
def add(x, y):
return (x+y)
def subtract(x, y):
return (x-y)
Import Module in Python – Import statement
We can import the functions, classes defined in a module to another module using the import
statement in some other Python source file.
Syntax:
import module
When the interpreter encounters an import statement, it imports the module if the module is present in
the search path. A search path is a list of directories that the interpreter searches for importing a module.
A module can be considered as several functionalities that would be executed if you include the
module in your application. You can create your module, save it and use it in another application as well.
Modules have .py extension and can be saved externally, independent of the application.
So basically, a module is a file that has a bunch of functions defined in it that can be imported as
a whole file into any application.
Modules increases the reusability of the code, as well as the scalability. That’s why they are
considered essential to programming.
A function is a block of organized, reusable code that is used to perform a single, related
action. There are two types of functions, user-defined and built-in functions. Built-in functions are
provided by python to help in coding like print( ), input( ), etc.
The difference between function vs module in Python is that a function is more specific to a
task, to fulfill a functionality while a module defines classes, functions, attributes, etc.
Create a Module
To create a module just save the code you want in a file with the file extension .py:
Example
Save this code in a file named mymodule.py
def greeting(name):
print("Hello, " + name)
Use a Module
Now we can use the module we just created, by using the import statement:
Example
Import the module named mymodule, and call the greeting function:
import mymodule
mymodule.greeting("Jonathan")
Note: When using a function from a module, use the syntax: module_name.function_name.
Variables in Module
The module can contain functions, as already described, but also variables of all types (arrays,
dictionaries, objects etc):
a = mymodule.person1["age"]
print(a)
Naming a Module
You can name the module file whatever you like, but it must have the file extension .py
Re-naming a Module
You can create an alias when you import a module, by using the as keyword:
Example
Create an alias for mymodule called mx:
import mymodule as mx
a = mx.person1["age"]
print(a)
Built-in Modules
There are several built-in modules in Python, which you can import whenever you like.
Example
Import and use the platform module:
import platform
x = platform.system()
print(x)
Packages in Python:
We organize a large number of files in different folders and subfolders based on some criteria,
so that we can find and manage them easily.
In the same way, a package in Python takes the concept of the modular approach to next logical
level. As you know, a module can contain multiple objects, such as classes, functions, etc. A package
can contain one or more relevant modules. Physically, a package is a folder containing one or more
module files.
Let's create a package named mypackage, using the following steps:
• Create a new folder named D:\MyApp.
• Inside MyApp, create a subfolder with the name 'mypackage'.
• Create an empty __init__.py file in the mypackage folder.
• Using a Python-aware editor like IDLE, create modules greet.py and functions.py with the
following code:
File name: greet.py
def SayHello(name):
print("Hello ", name)
File name:functions.py
def sum(x,y):
return x+y
def average(x,y):
def power(x,y):
return x**y
That's it. We have created our package called mypackage. The following is a folder structure:
D:\MyApp>python
Import the functions module from the mypackage package and call its power() function.
>>> functions.power(3,2)
__init__.py
The package folder contains a special file called __init__.py, which stores the package's content. It
serves two purposes:
1. The Python interpreter recognizes a folder as the package if it contains __init__.py file.
2. __init__.py exposes specified resources from its modules to be imported.
An empty __init__.py file makes all functions from the above modules available when this package
is imported. Note that __init__.py is essential for the folder to be recognized by Python as a package.
You can optionally define functions from individual modules to be made available.
Another way to read a file is to call a certain number of characters like in the following code
the interpreter will read the first five characters of stored data and return it as a string:
# Python code to illustrate read() mode character wise
file = open("file.txt", "r")
print (file.read(5))
Creating a file using write() mode
Let’s see how to create a file and how to write mode works, so in order to manipulate the file, write
the following in your Python environment:
• Python3
The close() command terminates all the resources in use and frees the system of this particular
program.
CLOUD COMPUTING by RKR - ASHOKA WOMEN’S ENGIREEING COLLEGE P a g e 52 | 78
Example program:
f=open("poem.txt")
d=f.read()
d=d.replace("the","them")
f.close()
f=open("poem.txt","w")
f.write(d)
f.close()
(Before executing this, you have to create a text file with file name poem.txt. Then type some text with
some the words. Then execute the above program, all the words will be replaced with them)
Another example program:
def program2():
f = open("MyFile.txt","w")
line1=input("Enter the text:")
line2=input("Enter the text:")
line3=input("Enter the text:")
new_line="\n"
f.write(line1)
f.write(new_line)
f.write(line2)
f.write(new_line)
f.write(line3)
f.write(new_line)
f.close()
program2()
Output:
Enter the text: Hi welcome to Python
Enter the text: We will learn Python programming in ashoka college
Enter the text: It is very easy to learn
Next you go to open the MyFile.txt file, it contains the above text which you
Entered.
Like this we can handle the files as per our requirement by using Python code.
strftime to print day, month, and year in various formats. Here are some of them are:
current.strftime(“%m/%d/%y”) that prints in month(Numeric)/date/year format
current.strftime(“%b-%d-%Y”) that prints in month(abbreviation)-date-year format
current.strftime(“%d/%m/%Y”) that prints in date/month/year format
Example program:
from datetime import date
format2 = current.strftime("%b-%d-%Y")
# prints in month(abbreviation)-date-year format Let's print date, month and yea
r in different-different ways
print("format2 =", format2) format1 = 08/28/22
format2 = Aug-28-2022
format3 = current.strftime("%d/%m/%Y") format3 = 28/08/2022
format4 = August 28, 2022
# prints in date/month/year format
print("format3 =", format3)
datetime.time():
A time object generated from the time class represents the local time.
Components:
• hour
• minute
• second
• microsecond
• tzinfo
Syntax: datetime.time(hour, minute, second, microsecond)
Example programm:
from datetime import time
Example
Create a class named Person, use the __init__() function to assign values for name
and age:
class Person:
def __init__(self, name, age): Output:
self.name = name John
self.age = age 36
p1 = Person("John", 36)
print(p1.name)
print(p1.age)
Note: The __init__() function is called automatically every time the class is being
used to create a new object.
Self is used to represent the instance of the class. With this keyword, you can access the attributes
and methods of the class in python. It binds the attributes with the given arguments. self is used in
different places and often thought to be a keyword. But unlike in C++, self is not a keyword in Python.
UNIT-3
Python for Cloud: Python for Amazon web services, Python for Google Cloud Platform, Python for
windows Azure, Python for MapReduce, Python packages of Interest, Python web Application Frame
work, Designing a RESTful web API.
Cloud Application Development in Python: Design Approaches, Image Processing APP, Document
Storage App, MapReduce App, Social Media Analytics App.
1. Django:
Django Python is a framework for perfectionists with deadlines. With it, you can build better Web apps in
much less time, and in less code. Django is known for how it focusses on automating. It also believes in
the DRY (Don’t Repeat Yourself) principle.
Django was originally developed for content-management systems, but is now used for many kinds of
web applications. This is because of its templating, automatic database generation, DB access layer,
and automatic admin interface generation. It also provides a web server for development use.
Giant companies that use Django Python are- Instagram, Pinterest, Disqus, Mozilla, The Washington
Times, and Bitbucket. In fact, when we think of the terms ‘framework’ and ‘Python’, the first thing that
comes to our minds is Django.
We will see more on Django in another lesson.
Django Architecture
For applications that use the Platform-as-a-service (PaaS) cloud service model, the
architecture and deployment design steps shown in Figure 8.1 are not required since the
platform takes care of the architecture and deployment.
In the component design step, the developers have to take into consideration the
platform specific features. For example, applications designed with Google App Engine (GAE)
can leverage the GAE Image Manipulation service for image processing tasks.
Different PaaS offerings such as Google App Engine, Windows Azure Web Sites, etc.,
provide platform specific software development kits (SDKs) for developing cloud applications.
Applications designed for specific PaaS offerings run in sandbox environments and are
allowed to perform only those actions that do not interfere with the performance of other
applications. The deployment and scaling is handled by the platform while the developers focus
on the application development using the platform-specific SDKs. Portability is a major
constraint for PaaS based applications as it is difficult to move the application from one cloud
vendor to the other due to the use of vendor-specific APIs and PaaS SDKs.
Figure 8.2 shows the component design step for the image processing app. In this step we
identify the application components and group them based on the type of functions performed
and type of resources required. The web tier for the image processing app has front ends for
image submission and displaying processed images. The application tier has components for
processing the image submission requests, processing the submitted image and processing
requests for displaying the results. The storage tier comprises of the storage for processed
images.
Figure 8.3 shows the architecture design step which defines the interactions between the
application components. This application uses the Django framework; therefore, the web tier
components map to the Django templates and the application tier components map to the
Django views. A cloud storage is used for the storage tier. For each component, the
corresponding code box numbers are mentioned.
Figure 8.4 shows the deployment design for the image processing app. This is a multi-tier
architecture comprising of load balancer, application servers and a cloud storage for processed
images. For each resource in the deployment the corresponding Amazon Web Services (AWS)
cloud service is mentioned.
Box 8.4 shows the source code of the Django View of the Image Processing app. The
function home( ) in the view renders the image submission page. This function checks if the
request method is POST or not.
If the request method is not POST then the file upload form is rendered in the template.
Whereas, if the request is POST, the file selected by the user is uploaded to the media directory
(specified in the Django project settings). The selected filter is then applied on the uploaded file
in the applyfilter( ) function. This example uses the Python Imaging Library (PIL) for image
filtering operations.
With the development of web 2.0 and higher and through the increasing reach of high speed
internet to wireless applications, multimedia rich web applications have become widely popular in recent
years.
There are various types of multimedia web applications including multimedia storage,
processing, transcoding and streaming applications. Due the higher resource requirements for
multimedia applications, cloud computing is proving to be an efficient and cost-effective solution. With
the increasing demand for multimedia rich web applications on wireless platforms.
Multimedia Cloud provides storage, processing, and streaming services to millions of mobile
users around the world.
Figure 10.1 shows a reference architecture for a multimedia cloud. In this architecture.
The first layer is the infrastructure services layer that includes computing and storage resources.
On top of the infrastructure services layer is the platform services layer it includes frameworks and
services for streaming and associated tasks such as transcoding and analytics for development of
multimedia applications.
The topmost layer is the applications such as live video streaming, video transcoding, video-on-demand,
multimedia processing etc.
Cloud-based multimedia applications alleviates (bearable) the burden of installing and maintaining
multimedia applications locally on the devices (desktops, tablets, smartphones, etc) and provide access
to rich multimedia content.
A multimedia cloud can have various service models such as IaaS, PaaS, SaaS that offers
infrastructure, platform or application services as shown in above Figure.
The above Figure 10.3 shows a screenshot of a live video streaming demo. This application allows on-
demand creation of video streaming instances in the cloud.
First step in the stream instance creation workflow will specify the details of the stream.
Figure 10.4 and 10.5 show the second and third steps in which an instance size is selected and then the
instance is launched.
The Flash Media Server (FMS) URL and stream details provided in the stream details page
and then entered in the Flash Media Encoder (FME) application. The client gets the video and audio
feed from a camera and microphone or gets a multiplexed feed from video/audio mixers. The video and
audio formats and bit rates can be specified in FME.
After all settings are complete, streaming can be started by clicking on the start button. In this
example, RTMP protocol (Real-Time Messaging Protocol) is used for streaming.
Example program: find in the page number 337 in provided pdf file.
MPEG-DASH:
Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH is a
developing ISO Standard. As the name suggests, DASH is a standard for adaptive streaming over HTTP
that has the potential to replace existing proprietary technologies like Microsoft Smooth Streaming,
Adobe Dynamic Streaming, and Apple HTTP Live Streaming (HLS). A unified standard
would be a boon to content publishers, who could produce one set of files that play on all DASH-
compatible devices.(ipad, iPhone, Nexus 7, Fir kids edition, Fire x version)
MPEG-DASH works by breaking the content into a sequence of small HTTP-based file
segments, each segment containing a short interval of playback time of content that is potentially many
hours in duration, such as a movie or the live broadcast of a sports event. The content is made available
at a variety of different bit rates, i.e., alternative segments encoded at different bit rates covering aligned
short intervals of play back time are made available. While the content is being played back by an MPEG-
DASH client, the client automatically selects from the alternatives the next segment to download and
play back based on current network conditions. The client selects the segment with the highest bit rate
possible that can be downloaded in time for play back without causing stalls or re-buffering events in the
playback. Thus, an MPEG-DASH client can seamlessly adapt to changing network conditions, and
provide high quality play back with fewer stalls or re-buffering events.
HDS:
HDS or HTTP Dynamic Streaming, is Adobe’s method for adaptive bitrate streaming for Flash
Video. This method of streaming enables on-demand and live adaptive bitrate video delivery of MP4
media over regular HTTP connections. When there is recording to be done, we have to implement
nDVR(wowza). So we can record the live stream on the go. HDS is mainly used in live applications
where nDVR is implemented. The chunks will not be missed while recording is going on. So in live
streaming it is highly useful.
[Adaptive bitrate streaming adjusts video quality based on network conditions to improve
video streaming over HTTP networks. This process makes playback as smooth as possible
for viewers regardless of their device, location, or Internet speed.]
Smooth Streaming:
Smooth Streaming is introduced by Microsoft. It is an IIS (Internet Information Services)
Media Services extension that enables adaptive streaming of media to clients (which include
Silverlight) over HTTP. Smooth Streaming uses the simple but powerful concept of delivering small
content fragments (typically 2 seconds worth of video) and verifying that each has arrived within the
appropriate time and played back at the expected quality level. If one fragment doesn’t meet these
requirements, the next fragment delivered will be at a somewhat lower quality level. Conversely, when
conditions allow it, the quality of subsequent fragments will be at a higher level.
Here is an overview of the recognized file extensions and mime types for these protocols, plus
their browser playback support:
The source code for the Django template for the home page is provided in below. This template uses the
jQuery smart wizard to get user inputs. The form inputs are processed in a Django view described.