Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
230 views

Module 1 - Cloud Computing

Notes are as per VTU syllabus

Uploaded by

ashwini
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views

Module 1 - Cloud Computing

Notes are as per VTU syllabus

Uploaded by

ashwini
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Page 1 of 48

MODULE –I : Distributed System Models


and Enabling Technologies.

Content: Scalable Computing Service over the internet, System


Models for Distributed and Cloud Computing.

Difference between Parallel, Distributed and Cloud


Computing.
..\Diff btw cec, pc,dc,cc.docx
Grid Computing:
Def/Meaning: Grid Computing is (also as distributed
computing) is a collection of computers working together to
perform various tasks. It distributes the workload across
multiple systems, allowing computers to contribute their
individual resources to common goal.

Computing: A study or use of computer for a particular


goal.
Page 2 of 48

Assess: Estimate /Evaluate/Determine/Judge.

1.1 SCALABLE COMPUTING OVER THE INTERNET

 Over the past 60 years, computing technology has undergone a series of


platform and environment changes.

 In this section, we assess evolutionary changes in machine architecture,


operating system platform, network connectivity, and application workload.

 Instead of using a centralized computer to solve computational problems, a


 parallel and distributed computers were used.

 “Distributed computing become “Data-Intensive And Network-Centric.”

 These large-scale Internet applications have significantly enhanced the


quality of
 life and information services in society today.

1.1.1 The Age of Internet Computing.


 Billions of people use the Internet every day.

 As a result, supercomputer sites and large data centers must provide high-
performance computing services to huge numbers of Internet users
concurrently.

 Because of this high demand, the Linpack Benchmark for high-performance


computing (HPC) applications is no longer optimal for measuring system
performance.
Page 3 of 48

 Instead, The emergence of computing clouds instead demands high-


throughput computing (HTC) systems built with parallel and distributed
computing technologies [5,6,19,25].

 We have to upgrade data centers using fast servers, storage systems, and
high-bandwidth networks. The purpose is to advance network-based
computing and web services with the emerging new technologies.

1.1.1.1 The Platform Evolution


Page 4 of 48

Table 1.1: Evolution of computing Technology in years

Year Technology Goal


1950-70 Handful of mainframes, including the To satisfy the demands of large
IBM 360 and CDC 6400 businesses and government
organizations.
1960 to 1980 Lower-cost minicomputers -do-
such as the DEC PDP 11 and VAX
Series
1970 to 1990 Widespread use of personal computers -do-
built with VLSI microprocessors.
1980 to 2000 Massive numbers of portable -do- + Personal online usage
computers and pervasive devices
appeared in both wired and wireless
applications
Since 1990 HPC AND HTC

**** MPP: Massively Parallel Processors


Page 5 of 48

******************************************************************
cluster: Cluster is often a collection of homogeneous
collection compute nodes that are physically connected in
close range to one another.

Proliferation: rapid increase in the number or amount of


something.
Leverage: Power to influence people and get the result you
want.
******************************************************************

1.1.1.2 High-Performance Computing


For many years, HPC systems emphasize the raw speed performance.

The speed of HPC systems has increased from Gflops in the early 1990s to now
Pflops in 2010.

This improvement was driven mainly by the demands from scientific, engineering,
and manufacturing communities.

FOR EXAMPLE:
The Top 500 most powerful computer systems in the world are measured by
floating-point speed in Linpack benchmark results. However, the number of
supercomputer users is limited to less than 10% of all computer users. Today, the
majority of computer users are using desktop computers or large servers when they
conduct Internet searches and market-driven computing tasks.

******************************************************************
• The LINPACK Benchmarks are a measure of a
system's floating pointcomputing power. Introduced
by Jack Dongarra, they measure how fast a computer
Page 6 of 48

solves a dense n by n system of linear equations Ax = b,


which is a common task in engineering.
• Throughput : the number of tasks completed per unit
time. EX: high usage of internet and web 2.0
applications.
*****In computing, floating point operations per
second (FLOPS, flops or flop/s) is a measure of computer
performance, useful in fields of scientific computations that
require floating-point calculations.

Name Unit Value

• *
* kiloFLOPS kFLOPS 103

* megaFLOPS MFLOPS 106


*
* gigaFLOPS GFLOPS 109

* teraFLOPS TFLOPS 1012


*
* petaFLOPS PFLOPS 1015

* exaFLOPS EFLOPS 1018


*
* zettaFLOPS ZFLOPS 1021

*
yottaFLOPS YFLOPS 1024
*
**********************************
Page 7 of 48

1.1.1.3 High-Throughput Computing

• The development of market-oriented high-end computing


systems is undergoing a strategic change from an HPC
paradigm to an HTC paradigm.

About HTC paradigm:


• This HTC paradigm pays more attention to high-flux
computing.

• The main application for high-flux computing is in Internet


searches and web services by millions or more users
simultaneously. The performance goal thus shifts to measure high
throughput or the number of tasks completed per unit of time.

HTC addresses the following problems:


-cost
-energy savings
-reliability
-security at m
any data and enterprise computing centers.
Page 8 of 48

1.1.1.4 Three New Computing Paradigms

The three new computing paradigms are as follows:


1. Radio-frequency identification (RFID),
2. Global Positioning System (GPS),
and sensor technologies has triggered the development the

3. Internet of Things (IoT).

****************************************************
****
• A nautical mile is a unit of measurement used in both air
and marine navigation,[2] and for the definition of territorial
waters.[3] Historically, it was defined as one minute (1/60 of
a degree) of latitude. Today the international nautical mile
is defined as exactly 1852 metres. This converts to about
1.15 imperial/US miles. The derived unit of speed is
the knot, one nautical mile per hour 1 nautical mile = 1.852
KM .
************************************************
*******
• Radio-frequency identification (RFID)
uses electromagnetic fields to automatically identify and
Page 9 of 48

track tags attached to objects. The tags contain


electronically stored information. Passive tags collect
energy from a nearby RFID reader's interrogating radio
waves. Active tags have a local power source (such as a
battery) and may operate hundreds of meters from the
RFID reader. Unlike a barcode, the tag need not be within
the line of sight of the reader, so it may be embedded in the
tracked object. RFID is one method of automatic
identification and data capture (AIDC).[1]
RFID EXAMPLE
Page 10 of 48

• RFID technology is implemented in tracking and passive,


non-contact data read/update systems. It uses a transponder,
more commonly known as an RF tag, which can be
electronically programmed to contain and communicate
unique information. This information is read from a
distance over radio waves, and can be automatically
transferred into tracking software, or similar.
****************************************************
GPS:: The global positioning system is a satellite-based
navigation system consisting of a network of 24 orbiting
satellites that are eleven thousand nautical miles in space and in
six different orbital paths. The satellites are constantly moving,
making two complete orbits around the Earth in just under 24
hours. If you do the math, that's about 2.6 kilometers per second.
That's really moving!
Page 11 of 48

• The GPS satellites are referred to as NAVSTAR satellites.


Of course, no GPS introduction would be complete without
learning the really neat stuff about the satellites too! The
first GPS satellite was launched way back in February,
1978.
Page 12 of 48

IOT
• The Internet of things (IoT) is the extension
of Internet connectivity into physical devices and everyday
objects. Embedded with electronics, Internet connectivity,
and other forms of hardware (such as sensors), these
devices can communicate and interact with others over the
Internet, and they can be remotely monitored and
controlled .
Page 13 of 48

1.1.1.5 Computing Paradigm Distinctions


• Ref pdf differences of cec,dc,pc,and cc
• The system efficiency is decided by speed,
programming, and energy factors (i.e.,
throughput per watt of energy consumed).

1.1.2 Scalable Computing Trends and New


Paradigms
Page 14 of 48

Designers and programmers want to predict


the technological capabilities of future
systems.
• Jim Gray’s paper, “Rules of Thumb in Data
Engineering,” is an excellent example of how
technology affects applications and vice
versa.

1.1.2.1 Degrees of Parallelism


• The degree of parallelism (DOP) is a metric which
indicates how many operations can be or are being
simultaneously executed by a computer. It is especially
useful for describing the performance of parallel
programs and multi-processors systems.

• A program running on a parallel computer may utilize


different numbers of processors at different times. For each
time period, the number of processors used to execute a
program is defined as the degree of parallelism. The plot of
the DOP as a function of time for a given program is called
the parallelism profile.
Page 15 of 48

HOW DEGREE OF PARALLELISM IS


CHANGEDBLP,WLP,ILP,DLP,TLP,JLP
• BLP: BITLEVEL PARALLELISM
• WLP:WORD LEVEL PARALLELISM
• ILP:INSTRUCTIONLEVEL PARALLELISM
• DLP:DATA LEVEL PARALLELISM
• TLP:TASK LEVEL PARALLELISM
• JLP: JOB LEVEL PARALLELISM

• Fifty years ago, when hardware was bulky and expensive,


most computers were designed in a bit-serial fashion. In
this scenario, bit-level parallelism (BLP) converts bit-serial
processing to word-level processing gradually.
• Over the years, users graduated from 4-bit microprocessors
to 8-,16-, 32-, and 64-bit CPUs. This led us to the next
wave of improvement, known as instruction-level
parallelism (ILP), in which the processor executes multiple
instructions simultaneously rather than only one instruction
at a time.
• For the past 30 years, we have practiced ILP through
pipelining, superscalar computing, VLIW (very long
instruction word) architectures, and multithreading. ILP
requires branch prediction, dynamic scheduling,
speculation, and compiler support to work efficiently.
Page 16 of 48

• Data-level parallelism (DLP) was made popular through


SIMD (single instruction, multiple data) and vector
machines using vector or array types of instructions. DLP
requires even more hardware support and compiler
assistance to work properly.
• Ever since the introduction of multicore processors and
chip multiprocessors (CMPs), we have been exploring task-
level parallelism (TLP).
• A modern processor explores all of the before mentioned
parallelism types. In fact, BLP, ILP, and DLP are well
supported by advances in hardware and compilers.
However, TLP is far from being very successful due to
difficulty in programming and compilation of code for
efficient execution on multicore CMPs. As we move from
parallel processing to distributed processing, we will see an
increase in computing granularity to job-level parallelism
(JLP). It is fair to say that coarse-grain parallelism is built
on top of fine-grain parallelism.
****************************************************
********




A bit (short for binary digit) is the smallest unit of data in a computer. A
bit has a single binary value, either 0 or 1. Although computers usually
Page 17 of 48

provide instructions that can test and manipulate bits, they generally are
designed to store data and execute instructions in bit multiples
called bytes. In most computer systems, there are eight bits in a byte

****************************************************

1.1.2.2 Innovative Applications


• Both HPC and HTC systems desire transparency in many
application aspects.
For example, data access, resource
allocation, process location, concurrency in execution, job
replication, and failure recovery should be made transparent to
both users and system management.
Page 18 of 48

************************************
Genomic analysis is the identification, measurement or comparison of genomic features such as DNA sequence,
structural variation, gene expression, or regulatory and functional element annotation at a genomic scale.

************************************
Page 19 of 48

Utility Computing( Profitable computing)


• Utility computing focuses on a business model in which
customers receive computing resources from a paid service
provider.
• All grid/cloud platforms are regarded as utility service
providers.
• Major technological challenges include all aspects of
computer science and engineering.
EX: users demand new network efficient
processors, scalable memory and storage schemes, distributed
OSes, middleware for machine virtualization, new programming
Page 20 of 48

models, effective resource management, and application


program development.

1.1.2.4 The Hype Cycle of New


Technologies.
Hype cycle def/meaning: Cycle shows the expectations for the
technology at five different stages. The expectations rise sharply
from the trigger period to a high peak of inflated expectations.

Viz.
1. Technology trigger.
2. Peak of inflated expectations.
3. Trough of disillusionment
4. Slope of enlightenment
5.Plateau of productivity

• Fig 1.3
Page 21 of 48

1.1.3 The Internet of Things and Cyber-


Physical Systems
1.1.3.1 The Internet of Things
Def/meaning:
• The Traditional Internet connects machines to machines or
web pages to web pages.
• The concept of the IoT was introduced in 1999 at MIT [40].
The IoT refers to the networked interconnection of
everyday objects, tools, devices, or computers.

• One can view the IoT as a wireless network of sensors that


interconnect all things in our daily life. These things can be
large or small and they vary with respect to time and place.
The idea is to tag every object using RFID or a related
sensor or electronic technology such as GPS.

• The IoT researchers have estimated that every human being


will be surrounded by 1,000 to 5,000 objects.

• The IoT needs to be designed to track 100 trillion static or


moving objects simultaneously. The IoT demands universal
addressability of all of the objects or things.

• To reduce the complexity of identification, search, and


storage, one can set the threshold to filter out fine-grain
Page 22 of 48

objects. The IoT obviously extends the Internet and is more


heavily developed in Asia and European countries.
----------------------------------------------------------------------------
The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts.

threshold :intensity that must be exceeded for a certain reaction

---------------------------------------------------------------------------------------------------------------------------------------------------------

1.1.3.2 Cyber-Physical Systems


• A cyber-physical system (CPS) is the result of interaction
between computational processes and the physical world.
• A CPS integrates “cyber” (heterogeneous, asynchronous)
with “physical” (concurrent and information-dense)
objects.

• A CPS merges the “3C” technologies of computation,


communication, and control into an intelligent closed
feedback system between the physical world and the
information world, a concept which is actively explored in
the United States.
• The IoT emphasizes various networking connections
among physical objects, while the CPS emphasizes
exploration of virtual reality (VR) applications in the
physical world. We may transform how we interact with
Page 23 of 48

the physical world just like the Internet transformed how


we interact with the virtual world.
• Distributed and cloud computing systems are built over a
large number of autonomous computer nodes. These node
machines are interconnected by SANs, LANs, or WANs in
a hierarchical manner.
• With today’s networking technology, a few LAN switches
can easily connect hundreds of machines as a working
cluster. A WAN can connect many local clusters to form a
very large cluster of clusters. In this sense, one can build a
massive system with millions of computers connected to
edge networks.
• Massive systems are considered highly scalable, and can
reach web-scale connectivity, either physically or logically.
In Table 1.2, massive systems are classified into four
groups:
• clusters,
• P2P networks.
• Computing grids, and
Internet clouds over huge data centers
Table 1.2: Classification of Parallel and Distributed Computing
Systems.
Page 24 of 48
Page 25 of 48

1.3.1.1 Cluster Architecture


• Cluster Architecture represents arrangement of servers I/O
devices, disk array through various connections.

1.3.1.2 Single System Image(SSI)


• Definition/Meaning:
An SSI is an illusion created by software or hardware that
presents a collection of resources as one integrated,
powerful resource
Also SSI makes the cluster appear like a single machine to
the user.
• About SSI Designing:
• Greg Pfister [38] has indicated that an ideal cluster should
merge multiple system images into a single-system image
(SSI).
• Cluster designers desire a cluster operating system or some
middleware to support SSI at various levels, including the
sharing of CPUs, memory, and I/O across all cluster nodes.
Page 26 of 48

1.3.1.3 Hardware, Software, and


Middleware Support
• Designing of clusters h/w, s/w, m/s combinations and
various techniques.
• In chapter we will discuss cluster design principles for both
small and large clusters.
• MPPs: Clusters exploring massive parallelism are known as
MPP’s.
• Almost all HPC clusters in the top 500 list are also MPPs.

• Building blocks in the clusters are computer nodes(pcs,


workstations, servers or SMP) also a special
communication software such as PVM or MPS and a
network interface card in each computer node.

Operating System for clusters:


Most clusters are run under linux operating system.

Interconnection between nodes:


Page 27 of 48

Computer nodes are interconnected by a high bandwidth


network such as Gigabit Ethernet, Myrinet, InfiniBand etc.
• Middle ware supports are needed to create SSI or high
availability (HA)
• Both sequential and parallel applications can run on the
cluster and special parallel environment are need to
facilitate use of cluster resources.
• Users may want all distributed memory system(DSM) to
be shared by all servers by forming DSM.
• Because many SSI features are expensive or difficult to
achieve at various cluster operational levels.

1.3.1.4 Major Cluster Design Issues


• Unfortunately, a cluster-wide OS for complete resource
sharing is not available yet.
• Middleware or OS extensions were developed at the user
space to achieve SSI at selected functional levels.
• Without this middleware, cluster nodes cannot work together
effectively to achieve cooperative computing.
• The software environments and applications must rely on the
middleware to achieve high performance.
Page 28 of 48

• The cluster benefits come from scalable performance,


efficient message passing, high system availability, seamless
fault tolerance, and cluster-wide job management.

1.3.2 Grid Computing Infrastructures


(arrangements)
• Grid computing meaning:
• The evolution from internet to web and grid services is
certainly playing a major role in this growth.

Table 1.3 Critical Design Issues and Feasible


Implementations
Page 29 of 48

Computational Grids:
Definition: A computational grid offers an infrastructure that
couples computers, software/middleware, special instruments
and people and sensors together.
Construction/ Architecture:
• The grid is often constructed across LAN, WAN, or
Internet backbone networks at a regional, national, or
global scale.
• The computers used in a grid are primarily workstations,
servers, clusters, and supercomputers .
Page 30 of 48

• At the server end a grid is a network.


• At the client end, a grid is a wired or wireless terminal
devices.
• The grid integrates the computing, communication,
contents and transactions as rented services.
Page 31 of 48

1.3.2.2 Grid Families

• Computational grids/data grids are build primarily at the


national level.
Page 32 of 48

1.3.3 Peer to Peer Network Families


1.3.3 Peer-to-Peer Network Structure/System:
Definition /Meaning :Stands for "Peer to Peer." In a P2P
network, the "peers" are computer systems which are connected to
each other via the Internet. Files can be shared directly between
systems on the network without the need of a central server. In
other words, each computer on a P2P network becomes a file
server as well as a client.
Page 33 of 48
Page 34 of 48

• In this system, client machines and PC’s, workstations


are connected to a central server for compute, email, file
access and database applications.
• P2P architecture offers a distributed model of Networked
systems.

In detail P2P systems


• In P2P system every node act as a both client and a server.
• Peer machines are simply client computers connected to the
Internet.
• In P2P system, no master-slave relationship exists, due to
this, all client machines act autonomously to join or leave
the system freely.
• No central coordination or central database is needed.
• In other words, no peer machine has a global view of the
entire P2P system.
• The system is self-organizing with distributed control.

Overlay Network
Page 35 of 48

Meaning/Definition:
• P2P overlay network that characteristics the logical
connectivity among the peers.
• Data items/files are distributed in the participating peers.
• Based on communication or file sharing needs, the peer
IDS( Definition of: IDS. IDS. (Intrusion Detection System) Software
that detects an attack on a network or computer system.) form an
overlay network at the logical level.
• This overlay is a virtual network formed by mapping each
physical machine with its ID, logically through a virtual
mapping as shown in fig 1.17.
Page 36 of 48

//** Virtual networking is a technology that facilitates data


communication between two or more virtual machines (VM). It
is similar to traditional computer networking but provides
interconnection between VMs, virtual servers and other related
components in a virtualized computing environment. ** //

/////
Page 37 of 48

Underlay Network is physical infrastructure above which overlay network is


built. It is the underlying network responsible for delivery of packets across
networks.
An Overlay Network is a virtual network that is built on top of an underlying
network infrastructure (Underlay Network). Actually, “Underlay” provides a
“service” to the overlay
Below table will further elaborate the difference between “Underlay Network”
and “Overlay Network” as below –
Page 38 of 48

-----------------------------------------------------------------------------

When new peer joins the system, its peer ID is added as a


node in the overlay network. When an existing system leave the
system its peer ID is automatically removed from overlay
network.
• Classification:
1. Unstructured overlay N/W
2. Structured overlay N/W
Unstructured overlay N/W :
is characterised by randam graph.
Page 39 of 48

• There is no fixed route to send files or messages among the


nodes.
Often, flooding is applied to send a query to all nodes in an
unstructured overlay , this resulting an heavy network traffic
and undeterministic results. (Search results)
// *** Flooding attack is one of the serious threats of network security on Web servers that
resulted in the loss of bandwidth and overload for the user and the service provider web
server. *** //

• Structured overlay network:


Follow certain connectivity topology and rules for inserting
and removing nodes (Peer ID’s) from the overlay graph.
P2P Network Families
Page 40 of 48

1.3.3.4 P2P Computing Challenges


• P2P computing faces three types of heterogeneity problems
in hardware, software, and network requirements.
CHALLENGES:
• There are too many hardware models and architectures to
select from; incompatibility exists between software and
the OS; and different network connections and protocols.

• We need system scalability as the workload increases.


System scaling is directly related to performance and
bandwidth. P2P networks do have these properties.

• Data location is also important to affect collective


performance. Data locality, network proximity, and
interoperability are three design objectives in distributed
P2P applications.

• P2P performance is affected by routing efficiency and self-


organization by participating peers.

• Fault tolerance, failure management, and load balancing are


other important issues in using overlay networks.
Page 41 of 48

• Lack of trust among peers poses another problem. Peers are


strangers to one another.
• Security, privacy, and copyright violations are major
worries by those in the industry in terms of applying P2P
technology in business applications [35].
• In a P2P network, all clients provide resources including
computing power, storage space, and I/O bandwidth.
• The distributed nature of P2P networks also increases
robustness, because limited peer failures do not form a
single point of failure.
• By replicating data in multiple peers, one can easily lose
data in failed nodes. On the other hand, disadvantages of
P2P networks do exist. Because the system is not
centralized, managing it is difficult.
• In addition, the system lacks security. Anyone can log on to
the system and cause damage or abuse.

1.3.4 Cloud computing over the internet


Definition/ Meaning of cloud computing:
• Cloud computing has been defined differently by many
users and designers. For example, IBM, a major player in
cloud computing, has defined it as follows: “A cloud is a
pool of virtualized computer resources. A cloud can host a
Page 42 of 48

variety of different workloads, including batch-style


backend jobs and interactive and user-facing applications.”
1.3.4.1 Internet Clouds
• Internet cloud is combination of user, user requests, paid
services, hardware software storage network service.
• The idea of internet clouds is move desktop computing to
a service-oriented platform using server clusters and huge
databases at data centers.
• The cloud ecosystem must be designed to be secure,
trustworthy and dependable.
Page 43 of 48

1.3.4.2 The Cloud Landscape


The cloud computing has been introduced because the
traditional systems have encountered several performance
bottlenecks (hurdles):
-constant system maintenance,
-poor utilization, and
-increasing costs associated with hardware/software upgrades.
• Cloud computing is an on demand paradigm resolves or
relieves us from these problems.
Page 44 of 48

Cloud platforms:
 Infrastructure as a Service (IaaS).
 Platform as a Service (PaaS).
 Software as a Service (SaaS).

 Infrastructure as a Service (IaaS).


 This model puts together infrastructures demanded by users—namely
servers, storage, networks, and the data center fabric.

 The user can deploy and run on multiple VMs running guest OSes on
specific applications.

 The user does not manage or control the underlying cloud infrastructure,
but can specify when to request and release the needed resources.

 Platform as a Service (PaaS).


 This model enables the user to deploy user-built applications onto a
virtualized cloud platform.

 PaaS includes middleware, databases, development tools, and


some runtime support such as Web 2.0 and Java.

 The platform includes both hardware and software integrated with


specific programming interfaces.

 The provider supplies the API and software tools (e.g., Java, Python, Web
2.0, .NET).

 The user is freed from managing the cloud infrastructure.


Page 45 of 48

 Software as a Service (SaaS)


 This refers to browser-initiated application software over thousands of paid cloud
customers.

 The SaaS model applies to business processes, industry applications, consumer relationship
management (CRM), enterprise resources planning (ERP), human resources (HR), and
collaborative applications.

 On the customer side, there is no upfront (inadvance) investment in servers or software


licensing. On the provider side, costs are rather low, compared with conventional hosting of
user applications.

 End of module:
 Brief Review:
Page 46 of 48
Page 47 of 48
Page 48 of 48

You might also like