Distributed Programming Handout - Module 1
Distributed Programming Handout - Module 1
Step
At A
Time
DISTRIBUTED PROGRAMMING
Detailed Lesson notes with practical exercises and applications in Distributed Programming
MODULE 1
OVERVIEW OF DISTRIBUTED COMPUTING
Table of Contents
1 General Overview on Distributed Computing and Systems ...................................................................... 2
1.1 Distributed Computing in Cloud Computing ................................................................................ 4
1.1.1 Distributed Computing vs. Cloud Computing ............................................................................. 5
1.1.2 Distributed Cloud vs. Edge Computing ....................................................................................... 5
1.2 Key Advantages .................................................................................................................................. 6
1.3 Distributed Computing Data Flow ...................................................................................................... 6
1.4 Introduction to Distributed Programming ........................................................................................... 9
1.4.1 Enterprise Computing Platforms ................................................................................................ 10
1.4.2 Enterprise Computing Platforms ................................................................................................ 10
1.4.3 Enterprise Architectures ............................................................................................................. 11
1.4.4 Why use Distributed Programming? .......................................................................................... 12
1.4.5 Distributed Programming On The Cloud ................................................................................... 13
1.4.6 Programming the cloud .............................................................................................................. 14
1.5 Architectures Used in Distributed Computing .................................................................................. 16
1.6 Applications of Distributed Systems................................................................................................. 18
1.7. Properties of a Distributed System................................................................................................... 18
1.8 Benefits and Problems With Distributed System .............................................................................. 19
1.9 Differences Between Distributed Systems and Distributed Computing ........................................... 20
1.10 Introduction To Parallel Computing ............................................................................................... 20
1.11 Difference Between Parallel and Distributed Computing ............................................................... 21
1.12 How Does Distributed Computing Work? ...................................................................................... 22
1.13 Module Summary............................................................................................................................ 23
1.14 TUTORIALS .................................................................................................................................. 26
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 1
1 General Overview on Distributed Computing and Systems
Distributed computing is a field of computer science that studies distributed systems. A
distributed system is a system whose components are located on different networked computers,
which communicate and coordinate their actions by passing messages to one another from any
system. The components interact with one another to achieve a common goal.
A computer program that runs within a distributed system is called a distributed program
(and distributed programming is the process of writing such programs)
It is becoming a popular way to meet the demands for higher performance in both high-
performance scientific computing and more "general-purpose" applications. There are many
reasons to show the increasing acceptance and adoption of distributed computing, such as
performance, the availability of computers to connect, fault tolerance and sharing of resources, etc.
By connecting several machines together, more computation power, memory, and I/O bandwidth
can be accessed. Distributed computing can be implemented in a variety of ways. For example,
groups of workstation interconnected by an appropriate high-speed network (abbreviated to
cluster) may even provide supercomputer-level computational power.
Distributed computing divides a single task between multiple computers. Each computer can
communicate with others via the network. All computers work together to achieve a common goal.
Thus, they all work as a single entity. A computer in the distributed system is a node while a
collection of nodes is a cluster.
Distributed computing systems are usually treated differently from parallel computing systems
or shared-memory systems, where multiple computers share a common memory pool that is used
for communication between the processors.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 2
A distributed system is a system that encompasses several components that exist in different
devices, usually located at different geographical locations. These physically separated
components are usually connected over a network, and communicate through message passing.
These components can be computer systems, resources, and processes, and are also referred to as
nodes:
Although the different components in a distributed system are located in different places, the
entire system functions as a single unit to process tasks and share resources. Furthermore,
there are two major types of distributed systems: peer-to-peer systems, and client/server systems.
Peer-to-peer systems, as the name suggests, have all their components processing tasks and sharing
resources equally. Due to the peer-to-peer connection between nodes, there is usually no need for
centralized control in the network.
As opposed to peer-to-peer distributed systems, the nodes in client/server systems take on the roles
of clients and servers. The client devices, or components, request resources, while the servers
provide resources to the clients. There are also other alternative distributed systems, such as n-tier
and three-tier distributed systems.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 3
Distributed memory systems use multiple computers to solve a common problem, with
computation distributed among the connected computers (nodes) and using message-passing to
communicate between the nodes. For example, grid computing, studied in the previous section, is
a form of distributed computing where the nodes may belong to different administrative domains.
Another example is the network-based storage virtualization solution described in an earlier
section in this chapter, which used distributed computing between data and metadata servers.
As data volumes have exploded and application performance demands have increased,
distributed computing has become extremely common in database and application design. This is
why it is especially valuable for scaling so that as data volumes grow, that extra load can be
handled by simply adding more hardware to the system. Contrast this to traditional “big iron”
environments consisting of powerful computer servers, in which load growth must be handled by
upgrading and replacing the hardware.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 4
Cloud computing is the approach that makes cloud-based software and services available on
demand for users. Just like offline resources allow you to perform various computing operations,
data and applications in the cloud also do — but remotely, through the internet.
In a distributed cloud, the public cloud infrastructure utilizes multiple locations and data centers
to store and run the software applications and services. With this implementation, distributed
clouds are more efficient and performance-driven.
A distributed cloud computing architecture, also called distributed computing architecture, is made
up of distributed systems and clouds.
Distributed clouds optimally utilize the resources spread over an extensive network, irrespective
of where users are.
Cloud architects combine these two approaches to build performance-oriented cloud computing
networks that serve global network traffic fast and with maximum uptime.
The growth of cloud computing options and vendors has made distributed computing even more
accessible. Although cloud computing instances themselves do not automatically enable
distributed computing, there are many different types of distributed computing software that run
in the cloud to take advantage of the quickly available computing resources.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 5
With the ease and speed in which new computing resources can be provisioned, distributed
computing enables greater levels of agility when handling growing workloads. This enables
“elasticity,” in which a cluster of computers can be expanded or contracted easily depending on
the immediate workload requirements.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 6
We can make distinctions in the distributed-computing flow model based on the relationship
between the task manager and the computing devices and what the task is. This relationship can
result in the computing devices being closely coupled, where there are frequent transfers of
information between devices, or loosely coupled, where there may be little to no transfer of
information between computing devices. Tasks may range from having a coarse granularity, where
each task is dedicated to a single computing device, to having a fine granularity, where a task is
subdivided among several devices and the computing is done concurrently.
When the task has a coarse granularity and the computing device relationship is loosely coupled,
then the distributed-computing flow model takes the form of a computing cluster or computing
resource management system, where tasks are allocated to each computing device based on
resource availability. Thus, each computing device communicates with the cluster server or
resource manager. The figure below shows the flows for an example of a computing cluster.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 7
The flows in this type of distributed-computing flow model are similar to those in the client–server
flow model, where communications are primarily between each client and the server. A difference
here is that the direction of the flows is not necessarily from the computing server to its clients. In
fact, the size of the task initialization file (which is, in a sense, a request) sent from the server to
each computing device may be much smaller than the size of the results of the computation, which
is sent from the computing device to the server. In this model the flow directionality is asymmetric,
but in the opposite direction from the client–server flow model. Also, each of the flows between
the computing devices and their server is independent of the other flows. There is no
synchronization among individual flows. The critical flows for this model are from the computing
devices to their server. Since the flows for this model are asymmetric, in the direction toward the
server, the server acts as a data sink, while the computing devices act as data sources.
When the task has a fine granularity and the computing node relationship is closely coupled, then
the distributed-computing flow model behaves like a simplified parallel processing system, where
each task is subdivided, based on the degree of parallelism in the application and the topology of
the problem, among several computing devices. These devices work concurrently on the problem,
exchanging information with neighbor devices and expecting (and waiting for) updated
information. The task manager sets up the computing devices and starts the task with an
initialization file as shown below.
Flows in this type of distributed-computing flow model can have the most stringent performance
requirements of any of the models. Since computing devices may block (halt their computations)
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 8
while waiting for information from neighbor devices, the timing of information transfer between
computing devices becomes critical. This has a direct impact on the delay and delay variation
requirements for the network connecting the devices. Although each individual flow has
directionality, collectively there is little or no overall directionality. Individual flows in this model
can be grouped to indicate which neighbor devices a computing device will communicate with for
a given problem or topology. For example, a problem may be configured such that a computing
device will communicate with one, two, four, or six of its closest neighbors.
For this model, critical flows are between computing devices. When a device will transfer the same
information to several neighbors simultaneously, multicasting should be considered to optimize
flow performance. There are no clear data sources or sinks for this model. The climate-modeling
problem could also be considered with a distributed-computing flow model, depending on the task
granularity and degree of coupling within the system.
Flow requirements will vary between the computing cluster and parallel system models, depending
on the degrees of coupling and the granularity in the task. Depending on the application and
amount of analysis you want to put into this model, you can use the computing cluster and parallel
system models as they are, or modify the task granularity and degree of coupling to suit your needs.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 9
distributed analytics engines, can automatically parallelize and distribute tasks and can tolerate
faults.
Distributed Algorithms are designed for programming distributed systems. They differ from
centralized algorithms because they are unaware of any global state or a global time frame.
Issues:
• Modeling: transition systems, state charts, temporal logic
• Communication, Timing, and Synchronization
• Routing Algorithms
• Virtual Circuits and Packet Switching
• Kinds of algorithms: wave algorithms, traversal algorithms, election algorithms, snapshot
algorithms
• Distributed Termination Detection
• Distributed Deadlock Detection
• Distributed Failure Detection
• Stabilization
Java EE .NET
• Runs on a JVM • Runs on the CLR (Common
• From Sun Language Runtime)
• Fully implemented on many operating • From Microsoft
systems • Fully implemented on Windows;
• Maintained and enhanced by the Java partially implemented on other
Community Process (comprised of operating systems
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 10
hundreds of companies and • Microsoft-maintained and
organizations) enhanced
• Source code for the entire framework • Some source code is proprietary
freely available • Mature
• Mature • Kind of a marketing strategy;
• Kind of a standard however, some "components"
are official standards (e.g. C#)
1.4.3 Enterprise Architectures
In the old days, and today for the most trivial of applications, we see client-server organizations.
Two tier architectures are almost always way too fragile. They soon gave way to three-
tier architectures:
The idea here is that any one of the three layers can be completely re-implemented without
affecting the others.
The middle layer completely isolates the front end from any knowledge of the database. The UI
doesn't even know what the data source is. It just makes calls like fetchCustomerById(24337).
Software running in the middle tier is called middleware. Middleware products are also
called containers, since they host and manage the business objects. They can manage lifecycles,
transactions, memory, authentication, concurrency, distribution, security, sessions, resource
pooling, logging and lots of other "system-level plumbing things" so developers only have to
concentrate on business logic.
There's no need to stop at three tiers. You'll often hear the term n-tier.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 11
Sometimes applications are classified by the complexity of the client:
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 12
A sequential program with serial (S1) and parallel (P1) parts. (b) A parallel/distributed program
that corresponds to the sequential program in (a), whereby the parallel parts can be either
distributed across machines or run concurrently on a single machine.
Distributed programs have also found broad applications beyond science, such as search engines,
web servers, and databases. One example is the Folding@Home project which uses distributed
computing on all kinds of systems, from super computers, to personal PCs to perform molecular
dynamics simulations of protein dynamics. Without parallelization, Folding@Home wouldn't be
able to access nearly as many computational resources. For example, running a Hadoop
MapReduce program on a single VM instance is not as effective as running it on a large-scale
cluster of VM instances. Of course, committing jobs earlier on the cloud leads to a reduction in
cost, which is a key objective for cloud users.
Distributed programs also help alleviate subsystem bottlenecks. For instance, I/O devices, such as
disks and network interface cards, typically represent major bottlenecks in terms of bandwidth,
performance, and/or throughput. By distributing work across machines, data can be served from
multiple disks simultaneously, offering an increased aggregate I/O bandwidth, improving
performance, and maximizing throughput. In summary, distributed programs play a critical role in
rapidly solving various computing problems and effectively mitigating resource bottlenecks. This
action improves performance, increases throughput, and reduces cost, especially in the cloud.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 13
their actions to solve a particular problem or offer a specific service. Because a cloud is defined as
a set of internet-based software, platform, and infrastructure services offered through a cluster (or
clusters) of networked computers (i.e., datacenters), a cloud is thus a distributed system. Another
consequence of our definition is that distributed programs (versus sequential or parallel) will be
the norm in clouds. In particular, we define distributed programs in the section as parallel programs
that run on separate processors at different machines. Thus, the only way for tasks in distributed
programs to interact over a distributed system is either by sending and receiving messages
explicitly or by reading and writing from/to a shared distributed memory supported by the
underlying distributed system (e.g., by using distributed shared memory [DSM] hardware
architecture). We next identify the different models by which distributed programs for clouds
(or cloud programs) can be built and recognize some of the challenges that cloud programs must
address.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 14
• Two common cloud realities—virtual environments and datacenter component
diversity—introduce heterogeneity that complicates scheduling tasks and masks
hardware and software differences among cloud nodes.
• To avoid deadlocks and transitive closures and to guarantee mutually exclusive
access, which are highly desirable capabilities in distributed settings, the underlying
system must provide, and the designer must exploit, effective synchronization
mechanisms.
• As failure likelihood increases with cloud scale, system designs must employ fault-
tolerance mechanisms, including task resiliency, distributed checkpointing, and
message logging.
• For effective and efficient execution, task and job schedulers must support control of
task locality, parallelism, and elasticity as well as service-level objectives (SLOs).
Addressing all of these development considerations and cloud issues imposes a major burden on
programmers. Designing, developing, verifying, and debugging all (or even some) of these
capabilities present inherently difficult problems and can introduce significant correctness and
performance challenges, in addition to consuming significant time and resources.
Modern distributed analytics engines promise to relieve developers of these responsibilities. These
engines provide application programming interfaces (APIs) that enable users to present their
programs as simple, sequential functions. The engines then automatically create, parallelize,
synchronize, and schedule tasks and jobs. They also handle failures without requiring user
involvement. At the end of this unit, we detail how distributed analytics engines effectively
abstract and address the challenges of developing cloud programs. In the next section, however,
we first present the two traditional distributed programming models: shared memory and message
passing. Second, we discuss the computation models that cloud programs can employ.
Specifically, we explain the synchronous and asynchronous computation models. Third, we
present the two main parallelism categories of cloud programs, data parallelism and graph
parallelism. Last, we describe the architectural models that cloud programs can typically utilize:
master-subordinate and peer-to-peer architectures.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 15
1.5 Architectures Used in Distributed Computing
Various hardware and software architectures are used for distributed computing. At a lower
level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of
whether that network is printed onto a circuit board or made up of loosely coupled devices and
cables. At a higher level, it is necessary to interconnect processes running on those CPUs with
some sort of communication system.
Distributed programming typically falls into one of several basic architectures: client–
server, three-tier, n-tier, or peer-to-peer; or categories: loose coupling, or tight coupling.
• Client–server: architectures where smart clients contact the server for data then format
and display it to the users. Input at the client is committed back to the server when it
represents a permanent change. In this model, the client fetches data from the server
directly then formats the data and renders it for the end-user. To modify this data, end-users
can directly submit their edits back to the server.
For example, companies like Amazon that store customer information. When a customer
updates their address or phone number, the client sends this to the server, where the server
updates the information in the database.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 16
This middle tier holds the client data, releasing the client from the burden of managing its
own information. The client can access its data through a web application, typically.
Through this, the client application’s and the user’s work is reduced and automated easily.
For example, a cloud storage space with the ability to store your files and a document
editor. Such a storage solution can make your file available anywhere for you through the
internet, saving you from managing data on your local machine.
• n-tier: architectures that refer typically to web applications which further forward their
requests to other enterprise services. This type of application is the one most responsible
for the success of application servers. Enterprises need business logic to interact with
various backend data tiers and frontend presentation tiers. This logic sends requests to
multiple enterprise network services easily. That’s why large organizations prefer the n-
tier or multi-tier distributed computing model.
For example, an enterprise network with n-tiers that collaborate when a user publishes a
social media post to multiple platforms. The post itself goes from data tier to presentation
tier.
• Peer-to-peer: architectures where there are no special machines that provide a service or
manage the network resources. Instead all responsibilities are uniformly divided among
all machines, known as peers. Peers can serve both as clients and as servers. Examples of
this architecture include BitTorrent and the bitcoin network. Unlike the hierarchical client
and server model, this model comprises peers. Each peer can act as a client or server,
depending upon the request it is processing. These peers share their computing power,
decision-making power, and capabilities to work better in collaboration.
For example, blockchain nodes that collaboratively work to make decisions regarding
adding, deleting, and updating data in the network.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 17
a master/slave relationship. Alternatively, a "database-centric" architecture can enable
distributed computing to be done without any form of direct inter-process communication,
by utilizing a shared database. Database-centric architecture in particular provides
relational processing analytics in a schematic architecture allowing for live environment
relay. This enables distributed computing functions both within and beyond the parameters
of a networked database.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 18
• Fault tolerance: the failure of a node in a distributed system doesn’t affect the overall
performance of the entire system.
• Scalability: a distributed system is flexible in terms of the number of nodes that participate
in the network. The system is capable of handling a growth or reduction in size.
• Resource sharing: the connection of nodes in a distributed system allows for the sharing
of resources. For example, instead of each computer system being connected to one printer,
a single printer can be shared by all the participating nodes in the system.
• Transparency: the ability of a distributed system to present itself as a single unit,
concealing the fact that its resources are physically separate and located in multiple
components
• Distributed systems are highly reliable, mainly because there are multiple computer
systems. Therefore, in the event of one component failing, the distributed system will still
function as normal.
• Distributed systems combine the processing capabilities of multiple nodes to render
maximum performance.
• Additional components can be added to the system due to the flexible nature of distributed
systems.
• The connection of multiple devices allows for the sharing of resources.
• It’s practically cheaper to implement a distributed system where multiple devices connect
to share resources, as opposed to having single centralized systems.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 19
1.9 Differences Between Distributed Systems and Distributed Computing
Distributed system: a collection of independent computers that are connected with an
interconnection network while Distributed computing: a method of computer processing in which
different parts of a computer program are run on two or more computers that are communicating
with each other over a network.
Figure (a): is a schematic view of a typical distributed system; the system is represented as a
network topology in which each node is a computer and each line connecting the nodes is a
communication link.
Figure (b): shows the same distributed system in more detail: each computer has its own local
memory, and information can be exchanged only by passing messages from one node to another
by using the available communication links.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 20
Figure (c): shows a parallel system in which each processor has a direct access to a shared
memory.
There are multiple advantages to parallel computing. As there are multiple processors working
simultaneously, it increases the CPU utilization and improves the performance. Moreover, failure
in one processor does not affect the functionality of other processors. Therefore, parallel
computing provides reliability. On the other hand, increasing processors is costly. Furthermore, if
one processor requires instructions of another, the processor might cause latency.
a) Definition
b) Number of computers
The number of computers involved is a difference between parallel and distributed computing.
Parallel computing occurs in a single computer whereas distributed computing involves multiple
computers.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 21
c) Functionality
In parallel computing, multiple processors execute multiple tasks at the same time. However, in
distributed computing, multiple computers perform tasks at the same time. Hence, this is another
difference between parallel and distributed computing.
d) Memory
Moreover, memory is a major difference between parallel and distributed computing. In parallel
computing, the computer can have a shared memory or distributed memory. In distributed
computing, each computer has its own memory.
e) Communication
Also, one other difference between parallel and distributed computing is the method of
communication. In parallel computing, the processors communicate with each other using a bus.
In distributed computing, computers communicate with each other via the network.
f) Usage
Parallel computing helps to increase the performance of the system. In contrast, distributed
computing allows scalability, sharing resources and helps to perform computation tasks efficiently.
So, this is also a difference between parallel and distributed computing.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 22
• Detect and handle errors in connected components of the distributed network so that the
network doesn’t fail and stays fault-tolerant.
Advanced distributed systems have automated processes and APIs to help them perform better.
From the customization perspective, distributed clouds are a boon for businesses. Cloud service
providers can connect on-premises systems to the cloud computing stack so that enterprises can
transform their entire IT infrastructure without discarding old setups. Instead, they can extend
existing infrastructure through comparatively fewer modifications.
The cloud service provider controls the application upgrades, security, reliability, adherence to
standards, governance, and disaster recovery mechanism for the distributed infrastructure.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 23
➢ The shared-memory model assumes a shared address space, which is accessible by all
tasks. Tasks communicate with each other by reading and writing to this shared address
space. Communication among the tasks must be explicitly synchronized (using constructs
such as barriers, semaphores, and locks). OpenMP is an example of a shared-memory
model programming language.
➢ In the message-passing model, tasks do not share an address space and can only
communicate to each other by explicitly sending and receiving messages. MPI is an
example of a message-passing model programming language.
➢ Programming models are also classified as synchronous and asynchronous, based on the
orchestration of the various tasks that are running in parallel. Synchronous programming
models force all component tasks to operate in lockstep mode, while asynchronous models
do not.
➢ Programs can also be classified according the type of parallelism they embody. They can
either be data parallel or graph parallel.
➢ Data-parallel models focus on distributing the data over multiple machines while running
the same code on each. This type of model is also called the single-program, multiple-data
(SPMD) model.
➢ Graph parallelism models focus on distributing computation as opposed to data. This type
of model is also called the multiple-program, multiple-data (MPMD) model.
➢ Tasks in a distributed programming model can be arranged into two distinct architectural
models: asymmetric/master-subordinate and symmetric/peer-to-peer architectures.
➢ The master-subordinate organization requires one or more tasks to be specifically
designated as the master tasks, which will coordinate the execution of the program among
the subordinate tasks.
➢ The peer-to-peer organization consists of a set of tasks that are all equal but require more
complicated schemes to organize computation and make decisions.
➢ Major challenges in building cloud programs include managing scalability,
communication, heterogeneity, synchronization, fault tolerance, and scheduling.
➢ Programs cannot be infinitely sped up by virtue of Amdahl's law, which expresses the limit
on the speedup of a program as a function of the fraction of the program's time spent
executing code that is serial in nature.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 24
➢ Efficiently managing communication among distributed tasks dictates performance for
many applications. Strategies to improve communication bottlenecks in the cloud include
colocating highly communicating tasks and effectively managing the partitioning of data
to map data to nodes that are closest to it.
➢ Clouds bring heterogeneity in terms of the underlying physical hardware, which is typically
masked from the end user through virtualization. Programs running on the cloud that can
account for and adjust based on heterogeneous hardware can benefit in terms of
performance.
➢ Robust synchronization techniques are a must in distributed programming to deal with
issues such as deadlocks.
➢ Fault tolerance poses a serious challenge in programming for clouds. Programs must
anticipate for and recover against failures of software and hardware while running in the
cloud.
➢ Task and job scheduling techniques take into account the unique nature of cloud resources
in order to maximize performance.
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 25
1.14 TUTORIALS
1. What Are Distributed Systems?
2. How does a distributed system work?
3. What are the types of distributed systems?
4. Why would you design a system as a distributed system? List some advantages of
distributed systems.
5. List some disadvantages or problems of distributed systems that local only systems do not
show (or at least not so strong).
6. List three properties of distributed systems
7. Give a definition of middleware and show in a small diagram where it is positioned.
8. What is the transparency dogma in distributed systems middleware and what is wrong with
it?
9. What is the difference between Asynchronous and Parallel programming?
10. What is a Single-point-of-failure and how can distribution help here?
11. What kind of reliable connection is provided by a tcp/ip based socket? Is this reliability
enough for distributed calls or when does it break down?
12. What is the advantage if your server side processing uses threads instead of a single
process?
13. What is the problem behind keeping state for a client on a server?
14. What is a proxy? Give an example of where a proxy can be used.
15. What are the differences between a local call and a remote call?
16. What are stub and skeleton and why are they needed in remote prodecure calls?
17. How a distributed system is different from distributed computing?
18. What Is Distributed Debugging?
19. What is distributed system design?
20. What Is The Security Mechanisms Used In Distributed Computing?
21. What Is Meant By Client Server Communication?
22. Differentiate Between Synchronous And Asynchronous Communication?
23. List The Two Types Of Thread Scheduling? Explain?
24. What is the difference between networking and internetworking?
25. What is meant by distributed garbage collection?
IUGET/2021-2022/BSc/300/DISTRIBUTED_PROGRAMMING/MODULE1 26