Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

COS 464 Concurrent Progr by DR Mrs Asogwa

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

1

LECTURE NOTES ON COS464 CONCURRENT


PROGRAMMING
PART 1

TABLE OF CONTENTS

1. Introduction

2. Basic Concepts

3. Introduction to Concurrency and Parallel Computing

4. Parallel Architecture

5. Threads Analysis

6. Basic Software and Operating System

7. References
2

1. Introduction

This is an introductory lecture note on Concurrent Programming. The note will lead the
student to the understanding of the foundational concepts of current programming. It
contains the introduction, Basic Concepts related to concurrent programming,
Introduction to Concurrency and Parallel Computing, The Need for Parallel Computers,
Parallel Architecture, Threads Analysis, Basic Software and Operating System.
The reader is advised to get a firm grip of the discussion or stumble on without it. To
make an ‘A’ in the course, the student is advised to attend classes, study the notes after
each day’s lecture, ask the lecturer questions if there are issues and obtain answers; do
associated assignments and take the examination confidently without depending on
‘microchip’.
2. Basic Concepts
At this level, it is assumed that you are already familiar with concepts associated with
programming. The concepts below are described just to refresh your memory.
Program: A program is a set of ordered instructions given to a computer to follow in
order to execute a specified task.
Programming: Programming is the act of writing computer programs.
Sequential programming: Act of writing computer instructions (code) that are executed
one after another.
Structured Programming
This is the act of using structured programming constructs (such as IF, While, For, etc) to
write computer program

In this course, you will advance to writing programming that are not limited to being run
one after another but at the same time. Writing programs that run in this manner is
referred to as Concurrent programming.

Concurrent: This means more than one thing happening at the same time such as
running a segment of code, printing and formatting, or browsing and downloading etc
happening, or done at the same time.

Transistor: This is a semi conductor which is a gate or switch that controls the voltage of
an electrical equipment (computer) which controls the flow of current and also amplifies
electric signals.
3

Si: silicon (Si), a nonmetallic chemical element in the carbon family. The name silicon
derives from the Latin silex or silicis, meaning “flint” or “hard stone.” Amorphous
elemental silicon was first isolated and described as an element in 1824 by Jöns Jacob
Berzelius, a Swedish chemist.

Multiprogramming: This is a way of running many programs at the same time in a


uniprocessor

Pipelining: a means of running a sequence of instructions at the same time that another
instruction is still running or has not finished running. In other words, programs overlap
in time. Formally, it is a form of computer organization in which successive steps of an
instruction sequence are executed in turn by a sequence of modules able to operate
concurrently, so that another instruction can be begun before the previous one is finished

Parallel programming: Parallel computing refers to the process of breaking down larger
problems into smaller, independent, often similar parts that can be executed
simultaneously by multiple processors communicating via shared memory, then the
results are combined upon completion as part of an overall output.

It covers how to use more than one processor or computer to complete a task. Parallel
programming can also be done by dividing the problem or the data among different
processors, and allowing them to exchange information. There are different tools and
methods for parallel programming, such as MPI, Pthreads, and OpenMP. Parallel
programming is useful for solving complex or large-scale problems that require high
performance or efficiency.

Nm (Nanometer): a metric unit of length equal to one thousand-millionth of a metre.

Nanocomputing: Nanocomputing is a term that is coined for the representation and


manipulation of data by computers that are smaller than a microcomputer

3. Introduction to Concurrency and Parallel Computing

Concurrency as a computer terminology describes things happening at the same time on


the computer system. Whenever things happen at the same time, we are interested in who
are doing what at the same time. As a result, two types of concurrency can be identified.
The first is quasi or false or pseudo concurrency, when the various peripheral devices and
the single processor of a uni-processor computer system are doing things at the same
time. It involves the single processor, executing program, at the same time, the peripheral
devices will be performing the next input and output operations. It may also involve the
various processing elements of the uniprocessor computer performing their respective
functions, at the same time, the single processor, executing program. This type of
concurrency involves a uniprocessor computer system. The second type of concurrency
involves more than one processor of multi-processor computer system/parallel
computers. Each of the processors will be executing a part of a program at the same time.
True concurrency, therefore, relates to parallel computing, the act of using a parallel
4

computer, whose processors will be executing a part of a program concurrently or in


parallel, with the aim of solving a specific problem. This lecture focuses on true
concurrency as it relates to parallel computing.

3.1 The Rationale for True Concurrency and Parallel Computing


Some contemporary computer applications require high speed or high performance
computers. Allowing the multiple processors of a parallel computer to execute a piece of
a program concurrently or in parallel with the aim of solving a particular problem is
aimed at increasing the speed of computer system indefinitely.

In the past, many efforts have been made towards increasing the speed of computer
systems, like multiprogramming, pipelining and so on. these efforts have yielded good
results, however, the most effective way of increasing the speed of computer systems is
to use a multi-processor computer system or parallel computers, and allow the various
processors to execute a part of a program that solves a specific problem concurrently or
in parallel. Therefore, increasing the speed of computer system indefinitely aims at
developing high performance computers.

Furthermore, Moore’s law can be used to explain the rationale for true concurrent
programming. In 1965, Gordon Moore predicted that the numbers of transistors on
computers will double every eighteen months. Though a lot of computer scientists
considered this law/prediction as economic law/prediction. However, the effects of
Moores law/prediction are as follows:
 The speeds of computers will double every eighteen months. The reason for this is
that increasing the number of transistors on computers will increase the speed of
computers.
 The speed of computers will double as the number of transistors doubles every
eighteen months.
 The size of transistors will continue to shrink (reduce in size), half of its original
size every eighteen months.
Though Moore’s law and its effects have been fulfilled for some decades, however,
recent discoveries show that there is hardware limitation with respect to continued
reduction in the size of transistors, in an attempt to increase number of transistors on
computer, thereby increasing its speed. This limit or threshold point has been reached; as
a result, the size of transistors cannot be reduced any more. The reason for this limit is
that all semiconductor devices, like transistors are Si based It can be assumed that a
circuit element will take at least a single Si atom, and the covalent bonding in Si has a
bond length approximately 20nm (nanometers), hence we will reach the limit of
miniaturization very soon. At the moment that limit has been reached. As a result of this
limitation, it implies that speed of computers can no longer be realized by increasing the
number of transistors. Computer scientists, at the moment are looking into other avenues
of increasing the speed of computers. Two of such possible ways are Parallel Computing
and Nanocomputing. Parallel Computing is promising, as it has recorded tremendous
success, while Nanocomputing remains a technology for the future, two decades from
date.
5

The diagrams below illustrate Moore’s Law and the various interpretations of Moore’s
Law.

No. of Transistors Speed

Year No. of Transistors

No. of Transistors
Threshold point

Year
At the Threshold point, any further increase in the number of transistors will reduce the
speed of the computer. This justifies the need to introduce more than one processor on the
computer system.

3.2 The Need for Concurrency


Concurrency is needed for the following reasons:
i. Concurrency results in the sharing of It helps in techniques like coordinating
the execution of processes, memory allocation,
ii. It provides scheduling which maximizes throughput.
iii. It creates multiuser environment since hardware resources are limited
iv. Provides computation Speedup by Parallel execution
v. It enables modularity, that is, it divides system functions by separation of
processes

Disadvantages of Concurrency
Though concurrency has many benefits and aforementioned, it also has some
disadvantages such as
i.It has problems like deadlocks
ii. It can result in resource starvation.

3.3 True Concurrency Requires Cooperation and Communication

Because each of the processors executes concurrently or at the same time, with the aim of
solving a specific problem, they must cooperate by communicating with each other. Data
must be exchanged and communicated among the various processors. It means that the
6

various processors must be networked for the purpose of communication. Determining


the most efficient route of communicating with the processors will help to minimize
communication overheads. The root of all successful human endeavors is cooperation, by
communicating. Any problem that will be solved requires that the problem be
decomposed or split and distributed to the various processors. Each of the processors
solves its own problem and communicates the result to others, so that results of processed
data can be collated with the aim of producing final result.

3.4 Illustrating with Concept of Division of Labour in Economics

The concept of division of labour in Economics requires that task be split into various
parts, and allocated to various workers. The various workers will perform some of the
tasks at the same time, while others will wait until some have been completed. In the
course of executing the work, the workers cooperate and communicate with each other.
The main advantage of this division of labour is increase in the level of production or
increase in the amount of work. The same concept is used in parallel computing,
computing task is split among the various processors, and they are allocated to the
various processors of the parallel computers. Some of the computing tasks are executed at
the same time by the parallel processors, while some will wait until others have
completed. The processors communicate and cooperate with each other by exchanging
data and synchronizing each other’s activities. The result is increase in throughput.
However, low productivity can be as a result of poor communication facilities that the
workers can use to communicate with each other. The same thing applies to parallel
computing. Throughput can be low as a result of inefficient means of communication
between the various processors; therefore, the most efficient means of communication
between the processors is desired. One way of realizing an efficient means of
communication between the processors is to devise the most efficient algorithm that
can be used to route the message from one processor to the other.

3.5 Research Area under Parallel Computing

Parallel computing, which requires the use of parallel computer to solve problem at
reasonable time has defined new research area that have added a parallel dimension to
computing. Whatever we can do with the traditional computing, using a uni-processor
computer has a parallel dimension, using a parallel computer system. As a result, the
following research areas are under parallel computing:
Parallel Architecture, Parallel Algorithm, Parallel/True Concurrent Programming,
Parallel Programming Languages, Parallel Computer System Performance Evaluation,
etc. Each of these broad research areas will be discussed briefly.

3.4.1 Parallel Architecture


This considers the various ways of arranging the various components of a parallel
computer, like multiple processors, memory, communication network etc with the aim of
realizing a functional parallel computer.
7

3.4.2 Parallel Algorithm


This involves the analysis and development of the algorithm that will be used to write a
parallel or true concurrent programming.

3.4.3 True Concurrent/Parallel Programming


This research area considers the use of parallel programming construct/control structures
to write programs that the parallel computer will execute.

3.4.4 Parallel Programming Languages


This research area include the design of programming constructs that can be used to write
a parallel program or true concurrent program.

3.4.5 Parallel Computer System Performance Evaluation


This research area examines the performance of parallel computer with the aim of
optimizing the performance of the system.

4. PARALLEL ARCHITECTURE
4.0 Introduction
Writing a true concurrent program or a parallel program requires that we understand the
architecture of the computer system that can be used to write the program. Parallel
architecture refers to the various ways of organizing or arranging the various components
of a parallel computer with the aim of enhancing the performance of the system. These
components include the following: memory, parallel processors, peripheral devices,
interconnecting network, connecting the various processors etc. This chapter uses Fynn’s
Taxonomy to examine the various architectures of computer, including parallel computer.

4.1 Flynn’s Taxonomy


Speed as a characteristic of computers has led to an increase in the demand for
high performance computers; as a result, there has been a single processor technological
leap, from the vacuum tube architecture in the 1940’s to the current single processor,
RISC architecture. Flynn, has classified the diversities of architecture that have been
developed from the 1940 to the present time in 1972, into four broad areas. Flynn’s
classification of various computer architecture has been captioned ‘FLYNN
TAXONOMY’ , and it is based on the number of streams of instruction and data that the
computer executes at a time. The Flynn’s taxonomy for classifying computer
architecture follows:
i) SISD, Single Instruction, Single Data
ii) SIMD, Single Instruction, Multiple Data
iii) MISD, Multiple Instruction, Single Data
iv) MIMD, Multiple Instruction, Multiple Data
As I consider each of these classifications, I shall identify the various architecture within
each, and discuss in detail those that can be used for concurrent programming.
8

4.2 SISD: SINGLE INSTRUCTION, SINGLE DATA


All the architectures within the single Instruction, Single Data classification are single
processor architecture, it is purely Jon Von Neumann architecture, it executes a single
instruction sequentially on single data stream. However, temporal parallelism can be
introduced by allowing sequential execution to be over lapped in time by a number of
functional unit within the processor, this is pipelining.
An example of computer with this type of architecture is CRAY supercomputer.
An example of architecture that is within this classification is the
CRAY - 1 computers though, a single processor architecture, but it has multiple pipeline
that allow both scalar and vector operations to be performed in parallel, this helps to
increase the performance of the computer, as a result, many have regarded this as the first
true supercomputer.

4.3 SIMD: SINGLE INSTRUCTION, MULTIPLE DATA


This classification of computer architecture includes all computers that execute a single
instruction on multiple streams of data. It will require multiple processors that will
execute the same instruction on multiple data stream at the same time.
Example of architecture that is part of this classification is the array processor
architecture.

4.4 MISD, MULTIPLE INSTRUCTIONS, SINGLE DATA


This classification means that multiple instructions are applied to single stream of data.
An example of architecture that is close to this classification is the systolic array
architecture. However, because no architecture falls into the MISD classification,

4.5 MIMD: MULTIPLE INSTRUCTIONS, MULTIPLE DATA


This classification allows each of the processors to apply their instructions to their own
data, thus allowing multiple instructions to be executed on multiple data, simultaneously.
 Shared memory, MIMD
 Distributed memory, MIMD
Each of these architectures has its own method of communication between the processor.
Each of these architectures will be considered.

4.5.1 SHARED MEMORY, MIMD


In a shared memory, MIMD, all the processors access a shared memory, therefore, the
control that will be used for communication and synchronization will allow exclusive
access to update shared variables.
The diagram that follows illustrates further:
9

Shared memory

Interconnection network

MIMD Processors

4.5.2 DISTRIBUTED MEMORY, MIMD


In a distributed memory, MIMD, each of the multiple processors has its own private
memory. Communication between these processors is through an interconnection
network. Transputers are devices that combine memory and processors on a single
chip.The diagram that follows illustrates further:
10

Interconnection network

Processor

Memory

MIMD Processors with private memory

4.5.3 Hybrid MIMD


The two architectures of MIMD can be combined to realize the benefits of the two
architectures. In distributed memory MIMD architecture, though each of the processors
has its own memory, the concept of virtual memory can be used so that a processor can
use the memory of another processor whenever its memory capacity is not sufficient.
This is called virtual memory. It is virtual memory because the processor thinks that it is
using its own memory, but in reality it is using the memory of another processor.
Similarly, in a shared memory MIMD architecture, the common memory of the parallel
computer can be divided into memory modules, each module
for each of the processors, thereby allowing the processors to have its own memory while
sharing a common memory, this is called interleaved memory MIMD.

4.5.4 INTERCONNECTION NETWORK


Different network configurations can be used to connect the various processors. Among
the common network configurations, includes the following:
11

 Fully Connected Network

7 1

6
1 1
2

1
5 3
1

4
1

8-processor fully connected configuration


In a fully connected configuration, every processor is connected to the other processors.
A fully connected n- processor configuration will have a maximum of n*(n-1)
connections from any processor to the other.
 Chain/Bus

0 1 2 3 4 5 6 7

8-processor chain
In a Chain/Bus configuration, the processors are arranged in a linear form. Each
processor is connected to two other processors except the processors at the front or back
of the chain. The maximum number of connections in such configuration in a n-processor
chain is 2*(n-1).
Two-dimensional Mesh

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19
20-processor mesh
12

This configuration is known as array. It consists of processors that are arranged in a grid.
As an array configuration, a two dimensional mesh of n processors can be decomposed
into I rows and j column processors. Each row of the array configuration can be regarded
as a horizontal chain, while each column can be regarded as a vertical chain. A two
dimensional mesh of I rows and j column of processors will have a maximum of 2*(i-1)*j
+ 2*(j-1)*i connections, and a total of I*j processors.
 Ring

0
11 1

10 2

9 3

8 4

7 5
6
12-processor ring

Ring is one of the most common configurations. Like a chain/bus, the processors are
arranged in a linear form, and each processor is connected to two other processors for a
bi-directional rings where messages travel in both direction of the ring. Unlike a
chain/bus, the processors at front and back ends of the chain are connected to each other
in both directions. Therefore, for a bi-directional, n-processor ring, the maximum number
of connections is 2*n.
13

 Torus

0 1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30 31 32 33 34

35 36 37 38 39 40 41

42-processor torus
A torus configuration can be regarded as a two dimensional mesh, but each row and each
column of the array processors is a ring. Therefore the torus configuration can be
regarded as a ring of rings of processors. A torus configuration with i rows and j columns
of processors will have a total of i*j processors and a total of 2*i*j + 2*j*i = 4*I*j
connections.

4. THREADS ANALYSIS

Threads are an important topic in concurrent programming and in operating systems. To


be able to write and run concurrent programs, one needs to understand what threads and
processes are. A thread is somehow related to a process but there is a distinct difference
between them. A thread is a basic unit of execution by the CPU. It is the smallest
sequence of instructions that can be managed independently by a scheduler (a scheduler
is a program that arranges operations into an appropriate sequence.). A thread can also
be defined as the smallest unit of execution to which a processor allocates time. A thread
is not a complete program, it can only run within a program.
14

To understand it better, assume that there is a Newspaper presented for users to read lying
on top of a table. That Newspaper before anybody reads it is like a program that has not
started running. Then you come in, saw the Newspaper and picked it to read. Your
starting to read a section of the Newspaper is like starting to run a part of a program. Note
that you don’t read all the news in the Newspaper at once; you may just pick the Sports
Section on African Cup of Nations (AFCON) and start reading. The entire Newspaper is
likened to a program and the Sports Section is likened to a Thread which is a unit of
execution cutting through the code and the data structures of the program (that is, a thread
of control that is running in the program or a portion of the program that the CPU is
allocating attention to at a particular time). Then your friend comes in and sees a section
on Politics in the same Newspaper and starts reading. Now you and your friend are
reading different sections of the same Newspaper and this is likened to running multiple
threads of the same program. What do you think will happen if two of you want to read
the same section at once? There will be conflict, right? In the same way, if two or more
threads wants to update (read/write, etc) the same data structure at the same time, there
will be conflict such as deadlock; but operating system uses a scheduler to allocate
resources appropriately to ensure that there is no conflict.

Components of a thread

A thread comprises

- A thread ID
- A program counter
- A set of registers and,
- A stack

A thread can share program code, data and files with other threads within a program but
must be individually assigned a stack and a set of registers.

Advantages of Threads

i. Responsiveness: threads provide responsive user interface, with different tasks


executing independently (example: browsing a page while downloading your
email
ii. Resource sharing – Since it shares resources there is no need for inter-process
communication or request
iii. More economical – no need to buy separate resources such as memory
modules
iv. Uses multi-process architecture and hence has increased speed
v. Independent execution - if one thread fails, it does not affect the functioning
of other threads
vi. Threads allow a program to operate more efficiently by doing multiple things
at the same time.
vii. Threads can be used to perform complicated tasks in the background without
interrupting the main program.
15

Process

A process is an instance of an executing program. A process to be executed must have a


code – which contains the instructions; data – to be manipulated by the instructions;
register – for temporary storage and stack – to keep track of other executions. Threads
that are in one process share resources (such as code, data and files) of that process.

A process can also be defined as a combination of program and all the states of threads
executing in the program. The main difference between a process and a thread is that a
thread cannot run on its own, but runs as within program while a process is a program in
execution. In the context of a browser program for example, one thread may be
downloading a page while another is allowing the browsing a page, another fetching a
page from remote server, etc.

Dissimilarities between a thread and a process

- Threads are dependent on one another since they share some resources hence
there may be conflict, processes are independent.
- Memory space are not protected in threads since threads share memory but
processes run independently

References
[1] Introduction to Concurrent Programming by Nir Piterman, Chalmers University
of Technology, University of Gothenburg, SP1 2021/2022
[1] Peterson, J. L, Silberschatz, A. (1985). Operating System Concepts, Addison-
Wesley Publishing Company.
[2] Chien-Min WANG and Sheng.De WANG, Structured Partitioning of Concurrent
Program for execution on multiprocessors, J. Parallel Computing.
[3] I.N. Herstein; Topics in Algebra; 1976; John Wiley & sons Inc.
16

[4] S. Aryan, B. Gaither; Parallel algorithm development workbench; 1998; Proc. Of


the 1998 ACM/IEEE conference on Supercomputing
[5] Concurrent Programming Lecture Notes by Dr O. E. Oguike
[6] Introduction to Parallel Computing: Lecture notes from Ignou, The People’s
University

You might also like