02 - Introduction To Concurrent Systems PDF
02 - Introduction To Concurrent Systems PDF
02 - Introduction To Concurrent Systems PDF
Programming
◼ In the last lecture we introduced Real Time systems and talked about
two different classifications of such systems
◼ Hard vs Soft
◼ Event vs Time driven
◼ (see http://en.wikipedia.org/wiki/Concurrency_(computer_science) )
5
What do we mean by CPUs and Cores?
void main(void)
{ thread
// Create Threads using CThread objects
CThread t1( ChildThread1, ACTIVE, NULL) ;
CThread t2( ChildThread2, ACTIVE, NULL) ;
// wait for Threads to terminate
t1.WaitForThread() ;
t2.WaitForThread() ; How many threads?
}
What cores/CPUs do they run on?
7
Multi-Tasking: Pros and Cons
Advantages:
• Utilises CPU power - has the potential to harness the power of all available cores.
Software performance grows with number of Cores and CPUs in systems
• Flexibility – system can be distributed across several servers, perhaps in different
countries
• Scalability – a single executable can be run multiple times as separate tasks.
Challenges:
• Decomposing/architecting system into appropriate concurrent tasks
• Communication between parallel tasks
• Synchronization between parallel tasks
• Debugging and testing (especially difficult)
Example Multi Tasking in a Distributed System
8
◼ Multi-tasking is used extensively in web-servers where the same program code to
handle a connection from a single remote client is run multiple times as a thread on the
server, one for each new client, leading to highly scalable solutions.
◼ Denial of service attacks attempt to crash the server by making hundreds of thousands
of connections per second and overwhelming the CPU/Memory resources of the server.
(see http://en.wikipedia.org/wiki/Denial-of-service_attack ).
Server
Client 1 Thread
3 threads instantiated
from the same code to
Server handle 3 concurrent
Client 2 Internet Thread clients. They share the
same code loaded into
memory, but have their
own storage for
Client 3 Server variables/stack etc.
Thread
Another Example of Scalability 9
◼ An elevator system comprising of 4 elevators.
◼ Instead of writing one big program to control all 4 elevators, we could just
write a single task to control 1 elevator and run 4 copies of the executable
with an elevator scheduler to handle and delegate floor requests to each.
Elevator
Elevator requests
main() Scheduler commands
{
Elevator requests
vs. main() main(){ main() main()
{ { {
} }
} } }
X = B2 – 4AC
◼ We could easily write the solution for this using 1 line of C/C++ code .
◼ However, the resulting 1 line solution will run no faster on a quad-core CPU
than it would on a single core CPU.
◼ This is because neither the compiler, CPU or the operating system have the
ability to automatically partition the problem into several smaller tasks that
can be executed in parallel.
◼ However, a programmer could theoretically architect the previous expression 12
into 3 much smaller tasks (implemented as threads on multiple cores):
Note: Because neither of the above 2 expressions depends upon the outcome of the
other, these two threads could be executed in parallel (i.e. at the same time).
In fact an operating system could even allocate these new tasks to run on separate
CPUs and/or cores, at the same time.
◼ Finally we could write a 3rd thread to perform the ‘-’ (subtraction) i.e. take the
output of the previous two threads and subtract them.
13
Problems: Data Dependencies, Communication and Synchronisation
◼ Designing this 3rd thread however highlights the difficulties of parallel
programming.
◼ That 3rd thread is required to communicate with and synchronise itself to the
output of the two previous threads.
◼ The impact of this is that one thread must be designed to wait for the other
two threads to complete. This in turn limits the amount of parallelism in the
solution and thus limits the speed at which it can be calculated. Of course
pipelining the data and results might help.
◼ For example, you may have to throw away or modify some of those “classic”
algorithms that appear in text books and courses like CPSC 260/259 because
they have evolved to be the fastest solution for sequential execution on a
SINGLE core.
Parallel Programming Approaches 15
◼ It’s very common and frustrating to find that after spending many hours
attempting to create a parallel version of a sequential algorithm, it often
performs worse than the sequential solution.
◼ Problems that lend themselves well to parallel processing include 16
◼ MPEG/JPEG decoding,
◼ Image processing,
◼ Weather forecasting,
◼ Finite element analysis etc.
◼ This is because the formula for calculating the next element in the series is
given by the equation
◼ That is, each element in the series can only be calculated after the previous
two, i.e. sequentially, it is thus not possible to calculate each element in the
series in parallel. For a good overview of parallel programming visit
https://computing.llnl.gov/tutorials/parallel_comp
Speedup and Parallel Programming 18
7
6
5
4
Speedup
3
2
1
0
4 2 4
# CPUs 2 1
1
# Cores per CPU
Questions:
◼ Why is speedup not linear, i.e. when you double the numbers of cores why does speedup
not appear to double?
◼ Why might doubling the number of CPUs not always yield the same speedup as doubling
the number of cores?
◼ Can speedup ever be < 1 ?
20
Implementing Multi-tasking (concurrency)
◼ Here, a cheap form of ‘fake’ multitasking is implemented by the system through clever
programming designed to make it appear as if the system is executing several tasks
repeatedly and in parallel. There’s no attempt to make anything faster.
◼ With a carefully crafted program decomposed into smaller tasks, each of which is brief
and can be executed over and over again, inside a loop, we can design a system that
gives the illusion of concurrency.
◼ The program below demonstrates this concept using a loop built into a program.
◼ Tasks are simulated via functions called repetitively within the loop.
void main(void)
{
while(1) { Tasks are ‘simulated’ with function
Monitor_Temperature() ; calls which are invoked rapidly and
Control_flow_rate() ; repeatedly to create the illusion
Update_Display() ; they are all running at the same
} time.
}
Infinite loop creates illusion that
all tasks run concurrently
22
Advantages of Pseudo-Multitasking Systems
◼ Simple learning curve, just a bunch of simple functions inside a forever loop.
◼ Sit in its own internal loop consuming large amounts of CPU time.
◼ Get involved in some operation that would cause it to delay its return,
such as waiting for input from a keyboard.
◼ If either of the 2 points above are violated, then other tasks/functions in the
main loop would be prevented from executing and the illusion of
concurrency will be lost, so its use is limited.
◼ Problems with task protection. Any “task” that crashes may wipe out all the
other tasks.
Network
Node C
Node D
26
◼ With a distributed system each task still has it's own dedicated networked
CPU.
◼ A Real time system might require the use of a deterministic network based
on the concept of token passing or prioritised arbitration. Here a node can
transmit only when it has a token or has been given permission and can only
send one packet of information after which it has to release the token for
use by any other node. It can carry on transmitting only when it gets the
token/permission again.
◼ Here the delay in transmitting an arbitrary sized message from one node to
another can be calculated and is related to
M M M M
◼ Advantages/Drawbacks
◼ Any CPU/core can run any task. For example, one CPU/core could start a
task, but have it completed by another.
◼ CPUs easier to integrate under the control of one Single Host Operating
System, e.g. Windows that controls multiple CPUs and Cores inside your
Laptop/desktop computer.
◼ Weather forecasting,
◼ Web serving where 1000's of clients
connect to a server every second
◼ http://en.wikipedia.org/wiki/Computer_cluster
◼ http://en.wikipedia.org/wiki/Grid_computing
◼ http://en.wikipedia.org/wiki/Blade_server
◼ https://en.wikipedia.org/wiki/Beowulf_cluster
Racks containing “Blade servers”,
complete slide-in computers with
disk drives all running under 1
operating system
Clustering used in Data Centers 31
◼ Clusters are often arranged into massive data centers, to handle 10’s of thousands of
simultaneous user connections. The image below is Microsoft’s data center in San
Antonio, Texas which typically would house >100,000 blade servers.
◼ Microsoft acknowledges that is has over 1 million servers around the world. (Less than
Google, more than Amazon.).