Parallel Progamming With Pthreads
Parallel Progamming With Pthreads
Parallel Progamming With Pthreads
Michael Ibrahim
Lesson Preview
• What is a thread?
• How are threads are different from processes?
• What data structures are used to implement and manage threads?
• Requires coordination
• Sharing of tools, parts,
workstations
Visual Metaphor
A thread is like a … worker
• Is an active entity • Is an active entity
• Executing unit of a process • Executing unit of product order
• Works simultaneously with • Works simultaneously with
others others
• Many workers completing
• Many threads executing products order
Processes Threads
Concurrency Control & Coordination
Synchronization mechanisms:
• Mutual Exclusion
• Exclusive access to only on thread at a time
• Mutex
• Condition variable
• Waiting on other threads specific condition before proceeding
• Waking up other threads from wait state
Threads and Thread Creation
• Thread type
• Thread data structure
• Join (thread)
• Terminate a thread
Thread Creation Example
Mutual Exclusion
Making safe_insert safe
Producer/Consumer Example
• What if the process you wish to perform with mutual exclusion needs
to occur only under certain conditions?
Producer/Consumer Pseudocode
Condition Variable
Condition Variable API
• Condition type
• Wait(mutex, cond)
• Mutex is automatically release and re-acquired on wait
• Signal (cond)
• Notify only one thread waiting on condition
• Broadcast(cond)
• Notify all waiting threads
Condition Variable Quiz
Recall the consumer code from
the previous example for condition
variable.
Instead of ‘while’, why did we not
simply use ‘if’?
‘While’ can support multiple consumer threads?
Cannot guarantee access to m once the condition is signaled?
The list can change before the consumer gets access again?
All of the above
Avoiding Common Mistakes
• Keep track of mutex/conditional variable used with a resource
• E.g., mutex_type m1;//mutex for var1
• Check that you always (and correctly) using lock & lock
• E.g., did you forget to lock/unlock?
• Use a single mutex to a access a single resource.
• In this case operations will occur concurrently on the shared resource
• Check that you are signaling correct condition
• Check that you not using signal when broadcast is needed
• Only one thread will proceed … remaining threads will continue to wait
Spurious Wake-Ups
• They usually happen because, in between the time when the condition
variable was signaled and when the waiting thread finally ran, another
thread ran and changed the condition.
Deadlocks
Interrupts Signals
• Events generated externally by • Events triggered by the CPU &
components other than the CPU software running on it
(I/O devices, timers, other CPUs)
• Determined based on the • Determined based on the
physical platform operating system
• Appear asynchronously • Appear synchronously or
asynchronously
Interrupts
• Functionalities
– Thread management, e.g. creation and joining
– Thread synchronization primitives
• Mutex
• Condition variables
• Reader/writer locks
• Pthread barrier
– Thread-specific data
• gcc -lpthread
Thread Creation
• Initially, main() program comprises a single, default thread
– All other threads must be explicitly created
int pthread_create(
pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine)(void *),
void * arg);
• thread: An opaque, unique identifier for the new thread returned by the subroutine
• attr: An opaque attribute object that may be used to set thread attributes
You can specify a thread attributes object, or NULL for the default values
• start_routine: the C routine that the thread will execute once it is created
• arg: A single argument that may be passed to start_routine. It must be passed by
reference as a pointer cast of type void. NULL may be used if no argument is to be
passed.
Opaque object: A letter is an opaque object to the mailman, and sender and receiver
know the information.
Thread Creation
• pthread_create creates a new thread and makes it
executable, i.e. run immediately in theory
– can be called any number of times from anywhere within your code
• Once created, threads are peers, and may create other threads
• There is no implied hierarchy or dependency between threads
Example 1: pthread_create
#include <pthread.h> One possible output:
#define NUM_THREADS 5
In main: creating thread 0
void *PrintHello(void *thread_id) { In main: creating thread 1
long tid = (long)thread_id; In main: creating thread 2
printf("Hello World! It's me, thread #%ld!\n", tid); In main: creating thread 3
pthread_exit(NULL); Hello World! It's me, thread #0!
} In main: creating thread 4
Hello World! It's me, thread #1!
int main(int argc, char *argv[]) { Hello World! It's me, thread #3!
pthread_t threads[NUM_THREADS]; Hello World! It's me, thread #2!
long t; Hello World! It's me, thread #4!
for(t=0;t<NUM_THREADS;t++) {
printf("In main: creating thread %ld\n", t);
int rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t );
if (rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
pthread_exit(NULL);
}
Terminating Threads
• pthread_exit is used to explicitly exit a thread
– Called after a thread has completed its work and is no longer
required to exist
• If main()finishes before the threads it has created
– If exits with pthread_exit(), the other threads will continue to
execute
– Otherwise, they will be automatically terminated when
main()finishes
• The programmer may optionally specify a termination
status, which is stored as a void pointer for any thread that
may join the calling thread
• Cleanup: the pthread_exit()routine does not close
files
– Any files opened inside the thread will remain open after the thread
is terminated
Thread Attribute
int pthread_create(
pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine)(void *),
void * arg);
• Make sure that all passed data is thread safe: data racing
– it can not be changed by other threads
– It can be changed in a determinant way
• Thread coordination
Example 2: Argument Passing
#include <pthread.h>
#define NUM_THREADS 8
struct thread_data {
int thread_id; char
*message;
};
sleep(1);
struct thread_data *my_data = (struct thread_data *) threadarg;
taskid = my_data->thread_id;
hello_msg = my_data->message;
printf("Thread %d: %s\n", taskid, hello_msg);
pthread_exit(NULL);
}
Example 2: Argument Passing
int main(int argc, char *argv[]) {
pthread_t threads[NUM_THREADS]; Thread 3: Klingon: Nuq neH!
int t; Thread 0: English: Hello World!
char *messages[NUM_THREADS]; Thread 1: French: Bonjour, le monde!
messages[0] = "English: Hello World!"; Thread 2: Spanish: Hola al mundo
messages[1] = "French: Bonjour, le monde!"; Thread 5: Russian: Zdravstvytye, mir!
messages[2] = "Spanish: Hola al mundo"; Thread 4: German: Guten Tag, Welt!
messages[3] = "Klingon: Nuq neH!";
Thread 6: Japan: Sekai e konnichiwa!
messages[4] = "German: Guten Tag, Welt!";
messages[5] = "Russian: Zdravstvytye, mir!";
Thread 7: Latin: Orbis, te saluto!
messages[6] = "Japan: Sekai e konnichiwa!";
messages[7] = "Latin: Orbis, te saluto!";
for(t=0;t<NUM_THREADS;t++) {
struct thread_data * thread_arg = &thread_data_array[t];
thread_arg->thread_id = t;
thread_arg->message = messages[t];
pthread_create(&threads[t], NULL, PrintHello, (void *) thread_arg);
}
pthread_exit(NULL);
}
Wait for Thread Termination
Suspend execution of calling thread until thread terminates
#include <pthread.h>
int pthread_join(
pthread_t thread,
void **value_ptr);
• thread: the joining thread
• value_ptr: ptr to location for return code a terminating thread passes to
pthread_exit
• Recommendation
– Be careful if your application uses libraries or other objects that
don't explicitly guarantee thread-safeness.
– When in doubt, assume that they are not thread-safe until
proven otherwise
– This can be done by "serializing" the calls to the uncertain
routine, etc.
Why PThreads (not processes)?
• The primary motivation
– To realize potential program performance gains
• Compared to the cost of creating and managing a process
– A thread can be created with much less OS overhead
• Managing threads requires fewer system resources than
managing processes
• All threads within a process share the same address space
• Inter-thread communication is more efficient and, in many cases,
easier to use than inter-process communication
pthread_create vs fork
• Timing results for the fork() subroutine and the
pthreads_create() subroutine
– Timings reflect 50,000 process/thread creations
– units are in seconds
– no optimization flags
Why pthreads
• Potential performance gains and practical advantages over non-
threaded applications:
– Overlapping CPU work with I/O
• For example, a program may have sections where it is performing a long
I/O operation
• While one thread is waiting for an I/O system call to complete, CPU
intensive work can be performed by other threads.
• Asynchronous event handling
– Tasks which service events of indeterminate frequency and duration can be
interleaved
– For example, a web server can both transfer data from previous requests
and manage the arrival of new requests.
AXPY with PThreads
• y = α·x + y
– x and y are vectors of size N
• In C, x[N], y[N]
– α is scalar
• Decomposition and mapping to pthreads
main() {
/* declarations and initializations */
task_available = 0;
pthread_cond_init(&cond_queue_empty, NULL);
pthread_cond_init(&cond_queue_full, NULL);
pthread_mutex_init(&task_queue_cond_lock, NULL);
/* create and join producer and consumer threads */
}
• Two conditions:
• Queue is full: (task_available == 1) cond_queue_full
• Queue is empty: (task_available == 0) cond_queue_empty
• A mutex for protecting accessing the queue (CS): task_queue_cond_lock
Producer-Consumer Using Condition Variables
void *producer(void *producer_thread_data) {
int inserted;
while (!done()) {
create_task();
pthread_mutex_lock(&task_queue_cond_lock);
while (task_available == 1) Release mutex (unlock)
1 pthread_cond_wait(&cond_queue_empty, when blocked/wait
&task_queue_cond_lock);
Acquire mutex (lock) when
insert_into_queue(); awaken
2 task_available = 1; CS
pthread_mutex_unlock(&task_queue_cond_lock);
3 pthread_cond_signal(&cond_queue_full);
}
}
Producer:
1. Wait for queue to become empty, notified by consumer through cond_queue_empty
2. insert into queue
3. Signal consumer through cond_queue_full
Producer-Consumer Using Condition Variables
void *consumer(void *consumer_thread_data) {
while (!done()) {
pthread_mutex_lock(&task_queue_cond_lock);
while (task_available == 0) Release mutex (unlock)
1 pthread_cond_wait(&cond_queue_full, when blocked/wait
&task_queue_cond_lock);
Acquire mutex (lock) when
my_task = extract_from_queue(); awaken
2 task_available = 0;
pthread_mutex_unlock(&task_queue_cond_lock);
3 pthread_cond_signal(&cond_queue_empty);
process_task(my_task);
}
}
Consumer:
1. Wait for queue to become full, notified by producer through cond_queue_full
2. Extract task from queue
3. Signal producer through cond_queue_empty
Thread and Synchronization Attributes
• Three major objects
– pthread_t
– pthread_mutex_t
– pthread_cond_t
• Default attributes when being created/initialized
– NULL
• An attributes object is a data-structure that describes entity
(thread, mutex, condition variable) properties.
– Once these properties are set, the attributes object can be
passed to the method initializing the entity.
– Enhances modularity, readability, and ease of modification.
Composite Synchronization Constructs
• Pthread Mutex and Condition Variables are two basic sync
operations.
• Higher level constructs can be built using basic constructs.
– Read-write locks
– Barriers
2 l->readers ++;
pthread_mutex_unlock(&(l->read_write_lock));
}
Reader lock:
1. if there is a write or pending writers, perform condition wait,
2. else increment count of readers and grant read lock
Read-Write Locks
void mylib_rwlock_wlock(mylib_rwlock_t *l) {
pthread_mutex_lock(&(1->read_write_lock));
1->pending_writers ++;
while ((1->writer > 0) || (1->readers > 0)) {
1 pthread_cond_wait(&(1->writer_proceed),
&(1->read_write_lock));
}
1->pending_writers --;
2 1->writer ++;
pthread_mutex_unlock(&(1->read_write_lock));
}
Writer lock:
1. If there are readers or writers, increment pending writers
count and wait.
2. On being woken, decrement pending writers count and
increment writer count
Read-Write Locks
void mylib_rwlock_unlock(mylib_rwlock_t *l) {
pthread_mutex_lock(&(1->read_write_lock));
if (1->writer > 0) /* only writer */
1 1->writer = 0;
else if (1->readers > 0) /* only reader */
2 1->readers --;
pthread_mutex_unlock(&(1->read_write_lock));
Reader/Writer unlock:
1. If there is a write lock then unlock
2. If there are read locks, decrement count of read locks.
3. If the read count becomes 0 and there is a pending writer, notify writer
4. Otherwise if there are pending readers, let them all go through
Barrier
• A barrier holds one or multiple threads until all
threads participating in the barrier have reached the
barrier point
Barrier
• Needs a counter, a mutex and a condition variable
– The counter keeps track of the number of threads that have
reached the barrier.
• If the count is less than the total number of threads, the
threads execute a condition wait.
– The last thread entering (master) wakes up all the threads
using a condition broadcast.
typedef struct {
int count;
pthread_mutex_t count_lock;
pthread_cond_t ok_to_proceed;
} mylib_barrier_t;
1 • b->count ++;
2 • if (b->count == num_threads) { b->count = 0;
• pthread_cond_broadcast(&(b->ok_to_proceed));
3 • } else ,
• while (pthread_cond_wait(&(b->ok_to_proceed) &(b->count_lo
!= 0);
pthread_mutex_unlock(&(b->count_lock));
}
Barrier
1. Each thread increments the counter and check whether all reach
2. The thread (master) who detect that all reaches signal others to proceed
3. If not all reach, the thread waits
Flat/Linear vs Tree/Log Barrier
• Linear/Flat barrier.
– O(n) for n thread
– A single master to collect information of all threads and notify them to
continue
• Tree/Log barrier
– Organize threads in a tree logically
– Multiple submaster to collect and notify
– Runtime grows as O(log p).
77
Barrier
• Thread mechanisms
• Mutexes, condition variables
• Using threads
• Problems, solutions and design approaches