Session 31

CO4 – Concurrency
19CS2106R
Operating Systems Design

Session 31: Locking: Spin Locks
© 2020 KL University
Recap of CO3
• Operating system organization: creating and running the first process,
Page tables: Paging, hardware, Process address space
• Page tables: Physical memory allocation
• Systems calls, exceptions and interrupts, Assembly trap handlers
• Disk driver and Disk scheduling
• Manipulation of the process address space
• Page tables: User part of an address space, sbrk, exec
• memory management policies: swapping, demand paging
• memory management policies: Page faults and replacement algorithms
• TLB, Segmentation
• Hybrid approach: paging and Segmentation, Multi-level paging
CO4 Topics
• Locking
• Inter-process communication
• Models of Inter-process communication
• Thread API, Conditional Variable
• Mutex, Concurrent Linked List
• Semaphores
• Concurrency Control Problems
• Deadlocks
• Boot Loader
Process memory layout
A program is a file containing a range of information that describes how to construct a process at run time.
The memory allocated to each process is composed of a number of parts, usually referred to as segments.
These segments are as follows:
a. Text: the instructions of the program.
b. The initialized data segment contains global and static variables that are explicitly Initialized
c. The uninitialized data segment contains global and static variables that are not explicitly initialized.
d. Heap: an area from which programs can dynamically allocate extra memory.
e. Stack: a piece of memory that grows and shrinks as functions are called and return and that is used to
allocate storage for local variables and function call linkage information
Several more segment types exist in an a.out, containing the symbol table, debugging information, linkage
tables for dynamic shared libraries, and the like. These additional sections don't get loaded as part of the
program's image executed by a process.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. For example:
$ size /usr/bin/cc /bin/sh

text data bss dec hex filename
79606 1536 916 82058 1408a /usr/bin/cc
619234 21120 18260 658614 a0cb6 /bin/sh
Stack and Heap Segment
Stack Segment
• The stack segment is used to store local variables, function parameters, and
the return address. (A return address is the memory address where a CPU
will continue its execution after the return from a function call).
• Local variables are declared inside the opening left curly brace of a function
body, including the main() or other left curly braces that are not defined as
static. Thus, the scopes of those variables are limited to the function’s body.
The life of a local variable is defined until the execution control is within the
respective function body.
Heap Segment
• The heap area is allocated to each process by the OS when the process is created. Dynamic
memory is obtained from the heap. They are obtained with the help of the malloc(), calloc(),
and realloc() function calls. Memory from the heap can only be accessed via pointers.
Process address space grows and shrinks at runtime as memory gets allocated and
deallocated. Memory is given back to the heap using free(). Data structures such as linked
lists and trees can be easily implemented using heap memory. Keeping track of heap
memory is an overhead. If not utilized properly, it may lead to memory leaks.
Typical memory arrangement
Typical memory arrangement
Race Condition Example - 1
As an example of why we need locks, consider several processors sharing a single
disk, such as the IDE disk in xv6. The disk driver maintains a linked list of the outstanding disk requests (4226) and
processors may add new requests to the list concurrently (4354). If there were no concurrent requests, you might
implement the linked list as follows:
Race Condition When multiple CPUs updating the same data simultaneously;
without careful design such parallel access is likely to yield
incorrect results or a broken data structure.
Race Condition occurs when multiple process are trying to do something with shared data and the
final outcome depends on the order in which the processes run
Race Condition
This implementation is correct if executed in isolation. However, the code is not correct if
more than one copy executes concurrently. If two CPUs execute insert at the same time, it
could happen that both execute line 15 before either executes 16 (see Figure 4-1). If this
happens, there will now be two list nodes with next set to the former value of list. When
the two assignments to list happen at line 16, the second one will overwrite the first; the
node involved in the first assignment will be lost.
The lost update at line 16 is an example of a race condition. A race condition is a
situation in which a memory location is accessed concurrently, and at least one access is a
write. A race is often a sign of a bug, either a lost update (if the accesses are writes) or a
read of an incompletely-updated data structure. The outcome of a race depends on the
exact timing of the two CPUs involved and how their memory operations are ordered by
the memory system, which can make race-induced errors difficult to reproduce and debug.
Race Condition example – 2 i = 5 (shared)
• You also need to synchronize two or more threads

that might try to modify the same variable at the
same time. Consider the case in which you
increment a variable . The increment operation is
usually broken down into three steps.
1. Read the memory location into a register.
2. Increment the value in the register.
3. Write the new value back to the memory
location.
• When two or more processes are reading or writing
some shared data and the final result depends on
who runs precisely when, is called race condition.
• Race condition occurs when two or more operations
occur in an undefined manner
• Race condition should be avoided because they can
cause fine errors in applications and are difficult to
debug
Simple example of the kind of problems that can
occur when shared resources are not accessed
atomically.
Sequence-number-increment Problem Example - 3
• The technique used by the print spoolers is to have a file for each printer that
contains the next sequence number to be used. The file is just a single line
containing the sequence number in ASCII. Each process that needs to assign a
sequence number goes through three steps:
1. it reads the sequence number file,
2. it uses the number, and it increments the number
3. and writes it back.
• The problem is that in the time a single process takes to execute these three steps,
another process can perform the same three steps. Chaos can result, as we will see
in some examples that follow.
sequence-number-increment problem
#define MAXLINE 4096 /* max text line length */
#define SEQFILE "seqno" /* filename */ void
#define LOCKFILE "seqno.lock"
my_lock(int fd)
void my_lock(int), my_unlock(int);
{
int main(int argc, char **argv)
return;
{int fd;
}
long i, seqno;
pid_t pid;
ssize_t n; void
char line[MAXLINE + 1]; my_unlock(int fd)
pid = getpid(); {
fd = open(SEQFILE, O_RDWR, 0666); return;
for (i = 0; i < 20; i++) {
}
my_lock(fd); /* lock the file */
lseek(fd, 0L, SEEK_SET); /* rewind before read */
n = read(fd, line, MAXLINE);
line[n] = '\0'; /* null terminate for sscanf */
n = sscanf(line, "%ld\n", &seqno);
printf("%s: pid = %ld, seq# = %ld\n", argv[0], (long) pid, seqno);
seqno++; /* increment sequence number */
snprintf(line, sizeof(line), "%ld\n", seqno);
lseek(fd, 0L, SEEK_SET); /* rewind before write */
write(fd, line, strlen(line));
my_unlock(fd); /* unlock the file */
}
exit(0);
}
If the sequence number in the file is initialized to one, and a single copy of the program
is run, we get the following output:
[vishnu@team-osd ~]$ cc seqnumnolock.c When the sequence number is again initialized to
one, and the program is run twice
[vishnu@team-osd ~]$ vi seqno in the background, we have the following output:
[vishnu@team-osd ~]$ ./a.out [vishnu@team-osd ~]$ vi seqno
./a.out: pid = 5448, seq# = 1 [vishnu@team-osd ~]$ ./a.out & ./a.out &
[1] 7891
./a.out: pid = 5448, seq# = 2 [2] 7892
./a.out: pid = 5448, seq# = 3 [vishnu@team-osd ~]$ ./a.out: pid = 7892, seq# = 1 ./a.out: pid = 7892, seq# = 20
./a.out: pid = 5448, seq# = 4 ./a.out: pid = 7892, seq# = 2 ./a.out: pid = 7891, seq# = 20
./a.out: pid = 7892, seq# = 3
./a.out: pid = 5448, seq# = 5 ./a.out: pid = 7891, seq# = 21
./a.out: pid = 7892, seq# = 4
./a.out: pid = 5448, seq# = 7
./a.out: pid = 7891, seq# = 27
./a.out: pid = 7892, seq# = 11
./a.out: pid = 7891, seq# = 33
./a.out: pid = 5448, seq# = 19 ./a.out: pid = 7892, seq# = 19 [1]- Done ./a.out
./a.out: pid = 5448, seq# = 20 [2]+ Done ./a.out
Critical Section do {
entry section
• A critical section is a block of a
critical section
code that only one process at a
time can execute exit section
• The critical section problem is
to ensure that only one process } while (TRUE);
at a time is allowed to be
operating in its critical section
• Each process takes permission
from operating system to enter
into the critical section
The term critical section is used to refer to a section of code that accesses a shared resource and
whose execution should be atomic; that is, its execution should not be interrupted by another
thread that simultaneously accesses the same shared resource.
Mutual exclusion
• If a process is executing in its critical section , then no other process is
allowed to execute in the critical section
• No two process can be in the same critical section at the same time.
This is called mutual exclusion
Locks: The Basic Idea
• Ensure that any critical section executes as if it were a single
atomic instruction.
• An example: the canonical update of a shared variable
balance = balance + 1;
• Add some code around the critical section

1 lock_t lk; // some globally-allocated lock ‘mutex’
2 …
3 lock(&lk);
4 balance = balance + 1;
5 unlock(&lk);
Locks: The Basic Idea
• Lock variable holds the state of the lock.
• available (or unlocked or free)
• No thread holds the lock.
• acquired (or locked or held)

• Exactly one thread holds the lock and presumably is in a critical section.
The semantics of the lock()
• lock()
• Try to acquire the lock.
• If no other thread holds the lock, the thread will acquire the lock.
• Enter the critical section.
• This thread is said to be the owner of the lock.
• Other threads are prevented from entering the critical section while the first
thread that holds the lock is in there.
Building A Lock
Efficient locks provided mutual exclusion at low cost.
Building a lock need some help from the hardware and the
OS.
Design
Evaluating locks – Basic criteria
• Mutual exclusion
• Does the lock work, preventing multiple threads
from entering a critical section?
• Fairness
• Does each thread contending for the lock get a
fair shot at acquiring it once it is free? (Starvation)
• Performance
• The time overheads added by using the lock
Controlling Interrupts
• Disable Interrupts for critical sections
• One of the earliest solutions used to provide mutual exclusion
• Invented for single-processor systems.
1 void lock() {
2 DisableInterrupts();
3 }
4 void unlock() {
5 EnableInterrupts();
6 }
• Problem:
• Require too much trust in applications
• Greedy (or malicious) program could monopolize the processor.
• Do not work on multiprocessors
• Code that masks or unmasks interrupts be executed slowly by modern CPUs
Why hardware support needed?
First attempt: Using a flag denoting whether the lock is held or
not.
The code below has problems.
1 typedef struct __lock_t { int flag; } lock_t;
2
3 void init(lock_t *mutex) {
4 // 0  lock is available, 1  held
5 mutex->flag = 0;
6 }
7
8 void lock(lock_t *mutex) {
9 while (mutex->flag == 1) // TEST the flag
10 ; // spin-wait (do nothing)
11 mutex->flag = 1; // now SET it !
12 }
13
14 void unlock(lock_t *mutex) {
15 mutex->flag = 0;
16 }
Why hardware support needed? (Cont.)
• Problem 1: No Mutual Exclusion (assume flag=0 to begin)
Thread1 Thread2
call lock()
while (flag == 1)
interrupt: switch to Thread 2
call lock()
while (flag == 1)
flag = 1;
interrupt: switch to Thread 1
flag = 1; // set flag to 1 (too!)
• Problem 2: Spin-waiting wastes time waiting for another thread.
• So, we need an atomic instruction supported by Hardware!

• test-and-set instruction, also known as atomic exchange
Test And Set (Atomic Exchange)
• An instruction to support the creation of simple locks
1 int TestAndSet(int *ptr, int new) {
2 int old = *ptr; // fetch old value at ptr
3 *ptr = new; // store ‘new’ into ptr
4 return old; // return the old value
5 }
• return(testing) old value pointed to by the ptr.

• Simultaneously update(setting) said value to new.
• This sequence of operations is performed atomically.
A Simple Spin Lock using test-and-set
1 typedef struct __lock_t {
2 int flag;
3 } lock_t;
4
5 void init(lock_t *lock) {
6 // 0 indicates that lock is available,
7 // 1 that it is held
8 lock->flag = 0;
9 }
10
11 void lock(lock_t *lock) {
12 while (TestAndSet(&lock->flag, 1) == 1)
13 ; // spin-wait
14 }
15
16 void unlock(lock_t *lock) {
17 lock->flag = 0;
18 }
• Note: To work correctly on a single processor, it requires a preemptive

scheduler.
A Simple Spin Lock using open()
We are guaranteed that only one process at a time can create the file (i.e., obtain the lock), and to release the lock,
we just unlink the file
void
my_lock(int fd)
{
int tempfd;
while ( (tempfd = open(LOCKFILE, O_RDWR|O_CREAT|O_EXCL, 0644)) < 0) {
if (errno != EEXIST)
printf("open error for lock file");
/* someone else has the lock, loop around and try again */
}
close(tempfd); /* opened the file, we have the lock */
}
void
my_unlock(int fd)
{
unlink(LOCKFILE); /* release lock by removing file */
}
Evaluating Spin Locks
• Correctness: yes
• The spin lock only allows a single thread to entry the critical section.
• Fairness: no
• Spin locks don’t provide any fairness guarantees.
• Indeed, a thread spinning may spin forever.
• Performance:
• In the single CPU, performance overheads can be quire painful.
• If the number of threads roughly equals the number of CPUs, spin locks work
reasonably well.
Load-Linked and Store-Conditional
• Test whether the value at the address(ptr) is equal to expected.
• If so, update the memory location pointed to by ptr with the new value.
• In either case, return the actual value at that memory location.
1 int CompareAndSwap(int *ptr, int expected, int new)
{
2 int actual = *ptr;
3 if (actual == expected)
4 *ptr = new;
5 return actual;
6 }
Compare-and-Swap hardware atomic instruction (C-style)
1 void lock(lock_t *lock) {

2 while (CompareAndSwap(&lock->flag, 0, 1) == 1)
3 ; // spin
4 }
Spin lock with compare-and-swap

Load-Linked and Store-Conditional
1 int LoadLinked(int *ptr) {
2 return *ptr;
3 }
4
5 int StoreConditional(int *ptr, int value) {
6 if (no one has updated *ptr since the LoadLinked to this
address) {
7 *ptr = value;
8 return 1; // success!
9 } else {
10 return 0; // failed to update
11 }
12 }
• The store-conditional only succeeds if no intermittent store to the address has

taken place.
• success: return 1 and update the value at ptr to value.
• fail: the value at ptr is not updates and 0 is returned.
1
Load-Linked and Store-Conditional (Cont.)
void lock(lock_t *lock) {
2 while (1) {
3 while (LoadLinked(&lock->flag) == 1)
4 ; // spin until it’s zero
5 if (StoreConditional(&lock->flag, 1) == 1)
6 return; // if set-it-to-1 was a success: all done
7 otherwise: try it all over again
8 }
9 }
10
11 void unlock(lock_t *lock) {
12 lock->flag = 0;
13 }
Using LL/SC To Build A Lock
1 void lock(lock_t *lock) { A more concise form of the

2 while (LoadLinked(&lock->flag)||!StoreConditional(&lock->flag, lock() using LL/SC
1))
3 ; // spin
4 }
Implementation of locks in xv6
Locking Introduction
• xv6 runs on multiprocessors
• Computers with multiple CPUs executing independently
• These multiple CPUs share physical RAM, and xv6 exploits the sharing
to maintain data structures that all CPUs read and write
• This sharing raises the possibility of one CPU reading a data structure
while another CPU is mid-way through updating it
• When multiple CPUs updating the same data simultaneously; without
careful design such parallel access is likely to yield incorrect results or
a broken data structure.
• Even on a uni-processor, an interrupt routine that uses the same data
as some interruptible code could damage the data if the interrupt
occurs at just the wrong time
• Any code that accesses shared data concurrently must have a strategy
for maintaining correctness despite concurrency.
• The concurrency may arise from accesses by multiple cores, or by
multiple threads, or by interrupt code.
• xv6 uses a handful of simple concurrency control strategies; much
more sophistication is possible - lock
• A lock provides mutual exclusion, ensuring that only one CPU at a
time can hold the lock.
• If a lock is associated with each shared data item, and the code
always holds the associated lock when using a given item, then we
can be sure that the item is used from only one CPU at a time.
• In this situation, we say that the lock protects the data item.
Code: Locks
• xv6 has two types of locks: spin-locks and sleep-locks.
• xv6 represents a spin-lock as a struct spinlock.
Code: Locks
• The important field in the structure is locked
• a word that is zero when the lock is available and non-zero when it is
held.
• Logically, xv6 should acquire a lock by executing code like
• This code does not guarantee mutual exclusion on a
Code: Locks multi-processor.
• It could happen that two CPUs simultaneously reach
line 25
lk->locked
• If it is zero then both grab the lock by executing line 26
lk->locked = 1;
• At this point, two different CPUs hold the lock, which
violates the mutual exclusion property.
• Rather than helping us avoid race conditions, this
implementation of acquire has its own race condition.
• The problem here is that lines 25 and 26 executed as
separate actions.
• In order for the routine above to be correct, lines 25
and 26 must execute in one atomic(i.e., indivisible)
step
Code: Locks
• To execute those two lines atomically, xv6 relies on a special x86
instruction , xchg
xchg(volatile uint *addr, uint newval)
• Locks (i.e., spinlocks) in xv6 are implemented using the xchg atomic
instruction
• In one atomic operation, xchg swaps a word in memory with the
contents of a register.
• The function acquire repeats this xchg instruction in a loop
acquire(struct spinlock *lk)
• Each iteration atomically reads lk->locked and sets it to 1
while(xchg(&lk->locked,1)!=0)
Code: Locks
• If the lock is already held, lk->locked will already be 1, so the xchg
returns 1 and the loop continues.
• If the xchg returns 0, however , acquire has successfully acquired
the lock->locked was 0 and is now 1—so the loop can stop.
• Once the lock is acquired , acquire records, for debugging, the CPU
and stack trace that acquired the lock.
• If a process forgets to release a lock, this information can help to
identify the culprit.
• These debugging fields are protected by the lock and must only be
edited while holding the lock.
Code: Locks
• The function release is the opposite of acquire: it clears the

debugging fields and then releases the lock.
release(struct spinlock *lk)
• The function uses an assembly instruction to clear locked, because
clearing this field should be atomic so that the xchg instruction won’t
see a subset of the 4 bytes that hold locked updated.
• The x86 guarantees that a 32-bit movl updates all 4 bytes atomically.
Code: Using locks
• xv6 uses locks in many places to avoid race conditions
• A hard part about using locks is deciding how many locks to use and
which data and invariants each lock protects.
• There are a few basic principles.
• First, any time a variable can be written by one CPU at the same time
that another CPU can read or write it, a lock should be introduced to
keep the two operations from overlapping.
• Second, remember that locks protect invariants: if an invariant
involves multiple memory locations, typically all of them need to be
protected by a single lock to ensure the invariant is maintained.
Code: Using locks
• These two rules says when locks are necessary but say nothing about
when locks are unnecessary
• It is important for efficiency not to lock too much, because locks
reduce parallelism.
• If parallelism isn’t important, then one could arrange to have only a
single thread and not worry about locks.
• A simple kernel can do this on a multiprocessor by having a single lock
that must be acquired on entering the kernel and released on exiting
the kernel
Sleep locks
• Sometimes xv6 code needs to hold a lock for a long time.
• For example, the file system keeps a file locked while reading and
writing its content on the disk, and these disk operations can take tens
of milliseconds.
• Efficiency demands that the processor be yielded while waiting so that
other threads can make progress, and this in turn means that xv6 needs
locks that work well when held across context switches.
• xv6 provides such locks in the form of sleep-locks.
• Xv6 sleep-locks support yielding the processor during their critical
sections.
Sleep locks
• This property poses a design challenge: if thread T1 holds lock L1 and has
yielded the processor, and thread T2 wishes to acquire L1, we have to
ensure that T1 can execute while T2 is waiting so that T1 can release L1.
• T2 can’t use the spin-lock acquire function here: it spins with interrupts
turned off, and that would prevent T1 from running.
• To avoid this deadlock, the sleep-lock acquire routine (called
acquiresleep) yields the processor while waiting, and does not disable
interrupts
acquiresleep(struct sleeplock *lk)
• At a high level, a sleep-lock has a locked field that is protected by a
spinlock, and acquiresleep’s call to sleep atomically yields the CPU and
releases the spin-lock.
Sleep locks
• The result is that other threads can execute while acquiresleep waits.
• Because sleep-locks leave interrupts enabled, they cannot be used in
interrupt handlers.
• Because acquiresleep may yield the processor, sleep-locks cannot be
used inside spin-lock critical sections (though spin-locks can be used
inside sleep-lock critical sections).
• xv6 uses spin-locks in most situations, since they have low overhead.
• It uses sleep-locks only in the file system, where it is convenient to be
able to hold locks across lengthy disk operations
Limitations of locks
• Locks often solve concurrency problems cleanly, but there are times
when they are awkward.
• Sometimes a function uses data which must be guarded by a lock,
but the function is called both from code that already holds the lock
and from code that wouldn't otherwise need the lock.
• One way to deal with this is to have two variants of the function, one
that acquires the lock, and the other that expects the caller to already
hold the lock
• Another approach is for the function to require callersto hold the lock
whether the caller needs it or not,as with
Limitations of locks
• Kernel developers need to be aware of such requirements.
• It might seem that one could simplify situations where both caller and
callee need a lock by allowing "recursive locks" ,
• so that if a function holds a lock , any function it calls is allowed to re-
acquire the lock.
Locks in xv6
initlock
acquire
release
Spin Lock
wakeup
bcache.lock
cons.lock
ftable.lock
Thank you

Session 31

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Session 31

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 31

Uploaded by

Copyright:

Available Formats

CO4 – Concurrency

Operating Systems Design

$ size /usr/bin/cc /bin/sh

• You also need to synchronize two or more threads

• Add some code around the critical section

• acquired (or locked or held)

• Problem 2: Spin-waiting wastes time waiting for another thread.

• So, we need an atomic instruction supported by Hardware!

• return(testing) old value pointed to by the ptr.

• Note: To work correctly on a single processor, it requires a preemptive

Compare-and-Swap hardware atomic instruction (C-style)

1 void lock(lock_t *lock) {

Spin lock with compare-and-swap

• The store-conditional only succeeds if no intermittent store to the address has

Using LL/SC To Build A Lock

1 void lock(lock_t *lock) { A more concise form of the

• The function release is the opposite of acquire: it clears the

You might also like

Session 31

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Session 31

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 31

Uploaded by

Copyright:

Available Formats

CO4 – Concurrency

Operating Systems Design ​

$ size /usr/bin/cc /bin/sh

• You also need to synchronize two or more threads

• Add some code around the critical section

• acquired (or locked or held)

• Problem 2: Spin-waiting wastes time waiting for another thread.

• So, we need an atomic instruction supported by Hardware!

• return(testing) old value pointed to by the ptr.

• Note: To work correctly on a single processor, it requires a preemptive

Compare-and-Swap hardware atomic instruction (C-style)

1 void lock(lock_t *lock) {

Spin lock with compare-and-swap

• The store-conditional only succeeds if no intermittent store to the address has

Using LL/SC To Build A Lock

1 void lock(lock_t *lock) { A more concise form of the

• The function release is the opposite of acquire: it clears the

You might also like

Operating Systems Design