OS Chapt 3 (2015)
OS Chapt 3 (2015)
OS Chapt 3 (2015)
Chapter 3
Memory Management
3.1 Introduction
Memory is central to the operation of a modern computer system. Memory is a large array of
words or bytes, each with its own address. The CPU fetches instructions from memory
according to the value of the program counter. These instructions may cause additional
loading from and storing to specific memory addresses.
A typical instruction execution cycle, for example, will first fetch an instruction from
memory. The instruction is then decoded and may cause operands to be fetched from
memory. After the instruction has been executed on the operands, results may be stored back
in memory.
The purpose of memory management is to ensure fair, secure, orderly, and efficient use of
memory. The task of memory management includes keeping track of used and free memory
space, as well as when, where, and how much memory to allocate and deallocate. It is also
responsible for swapping processes in and out of main memory.
The normal procedure is to select one of the processes in the input queue and to load that
process into memory. As the process is executed, it accesses instructions and data from
memory. Eventually, the process terminates, and its memory space is declared available.
In most cases, a user program will go through several steps (some of which may be optional)
before being executed. Addresses may be represented in different ways during these steps.
Addresses in the source program are generally symbolic (such as COUNT). A compiler will
typically bind these symbolic addresses to relocatable addresses (such as "14 bytes from the
beginning of this module"). The linkage editor or loader will in turn bind these relocatable
addresses to absolute addresses (such as 74014). Each binding is a mapping from one
address space to another.
Classically, the binding of instructions and data to memory addresses can be done at any step
along the way:
Compile time: If it is known at compile time where the process will reside in
memory, then absolute code can be generated by the compiler.
1
Operating System
Load time: If it is not known at compile time where the process will reside in
memory, then the compiler must generate relocatable code. In this case, final binding
is delayed until load time.
Execution time: If the process can be moved during its execution from one memory
segment to another, then binding must be delayed until run time.
We shall see in this chapter how these various bindings can be implemented effectively in a
computer system.
Dynamic loading does not require special support from the operating system. It is the
responsibility of the users to design their programs to take advantage of such a scheme.
3.1.4 Overlays
In our discussion so far, the entire program and data of a process must be in physical memory
for the process to execute. The size of a process is limited to the size of physical memory. So
that a process can be larger than the amount of memory allocated to it, a technique called
overlays is sometimes used. The idea of overlays is to keep in memory only those instructions
and data that are needed at any given time. When other instructions are needed, they are
loaded into space that was occupied previously by instructions that are no longer needed.
The use of overlays is currently limited to microcomputer and other systems that have limited
amounts of physical memory and that lack hardware support for more advanced techniques.
The compile-time and load-time address-binding schemes result in an environment where the
logical and physical addresses are the same. However, the execution-time address-binding
scheme results in an environment where the logical and physical addresses differ. In this case,
2
Operating System
we usually refer to the logical address as a virtual address. We use logical address and
virtual address interchangeably. The set of all logical addresses generated by a program is
referred to as a logical address space; the set of all physical addresses corresponding to these
logical addresses is referred to as a physical address space. Thus, in the execution-time
address-binding scheme, the logical and physical address spaces differ.
The run-time mapping from virtual to physical addresses is done by the memory-
management unit (MMU), which is a hardware device. There are a number of different
schemes for accomplishing such a mapping, as will be discussed later. For the time being, we
shall illustrate this mapping with a simple MMU scheme
In this case the base register is now called a relocation register. The value in the relocation
register is added to every address generated by a user process at the time it is sent to memory.
For example, if the base is at 14,000, then an attempt by the user to address location 0 is
dynamically relocated to location 14,000; an access to location 346 is mapped to location
14,346 as shown in the figure 3.1.
Notice that the user program never sees the real physical addresses. The user program deals
with logical addresses. The memory-mapping hardware converts logical addresses into
physical addresses.
3.3 Swapping
A process needs to be in memory to be executed. A process, however, can be swapped
temporarily out of memory to a backing store, and then brought back into memory for
continued execution. For example, assume a multiprogramming environment with a round-
robin CPU-scheduling algorithm. When a quantum expires, the memory manager will start to
swap out the process that just finished, and to swap in another process to the memory space
that has been freed (Figure 3.2). In the meantime, the CPU scheduler will allocate a time slice
to some other process in memory. When each process finishes its quantum, it will be
swapped with another process.
3
Operating System
Swapping requires a backing store. The backing store is commonly a fast disk. It must be
large enough to accommodate copies of all memory images for all users, and must provide
direct access to these memory images. The system maintains a ready queue consisting of all
processes whose memory images are on the backing store or in memory and are ready to run.
Whenever the CPU scheduler decides to execute a process, it calls the dispatcher. The
dispatcher checks to see whether the next process in the queue is in memory. If the process is
not, and there is no free memory region, the dispatcher swaps out a process currently in
memory and swaps in the desired process. It should be clear that the context-switch time in
such a swapping system is fairly high.
It is desirable to have several user processes residing in the memory at the same time. In
contiguous memory allocation, each process is contained in a single contiguous section of
memory. The base (re-location) and limit registers are used to point to the smallest memory
address of a process and its size, respectively.
Figure 3.3: The program can access memory between the base and the limit.
When the CPU scheduler selects a process for execution, the dispatcher loads the relocation
and limit registers with the correct values as part of the context switch. Because every address
generated by the CPU is checked against these registers, we can protect both the operating
system and the other users’ programs and data from being modified by this running process.
The base register makes it impossible for a program to reference any part of memory below
itself. Furthermore, the limit register makes it impossible to reference any part of memory
above itself.
5
Operating System
Note that the relocation-register scheme provides an effective way to allow the operating-
system size to change dynamically. This flexibility is desirable in many situations. For
example, the operating system contains code and buffer space for device drivers. If a device
driver (or other operating-system service) is not commonly used, it is undesirable to keep the
code and data in memory, as we might be able to use that space for other purposes. Such code
is sometimes called transient operating-system code; it comes and goes as needed. Thus,
using this code changes the size of the operating system during program execution.
The operating system keeps a table indicating which parts of memory are available and which
are occupied. Initially, all memory is available for user processes, and is considered as one
large block of available memory, a hole. When a process arrives and needs memory, we
search for a hole large enough for this process. If we find one, we allocate only as much
memory as is needed, keeping the rest available to satisfy future requests.
When a process is allocated space, it is loaded into memory and it can then compete for the
CPU. When a process terminates, it releases its memory, which the operating system may
then fill with another process from the input queue.
At any given time, we have a list of available block sizes and the input queue. Memory is
allocated to processes until, finally, the memory requirements of the next process cannot be
satisfied; no available block of memory (hole) is large enough to hold that process. The
operating system can then wait until a large enough block is available, or it can skip down the
input queue to see whether the smaller memory requirements of some other process can be
met.
In general, there is at any time a set of holes, of various sizes, scattered throughout memory.
When a process arrives and needs memory, we search this set for a hole that is large enough
for this process. If the hole is too large, it is split into two: One part is allocated to the arriving
process; the other is returned to the set of holes. When a process terminates, it releases its
block of memory, which is then placed back in the set of holes. If the new hole is adjacent to
other holes, we merge these adjacent holes to form one larger hole. At this point, we may
need to check whether there are processes waiting for memory and whether this newly freed
and recombined memory could satisfy the demands of any of these waiting processes. This
procedure is a particular instance of the general dynamic storage allocation problem, which is
how to satisfy a request of size n from a list of free holes. There are many solutions to this
problem. The set of holes is searched to determine which hole is best to allocate. First-fit,
6
Operating System
best-fit, and worst-fit are the most common strategies used to select a free hole from the set of
available holes.
First-fit: Allocate the first hole that is big enough. We can stop searching as soon as
we find a free hole that is large enough.
Best-fit: Allocate the smallest hole that is big enough. We must search the entire list,
unless the list is kept ordered by size. This strategy produces the smallest leftover
hole.
Worst-fit: Allocate the largest hole. Again, we must search the entire list, unless it is
sorted by size. This strategy produces the largest leftover hole, which may be more
useful than the smaller leftover hole from a best-fit approach.
Simulations have shown that both first-fit and best-fit are better than worst-fit in terms of
decreasing both time and storage utilization. Neither first-fit nor best-fit is clearly better in
terms of storage utilization, but first-fit is generally faster.
Another problem that arises with the multiple partition allocation scheme is internal
fragmentation. Consider the hole of 18,464 bytes. Suppose that the next process requests
18,462 bytes. If we allocate exactly the requested block, we are left with a hole of 2 bytes.
The overhead to keep track of this hole will be substantially larger than the hole itself. The
general approach is to allocate very small holes as part of the larger request. Thus, the
allocated memory may be slightly larger than the requested memory. The difference between
these two numbers is internal fragmentation - memory that is internal to a partition, but is not
being used.
3.5 Paging
Another possible solution to the external fragmentation problem is to permit the logical
address space of a process to be noncontiguous, thus allowing a process to be allocated
physical memory wherever the latter is available. One way of implementing this solution is
through the use of a paging scheme.
Physical memory is broken into fixed-sized blocks called frames. Logical memory is also
broken into blocks of the same size called pages. When a process is to be executed, its pages
are loaded into any available memory frames from the backing store. The backing store is
divided into fixed-sized blocks that are of the same size as the memory frames.
Every address generated by the CPU is divided into two parts: a page number (p) and a page
offset (d). The page number is used as an index into a page table. The page table contains the
base address of each page in physical memory. This base address is combined with the page
offset to define the physical memory address that is sent to the memory unit. The paging
model of memory is shown in Figure 3.4.
7
Operating System
The page size (like the frame size) is defined by the hardware. The size of a page is typically
a power of 2 varying between 512 bytes and 8192 bytes per page, depending on the computer
architecture. If the size of logical address space is 2m, and a page size is 2n addressing units
(bytes or words), then the high-order m – n bits of a logical address designate the page
number, and the n low-order bits designate the page offset. Thus, the logical address is as
follows:
where p is an index into the page table and d is the displacement within the page.
For example, consider the memory of Figure 3.5. Using a page size of 4 bytes and a physical
memory of 32 bytes (8 pages), we show an example of how the user's view of memory can be
mapped into physical memory. Logical address 0 is page 0, offset 0. Indexing into the page
table, we find that page 0 is in frame 5. Thus, logical address 0 maps to physical address 20
(= (5 x 4) + 0). Logical address 3 (page 0, offset 3) maps to physical address 23 (= (5 x 4) +
3). Logical address 4 is page 1, offset 0; according to the page table, page 1 is mapped to
frame 6. Thus, logical address 4 maps to physical address 24 (= (6 x 4) + 0). Logical address
13 maps to physical address 9.
Notice that paging itself is a form of dynamic relocation. When we use a paging scheme, we
have no external fragmentation: Any free frame can be allocated to a process that needs it.
However, we may have some internal fragmentation.
8
Operating System
Figure 3.5: Paging example for a 32-byte memory with 4-byte pages.
3.6 Segmentation
The user's view of memory is not the same as the actual physical memory. The user’s view is
mapped onto physical memory. What is the user’s view of memory? Does the user think of
memory as a linear array of bytes, some containing instructions and others containing data, or
is there some other preferred memory view? There is general agreement that the user or
programmer of a system does not think of memory as a linear array of bytes. Rather, the user
prefers to view memory as a collection of variable-sized segments, with no necessary
ordering among segments.
9
Operating System
The segment number is used as an index into the segment table. The offset d of the logical
address must be between 0 and the segment limit. If this offset is legal, it is added to the
segment base to produce the address in physical memory of the desired byte. The segment
table is thus essentially an array of base-limit register pairs.
As an example, consider the situation shown in Figure 3.6. We have five segments numbered
from 0 through 4. The segments are stored in physical memory as shown. The segment table
has a separate entry for each segment, giving the beginning address of the segment in
physical memory (the base) and the length of that segment (the limit). For example, segment
2 is 400 bytes long, and begins at location 4300. Thus, a reference to byte 53 of segment 2 is
mapped onto location 4300 + 53 = 4353. A reference to segment 3, byte 852, is mapped to
3200 (the base of segment 3) + 852 = 4052. A reference to byte 1222 of segment 0 would
result in a trap to the operating system, as this segment is only 1000 bytes long.
10
Operating System
Figure 3.7: Diagram showing virtual memory that is larger than physical memory.
Following are the situations, when entire program is not required to load fully:
User written error handling routines are used only when an error occurs in the data or
computation.
Certain options and features of a program may be used rarely.
Many tables are assigned a fixed amount of address space even though only a small
amount of the table is actually used.
The ability to execute a program that is only partially in memory would counter many
benefits:
Less number of I/O would be needed to load or swap each user program into memory.
A program would no longer be constrained by the amount of physical memory that is
available.
Each user program could take less physical memory, more programs could be run the
same time, with a corresponding increase in CPU utilization and throughput.
With demand-paged virtual memory, pages are only loaded when they are demanded during
program execution; pages that are never accessed are thus never loaded into physical
memory. A demand-paging system is similar to a paging system with swapping (Figure 3.8)
where processes reside in secondary memory (usually a disk).
11
Operating System
When we want to execute a process, we swap it into memory. Rather than swapping the
entire process into memory, however, we use a lazy swapper. A lazy swapper never swaps a
page into memory unless that page will be needed. Since we are now viewing a process as a
sequence of pages, rather than as one large contiguous address space, use of the term swapper
is technically incorrect. A swapper manipulates entire processes, whereas a pager is
concerned with the individual pages of a process. We thus use pager, rather than swapper, in
connection with demand paging.
When a process is to be swapped in, the pager guesses which pages will be used before the
process is swapped out again. Instead of swapping in a whole process, the pager brings only
those necessary pages into memory. Thus, it avoids reading into memory pages that will not
be used anyway, decreasing the swap time and the amount of physical memory needed.
If a process tries to access a page that was not brought into memory, access to such page is
called page-fault. If the program calls or jumps to an instruction that is not in memory, a
page fault occurs and the operating system will go and get the missing instruction from disk.
This is called a page fault. The process is blocked while the necessary instruction is being
located and read in.
12
Operating System
For our example reference string, our three frames are initially empty. The first three
references (7, 0, 1) cause page faults and are brought into these empty frames. The next
reference (2) replaces page 7, because page 7 was brought in first. Since 0 is the next
reference and 0 is already in memory, we have no fault for this reference. The first reference
to 3 results in replacement of page 0, since it is now first in line. Because of this replacement,
the next reference, to 0, will fault. Page 1 is then replaced by page 0. This process continues
as shown in Figure 3.9. Every time a fault occurs, we show which pages are in our three
frames. There are 15 page faults altogether.
The FIFO page-replacement algorithm is easy to understand and program. However, its
performance is not always good.
13
Operating System
the optimal page-replacement algorithm would yield nine page faults, as shown in Figure
3.10.
The first three references cause faults that fill the three empty frames. The reference to page
2 replaces page 7, because 7 will not be used until reference 18, whereas page 0 will be used
at 5, and page 1 at 14. The reference to page 3 replaces page 1, as page 1 will be the last of
the three pages in memory to be referenced again. With only nine page faults, optimal
replacement is much better than a FIFO algorithm, which resulted in fifteen faults. (If we
ignore the first three, which all algorithms must suffer, then optimal replacement is twice as
good as FIFO replacement.) In fact, no replacement algorithm can process this reference
string in three frames with fewer than nine faults.
The result of applying LRU replacement to our example reference string is shown in Figure
3.11. The LRU algorithm produces 12 faults. Notice that the first five faults are the same as
the optimal replacement. When the reference to page 4 occurs, however, LRU replacement
sees that, of the three frames in memory, page 2 was used least recently. The most recently
used page is page 0, and just before that page 3 was used. Thus, the LRU algorithm replaces
page 2, not knowing that page 2 is about to be used. When it then faults for page 2, the LRU
algorithm replaces page 3 since, of the three pages in memory {0, 3, 4}, page 3 is the least
14
Operating System
recently used. LRU replacement with 12 faults is still much better than FIFO replacement
with 15. The LRU policy is often used as a page-replacement algorithm and is considered to
be quite good.
15