Operating Systems: Chapter 3: Memory Management
Operating Systems: Chapter 3: Memory Management
Operating Systems: Chapter 3: Memory Management
3. At load time
• Similar to at link-edit time, but do not fix the starting address.
• Program can be loaded anywhere.
• Program can move but cannot be split.
• Need modest hardware: base/limit registers.
• Loader sets the base/limit registers.
4. At execution time
• Addresses translated dynamically during execution.
• Hardware needed to perform the virtual to physical address translation quickly.
• Currently dominates.
• Much more information later.
Extensions
• Dynamic Loading
• When executing a call, check if module is loaded.
• If not loaded, call linking loader to load it and update tables.
• Slows down calls (indirection) unless you rewrite code dynamically.
• Not used much.
• Dynamic Linking
• The traditional linking described above is today often called static linking.
• With dynamic linking, frequently used routines are not linked into the program. Instead, just a stub is linked.
• When the routine is called, the stub checks to see if the real routine is loaded (it may have been loaded by another program).
• If not loaded, load it.
• If already loaded, share it. This needs some OS help so that different jobs sharing the library don't overwrite each
other's private memory.
• Advantages of dynamic linking.
• Saves space: Routine only in memory once even when used many times.
• Bug fix to dynamically linked library fixes all applications that use that library, without having to relink the
application.
• Disadvantages of dynamic linking.
• New bugs in dynamically linked library infect all applications.
• Applications ``change'' even when they haven't changed.
Note: I will place ** before each memory management scheme.
Note: Typo on lab 2. The last input set has a process with a zero value for IO, which is an error. Specifically, you should replace
5 (0 3 200 3) (0 9 500 3) (0 20 500 3) (100 1 100 3) (100 100 500 3)
with
5 (0 3 200 3) (0 9 500 3) (0 20 500 3) (100 1 100 3) (100 100 500 3)
End of Note.
3.2: Swapping
Moving entire processes between disk and memory is called swapping.
Homework: 4
• There are more processes than holes. Why?
• Because next to a process there might be a process or a hole but next to a hole there must be a process
• So can have ``runs'' of processes but not of holes
• If after a process equally likely to have a process or a hole, you get about twice as many processes as holes.
• Base and limit registers are used.
• Storage keys not good since compactifying would require changing many keys.
• Storage keys might need a fine granularity to permit the boundaries move by small amounts. Hence many keys would need to
be changed
MVT Also introduces the ``Replacement Question'', which victim to swap out
We will study this question more when we discuss demand paging.
Considerations in choosing a victim
• Cannot replace a job that is pinned, i.e. whose memory is tied down. For example, if Direct Memory Access (DMA) I/O is scheduled
for this process, the job is pinned until the DMA is complete.
• Victim selection is a medium term scheduling decision
• Job that has been in a wait state for a long time is a good candidate.
• Often choose as a victim a job that has been in memory for a long time.
• Another point is how long should it stay swapped out.
• For demand paging, where swaping out a page is not as drastic as swapping out a job, choosing the victim is an important memory
management decision and we shall study several policies,
NOTEs:
1. So far the schemes presented have had two properties:
i. Each job is stored contiguously in memory. That is, the job is contiguous in physical addresses.
ii. Each job cannot use more memory than exists in the system. That is, the virtual addresses space cannot exceed the physical
address space.
2. Tanenbaum now attacks the second item. I wish to do both and start with the first.br>
3. Tanenbaum (and most of the world) uses the term ``paging'' to mean what I call demand paging. This is unfortunate as it mixes
together two concepts.
i. Paging (dicing the address space) to solve the placement problem and essentially eliminate external fragmentation.
ii. Demand fetching, to permit the total memory requirements of all loaded jobs to exceed the size of physical memory.
4. Tanenbaum (and most of the world) uses the term virtual memory as a synonym for demand paging. Again I consider this
unfortunate.
i. Demand paging is a fine term and is quite descriptive
ii. Virtual memory ``should'' be used in contrast with physical memory to describe any virtual to physical address translation.
** (non-demand) Paging
Simplest scheme to remove the requirement of contiguous physical memory.
• Chop the program into fixed size pieces called pages (invisible to the programmer).
• Chop the real memory into fixed size pieces called page frames or simply frames.
• Size of a page (the page size) = size of a frame (the frame size).
• Sprinkle the pages into the frames.
• Keep a table (called the page table) having an entry for each page. The page table entry or PTE for page p contains the number of
the frame f that contains page p.
================ Start Lecture #11 ================
Example: Assume a decimal machine with page size = frame size = 1000.
Assume PTE 3 contains 459.
Then virtual address 3372 corresponds to physical address 459372.
Properties of (non-demand) paging.
• Entire job must be memory resident to run.
• No holes, i.e. no external fragmentation.
• If there are 50 frames available and the page size is 4KB than a job requiring <= 200KB will fit, even if the available frames are
scattered over memory.
• Hence (non-demand) paging is useful.
• Introduces internal fragmentation approximately equal to 1/2 the page size for every process (really every segment).
• Can have a job unable to run due to insufficient memory and have some (but not enough) memory available. This is not called
external fragmentation since it is not due to memory being fragmented.
• Eliminates the placement question. All pages are equally good since don't have external fragmentation.
• Replacement question remains.
• Since page boundaries occur at ``random'' points and can change from run to run (the page size can change with no effect on the
program--other than performance), pages are not appropriate units of memory to use for protection and sharing. This is discussed
further when we introduce segmentation.
Homework: 13
Address translation
• Each memory reference turns into 2 memory references
1. Reference the page table
2. Reference central memory
• This would be a disaster!
• Hence the MMU caches page#-->frame# translations. This cache is kept near the processor and can be accessed rapidly.
• This cache is called a translation lookaside buffer (TLB) or translation buffer (TB).
• For the above example, after referencing virtual address 3372, entry 3 in the TLB would contain 459.
• Hence a subsequent access to virtual address 3881 would be translated to physical address 459881 without a memory reference.
Choice of page size is discuss below.
Homework: 8.
Contents of a PTE
Each page has a corresponding page table entry (PTE). The information in a PTE is for use by the hardware. Information set by and used by
the OS is normally kept in other OS tables. The page table format is determined by the hardware so access routines are not portable. The
following fields are often present.
1. The valid bit. This tells if the page is currently loaded (i.e., is in a frame). If set, the frame pointer is valid. It is also called the
presence or presence/absence bit. If a page is accessed with the valid bit zero, a page fault is generated by the hardware.
2. The frame number. This is the main reason for the table. It is needed for virtual to physical address translation.
3. TheModified bit. Indicates that some part of the page has been written since it was loaded. This is needed when the page is evicted so
the OS can know that the page must be written back to disk.
4. The referenced bit. Indicates that some word in the page has been referenced. Used to select a victim: unreferenced pages make good
victims by the locality property.
5. Protection bits. For example one can mark text pages as execute only. This requires that boundaries between regions with different
protection are on page boundaries. Normally many consecutive (in logical address) pages have the same protection so many page
protection bits are redundant. Protection is more naturally done with segmentation.
• P#1 gives the index into the first level page table.
• Follow the pointer in the corresponding PTE to reach
the frame containing the relevant 2nd level page table.
• P#2 gives the index into this 2nd level page table
• Follow the pointer in the corresponding PTE to reach
the frame containing the (originally) requested frame.
• Offset gives the offset in this frame where the requested
word is located.
Do an example on the board
The VAX used a 2-level page table structure, but with some
wrinkles (see Tanenbaum for details).
Naturally, there is no need to stop at 2 levels. In fact the SPARC
has 3 levels and the Motorola 68030 has 4 (and the number of bits of Virtual Address used for P#1, P#2, P#3, and P#4 can be varied).
If the index field is Animal and Iguana is given, the associative memory returns
Izzy | Iguana | Quiet | Brown
A Translation Lookaside Buffer or TLB is an associate memory where the index field is the page number. The other fields include the
frame number, dirty bit, valid bit, and others.
• A TLB is small and expensive but at least it is fast. When the page number is in the TLB, the frame number is returned very quickly.
• On a miss, the page number is looked up in the page table. The record found is placed in the TLB and a victim is discarded. There is
no placement question since all entries are accessed at the same time. But there is a replacement question.
Homework: 15.
Random
A lower bound on performance. Any decent scheme should do better.
3.4.1: The optimal page replacement algorithm (opt PRA) (aka Belady's min PRA)
Replace the page whose next reference will be furthest in the future.
• Also called Belady's min algorithm.
• Provably optimal. That is, generates the fewest number of page faults.
• Unimplementable: Requires predicting the future.
• Good upper bound on performance.
LIFO PRA
This is terrible! Why?
Ans: All but the last frame are frozen once loaded so you can replace only one frame. This is especially bad after a phase shift in the program
when it is using all new pages.
Shared pages
Really should share segments.
• Must keep reference counts or something so that when a process terminates, pages (even dirty pages) it shares with another process
are not automatically discarded.
• Similarly, a reference count would make a widely shared page (correctly) look like a poor choice for a victim.
• A good place to store the reference count would be in a structure pointed to by both PTEs. If stored in the PTEs, must keep them
consistent between processes.
Backing Store
The issue is where on disk do we put pages.
• For program text, which is presumably read only, a good choice is the file itself.
• What if we decide to keep the data and stack each contiguous on the backing store. Data and stack grow so must be prepared to grow
the space on disk and leads to the same issues and problems as we saw with MVT.
• If those issues/problems are painful, we can scatter the pages on the disk.
• That is we employ paging!
• This is NOT demand paging.
• Need a table to say where the backing space for each page is located.
• This corresponds to the page table used to tell where in real memory a page is located.
• The format of the ``memory page table'' is determined by the hardware since the hardware modifies/accesses it.
• The format of the ``disk page table'' is decided by the OS designers and is machine independent.
• If the format of the memory page table was flexible, then we might well keep the disk information in it as well.
Paging Daemons
Done earlier
6. Now the O/S has a clean frame (this may be much later in wall clock time if a victim frame had to be written). The O/S schedules an
I/O to read the desired page into this clean frame. Process A is blocked (perhaps for the second time) and hence the process scheduler
is invoked to perform a context switch.
7. A Disk interrupt occurs when the I/O completes (trap / asm / OS determines I/O done). The PTE is updated.
8. The O/S may need to fix up process A (e.g. reset the program counter to re-execute the instruction that caused the page fault).
9. Process A is placed on the ready list and eventually is chosen by the scheduler to run. Recall that process A is executing O/S code.
10.The OS returns to the first assembly language routine.
11.The assembly language routine restores registers, etc. and ``returns'' to user mode.
Process A is unaware that all this happened.
3.7: Segmentation
Up to now, the virtual address space has been contiguous.
• Among other issues this makes memory management difficult when there are more that two dynamically growing regions.
• With two regions you start them on opposite sides of the virtual space as we did before.
• Better is to have many virtual address spaces each starting at zero.
• This split up is user visible.
• Without segmentation (equivalently said with just one segment) all procedures are packed together so if
one changes in size all the virtual addresses following are changed and the program must be re-linked.
• Eases flexible protection and sharing (share a segment). For example, can have a shared library.
Homework: 29.
** Two Segments
Late PDP-10s and TOPS-10
• One shared text segment, that can also contain shared (normally read only) data.
• One (private) writable data segment.
• Permission bits on each segment.
• Which kind of segment is better to evict?
• Swap out shared segment hurts many tasks.
• The shared segment is read only (probably) so no writeback is needed.
• ``One segment'' is OS/MVT done above.
** Three Segments
Traditional Unix shown at right.
1. Shared text marked execute only.
2. Data segment (global and static variables).
3. Stack segment (automatic variables).
** Four Segments
Just kidding.
** Demand Segmentation
Same idea as demand paging applied to segments.
• If a segment is loaded, base and limit are stored in the STE and the valid bit is set in the PTE.
• The PTE is accessed for each memory reference (not really, TLB).
• If the segment is not loaded, the valid bit is unset. The base and limit as well as the disk address of the segment is stored in the an OS
table.
• A reference to a non-loaded segment generate a segment fault (analogous to page fault).
• To load a segment, we must solve both the placement question and the replacement question (for demand paging, there is no
placement question).
• I believe demand segmentation was once implemented but am not sure. It is not used in modern systems.
The following table mostly from Tanenbaum compares demand paging with demand segmentation.
Demand Demand
Consideration
Paging Segmentation
Programmer aware No Yes
How many addr spaces 1 Many
VA size > PA size Yes Yes
Protect individual
No Yes
procedures separately
Accommodate elements
No Yes
with changing sizes
Ease user sharing No Yes
let the VA size Sharing, Protection,
Why invented
exceed the PA size independent addr spaces
Operating Systems
Chapter 4: File Systems
Requirements
1. Size: Store very large amounts of data.
2. Persistence: Data survives the creating process.
3. Access: Multiple processes can access the data concurrently.
Solution: Store data in files that together form a file system.
4.1: Files
4.1.1: File Naming
Very important. A major function of the file system.
• Does each file have a unique name?
Answer: Often no. We will discuss this below when we study links.
• Extensions, e.g. the ``html'' in ``class-notes.html''.
1. Conventions just for humans: letter.teq (my convention).
2. Conventions giving default behavior for some programs.
• The emacs editor thinks .html files should be edited in html mode but
can edit them in any mode and can edit any file in html mode.
• Netscape thinks .html means an html file, but
<html> ... </html> works as well
• Gzip thinks .gz means a compressed file but accepts a --suffix flag
• Case sensitive?
Unix: yes. Windows: no.
The implementation is via segmentation with demand paging but the backing store for the pages is the file itself. This all sounds great
but ...
• How do you tell the length of a newly created file? You know which pages were written but not what words in those pages. So
a file with one byte or 10, looks like a page.
• What if same file is accessed by both I/O and memory mapping.
• What if the file is bigger than the size of virtual memory (will not be a problem for systems built 3 years from now as all will
have enormous virtual memory sizes).
4.2: Directories
Unit of organization.
Contiguous allocation
• This is like OS/MVT.
• The entire file is stored as one piece.
• Simple and fast for access, but ...
• Problem with growing files
• Must either evict the file itself or the file it is bumping into.
• Same problem with an OS/MVT kind of system if jobs grow.
• Problem with external fragmentation.
• Not used for general purpose systems. Ideal for systems where files do not change size.
Homework: 7.
Linked allocation
• The directory entry contains a pointer to the first block of the file.
• Each block contains a pointer to the next.
• Horrible for random access.
• Not used.
Inodes
• Used by unix.
• Directory entry points to inode (index-node).
• Inode points to first few data blocks, often called direct blocks.
• Inode also points to an indirect block, which points to disk blocks.
• Inode also points to a double indirect, which points an indirect ...
• For some implementations there are triple indirect as well.
• The inode is in memory for open files. So references to direct blocks take just one I/O.
• For big files most references require two I/Os (indirect + data).
• For huge files most references require three I/Os (double indirect, indirect, and data).
4.3.2; Implementing Directories
Recall that a directory is a mapping that converts file (or subdirectory) names to the files (or subdirectories) themselves.
Unix
• Each entry contains a name and a pointer to the corresponding inode.
• Metadata is in the inode.
• Early unix had limit of 14 character names.
• Name field now is varying length.
• To go down a level in directory takes two steps: get inode, get file (or subdirectory).
• Do on the blackboard the steps for /allan/gottlieb/courses/os/class-notes.html
Homework: 11
Hard Links
• Symmetric multinamed files.
• When a hard like is created another name is created for the same file.
• The two names have equal status.
• It is not, I repeat NOT true that one name is the ``real name'' and the other is ``just
a link''.
Start with an empty file system (i.e., just the root directory) and then execute:
cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
Assume Bob created /B and /B/Y and Alice created /A, /A/X, and /A/New. Later Bob tires
of /B/Y and removes it by executing
rm /B/Y
The file /A/New is still fine (see third diagram on the right). But it is owned by Bob, who
can't find it! If the system enforces quotas bob will likely be charged (as the owner), but he
can neither find nor delete the file (since bob cannot unlink, i.e. remove, files from /A)
Since hard links are only permitted to files (not directories) the resulting file system is a dag (directed acyclic graph). That is there are
no directed cycles. We will now proceed to give away this useful property by studying symlinks, which can point to directories.
Symlinks
• Asymmetric multinamed files.
• When a symlink is created another file is created, one that points to the original file.
Again start with an empty file system and this time execute
cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
ln -s /B/Y /A/New
Backups
All modern systems support full and incremental dumps.
• A level 0 dump is a called a full dump (i.e., dumps everything).
• A level n dump (n>0) is called an incremental dump and the standard unix utility dumps all files that have changed since the
previous level n-1 dump.
• Other dump utilities dump all files that have changed since the last level n dump.
• Keep on the disk the dates of the most recent level i dumps for all i. In Unix this is traditionally in /etc/dumpdates.
• What about the nodump attribute?
• Default policy (for Linux at least) is to dump such files anyway when doing a full dump, but not dump them for
incremental dumps.
• Another way to say this is the nodump attribute is honored for level n dumps if n>1.
• The dump command has an option to override the default policy (can specify k so that nodump is honored for level n
dumps if n>k).
Consistency
• Fsck (file system check) and chkdsk (check disk)
• If the system crashed, it is possible that not all metadata was written to disk. As a result the file system may be
inconsistent. These programs check, and often correct, inconsistencies.
• Scan all inodes (or fat) to check that each block is in exactly one file, or on the free list, but not both.
• Also check that the number of links to each file (part of the metadata in the file's inode) is correct (by looking at all
directories).
• Other checks as well.
• Offers to ``fix'' the errors found (for most errors).
• ``Journaling'' file systems
• An idea from database theory (transaction logs).
• Eliminates the need for fsck.
• NTFS has had journaling from day 1.
• Many Unix systems have it. IBM's AIX converted to journaling in the early 90s.
• Linux does not yet have journaling, a serious shortcoming. It is under very active development.
• FAT does not have journaling.
4.4: Security
Very serious subject. Could easily be a course in itself. My treatment is very brief.
• Intruders
• Sadly an enormous problem.
• The NYU ``greeting'' no longer includes the word ``welcome'' since that was somehow interpreted as some sort of
license to break in.
• Indeed, the greeting is not friendly.
• It once was.
• Below I have a nasty version from a few years ago.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
WARNING: UNAUTHORIZED PERSONS ........ DO NOT PROCEED
~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
This computer system is operated by New York University (NYU) and may be
accessed only by authorized users. Authorized users are granted specific,
limited privileges in their use of the system. The data and programs
in this system may not be accessed, copied, modified, or disclosed without
prior approval of NYU. Access and use, or causing access and use, of this
computer system by anyone other than as permitted by NYU are strictly pro-
hibited by NYU and by law and may subject an unauthorized user, including
unauthorized employees, to criminal and civil penalties as well as NYU-
initiated disciplinary proceedings. The use of this system is routinely
monitored and recorded, and anyone accessing this system consents to such
monitoring and recording. Questions regarding this access policy or other
topics should be directed (by e-mail) to comment@nyu.edu or (by phone) to
212-998-3333.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note: There was a typo or two in the symlinking directories section (picture should show /B not /B/new, and cd -P goes to / not /B).
The giant page and the lecture-10 page have been fixed.
• Privacy
• An enormously serious (societal) subject.
Viruses
• A virus attaches itself to (``infects'') a part of the system so that it remains until explicitly removed. In particular, rebooting the
system does not remove it.
• Attach to an existing program or to a portion of the disk that is used for booting.
• When the virus is run it tries to attach itself to other files.
• Often implemented the same was as a binary patch: Change the first instruction to jump to somewhere where you put the
original first instruction, then your patch, then a jump back to the second instruction.
Passwords
• Software to crack passwords is publically available.
• Use this software for prevention.
• One way to prevent cracking passwords is to use instead one time passwords: e.g. SecurId.
• Current practice here and elsewhere is that when you telnet to a remote machine, your password is sent in the clear along the
ethernet.
So maybe .rhosts aren't that bad after all.
Physical identification
Opens up a bunch of privacy questions. For example, should we require fingerprinting for entering the subway?
Homework: 15, 16, 19, 24.
4.5.3: Capabilities
Keep the rows of the matrix separate and drop the null entries.
4.5.4: Protection models
Give objects and subjects security levels and enforce:
• A subject may read only those objects whose level is at or below her own.
• A subject may write only those objects whose level is at or above her own.