Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introduction to Unix




CS 167            II–1   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                     II–1
Outline

                     •   Processes
                     •   File Abstraction
                     •   Directories
                     •   File Representation
                     •   File-Oriented System Calls




            CS 167                            II–2      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  In this lecture we present a brief introduction to a few of the more important aspects of the
Unix operating system. In particular, we look at the Unix notions of processes and files—two
concepts that are important throughout the course.




                                                                                                                    II–2
Processes

                    • Fundamental abstraction of program execution
                      – memory
                      – processor(s)
                          - each processor abstraction is a thread
                      – “execution context”




           CS 167                            II–3      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Unix, as do many operating systems, uses the notion of a process as its fundamental
abstraction of program execution. Each program runs in a separate process. Processes are
protected from one another in the sense that the actions of one process cannot directly harm
others. The abstraction comprises the memory of a program (known as its address space—
the collection of locations that can be referenced by the process), the execution agents
(processor abstractions), and other information, known collectively as the execution context,
representing such things as the files the process is currently accessing, how it responds to
exceptions, to external stimuli, etc.
  The processor abstraction is often called a thread. In “traditional” Unix programs,
processes have only one thread, so we’ll use the word process to include the single thread
running inside of it. Later in this course, when we cover multithreaded programming, we’ll be
more careful and use the word thread when we are discussing the processor abstraction.




                                                                                                                   II–3
The Unix Address Space

                                            stack



                                           dynamic
                                             bss

                                             data

                                             text



            CS 167                            II–4       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  A Unix process’s address space appears to be three regions of memory: a read-only text
region (containing executable code); a read-write region consisting of initialized data (simply
called data), uninitialized data (BSS—a directive from an ancient assembler (for the IBM 704
series of computers), standing for Block Started by Symbol and used to reserve space for
uninitialized storage), and a dynamic area; and a second read-write region containing the
process’s user stack (a standard Unix process contains only one thread of control).
  The first area of read-write storage is often collectively called the data region. Its dynamic
portion grows in response to sbrk system calls. Most programmers do not use this system
call directly, but instead use the malloc and free library routines, which manage the dynamic
area and allocate memory when needed by in turn executing sbrk system calls.
  The stack region grows implicitly: whenever an attempt is made to reference beyond the
current end of stack, the stack is implicitly grown to the new reference. (There are system-
wide and per-process limits on the maximum data and stack sizes of processes.)




                                                                                                                     II–4
Creating a Process: Before




                   fork( )

                   parent process

          CS 167                            II–5      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




The only way to create a new process is to use the fork system call.




                                                                                                                  II–5
Creating a Process: After




                     fork( )                             fork( )
                     // returns p                        // returns 0

                     parent process                       child process
                                                            (pid = p)
            CS 167                            II–6      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  By executing fork the parent process creates an almost exact clone of itself which we call
the child process. This new process executes the same text as its parent, but contains a copy
of the data and a copy of the stack. This copying of the parent to create the child can be very
time-consuming. We discuss later how it is optimized.
  Fork is a very unusual system call: one thread of control flows into it but two threads of
control flow out of it, each in a separate address space. From the parent’s point of view, fork
does very little: nothing happens to the parent except that fork returns the process ID (PID—
an integer) of the new process. The new process starts off life by returning from fork. It
always views fork as returning a zero.




                                                                                                                    II–6
Loading a New Image

                                                                       args




                                                                prog’s bss
                                                               prog’s data
                     exec(prog, args)
                                                                prog’s text
                         Before                                        After

            CS 167                            II–7      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Most of the time the purpose of creating a new process is to run a new (i.e., different)
program. Once a new process has been created, it can use the exec system call to load a new
program image into itself, replacing the prior contents of the process’s address space. Exec is
passed the name of a file containing a fully relocated program image (which might require
further linking via a runtime linker). The previous text region of the process is replaced with
the text of the program image. The data, BSS and dynamic areas of the process are “thrown
away” and replaced with the data and BSS of the program image. The contents of the
process’s stack are replaced with the arguments that are passed to the main procedure of the
program.




                                                                                                                    II–7
Fork/Exec Example

                     if (fork( ) == 0) {
                        // child process
                        –– set up I/O in child ––
                        execv(newprogram, parameters);
                        // load new image
                        // if we get here, there’s a problem
                     }
                     // parent process continues here



            CS 167                            II–8      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The slide shows a typical example of the use of the fork and exec system calls. The parent
process calls fork and two threads of control, one in the parent process and one in the child
process, return. If a thread sees that fork returns 0, then it must be in the child process.
What typically happens next is that this thread sets up the I/O descriptors of the new
process (which we discuss shortly) and then execs a new program. Here we use the execv
system call, which passes the arguments to the system in a vector (hence the v). If execv
succeeds, it does not return—there is nothing to return to! The process contains an entirely
new image. Thus if execv does return, there must have been an error (for example, the file
did not exist).
  If fork returns a positive value, then this value is the process ID of the child process and
the thread executing this code must be in the parent process (i.e., it is the original thread
that called fork in the first place). It might continue on and do something totally independent
of the child process, or it might at some point wait until the child terminates.




                                                                                                                    II–8
Environment

                     • Set of pairs: <name, value>
                        – e.g. <term, xterm>, <TZ, US/Eastern>
                        – maintained in user-mode address space
                            - Accessed using library routines
                                • getenv
                                • putenv
                     • Initially established via exec:
                        – execve(char *file, char *argv[ ], char *envp[ ])
                            - supplied in third argument
                        – execv(char *file, char *argv[ ])
                            - Inherited from caller

            CS 167                            II–9       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   A process’s environment is a set of names and values that’s maintained in its user-mode
address space. It’s established when the process image is set up via exec. As shown in the
slide, there’s a variant of exec, execve, with which one can supply the environment explicitly.
If execv is used, the environment is passed to the new image directly from the old.




                                                                                                                     II–9
System Calls

                     • Sole interface between user and kernel
                     • Implemented as library routines that execute
                       trap instructions to enter kernel
                     • Errors indicated by return of –1; error code is
                       in errno

                        if (write(fd, buffer, bufsize) == –1) {
                            // error!
                            printf("error %dn", errno);
                            // see perror
                        }

            CS 167                            II–10      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  System calls, such as fork, execv, read, write, etc., are the only means for application
programs to communicate directly with the kernel: they form an API (application program
interface) to the kernel. When a program calls such a routine, it is actually placing a call to a
subroutine in a system library. The body of this subroutine contains a hardware-specific trap
instruction which transfers control and some parameters to the kernel. On return to this
library return, the kernel provides an indication of whether or not there was an error and
what the error was. The error indication is passed back to the original caller via the
functional return value of the library routine. If there was an error, a positive-integer code
identifying it is stored in the global variable errno. Rather than simply print this code out, as
shown in the slide, one might instead print out an informative error message. This can be
done via the perror routine.




                                                                                                                     II–10
System Calls


                                  other stuff
                                 kernel stack


                                  kernel text




         trap into kernel




                            write(fd, buf, len)


CS 167                               II–11        Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                              II–11
Multiple Processes
                 other stuff
                kernel stack

                 other stuff
                kernel stack

                 other stuff
                kernel stack

                 other stuff
                kernel stack

                kernel text




CS 167             II–12       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                           II–12
The File Abstraction

                     • A file is a simple array of bytes
                     • Files are made larger by writing beyond their
                       current end
                     • Files are named by paths in a naming tree
                     • System calls on files are synchronous




            CS 167                            II–13      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   As discussed three pages ago, most programs perform file I/O using library code layered on
top of kernel code. In this section we discuss just the kernel aspects of file I/O, looking at
the abstraction and the high-level aspects of how this abstraction is implemented.
   The Unix file abstraction is very simple: files are simply arrays of bytes. Many systems have
special system calls to make a file larger. In Unix, you simply write where you’ve never
written before, and the file “magically” grows to the new size (within limits). The names of
files are equally straightforward—just the names labeling the path that leads to the file
within the directory tree. Finally, from the programmer’s point of view, all operations on files
appear to be synchronous—when an I/O system call returns, as far as the process is
concerned, the I/O has completed. (Things are different from the kernel’s point of view, as
discussed later.)




                                                                                                                     II–13
Directories
                                 unix    etc   home    pro       dev


                                   passwd   motd        twd


                                                              unix           ...



                                                       slide1 slide2




            CS 167                             II–14         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here is a portion of a Unix directory tree. The ovals represent files, the rectangles represent
directories (which are really just special cases of files).




                                                                                                                         II–14
Directory Representation
                            Component Name            Inode Number
                                        directory entry



                                        .               1
                                        ..              1
                                       unix            117
                                       etc             4
                                      home             18
                                       pro             36
                                       dev             93



            CS 167                            II–15          Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  A directory consists of an array of pairs of component name and inode number, where the
latter identifies the target file’s inode to the operating system (an inode is data structure
maintained by the operating system that represents a file). Note that every directory contains
two special entries, “.” and “..”. The former refers to the directory itself, the latter to the
directory’s parent (in the case of the slide, the directory is the root directory and has no
parent, thus its “..” entry is a special case that refers to the directory itself).




                                                                                                                         II–15
Hard Links
                                 unix    etc   home    pro       dev


                                                        twd


                                                              unix           ...
                                   image motd


                                                       slide1 slide2

              % ln /unix /etc/image
              # link system call

            CS 167                             II–16         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here are two directory entries referring to the same file. This is done, via the shell, through
the ln command which creates a (hard) link to its first argument, giving it the name specified
by its second argument.
  The shell’s “ln” command is implemented using the link system call.




                                                                                                                         II–16
Directory Representation
                                          .              1
                                          ..             1
                                         unix           117
                                         etc            4
                                        home            18
                                         pro            36
                                         dev            93


                                          .              4
                                          ..             1
                                        image           117
                                        motd            33

            CS 167                              II–17         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here are the (abbreviated) contents of both the root (/) and /etc directories, showing how
/unix and /etc/image are the same file. Note that if the directory entry /unix is deleted (via
the shell’s “rm” command), the file (represented by inode 117) continues to exist, since there
is still a directory entry referring to it. However if /etc/image is also deleted, then the file has
no more links and is removed. To implement this, the file’s inode contains a link count,
indicating the total number of directory entries that refer to it. A file is actually deleted only
when its inode’s link count reaches zero.
  Note: suppose a file is open, i.e. is being used by some process, when its link count
becomes zero. Rather than delete the file while the process is using it, the file will continue to
exist until no process has it open. Thus the inode also contains a reference count indicating
how many times it is open: in particular, how many system file table entries point to it. A file
is deleted when and only when both the link count and this reference count become zero.
  The shell’s “rm” command is implemented using the unlink system call.
  Note that /etc/.. refers to the root directory.




                                                                                                                          II–17
Soft Links
                                 unix   etc   home     pro       dev


                                                        twd


                                                              unix           ...      mylink
                                   image twd


                                                       slide1 slide2                            /unix
                                        /home/twd
         % ln –s /unix /home/twd/mylink
         % ln –s /home/twd /etc/twd
         # symlink system call
            CS 167                             II–18         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   Differing from a hard link, a soft link (or symbolic link) is a special kind of file containing
the name of another file. When the kernel processes such a file, rather than simply retrieving
its contents, it makes use of the contents by replacing the portion of the directory path that
it has already followed with the contents of the soft-link file and then following the resulting
path. Thus referencing /home/twd/mylink results in the same file as referencing /unix.
Referencing      /etc/twd/unix/slide1     results   in    the    same     file  as     referencing
/home/twd/unix/slide1.
   The shell’s “ln” command with the “-s” flag is implemented using the symlink system call.




                                                                                                                         II–18
Working Directory

                     • Maintained in kernel (as an inode number) for
                       each process
                        – paths not starting from “/” start with the
                          working directory
                        – changed by use of the chdir system call
                        – displayed (via shell) using “pwd”
                            - how is this done?




            CS 167                            II–19     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The working directory is maintained (as the inode number of the directory) in the kernel for
each process. Whenever a process attempts to follow a path that doesn’t start with “/”, it
starts at its working directory (rather than at “/”).




                                                                                                                    II–19
Standard File Descriptors

                     main( ) {
                      char buf[BUFSIZE];
                      int n;
                      const char* note = "Write failedn";

                         while ((n = read(0, buf, sizeof(buf))) > 0)
                           if (write(1, buf, n) != n) {
                                   (void)write(2, note, strlen(note));
                                   exit(EXIT_FAILURE);
                           }
                         return(EXIT_SUCCESS);
                     }


            CS 167                            II–20     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The file descriptors 0, 1, and 2 are opened to access your terminal when you log in, and are
preserved across forks, unless redirected.




                                                                                                                    II–20
Representing an Open File (1)
                      file descriptor            system file                       active inode
                            table                   table                              table
                      0
                      1
                      2
                      3

                      .

                      .


                      .


                     n–1
                                           ref count f pointer
                                                 access     inode




                                        disk                   buffer cache

            CS 167                                   II–21          Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  A process’s set of open files is represented by a data structure known as the file descriptor
table, which is used to map file descriptors, representing open files, to system file table
entries, each representing an open file. Each process has a separate file descriptor table, but
there is exactly one system file table for the entire system. Each active file (i.e., a file that is
open or otherwise being used) is represented by an inode that is entered in the active inode
table. Recently accessed blocks of data from the file are kept in the kernel’s data section in
what is known as the buffer cache—an area of memory reserved for buffering data from files.
Thus data from these blocks can be accessed again and again without wasting time going to
the disk. When a disk block is modified, the modifications are written in the buffer cache and
only later are written to disk. This allows the process doing the modification to proceed
without having to wait for the data to be written to disk.




                                                                                                                                II–21
Representing an Open File (2)
                      file descriptor              system file                      active inode
                            table                     table                             table
                      0
                      1
                      2
                      3
             fdrw
                      .                        1    rw      0
                      .


                      .


                     n–1
                                           ref count f pointer
                                                 access     inode
           fdrw = open("x", O_RDWR);




                                        disk                     buffer cache

            CS 167                                       II–22       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  This slide shows what happens when a file is opened.
  The lowest-numbered available file descriptor is allocated from the file descriptor table.
Next, an entry in the system file table is allocated, and the file descriptor table entry is set to
point to the system file table entry. An inode-table entry for the file is allocated (or found if it
already exists) and the system file table entry is set to point to it. Additional fields of the
system file table entry are initialized, including:
       • a reference count
               • each file-descriptor-table entry pointing to it contribute one to the reference
                 count
               • the entry is freed when the reference count goes to zero
       • the allowed access (i.e., how the file was opened—read-only, read-write, etc.)
       • the file pointer (i.e., the location within the file at which the next transfer will start)
               • after every read and write system call the file pointer is incremented by the
                 number of bytes actually transferred, thus facilitating sequential I/O




                                                                                                                                 II–22
Global View


            other stuff

            kernel stack


            other stuff

            kernel stack


            other stuff

            kernel stack


            other stuff

            kernel stack


            kernel text


CS 167           II–23     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                       II–23
Allocation of File Descriptors

                   • Whenever a process requests a new file
                     descriptor, the lowest numbered file
                     descriptor not already associated with an
                     open file is selected; thus
                         #include <fcntl.h>
                         #include <unistd.h>

                         close(0);
                         fd = open("file", O_RDONLY);

                      – will always associate file with file descriptor 0
                        (assuming that the open succeeds)

          CS 167                            II–24      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




One can depend on always getting the lowest available file descriptor.




                                                                                                                   II–24
Reading From a File
                     #include <sys/types.h>
                     #include <unistd.h>
                     ssize_t read(int fd, void *buffer, size_t n)


                      – read up to n bytes of data into buffer
                      – returns number of bytes transferred
                      – returns 0 on end of file
                      – returns –1 on error




            CS 167                            II–25      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The read system call tells the system to read up to n bytes of data into the area of memory
pointed to by buffer from the file referred to by fd. It returns the number of bytes actually
transferred, or, as usual, –1 if there was an error.
  Read will transfer fewer bytes than specified if:
               • the number of bytes left in the file was less than n
               • the read request was interrupted by a signal after some of the bytes were
                 transferred
                • the file is a pipe, FIFO, or special device with less than n bytes immediately
                  available for reading
  It transfers as many bytes as are present up to the given maximum; if it returns a zero,
that means the end of the file has been reached.




                                                                                                                     II–25
Writing To a File
                     #include <sys/types.h>
                     #include <unistd.h>
                     ssize_t write(int fd, void *buffer, size_t n)


                      – write up to n bytes of data from buffer
                      – returns number of bytes transferred
                      – returns –1 on error




            CS 167                              II–26    Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The write system call tells the system to write up to n bytes of data from the area of
memory pointed to by buffer into the file referred to by fd. It returns the number of bytes
actually transferred, or, as usual, –1 if there was an error.
  If write transfers fewer bytes than specified, then something caused the transfer to stop
prematurely:
       • a signal interrupted the system call
               • this is pretty complicated and we’ll discuss it in more detail later. However,
                 this would happen if the requested size was large enough so that the write
                 was split by the kernel into a number of segments, at least one segment was
                 written (otherwise the call would have returned with an error and errno set
                 to EINTR), and the signal occurred while a subsequent segment was being
                 written
       • a filesize limit was reached
       • an I/O error occurred
  The response in all these cases should be to attempt to rewrite those bytes that were not
transferred; if a signal had interrupted the previous try, then the next try will succeed
(unless again interrupted by a signal); if a filesize limit had been reached or an I/O error has
occurred, this next write will yield the appropriate error code.




                                                                                                                     II–26
Example
               main( ) {
                  char buf[BUFSIZE]; int nread;
                  const char* note = "Write failedn";
                  while ((nread = read(0, buf, sizeof(buf))) > 0) {
                       int bytes_left = nread; int bpos = 0;
                       while ((n = write(1, &buf[bpos], bytes_left)) < bytes_left) {
                           if (n == –1) {
                                write(2, note, strlen(note));
                                exit(EXIT_FAILURE);
                           }
                           bytes_left −= n; bpos += n;
                       }
                  }
                  return(EXIT_SUCCESS);
               }

            CS 167                             II–27      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here we copy from the program’s standard input to its standard output, but take
advantage of what we’ve learned about the behavior of write. In particular, since it’s not
guaranteed that write will transfer all the bytes requested, we must supply code to make sure
that all the data does get transferred. Thus, after each call to write (except, for reasons of
space on the slide, when we write to file descriptor 2), we check to see how much data was
written and call write again, if necessary, to handle the unwritten data.




                                                                                                                      II–27
Random Access

                     #include <sys/types.h>
                     #include <unistd.h>
                     off_t lseek(int fd, off_t offset, int whence)
                      – sets the file pointer for fd:
                             if whence is SEEK_SET, the pointer is set to
                             offset bytes;
                             if whence is SEEK_CUR, the pointer is set to its
                             current value plus offset bytes;
                             if whence is SEEK_END, the pointer is set to the
                             size of the file plus offset bytes
                      – it returns the (possibly) updated value of the file
                        pointer relative to the beginning of the file. Thus,
                                  n = lseek(fd, (off_t)0, SEEK_CUR);
                        returns the current value of the file pointer for fd
            CS 167                             II–28       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   To effect random access to files (i.e., access files other than sequentially), you first set the
file pointer, then perform a read or write. Setting the file pointer is done with the lseek
system call.




                                                                                                                       II–28
lseek Example

                     • Example: printing a text file backwards:
                           fd = open("textfile", O_RDONLY);
                           /* go to last char in file */
                           fptr = lseek(fd, (off_t)–1, SEEK_END);
                           while (fptr != –1) {
                                    read(fd, buf, 1);
                                    write(1, buf, 1);
                                    fptr = lseek(fd, (off_t)–2, SEEK_CUR);
                           }




            CS 167                             II–29      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  This example prints the contents of a file backwards. Note what’s happening in the while
statement: it continues as long as fptr does not return –1, a value it will return if the call to
lseek fails. Our intent is that the last successful call to lseek sets the file pointer to 0. The
next call, which attempts to set it to –1, will fail, thus causing lseek to return –1 (recall that
failed system calls always return –1).




                                                                                                                      II–29
Representing an Open File (3)
                    file descriptor              system file                      active inode
                          table                     table                             table
                    0
                    1
                    2
                    3
                                             1     r 10
          fdrw
                    .                        1    rw 20
                    .
            fdr
                    .


                   n–1
                                         ref count f pointer
                                               access     inode
       fdrw = open("x", O_RDWR);
       fdr = open("x", O_RDONLY);
       write(fdrw, buf, 20);
       read(fdr, buf2, 10);
                                      disk                     buffer cache

          CS 167                                     II–30         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




In this slide we see the effect of two opens of the same file within the same process.




                                                                                                                               II–30
Multiple Descriptors; One File

                     • How are standard file descriptors set up?
                        – suppose 1 and 2 are opened separately

                         while ((n = read(0, buf, sizeof(buf))) > 0)
                          if (write(1, buf, n) != n) {
                                  (void)write(2, note, strlen(note));
                                  exit(EXIT_FAILURE);
                          }

                        – error message clobbers data bytes!




            CS 167                            II–31      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   By convention, file descriptors 1 and 2 are used for processes’ normal and diagnostic
output. Normally they both refer to the display, and thus diagnostic output is intermingled
with normal output. Suppose, however, one wanted to redirect both file descriptors so that
all output, normal and diagnostic, was sent to a file. One might open this file twice, once as
file descriptor 1 and again as file descriptor 2, thereby creating two system file table entries.
As file descriptor 1 receives output, the offset field of its file table entry advances with each
write. After 1000 bytes have been written (sequentially), the offset field is set to 1000,
representing the current end-of-file.
   If at this point a diagnostic message is written to file descriptor 2, it will start at the
beginning of the file, overwriting the data already there, since file descriptor 2’s file table
entry’s offset is still at 0. This outcome is certainly not desirable.




                                                                                                                     II–31
dup System Calls

                     • dup returns a new file descriptor referring to
                       the same file as its argument

                           int dup(int fd)

                     • dup2 is similar, but it allows you to specify
                       the new file descriptor

                           int dup2(int oldfd, int newfd)




            CS 167                            II–32      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  We can use one of the dup system calls to solve this problem. dup obeys our rule of always
allocating the lowest available file descriptor. However, with dup2, one can specify, via the
second argument, which file descriptor is allocated. If the second argument is the file
descriptor of an open file, the file is first closed, then associated with the file of the first
argument.




                                                                                                                     II–32
dup Example

                     /* redirect stdout and stderr to same file */
                     /* assumes file descriptor 0 is in use */
                     close(1);
                     open("file", O_WRONLY|O_CREAT, 0666);
                     close(2);
                     dup(1);

                     /* alternatively, replace last two lines with: */
                     dup2(1, 2);



            CS 167                           II–33      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here we see how to use dup and dup2 to set file descriptors 1 and 2 to refer to the same
system file-table entry. Note the extra argument to open. We’ve given open the O_CREAT flag,
which tells the system that if the file does not exist, it should create it. The third argument
helps to specify the access permissions assigned to the file if it’s created by this call. We
discuss this in detail in a few slides.




                                                                                                                    II–33
Representing an Open File (4)
                     file descriptor              system file                      active inode
                           table                     table                             table
                     0
                     1
                     2
                     3
           fdrw
                     .                        2    rw 20
          fdrw2
                     .


                     .


                    n–1
                                          ref count f pointer
                                                access     inode
         fdrw = open("x", O_RDWR);
         fdrw2 = dup(fdrw);
         write(fdrw, buf, 20);

                                       disk                     buffer cache

           CS 167                                     II–34         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




 The dup system call causes two file descriptors to refer to the same file table entry and
hence share the offset.




                                                                                                                                II–34
Representing an Open File (5)
                      file descriptor              system file                      active inode
                            table                     table                             table
                      0
                      1
                      2
                      3
                                               2     r 10
            fdrw
                      .                        4    rw 20
           fdrw2
                      .
              fdr
                      .


                     n–1
                                           ref count f pointer
                                                 access     inode



          fork( )


                                        disk                     buffer cache

            CS 167                                     II–35         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   If our process executes a fork system call, creating a child process, the child is given a file-
descriptor table that’s a copy of its parent’s. Of course, the reference counts on the system-
file-table entries are increased appropriately.




                                                                                                                                 II–35
I/O Redirection

                     % who > file &

                       if (fork( ) == 0) {
                            char *args[ ] = {"who", 0};
                            close(1);
                            open("file", O_WRONLY|O_TRUNC, 0666);
                            execv("who", args);
                            printf("you screwed upn");
                            exit(1);
                       }


            CS 167                            II–36      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  This is an example of what a shell might do to handle I/O redirection: it first creates a new
process in which to run a command (“who”, in this case). In the new process it closes file
descriptor 1 (standard output—to which normal output is written). It then opens “file” (the
arguments indicate that “file” is only to be written to, that any prior contents are erased, and
that if “file” didn’t already exist, it will be created with read and write permission for all;
assuming that file descriptor 0 is not available (it’s assigned to standard input), file
descriptor 1 will be assigned to “file”. Assuming that execv succeeds, when “who” runs, its
output is written to “file”.
  Note that the parent process does not wait for its child to terminate; it goes on to execute
further commands. (This behavior occurs because we’ve placed an “&” at the end of the
command line.)
  Note the args argument to execv: By convention, the first argument to each command is
the name of the command (“who” in this case). To indicate that there are no further
arguments, a zero is supplied.
  Note that we aren’t checking for errors: this is only because doing so would cause the
resulting code not to fit in the slide. You should always check for errors.




                                                                                                                     II–36
More I/O Redirection

            % who >& file &

                 if (fork( ) == 0) {
                      char *args[ ] = {"who", 0};
                      close(1);
                      close(2);
                      open("file", O_WRONLY|O_TRUNC|O_CREAT, 0666);
                      dup(1);
                      execv("who", args);
                      …
                 }



            CS 167                             II–37       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  In this example (using csh syntax), we run “who” with both standard output (file descriptor
1) and standard error (file descriptor 2) redirected to “file”. It’s done in such a way that writes
to either standard output or standard error go the current end of the file.




                                                                                                                       II–37
Waiting For Termination

                     % who

                       if ((pid = fork( )) == 0) {
                           ...
                           execv("who", args);
                           ...
                       }
                       while (pid != wait(0))
                           ;



            CS 167                           II–38      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   Here we execute a shell command and wait for it to terminate before going on to the next
command (i.e., what’s normally done). The wait system call causes the caller to wait (i.e., not
execute) until one of its children terminate and then returns the process ID of the process
that’s terminated (if its argument is nonzero, it points to an area of memory at which will be
written status information about the child, such as the argument it passed to exit). Here our
first process waits until the process it created to run “who” terminates.




                                                                                                                    II–38
Open
             #include <sys/types.h>
             #include <sys/stat.h>
             #include <fcntl.h>
             int open(const char *path, int options [, mode_t mode])

                     – options
                        -   O_RDONLY   open for reading only
                        -   O_WRONLY   open for writing only
                        -   O_RDWR     open for reading and writing
                        -   O_APPEND   set the file offset to end of file prior to each
                                       write
                        - O_CREAT      if the file does not exist, then create it,
                                       setting its mode to mode adjusted by umask
                        - O_EXCL       if O_EXCL and O_CREAT are set, then
                                       open fails if the file exists
                        - O_TRUNC      delete any previous contents of the file
                        - O_NONBLOCK   don’t wait if I/O can’t be done immediately
            CS 167                            II–39       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here’s a partial list of the options available as the second argument to open. (Further
options are often available, but they depend on the version of Unix.) Note that the first three
options are mutually exclusive: one, and only one, must be supplied. We discuss the third
argument to open, mode, shortly.




                                                                                                                      II–39
File Access Permissions

                     • Who’s allowed to do what?
                       – who
                          - user (owner)
                          - group
                          - others (rest of the world)
                       – what
                          - read
                          - write
                          - execute




            CS 167                            II–40      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Each file has associated with it a set of access permissions indicating, for each of three
classes of principals, what sorts of operations on the file are allowed. The three classes are
the owner of the file, known as user, the group owner of the file, known simply as group, and
everyone else, known as others. The operations are grouped into the classes read, write, and
execute, with their obvious meanings. The access permissions apply to directories as well as
to ordinary files, though the meaning of execute for directories is not quite so obvious: one
must have execute permission for a directory file in order to follow a path through it.
  The system, when checking permissions, first determines the smallest class of principals
the requester belongs to: user (smallest), group, or others (largest). It then, within the chosen
class, checks for appropriate permissions.




                                                                                                                     II–40
Permissions Example
             % ls -lR
             .:
             total 2
             drwxr-x--x      2 tom      adm           1024 Dec 17 13:34 A
             drwxr-----      2 tom      adm           1024 Dec 17 13:34 B

             ./A:
             total 1
             -rw-rw-rw-      1 tom      adm            593 Dec 17 13:34 x

             ./B:
             total 2
             -r--rw-rw-      1 tom      adm            446 Dec 17 13:34 x
             -rw----rw-      1 trina    adm            446 Dec 17 13:45 y
            CS 167                            II–41        Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  In the current directory are two subdirectories, A and B, with access permissions as shown
in the slide. Note that the permissions are given as a string of characters: the first character
indicates whether or not the file is a directory, the next three characters are the permissions
for the owner of the file, the next three are the permissions for the members of the file’s
group’s members, and the last three are the permissions for the rest of the world.
  Quiz: the users tom and trina are members of the adm group; andy is not.
       May andy list the contents of directory A?
       May andy read A/x?
       May trina list the contents of directory B?
       May trina modify B/y?
       May tom modify B/x?
       May tom read B/y?




                                                                                                                       II–41
Setting File Permissions
                    #include <sys/types.h>
                    #include <sys/stat.h>
                    int chmod(const char *path, mode_t mode)

                       – sets the file permissions of the given file to those
                         specified in mode
                       – only the owner of a file and the superuser may
                         change its permissions
                       – nine combinable possibilities for mode
                         (read/write/execute for user, group, and others)
                           - S_IRUSR (0400), S_IWUSR (0200), S_IXUSR (0100)
                           - S_IRGRP (040), S_IWGRP (020), S_IXGRP (010)
                           - S_IROTH (04), S_IWOTH (02), S_IXOTH (01)


           CS 167                            II–42      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The chmod system call (and the similar chmod shell command) are used to change the
permissions of a file. Note that the symbolic names for the permissions are rather
cumbersome; what is often done is to use their numerical equivalents instead. Thus the
combination of read/write/execute permission for the user (0700), read/execute permission
for the group (050), and execute-only permission for others (01) can be specified simply as
0751.




                                                                                                                    II–42
Creating a File

             • Use either open or creat
                     – open (const char *pathname, int flags, mode_t mode)
                         - flags must include O_CREAT
                     – creat(const char *pathname, mode_t mode)
                         - open is preferred
             • The mode parameter helps specify the permissions of
               the newly created file
                     – permissions = mode & ~umask




            CS 167                             II–43      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   Originally in Unix one created a file only by using the creat system call. A separate
O_CREAT flag was later given to open so that it, too, can be used to create files. The creat
system call fails if the file already exists. For open, what happens if the file already exists
depends upon the use of the flags O_EXCL and O_TRUNC. If O_EXCL is included with the
flags (e.g., open(“newfile”, O_CREAT|O_EXCL, 0777)), then, as with creat, the call fails if the
file exists. Otherwise, the call succeeds and the (existing) file is opened. If O_TRUNC is
included in the flags, then, if the file exists, its previous contents are eliminated and the file
(whose size is now zero) is opened.
   When a file is created by either open or creat, the file’s initial access permissions are the
bitwise AND of the mode parameter and the complement of the process’s umask (explained
in the next slide).




                                                                                                                      II–43
Umask

                     • Standard programs create files with
                       “maximum needed permissions” as mode
                        – compilers: 0777
                        – editors: 0666
                     • Per-process parameter, umask, used to turn
                       off undesired permission bits
                        – e.g., turn off all permissions for others, write
                          permission for group: set umask to 027
                            - compilers: permissions = 0777 & ~(027) =
                              0750
                            - editors: permissions = 0666 & ~(027) = 0640
                        – set with umask system call or (usually) shell
                          command
            CS 167                            II–44      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   The umask (often called the “creation mask”) allows programs to have wired into them a
standard set of maximum needed permissions as their file-creation modes. Users then have,
as part of their environment (via a per-process parameter that is inherited by child processes
from their parents), a limit on the permissions given to each of the classes of security
principals. This limit (the umask) looks like the 9-bit permissions vector associated with each
file, but each one-bit indicates that the corresponding permission is not to be granted. Thus,
if umask is set to 022, then, whenever a file is created, regardless of the settings of the mode
bits in the open or creat call, read permission for group and others is not to be included with
the file’s access permissions.
   You can determine the current setting of umask by executing the umask shell command
without any arguments.




                                                                                                                     II–44

More Related Content

02unixintro

  • 1. Introduction to Unix CS 167 II–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved. II–1
  • 2. Outline • Processes • File Abstraction • Directories • File Representation • File-Oriented System Calls CS 167 II–2 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In this lecture we present a brief introduction to a few of the more important aspects of the Unix operating system. In particular, we look at the Unix notions of processes and files—two concepts that are important throughout the course. II–2
  • 3. Processes • Fundamental abstraction of program execution – memory – processor(s) - each processor abstraction is a thread – “execution context” CS 167 II–3 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Unix, as do many operating systems, uses the notion of a process as its fundamental abstraction of program execution. Each program runs in a separate process. Processes are protected from one another in the sense that the actions of one process cannot directly harm others. The abstraction comprises the memory of a program (known as its address space— the collection of locations that can be referenced by the process), the execution agents (processor abstractions), and other information, known collectively as the execution context, representing such things as the files the process is currently accessing, how it responds to exceptions, to external stimuli, etc. The processor abstraction is often called a thread. In “traditional” Unix programs, processes have only one thread, so we’ll use the word process to include the single thread running inside of it. Later in this course, when we cover multithreaded programming, we’ll be more careful and use the word thread when we are discussing the processor abstraction. II–3
  • 4. The Unix Address Space stack dynamic bss data text CS 167 II–4 Copyright © 2006 Thomas W. Doeppner. All rights reserved. A Unix process’s address space appears to be three regions of memory: a read-only text region (containing executable code); a read-write region consisting of initialized data (simply called data), uninitialized data (BSS—a directive from an ancient assembler (for the IBM 704 series of computers), standing for Block Started by Symbol and used to reserve space for uninitialized storage), and a dynamic area; and a second read-write region containing the process’s user stack (a standard Unix process contains only one thread of control). The first area of read-write storage is often collectively called the data region. Its dynamic portion grows in response to sbrk system calls. Most programmers do not use this system call directly, but instead use the malloc and free library routines, which manage the dynamic area and allocate memory when needed by in turn executing sbrk system calls. The stack region grows implicitly: whenever an attempt is made to reference beyond the current end of stack, the stack is implicitly grown to the new reference. (There are system- wide and per-process limits on the maximum data and stack sizes of processes.) II–4
  • 5. Creating a Process: Before fork( ) parent process CS 167 II–5 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The only way to create a new process is to use the fork system call. II–5
  • 6. Creating a Process: After fork( ) fork( ) // returns p // returns 0 parent process child process (pid = p) CS 167 II–6 Copyright © 2006 Thomas W. Doeppner. All rights reserved. By executing fork the parent process creates an almost exact clone of itself which we call the child process. This new process executes the same text as its parent, but contains a copy of the data and a copy of the stack. This copying of the parent to create the child can be very time-consuming. We discuss later how it is optimized. Fork is a very unusual system call: one thread of control flows into it but two threads of control flow out of it, each in a separate address space. From the parent’s point of view, fork does very little: nothing happens to the parent except that fork returns the process ID (PID— an integer) of the new process. The new process starts off life by returning from fork. It always views fork as returning a zero. II–6
  • 7. Loading a New Image args prog’s bss prog’s data exec(prog, args) prog’s text Before After CS 167 II–7 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Most of the time the purpose of creating a new process is to run a new (i.e., different) program. Once a new process has been created, it can use the exec system call to load a new program image into itself, replacing the prior contents of the process’s address space. Exec is passed the name of a file containing a fully relocated program image (which might require further linking via a runtime linker). The previous text region of the process is replaced with the text of the program image. The data, BSS and dynamic areas of the process are “thrown away” and replaced with the data and BSS of the program image. The contents of the process’s stack are replaced with the arguments that are passed to the main procedure of the program. II–7
  • 8. Fork/Exec Example if (fork( ) == 0) { // child process –– set up I/O in child –– execv(newprogram, parameters); // load new image // if we get here, there’s a problem } // parent process continues here CS 167 II–8 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The slide shows a typical example of the use of the fork and exec system calls. The parent process calls fork and two threads of control, one in the parent process and one in the child process, return. If a thread sees that fork returns 0, then it must be in the child process. What typically happens next is that this thread sets up the I/O descriptors of the new process (which we discuss shortly) and then execs a new program. Here we use the execv system call, which passes the arguments to the system in a vector (hence the v). If execv succeeds, it does not return—there is nothing to return to! The process contains an entirely new image. Thus if execv does return, there must have been an error (for example, the file did not exist). If fork returns a positive value, then this value is the process ID of the child process and the thread executing this code must be in the parent process (i.e., it is the original thread that called fork in the first place). It might continue on and do something totally independent of the child process, or it might at some point wait until the child terminates. II–8
  • 9. Environment • Set of pairs: <name, value> – e.g. <term, xterm>, <TZ, US/Eastern> – maintained in user-mode address space - Accessed using library routines • getenv • putenv • Initially established via exec: – execve(char *file, char *argv[ ], char *envp[ ]) - supplied in third argument – execv(char *file, char *argv[ ]) - Inherited from caller CS 167 II–9 Copyright © 2006 Thomas W. Doeppner. All rights reserved. A process’s environment is a set of names and values that’s maintained in its user-mode address space. It’s established when the process image is set up via exec. As shown in the slide, there’s a variant of exec, execve, with which one can supply the environment explicitly. If execv is used, the environment is passed to the new image directly from the old. II–9
  • 10. System Calls • Sole interface between user and kernel • Implemented as library routines that execute trap instructions to enter kernel • Errors indicated by return of –1; error code is in errno if (write(fd, buffer, bufsize) == –1) { // error! printf("error %dn", errno); // see perror } CS 167 II–10 Copyright © 2006 Thomas W. Doeppner. All rights reserved. System calls, such as fork, execv, read, write, etc., are the only means for application programs to communicate directly with the kernel: they form an API (application program interface) to the kernel. When a program calls such a routine, it is actually placing a call to a subroutine in a system library. The body of this subroutine contains a hardware-specific trap instruction which transfers control and some parameters to the kernel. On return to this library return, the kernel provides an indication of whether or not there was an error and what the error was. The error indication is passed back to the original caller via the functional return value of the library routine. If there was an error, a positive-integer code identifying it is stored in the global variable errno. Rather than simply print this code out, as shown in the slide, one might instead print out an informative error message. This can be done via the perror routine. II–10
  • 11. System Calls other stuff kernel stack kernel text trap into kernel write(fd, buf, len) CS 167 II–11 Copyright © 2006 Thomas W. Doeppner. All rights reserved. II–11
  • 12. Multiple Processes other stuff kernel stack other stuff kernel stack other stuff kernel stack other stuff kernel stack kernel text CS 167 II–12 Copyright © 2006 Thomas W. Doeppner. All rights reserved. II–12
  • 13. The File Abstraction • A file is a simple array of bytes • Files are made larger by writing beyond their current end • Files are named by paths in a naming tree • System calls on files are synchronous CS 167 II–13 Copyright © 2006 Thomas W. Doeppner. All rights reserved. As discussed three pages ago, most programs perform file I/O using library code layered on top of kernel code. In this section we discuss just the kernel aspects of file I/O, looking at the abstraction and the high-level aspects of how this abstraction is implemented. The Unix file abstraction is very simple: files are simply arrays of bytes. Many systems have special system calls to make a file larger. In Unix, you simply write where you’ve never written before, and the file “magically” grows to the new size (within limits). The names of files are equally straightforward—just the names labeling the path that leads to the file within the directory tree. Finally, from the programmer’s point of view, all operations on files appear to be synchronous—when an I/O system call returns, as far as the process is concerned, the I/O has completed. (Things are different from the kernel’s point of view, as discussed later.) II–13
  • 14. Directories unix etc home pro dev passwd motd twd unix ... slide1 slide2 CS 167 II–14 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here is a portion of a Unix directory tree. The ovals represent files, the rectangles represent directories (which are really just special cases of files). II–14
  • 15. Directory Representation Component Name Inode Number directory entry . 1 .. 1 unix 117 etc 4 home 18 pro 36 dev 93 CS 167 II–15 Copyright © 2006 Thomas W. Doeppner. All rights reserved. A directory consists of an array of pairs of component name and inode number, where the latter identifies the target file’s inode to the operating system (an inode is data structure maintained by the operating system that represents a file). Note that every directory contains two special entries, “.” and “..”. The former refers to the directory itself, the latter to the directory’s parent (in the case of the slide, the directory is the root directory and has no parent, thus its “..” entry is a special case that refers to the directory itself). II–15
  • 16. Hard Links unix etc home pro dev twd unix ... image motd slide1 slide2 % ln /unix /etc/image # link system call CS 167 II–16 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here are two directory entries referring to the same file. This is done, via the shell, through the ln command which creates a (hard) link to its first argument, giving it the name specified by its second argument. The shell’s “ln” command is implemented using the link system call. II–16
  • 17. Directory Representation . 1 .. 1 unix 117 etc 4 home 18 pro 36 dev 93 . 4 .. 1 image 117 motd 33 CS 167 II–17 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here are the (abbreviated) contents of both the root (/) and /etc directories, showing how /unix and /etc/image are the same file. Note that if the directory entry /unix is deleted (via the shell’s “rm” command), the file (represented by inode 117) continues to exist, since there is still a directory entry referring to it. However if /etc/image is also deleted, then the file has no more links and is removed. To implement this, the file’s inode contains a link count, indicating the total number of directory entries that refer to it. A file is actually deleted only when its inode’s link count reaches zero. Note: suppose a file is open, i.e. is being used by some process, when its link count becomes zero. Rather than delete the file while the process is using it, the file will continue to exist until no process has it open. Thus the inode also contains a reference count indicating how many times it is open: in particular, how many system file table entries point to it. A file is deleted when and only when both the link count and this reference count become zero. The shell’s “rm” command is implemented using the unlink system call. Note that /etc/.. refers to the root directory. II–17
  • 18. Soft Links unix etc home pro dev twd unix ... mylink image twd slide1 slide2 /unix /home/twd % ln –s /unix /home/twd/mylink % ln –s /home/twd /etc/twd # symlink system call CS 167 II–18 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Differing from a hard link, a soft link (or symbolic link) is a special kind of file containing the name of another file. When the kernel processes such a file, rather than simply retrieving its contents, it makes use of the contents by replacing the portion of the directory path that it has already followed with the contents of the soft-link file and then following the resulting path. Thus referencing /home/twd/mylink results in the same file as referencing /unix. Referencing /etc/twd/unix/slide1 results in the same file as referencing /home/twd/unix/slide1. The shell’s “ln” command with the “-s” flag is implemented using the symlink system call. II–18
  • 19. Working Directory • Maintained in kernel (as an inode number) for each process – paths not starting from “/” start with the working directory – changed by use of the chdir system call – displayed (via shell) using “pwd” - how is this done? CS 167 II–19 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The working directory is maintained (as the inode number of the directory) in the kernel for each process. Whenever a process attempts to follow a path that doesn’t start with “/”, it starts at its working directory (rather than at “/”). II–19
  • 20. Standard File Descriptors main( ) { char buf[BUFSIZE]; int n; const char* note = "Write failedn"; while ((n = read(0, buf, sizeof(buf))) > 0) if (write(1, buf, n) != n) { (void)write(2, note, strlen(note)); exit(EXIT_FAILURE); } return(EXIT_SUCCESS); } CS 167 II–20 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The file descriptors 0, 1, and 2 are opened to access your terminal when you log in, and are preserved across forks, unless redirected. II–20
  • 21. Representing an Open File (1) file descriptor system file active inode table table table 0 1 2 3 . . . n–1 ref count f pointer access inode disk buffer cache CS 167 II–21 Copyright © 2006 Thomas W. Doeppner. All rights reserved. A process’s set of open files is represented by a data structure known as the file descriptor table, which is used to map file descriptors, representing open files, to system file table entries, each representing an open file. Each process has a separate file descriptor table, but there is exactly one system file table for the entire system. Each active file (i.e., a file that is open or otherwise being used) is represented by an inode that is entered in the active inode table. Recently accessed blocks of data from the file are kept in the kernel’s data section in what is known as the buffer cache—an area of memory reserved for buffering data from files. Thus data from these blocks can be accessed again and again without wasting time going to the disk. When a disk block is modified, the modifications are written in the buffer cache and only later are written to disk. This allows the process doing the modification to proceed without having to wait for the data to be written to disk. II–21
  • 22. Representing an Open File (2) file descriptor system file active inode table table table 0 1 2 3 fdrw . 1 rw 0 . . n–1 ref count f pointer access inode fdrw = open("x", O_RDWR); disk buffer cache CS 167 II–22 Copyright © 2006 Thomas W. Doeppner. All rights reserved. This slide shows what happens when a file is opened. The lowest-numbered available file descriptor is allocated from the file descriptor table. Next, an entry in the system file table is allocated, and the file descriptor table entry is set to point to the system file table entry. An inode-table entry for the file is allocated (or found if it already exists) and the system file table entry is set to point to it. Additional fields of the system file table entry are initialized, including: • a reference count • each file-descriptor-table entry pointing to it contribute one to the reference count • the entry is freed when the reference count goes to zero • the allowed access (i.e., how the file was opened—read-only, read-write, etc.) • the file pointer (i.e., the location within the file at which the next transfer will start) • after every read and write system call the file pointer is incremented by the number of bytes actually transferred, thus facilitating sequential I/O II–22
  • 23. Global View other stuff kernel stack other stuff kernel stack other stuff kernel stack other stuff kernel stack kernel text CS 167 II–23 Copyright © 2006 Thomas W. Doeppner. All rights reserved. II–23
  • 24. Allocation of File Descriptors • Whenever a process requests a new file descriptor, the lowest numbered file descriptor not already associated with an open file is selected; thus #include <fcntl.h> #include <unistd.h> close(0); fd = open("file", O_RDONLY); – will always associate file with file descriptor 0 (assuming that the open succeeds) CS 167 II–24 Copyright © 2006 Thomas W. Doeppner. All rights reserved. One can depend on always getting the lowest available file descriptor. II–24
  • 25. Reading From a File #include <sys/types.h> #include <unistd.h> ssize_t read(int fd, void *buffer, size_t n) – read up to n bytes of data into buffer – returns number of bytes transferred – returns 0 on end of file – returns –1 on error CS 167 II–25 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The read system call tells the system to read up to n bytes of data into the area of memory pointed to by buffer from the file referred to by fd. It returns the number of bytes actually transferred, or, as usual, –1 if there was an error. Read will transfer fewer bytes than specified if: • the number of bytes left in the file was less than n • the read request was interrupted by a signal after some of the bytes were transferred • the file is a pipe, FIFO, or special device with less than n bytes immediately available for reading It transfers as many bytes as are present up to the given maximum; if it returns a zero, that means the end of the file has been reached. II–25
  • 26. Writing To a File #include <sys/types.h> #include <unistd.h> ssize_t write(int fd, void *buffer, size_t n) – write up to n bytes of data from buffer – returns number of bytes transferred – returns –1 on error CS 167 II–26 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The write system call tells the system to write up to n bytes of data from the area of memory pointed to by buffer into the file referred to by fd. It returns the number of bytes actually transferred, or, as usual, –1 if there was an error. If write transfers fewer bytes than specified, then something caused the transfer to stop prematurely: • a signal interrupted the system call • this is pretty complicated and we’ll discuss it in more detail later. However, this would happen if the requested size was large enough so that the write was split by the kernel into a number of segments, at least one segment was written (otherwise the call would have returned with an error and errno set to EINTR), and the signal occurred while a subsequent segment was being written • a filesize limit was reached • an I/O error occurred The response in all these cases should be to attempt to rewrite those bytes that were not transferred; if a signal had interrupted the previous try, then the next try will succeed (unless again interrupted by a signal); if a filesize limit had been reached or an I/O error has occurred, this next write will yield the appropriate error code. II–26
  • 27. Example main( ) { char buf[BUFSIZE]; int nread; const char* note = "Write failedn"; while ((nread = read(0, buf, sizeof(buf))) > 0) { int bytes_left = nread; int bpos = 0; while ((n = write(1, &buf[bpos], bytes_left)) < bytes_left) { if (n == –1) { write(2, note, strlen(note)); exit(EXIT_FAILURE); } bytes_left −= n; bpos += n; } } return(EXIT_SUCCESS); } CS 167 II–27 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here we copy from the program’s standard input to its standard output, but take advantage of what we’ve learned about the behavior of write. In particular, since it’s not guaranteed that write will transfer all the bytes requested, we must supply code to make sure that all the data does get transferred. Thus, after each call to write (except, for reasons of space on the slide, when we write to file descriptor 2), we check to see how much data was written and call write again, if necessary, to handle the unwritten data. II–27
  • 28. Random Access #include <sys/types.h> #include <unistd.h> off_t lseek(int fd, off_t offset, int whence) – sets the file pointer for fd: if whence is SEEK_SET, the pointer is set to offset bytes; if whence is SEEK_CUR, the pointer is set to its current value plus offset bytes; if whence is SEEK_END, the pointer is set to the size of the file plus offset bytes – it returns the (possibly) updated value of the file pointer relative to the beginning of the file. Thus, n = lseek(fd, (off_t)0, SEEK_CUR); returns the current value of the file pointer for fd CS 167 II–28 Copyright © 2006 Thomas W. Doeppner. All rights reserved. To effect random access to files (i.e., access files other than sequentially), you first set the file pointer, then perform a read or write. Setting the file pointer is done with the lseek system call. II–28
  • 29. lseek Example • Example: printing a text file backwards: fd = open("textfile", O_RDONLY); /* go to last char in file */ fptr = lseek(fd, (off_t)–1, SEEK_END); while (fptr != –1) { read(fd, buf, 1); write(1, buf, 1); fptr = lseek(fd, (off_t)–2, SEEK_CUR); } CS 167 II–29 Copyright © 2006 Thomas W. Doeppner. All rights reserved. This example prints the contents of a file backwards. Note what’s happening in the while statement: it continues as long as fptr does not return –1, a value it will return if the call to lseek fails. Our intent is that the last successful call to lseek sets the file pointer to 0. The next call, which attempts to set it to –1, will fail, thus causing lseek to return –1 (recall that failed system calls always return –1). II–29
  • 30. Representing an Open File (3) file descriptor system file active inode table table table 0 1 2 3 1 r 10 fdrw . 1 rw 20 . fdr . n–1 ref count f pointer access inode fdrw = open("x", O_RDWR); fdr = open("x", O_RDONLY); write(fdrw, buf, 20); read(fdr, buf2, 10); disk buffer cache CS 167 II–30 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In this slide we see the effect of two opens of the same file within the same process. II–30
  • 31. Multiple Descriptors; One File • How are standard file descriptors set up? – suppose 1 and 2 are opened separately while ((n = read(0, buf, sizeof(buf))) > 0) if (write(1, buf, n) != n) { (void)write(2, note, strlen(note)); exit(EXIT_FAILURE); } – error message clobbers data bytes! CS 167 II–31 Copyright © 2006 Thomas W. Doeppner. All rights reserved. By convention, file descriptors 1 and 2 are used for processes’ normal and diagnostic output. Normally they both refer to the display, and thus diagnostic output is intermingled with normal output. Suppose, however, one wanted to redirect both file descriptors so that all output, normal and diagnostic, was sent to a file. One might open this file twice, once as file descriptor 1 and again as file descriptor 2, thereby creating two system file table entries. As file descriptor 1 receives output, the offset field of its file table entry advances with each write. After 1000 bytes have been written (sequentially), the offset field is set to 1000, representing the current end-of-file. If at this point a diagnostic message is written to file descriptor 2, it will start at the beginning of the file, overwriting the data already there, since file descriptor 2’s file table entry’s offset is still at 0. This outcome is certainly not desirable. II–31
  • 32. dup System Calls • dup returns a new file descriptor referring to the same file as its argument int dup(int fd) • dup2 is similar, but it allows you to specify the new file descriptor int dup2(int oldfd, int newfd) CS 167 II–32 Copyright © 2006 Thomas W. Doeppner. All rights reserved. We can use one of the dup system calls to solve this problem. dup obeys our rule of always allocating the lowest available file descriptor. However, with dup2, one can specify, via the second argument, which file descriptor is allocated. If the second argument is the file descriptor of an open file, the file is first closed, then associated with the file of the first argument. II–32
  • 33. dup Example /* redirect stdout and stderr to same file */ /* assumes file descriptor 0 is in use */ close(1); open("file", O_WRONLY|O_CREAT, 0666); close(2); dup(1); /* alternatively, replace last two lines with: */ dup2(1, 2); CS 167 II–33 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here we see how to use dup and dup2 to set file descriptors 1 and 2 to refer to the same system file-table entry. Note the extra argument to open. We’ve given open the O_CREAT flag, which tells the system that if the file does not exist, it should create it. The third argument helps to specify the access permissions assigned to the file if it’s created by this call. We discuss this in detail in a few slides. II–33
  • 34. Representing an Open File (4) file descriptor system file active inode table table table 0 1 2 3 fdrw . 2 rw 20 fdrw2 . . n–1 ref count f pointer access inode fdrw = open("x", O_RDWR); fdrw2 = dup(fdrw); write(fdrw, buf, 20); disk buffer cache CS 167 II–34 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The dup system call causes two file descriptors to refer to the same file table entry and hence share the offset. II–34
  • 35. Representing an Open File (5) file descriptor system file active inode table table table 0 1 2 3 2 r 10 fdrw . 4 rw 20 fdrw2 . fdr . n–1 ref count f pointer access inode fork( ) disk buffer cache CS 167 II–35 Copyright © 2006 Thomas W. Doeppner. All rights reserved. If our process executes a fork system call, creating a child process, the child is given a file- descriptor table that’s a copy of its parent’s. Of course, the reference counts on the system- file-table entries are increased appropriately. II–35
  • 36. I/O Redirection % who > file & if (fork( ) == 0) { char *args[ ] = {"who", 0}; close(1); open("file", O_WRONLY|O_TRUNC, 0666); execv("who", args); printf("you screwed upn"); exit(1); } CS 167 II–36 Copyright © 2006 Thomas W. Doeppner. All rights reserved. This is an example of what a shell might do to handle I/O redirection: it first creates a new process in which to run a command (“who”, in this case). In the new process it closes file descriptor 1 (standard output—to which normal output is written). It then opens “file” (the arguments indicate that “file” is only to be written to, that any prior contents are erased, and that if “file” didn’t already exist, it will be created with read and write permission for all; assuming that file descriptor 0 is not available (it’s assigned to standard input), file descriptor 1 will be assigned to “file”. Assuming that execv succeeds, when “who” runs, its output is written to “file”. Note that the parent process does not wait for its child to terminate; it goes on to execute further commands. (This behavior occurs because we’ve placed an “&” at the end of the command line.) Note the args argument to execv: By convention, the first argument to each command is the name of the command (“who” in this case). To indicate that there are no further arguments, a zero is supplied. Note that we aren’t checking for errors: this is only because doing so would cause the resulting code not to fit in the slide. You should always check for errors. II–36
  • 37. More I/O Redirection % who >& file & if (fork( ) == 0) { char *args[ ] = {"who", 0}; close(1); close(2); open("file", O_WRONLY|O_TRUNC|O_CREAT, 0666); dup(1); execv("who", args); … } CS 167 II–37 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In this example (using csh syntax), we run “who” with both standard output (file descriptor 1) and standard error (file descriptor 2) redirected to “file”. It’s done in such a way that writes to either standard output or standard error go the current end of the file. II–37
  • 38. Waiting For Termination % who if ((pid = fork( )) == 0) { ... execv("who", args); ... } while (pid != wait(0)) ; CS 167 II–38 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here we execute a shell command and wait for it to terminate before going on to the next command (i.e., what’s normally done). The wait system call causes the caller to wait (i.e., not execute) until one of its children terminate and then returns the process ID of the process that’s terminated (if its argument is nonzero, it points to an area of memory at which will be written status information about the child, such as the argument it passed to exit). Here our first process waits until the process it created to run “who” terminates. II–38
  • 39. Open #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *path, int options [, mode_t mode]) – options - O_RDONLY open for reading only - O_WRONLY open for writing only - O_RDWR open for reading and writing - O_APPEND set the file offset to end of file prior to each write - O_CREAT if the file does not exist, then create it, setting its mode to mode adjusted by umask - O_EXCL if O_EXCL and O_CREAT are set, then open fails if the file exists - O_TRUNC delete any previous contents of the file - O_NONBLOCK don’t wait if I/O can’t be done immediately CS 167 II–39 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here’s a partial list of the options available as the second argument to open. (Further options are often available, but they depend on the version of Unix.) Note that the first three options are mutually exclusive: one, and only one, must be supplied. We discuss the third argument to open, mode, shortly. II–39
  • 40. File Access Permissions • Who’s allowed to do what? – who - user (owner) - group - others (rest of the world) – what - read - write - execute CS 167 II–40 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Each file has associated with it a set of access permissions indicating, for each of three classes of principals, what sorts of operations on the file are allowed. The three classes are the owner of the file, known as user, the group owner of the file, known simply as group, and everyone else, known as others. The operations are grouped into the classes read, write, and execute, with their obvious meanings. The access permissions apply to directories as well as to ordinary files, though the meaning of execute for directories is not quite so obvious: one must have execute permission for a directory file in order to follow a path through it. The system, when checking permissions, first determines the smallest class of principals the requester belongs to: user (smallest), group, or others (largest). It then, within the chosen class, checks for appropriate permissions. II–40
  • 41. Permissions Example % ls -lR .: total 2 drwxr-x--x 2 tom adm 1024 Dec 17 13:34 A drwxr----- 2 tom adm 1024 Dec 17 13:34 B ./A: total 1 -rw-rw-rw- 1 tom adm 593 Dec 17 13:34 x ./B: total 2 -r--rw-rw- 1 tom adm 446 Dec 17 13:34 x -rw----rw- 1 trina adm 446 Dec 17 13:45 y CS 167 II–41 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In the current directory are two subdirectories, A and B, with access permissions as shown in the slide. Note that the permissions are given as a string of characters: the first character indicates whether or not the file is a directory, the next three characters are the permissions for the owner of the file, the next three are the permissions for the members of the file’s group’s members, and the last three are the permissions for the rest of the world. Quiz: the users tom and trina are members of the adm group; andy is not. May andy list the contents of directory A? May andy read A/x? May trina list the contents of directory B? May trina modify B/y? May tom modify B/x? May tom read B/y? II–41
  • 42. Setting File Permissions #include <sys/types.h> #include <sys/stat.h> int chmod(const char *path, mode_t mode) – sets the file permissions of the given file to those specified in mode – only the owner of a file and the superuser may change its permissions – nine combinable possibilities for mode (read/write/execute for user, group, and others) - S_IRUSR (0400), S_IWUSR (0200), S_IXUSR (0100) - S_IRGRP (040), S_IWGRP (020), S_IXGRP (010) - S_IROTH (04), S_IWOTH (02), S_IXOTH (01) CS 167 II–42 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The chmod system call (and the similar chmod shell command) are used to change the permissions of a file. Note that the symbolic names for the permissions are rather cumbersome; what is often done is to use their numerical equivalents instead. Thus the combination of read/write/execute permission for the user (0700), read/execute permission for the group (050), and execute-only permission for others (01) can be specified simply as 0751. II–42
  • 43. Creating a File • Use either open or creat – open (const char *pathname, int flags, mode_t mode) - flags must include O_CREAT – creat(const char *pathname, mode_t mode) - open is preferred • The mode parameter helps specify the permissions of the newly created file – permissions = mode & ~umask CS 167 II–43 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Originally in Unix one created a file only by using the creat system call. A separate O_CREAT flag was later given to open so that it, too, can be used to create files. The creat system call fails if the file already exists. For open, what happens if the file already exists depends upon the use of the flags O_EXCL and O_TRUNC. If O_EXCL is included with the flags (e.g., open(“newfile”, O_CREAT|O_EXCL, 0777)), then, as with creat, the call fails if the file exists. Otherwise, the call succeeds and the (existing) file is opened. If O_TRUNC is included in the flags, then, if the file exists, its previous contents are eliminated and the file (whose size is now zero) is opened. When a file is created by either open or creat, the file’s initial access permissions are the bitwise AND of the mode parameter and the complement of the process’s umask (explained in the next slide). II–43
  • 44. Umask • Standard programs create files with “maximum needed permissions” as mode – compilers: 0777 – editors: 0666 • Per-process parameter, umask, used to turn off undesired permission bits – e.g., turn off all permissions for others, write permission for group: set umask to 027 - compilers: permissions = 0777 & ~(027) = 0750 - editors: permissions = 0666 & ~(027) = 0640 – set with umask system call or (usually) shell command CS 167 II–44 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The umask (often called the “creation mask”) allows programs to have wired into them a standard set of maximum needed permissions as their file-creation modes. Users then have, as part of their environment (via a per-process parameter that is inherited by child processes from their parents), a limit on the permissions given to each of the classes of security principals. This limit (the umask) looks like the 9-bit permissions vector associated with each file, but each one-bit indicates that the corresponding permission is not to be granted. Thus, if umask is set to 022, then, whenever a file is created, regardless of the settings of the mode bits in the open or creat call, read permission for group and others is not to be included with the file’s access permissions. You can determine the current setting of umask by executing the umask shell command without any arguments. II–44