02unixintro

Introduction to Unix

CS 167 II–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved.

II–1

Outline

• Processes
• File Abstraction
• Directories
• File Representation
• File-Oriented System Calls


In this lecture we present a brief introduction to a few of the more important aspects of the
Unix operating system. In particular, we look at the Unix notions of processes and files—two
concepts that are important throughout the course.

II–2

Processes

• Fundamental abstraction of program execution
– memory
– processor(s)
- each processor abstraction is a thread
– “execution context”


Unix, as do many operating systems, uses the notion of a process as its fundamental
abstraction of program execution. Each program runs in a separate process. Processes are
protected from one another in the sense that the actions of one process cannot directly harm
others. The abstraction comprises the memory of a program (known as its address space—
the collection of locations that can be referenced by the process), the execution agents
(processor abstractions), and other information, known collectively as the execution context,
representing such things as the files the process is currently accessing, how it responds to
exceptions, to external stimuli, etc.
The processor abstraction is often called a thread. In “traditional” Unix programs,
processes have only one thread, so we’ll use the word process to include the single thread
running inside of it. Later in this course, when we cover multithreaded programming, we’ll be
more careful and use the word thread when we are discussing the processor abstraction.

II–3

The Unix Address Space

stack

dynamic
bss

data

text


A Unix process’s address space appears to be three regions of memory: a read-only text
region (containing executable code); a read-write region consisting of initialized data (simply
called data), uninitialized data (BSS—a directive from an ancient assembler (for the IBM 704
series of computers), standing for Block Started by Symbol and used to reserve space for
uninitialized storage), and a dynamic area; and a second read-write region containing the
process’s user stack (a standard Unix process contains only one thread of control).
The first area of read-write storage is often collectively called the data region. Its dynamic
portion grows in response to sbrk system calls. Most programmers do not use this system
call directly, but instead use the malloc and free library routines, which manage the dynamic
area and allocate memory when needed by in turn executing sbrk system calls.
The stack region grows implicitly: whenever an attempt is made to reference beyond the
current end of stack, the stack is implicitly grown to the new reference. (There are system-
wide and per-process limits on the maximum data and stack sizes of processes.)

II–4

Creating a Process: Before

fork( )

parent process


The only way to create a new process is to use the fork system call.

II–5

Creating a Process: After

fork( ) fork( )
// returns p // returns 0

parent process child process
(pid = p)

By executing fork the parent process creates an almost exact clone of itself which we call
the child process. This new process executes the same text as its parent, but contains a copy
of the data and a copy of the stack. This copying of the parent to create the child can be very
time-consuming. We discuss later how it is optimized.
Fork is a very unusual system call: one thread of control flows into it but two threads of
control flow out of it, each in a separate address space. From the parent’s point of view, fork
does very little: nothing happens to the parent except that fork returns the process ID (PID—
an integer) of the new process. The new process starts off life by returning from fork. It
always views fork as returning a zero.

II–6

Loading a New Image

args

prog’s bss
prog’s data
exec(prog, args)
prog’s text
Before After


Most of the time the purpose of creating a new process is to run a new (i.e., different)
program. Once a new process has been created, it can use the exec system call to load a new
program image into itself, replacing the prior contents of the process’s address space. Exec is
passed the name of a file containing a fully relocated program image (which might require
further linking via a runtime linker). The previous text region of the process is replaced with
the text of the program image. The data, BSS and dynamic areas of the process are “thrown
away” and replaced with the data and BSS of the program image. The contents of the
process’s stack are replaced with the arguments that are passed to the main procedure of the
program.

II–7

Fork/Exec Example

if (fork( ) == 0) {
// child process
–– set up I/O in child ––
execv(newprogram, parameters);
// load new image
// if we get here, there’s a problem
}
// parent process continues here


The slide shows a typical example of the use of the fork and exec system calls. The parent
process calls fork and two threads of control, one in the parent process and one in the child
process, return. If a thread sees that fork returns 0, then it must be in the child process.
What typically happens next is that this thread sets up the I/O descriptors of the new
process (which we discuss shortly) and then execs a new program. Here we use the execv
system call, which passes the arguments to the system in a vector (hence the v). If execv
succeeds, it does not return—there is nothing to return to! The process contains an entirely
new image. Thus if execv does return, there must have been an error (for example, the file
did not exist).
If fork returns a positive value, then this value is the process ID of the child process and
the thread executing this code must be in the parent process (i.e., it is the original thread
that called fork in the first place). It might continue on and do something totally independent
of the child process, or it might at some point wait until the child terminates.

II–8

Environment

• Set of pairs: <name, value>
– e.g. <term, xterm>, <TZ, US/Eastern>
– maintained in user-mode address space
- Accessed using library routines
• getenv
• putenv
• Initially established via exec:
– execve(char *file, char *argv[ ], char *envp[ ])
- supplied in third argument
– execv(char *file, char *argv[ ])
- Inherited from caller


A process’s environment is a set of names and values that’s maintained in its user-mode
address space. It’s established when the process image is set up via exec. As shown in the
slide, there’s a variant of exec, execve, with which one can supply the environment explicitly.
If execv is used, the environment is passed to the new image directly from the old.

II–9

System Calls

• Sole interface between user and kernel
• Implemented as library routines that execute
trap instructions to enter kernel
• Errors indicated by return of –1; error code is
in errno

if (write(fd, buffer, bufsize) == –1) {
// error!
printf("error %dn", errno);
// see perror
}


System calls, such as fork, execv, read, write, etc., are the only means for application
programs to communicate directly with the kernel: they form an API (application program
interface) to the kernel. When a program calls such a routine, it is actually placing a call to a
subroutine in a system library. The body of this subroutine contains a hardware-specific trap
instruction which transfers control and some parameters to the kernel. On return to this
library return, the kernel provides an indication of whether or not there was an error and
what the error was. The error indication is passed back to the original caller via the
functional return value of the library routine. If there was an error, a positive-integer code
identifying it is stored in the global variable errno. Rather than simply print this code out, as
shown in the slide, one might instead print out an informative error message. This can be
done via the perror routine.

II–10

System Calls

other stuff
kernel stack

kernel text

trap into kernel

write(fd, buf, len)


II–11

Multiple Processes
other stuff
kernel stack

other stuff
kernel stack

other stuff
kernel stack

other stuff
kernel stack

kernel text


II–12

The File Abstraction

• A file is a simple array of bytes
• Files are made larger by writing beyond their
current end
• Files are named by paths in a naming tree
• System calls on files are synchronous


As discussed three pages ago, most programs perform file I/O using library code layered on
top of kernel code. In this section we discuss just the kernel aspects of file I/O, looking at
the abstraction and the high-level aspects of how this abstraction is implemented.
The Unix file abstraction is very simple: files are simply arrays of bytes. Many systems have
special system calls to make a file larger. In Unix, you simply write where you’ve never
written before, and the file “magically” grows to the new size (within limits). The names of
files are equally straightforward—just the names labeling the path that leads to the file
within the directory tree. Finally, from the programmer’s point of view, all operations on files
appear to be synchronous—when an I/O system call returns, as far as the process is
concerned, the I/O has completed. (Things are different from the kernel’s point of view, as
discussed later.)

II–13

Directories
unix etc home pro dev

passwd motd twd

unix ...

slide1 slide2


Here is a portion of a Unix directory tree. The ovals represent files, the rectangles represent
directories (which are really just special cases of files).

II–14

Directory Representation
Component Name Inode Number
directory entry

. 1
.. 1
unix 117
etc 4
home 18
pro 36
dev 93


A directory consists of an array of pairs of component name and inode number, where the
latter identifies the target file’s inode to the operating system (an inode is data structure
maintained by the operating system that represents a file). Note that every directory contains
two special entries, “.” and “..”. The former refers to the directory itself, the latter to the
directory’s parent (in the case of the slide, the directory is the root directory and has no
parent, thus its “..” entry is a special case that refers to the directory itself).

II–15

Hard Links

twd

unix ...
image motd

slide1 slide2

% ln /unix /etc/image
# link system call


Here are two directory entries referring to the same file. This is done, via the shell, through
the ln command which creates a (hard) link to its first argument, giving it the name specified
by its second argument.
The shell’s “ln” command is implemented using the link system call.

II–16

Directory Representation
. 1
.. 1
unix 117
etc 4
home 18
pro 36
dev 93

. 4
.. 1
image 117
motd 33


Here are the (abbreviated) contents of both the root (/) and /etc directories, showing how
/unix and /etc/image are the same file. Note that if the directory entry /unix is deleted (via
the shell’s “rm” command), the file (represented by inode 117) continues to exist, since there
is still a directory entry referring to it. However if /etc/image is also deleted, then the file has
no more links and is removed. To implement this, the file’s inode contains a link count,
indicating the total number of directory entries that refer to it. A file is actually deleted only
when its inode’s link count reaches zero.
Note: suppose a file is open, i.e. is being used by some process, when its link count
becomes zero. Rather than delete the file while the process is using it, the file will continue to
exist until no process has it open. Thus the inode also contains a reference count indicating
how many times it is open: in particular, how many system file table entries point to it. A file
is deleted when and only when both the link count and this reference count become zero.
The shell’s “rm” command is implemented using the unlink system call.
Note that /etc/.. refers to the root directory.

II–17

Soft Links

twd

unix ... mylink
image twd

slide1 slide2 /unix
/home/twd
% ln –s /unix /home/twd/mylink
% ln –s /home/twd /etc/twd
# symlink system call

Differing from a hard link, a soft link (or symbolic link) is a special kind of file containing
the name of another file. When the kernel processes such a file, rather than simply retrieving
its contents, it makes use of the contents by replacing the portion of the directory path that
it has already followed with the contents of the soft-link file and then following the resulting
path. Thus referencing /home/twd/mylink results in the same file as referencing /unix.
Referencing /etc/twd/unix/slide1 results in the same file as referencing
/home/twd/unix/slide1.
The shell’s “ln” command with the “-s” flag is implemented using the symlink system call.

II–18

Working Directory

• Maintained in kernel (as an inode number) for
each process
– paths not starting from “/” start with the
working directory
– changed by use of the chdir system call
– displayed (via shell) using “pwd”
- how is this done?


The working directory is maintained (as the inode number of the directory) in the kernel for
each process. Whenever a process attempts to follow a path that doesn’t start with “/”, it
starts at its working directory (rather than at “/”).

II–19

Standard File Descriptors

main( ) {
char buf[BUFSIZE];
int n;
const char* note = "Write failedn";

while ((n = read(0, buf, sizeof(buf))) > 0)
if (write(1, buf, n) != n) {
(void)write(2, note, strlen(note));
exit(EXIT_FAILURE);
}
return(EXIT_SUCCESS);
}


The file descriptors 0, 1, and 2 are opened to access your terminal when you log in, and are
preserved across forks, unless redirected.

II–20

Representing an Open File (1)
file descriptor system file active inode
table table table
0
1
2
3

.

.

.

n–1
ref count f pointer
access inode

disk buffer cache


A process’s set of open files is represented by a data structure known as the file descriptor
table, which is used to map file descriptors, representing open files, to system file table
entries, each representing an open file. Each process has a separate file descriptor table, but
there is exactly one system file table for the entire system. Each active file (i.e., a file that is
open or otherwise being used) is represented by an inode that is entered in the active inode
table. Recently accessed blocks of data from the file are kept in the kernel’s data section in
what is known as the buffer cache—an area of memory reserved for buffering data from files.
Thus data from these blocks can be accessed again and again without wasting time going to
the disk. When a disk block is modified, the modifications are written in the buffer cache and
only later are written to disk. This allows the process doing the modification to proceed
without having to wait for the data to be written to disk.

II–21

table table table
0
1
2
3
fdrw
. 1 rw 0
.

.

n–1
ref count f pointer
access inode
fdrw = open("x", O_RDWR);

disk buffer cache


This slide shows what happens when a file is opened.
The lowest-numbered available file descriptor is allocated from the file descriptor table.
Next, an entry in the system file table is allocated, and the file descriptor table entry is set to
point to the system file table entry. An inode-table entry for the file is allocated (or found if it
already exists) and the system file table entry is set to point to it. Additional fields of the
system file table entry are initialized, including:
• a reference count
• each file-descriptor-table entry pointing to it contribute one to the reference
count
• the entry is freed when the reference count goes to zero
• the allowed access (i.e., how the file was opened—read-only, read-write, etc.)
• the file pointer (i.e., the location within the file at which the next transfer will start)
• after every read and write system call the file pointer is incremented by the
number of bytes actually transferred, thus facilitating sequential I/O

II–22

Global View

other stuff

kernel stack

other stuff

kernel stack

other stuff

kernel stack

other stuff

kernel stack

kernel text


II–23

Allocation of File Descriptors

• Whenever a process requests a new file
descriptor, the lowest numbered file
descriptor not already associated with an
open file is selected; thus
#include <fcntl.h>
#include <unistd.h>

close(0);
fd = open("file", O_RDONLY);

– will always associate file with file descriptor 0
(assuming that the open succeeds)


One can depend on always getting the lowest available file descriptor.

II–24

Reading From a File
#include <sys/types.h>
#include <unistd.h>
ssize_t read(int fd, void *buffer, size_t n)

– read up to n bytes of data into buffer
– returns number of bytes transferred
– returns 0 on end of file
– returns –1 on error


The read system call tells the system to read up to n bytes of data into the area of memory
pointed to by buffer from the file referred to by fd. It returns the number of bytes actually
transferred, or, as usual, –1 if there was an error.
Read will transfer fewer bytes than specified if:
• the number of bytes left in the file was less than n
• the read request was interrupted by a signal after some of the bytes were
transferred
• the file is a pipe, FIFO, or special device with less than n bytes immediately
available for reading
It transfers as many bytes as are present up to the given maximum; if it returns a zero,
that means the end of the file has been reached.

II–25

Writing To a File
#include <unistd.h>
ssize_t write(int fd, void *buffer, size_t n)

– write up to n bytes of data from buffer
– returns number of bytes transferred
– returns –1 on error


The write system call tells the system to write up to n bytes of data from the area of
memory pointed to by buffer into the file referred to by fd. It returns the number of bytes
actually transferred, or, as usual, –1 if there was an error.
If write transfers fewer bytes than specified, then something caused the transfer to stop
prematurely:
• a signal interrupted the system call
• this is pretty complicated and we’ll discuss it in more detail later. However,
this would happen if the requested size was large enough so that the write
was split by the kernel into a number of segments, at least one segment was
written (otherwise the call would have returned with an error and errno set
to EINTR), and the signal occurred while a subsequent segment was being
written
• a filesize limit was reached
• an I/O error occurred
The response in all these cases should be to attempt to rewrite those bytes that were not
transferred; if a signal had interrupted the previous try, then the next try will succeed
(unless again interrupted by a signal); if a filesize limit had been reached or an I/O error has
occurred, this next write will yield the appropriate error code.

II–26

Example
main( ) {
char buf[BUFSIZE]; int nread;
const char* note = "Write failedn";
while ((nread = read(0, buf, sizeof(buf))) > 0) {
int bytes_left = nread; int bpos = 0;
while ((n = write(1, &buf[bpos], bytes_left)) < bytes_left) {
if (n == –1) {
write(2, note, strlen(note));
exit(EXIT_FAILURE);
}
bytes_left −= n; bpos += n;
}
}
return(EXIT_SUCCESS);
}


Here we copy from the program’s standard input to its standard output, but take
advantage of what we’ve learned about the behavior of write. In particular, since it’s not
guaranteed that write will transfer all the bytes requested, we must supply code to make sure
that all the data does get transferred. Thus, after each call to write (except, for reasons of
space on the slide, when we write to file descriptor 2), we check to see how much data was
written and call write again, if necessary, to handle the unwritten data.

II–27

Random Access

#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence)
– sets the file pointer for fd:
if whence is SEEK_SET, the pointer is set to
offset bytes;
if whence is SEEK_CUR, the pointer is set to its
current value plus offset bytes;
if whence is SEEK_END, the pointer is set to the
size of the file plus offset bytes
– it returns the (possibly) updated value of the file
pointer relative to the beginning of the file. Thus,
n = lseek(fd, (off_t)0, SEEK_CUR);
returns the current value of the file pointer for fd

To effect random access to files (i.e., access files other than sequentially), you first set the
file pointer, then perform a read or write. Setting the file pointer is done with the lseek
system call.

II–28

lseek Example

• Example: printing a text file backwards:
fd = open("textfile", O_RDONLY);
/* go to last char in file */
fptr = lseek(fd, (off_t)–1, SEEK_END);
while (fptr != –1) {
read(fd, buf, 1);
write(1, buf, 1);
fptr = lseek(fd, (off_t)–2, SEEK_CUR);
}


This example prints the contents of a file backwards. Note what’s happening in the while
statement: it continues as long as fptr does not return –1, a value it will return if the call to
lseek fails. Our intent is that the last successful call to lseek sets the file pointer to 0. The
next call, which attempts to set it to –1, will fail, thus causing lseek to return –1 (recall that
failed system calls always return –1).

II–29

table table table
0
1
2
3
1 r 10
fdrw
. 1 rw 20
.
fdr
.

n–1
ref count f pointer
access inode
fdr = open("x", O_RDONLY);
write(fdrw, buf, 20);
read(fdr, buf2, 10);
disk buffer cache


In this slide we see the effect of two opens of the same file within the same process.

II–30

Multiple Descriptors; One File

• How are standard file descriptors set up?
– suppose 1 and 2 are opened separately

while ((n = read(0, buf, sizeof(buf))) > 0)
if (write(1, buf, n) != n) {
(void)write(2, note, strlen(note));
exit(EXIT_FAILURE);
}

– error message clobbers data bytes!


By convention, file descriptors 1 and 2 are used for processes’ normal and diagnostic
output. Normally they both refer to the display, and thus diagnostic output is intermingled
with normal output. Suppose, however, one wanted to redirect both file descriptors so that
all output, normal and diagnostic, was sent to a file. One might open this file twice, once as
file descriptor 1 and again as file descriptor 2, thereby creating two system file table entries.
As file descriptor 1 receives output, the offset field of its file table entry advances with each
write. After 1000 bytes have been written (sequentially), the offset field is set to 1000,
representing the current end-of-file.
If at this point a diagnostic message is written to file descriptor 2, it will start at the
beginning of the file, overwriting the data already there, since file descriptor 2’s file table
entry’s offset is still at 0. This outcome is certainly not desirable.

II–31

dup System Calls

• dup returns a new file descriptor referring to
the same file as its argument

int dup(int fd)

• dup2 is similar, but it allows you to specify
the new file descriptor

int dup2(int oldfd, int newfd)


We can use one of the dup system calls to solve this problem. dup obeys our rule of always
allocating the lowest available file descriptor. However, with dup2, one can specify, via the
second argument, which file descriptor is allocated. If the second argument is the file
descriptor of an open file, the file is first closed, then associated with the file of the first
argument.

II–32

dup Example

/* redirect stdout and stderr to same file */
/* assumes file descriptor 0 is in use */
close(1);
open("file", O_WRONLY|O_CREAT, 0666);
close(2);
dup(1);

/* alternatively, replace last two lines with: */
dup2(1, 2);


Here we see how to use dup and dup2 to set file descriptors 1 and 2 to refer to the same
system file-table entry. Note the extra argument to open. We’ve given open the O_CREAT flag,
which tells the system that if the file does not exist, it should create it. The third argument
helps to specify the access permissions assigned to the file if it’s created by this call. We
discuss this in detail in a few slides.

II–33

table table table
0
1
2
3
fdrw
. 2 rw 20
fdrw2
.

.

n–1
ref count f pointer
access inode
fdrw2 = dup(fdrw);
write(fdrw, buf, 20);

disk buffer cache


The dup system call causes two file descriptors to refer to the same file table entry and
hence share the offset.

II–34

table table table
0
1
2
3
2 r 10
fdrw
. 4 rw 20
fdrw2
.
fdr
.

n–1
ref count f pointer
access inode

fork( )

disk buffer cache


If our process executes a fork system call, creating a child process, the child is given a file-
descriptor table that’s a copy of its parent’s. Of course, the reference counts on the system-
file-table entries are increased appropriately.

II–35

I/O Redirection

% who > file &

if (fork( ) == 0) {
char *args[ ] = {"who", 0};
close(1);
open("file", O_WRONLY|O_TRUNC, 0666);
execv("who", args);
printf("you screwed upn");
exit(1);
}


This is an example of what a shell might do to handle I/O redirection: it first creates a new
process in which to run a command (“who”, in this case). In the new process it closes file
descriptor 1 (standard output—to which normal output is written). It then opens “file” (the
arguments indicate that “file” is only to be written to, that any prior contents are erased, and
that if “file” didn’t already exist, it will be created with read and write permission for all;
assuming that file descriptor 0 is not available (it’s assigned to standard input), file
descriptor 1 will be assigned to “file”. Assuming that execv succeeds, when “who” runs, its
output is written to “file”.
Note that the parent process does not wait for its child to terminate; it goes on to execute
further commands. (This behavior occurs because we’ve placed an “&” at the end of the
command line.)
Note the args argument to execv: By convention, the first argument to each command is
the name of the command (“who” in this case). To indicate that there are no further
arguments, a zero is supplied.
Note that we aren’t checking for errors: this is only because doing so would cause the
resulting code not to fit in the slide. You should always check for errors.

II–36

More I/O Redirection

% who >& file &

if (fork( ) == 0) {
char *args[ ] = {"who", 0};
close(1);
close(2);
open("file", O_WRONLY|O_TRUNC|O_CREAT, 0666);
dup(1);
execv("who", args);
…
}


In this example (using csh syntax), we run “who” with both standard output (file descriptor
1) and standard error (file descriptor 2) redirected to “file”. It’s done in such a way that writes
to either standard output or standard error go the current end of the file.

II–37

Waiting For Termination

% who

if ((pid = fork( )) == 0) {
...
execv("who", args);
...
}
while (pid != wait(0))
;


Here we execute a shell command and wait for it to terminate before going on to the next
command (i.e., what’s normally done). The wait system call causes the caller to wait (i.e., not
execute) until one of its children terminate and then returns the process ID of the process
that’s terminated (if its argument is nonzero, it points to an area of memory at which will be
written status information about the child, such as the argument it passed to exit). Here our
first process waits until the process it created to run “who” terminates.

II–38

Open
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *path, int options [, mode_t mode])

– options
- O_RDONLY open for reading only
- O_WRONLY open for writing only
- O_RDWR open for reading and writing
- O_APPEND set the file offset to end of file prior to each
write
- O_CREAT if the file does not exist, then create it,
setting its mode to mode adjusted by umask
- O_EXCL if O_EXCL and O_CREAT are set, then
open fails if the file exists
- O_TRUNC delete any previous contents of the file
- O_NONBLOCK don’t wait if I/O can’t be done immediately

Here’s a partial list of the options available as the second argument to open. (Further
options are often available, but they depend on the version of Unix.) Note that the first three
options are mutually exclusive: one, and only one, must be supplied. We discuss the third
argument to open, mode, shortly.

II–39

File Access Permissions

• Who’s allowed to do what?
– who
- user (owner)
- group
- others (rest of the world)
– what
- read
- write
- execute


Each file has associated with it a set of access permissions indicating, for each of three
classes of principals, what sorts of operations on the file are allowed. The three classes are
the owner of the file, known as user, the group owner of the file, known simply as group, and
everyone else, known as others. The operations are grouped into the classes read, write, and
execute, with their obvious meanings. The access permissions apply to directories as well as
to ordinary files, though the meaning of execute for directories is not quite so obvious: one
must have execute permission for a directory file in order to follow a path through it.
The system, when checking permissions, first determines the smallest class of principals
the requester belongs to: user (smallest), group, or others (largest). It then, within the chosen
class, checks for appropriate permissions.

II–40

Permissions Example
% ls -lR
.:
total 2
drwxr-x--x 2 tom adm 1024 Dec 17 13:34 A
drwxr----- 2 tom adm 1024 Dec 17 13:34 B

./A:
total 1
-rw-rw-rw- 1 tom adm 593 Dec 17 13:34 x

./B:
total 2
-r--rw-rw- 1 tom adm 446 Dec 17 13:34 x
-rw----rw- 1 trina adm 446 Dec 17 13:45 y

In the current directory are two subdirectories, A and B, with access permissions as shown
in the slide. Note that the permissions are given as a string of characters: the first character
indicates whether or not the file is a directory, the next three characters are the permissions
for the owner of the file, the next three are the permissions for the members of the file’s
group’s members, and the last three are the permissions for the rest of the world.
Quiz: the users tom and trina are members of the adm group; andy is not.
May andy list the contents of directory A?
May andy read A/x?
May trina list the contents of directory B?
May trina modify B/y?
May tom modify B/x?
May tom read B/y?

II–41

Setting File Permissions
#include <sys/stat.h>
int chmod(const char *path, mode_t mode)

– sets the file permissions of the given file to those
specified in mode
– only the owner of a file and the superuser may
change its permissions
– nine combinable possibilities for mode
(read/write/execute for user, group, and others)
- S_IRUSR (0400), S_IWUSR (0200), S_IXUSR (0100)
- S_IRGRP (040), S_IWGRP (020), S_IXGRP (010)
- S_IROTH (04), S_IWOTH (02), S_IXOTH (01)


The chmod system call (and the similar chmod shell command) are used to change the
permissions of a file. Note that the symbolic names for the permissions are rather
cumbersome; what is often done is to use their numerical equivalents instead. Thus the
combination of read/write/execute permission for the user (0700), read/execute permission
for the group (050), and execute-only permission for others (01) can be specified simply as
0751.

II–42

Creating a File

• Use either open or creat
– open (const char *pathname, int flags, mode_t mode)
- flags must include O_CREAT
– creat(const char *pathname, mode_t mode)
- open is preferred
• The mode parameter helps specify the permissions of
the newly created file
– permissions = mode & ~umask


Originally in Unix one created a file only by using the creat system call. A separate
O_CREAT flag was later given to open so that it, too, can be used to create files. The creat
system call fails if the file already exists. For open, what happens if the file already exists
depends upon the use of the flags O_EXCL and O_TRUNC. If O_EXCL is included with the
flags (e.g., open(“newfile”, O_CREAT|O_EXCL, 0777)), then, as with creat, the call fails if the
file exists. Otherwise, the call succeeds and the (existing) file is opened. If O_TRUNC is
included in the flags, then, if the file exists, its previous contents are eliminated and the file
(whose size is now zero) is opened.
When a file is created by either open or creat, the file’s initial access permissions are the
bitwise AND of the mode parameter and the complement of the process’s umask (explained
in the next slide).

II–43

Umask

• Standard programs create files with
“maximum needed permissions” as mode
– compilers: 0777
– editors: 0666
• Per-process parameter, umask, used to turn
off undesired permission bits
– e.g., turn off all permissions for others, write
permission for group: set umask to 027
- compilers: permissions = 0777 & ~(027) =
0750
- editors: permissions = 0666 & ~(027) = 0640
– set with umask system call or (usually) shell
command

The umask (often called the “creation mask”) allows programs to have wired into them a
standard set of maximum needed permissions as their file-creation modes. Users then have,
as part of their environment (via a per-process parameter that is inherited by child processes
from their parents), a limit on the permissions given to each of the classes of security
principals. This limit (the umask) looks like the 9-bit permissions vector associated with each
file, but each one-bit indicates that the corresponding permission is not to be granted. Thus,
if umask is set to 022, then, whenever a file is created, regardless of the settings of the mode
bits in the open or creat call, read permission for group and others is not to be included with
the file’s access permissions.
You can determine the current setting of umask by executing the umask shell command
without any arguments.

II–44

02unixintro

More Related Content

02unixintro