Introduction To UNIX System Programming: by Armin R. Mikler
Introduction To UNIX System Programming: by Armin R. Mikler
By Armin R. Mikler
Overview
Buffered vs. non-buffered I/O Basic System Calls Whats a process anyway? The fork() System Call Coordinating Processes (wait, exit, etc) Pipes
Processes
Inter-Process Communication
Login
Basic Commands:
username password
The Command Interpreter A running program UNIX Commands are (often small) programs. What else does the Shell do?
who am I pwd who what ps (*) finger ls mkdir rm (-i -f -r) touch cat (note: there is no dog) grep
Editors
emacs vi joe sed and others gcc g++ perl java etc.
use the manual pages to get information about a specific command or system call. The UNIX manual is divided into sections. Careful!! The same system call can (and does) appear in different sections with different context. Use man -si subject to refer to section i.
Compilers/Interpreters
man pages
man -k keyword(s)
prints the header line of manual pages that contain the keyword(s) same as man -k Which manual section contains UNIX user commands? Which manual section contains UNIX system calls? What is the difference between commands and system calls?
apropos keyword(s)
Questions:
Files
UNIX Input/Output operations are based on the concept of files. Files are an abstraction of specific I/O devices. A very small set of system calls provide the primitives that give direct access to I/O facilities of the UNIX kernel. Most I/O operations rely on the use of these primitives. We must remember that the basic I/O primitives are system calls, executed by the kernel. What does that mean to us as programmers???
open: Opens a file for reading or writing, or creates an empty file. create: Creates an empty file close: Closes a previously opened file read: Extracts information from a file write: Places information into a file lseek: Moves to a specific byte in the file unlink: Removes a file remove: Removes a file
A rudimentary example:
#include <fcntl.h> /* controls file attributes */ #include<unistd.h> /* defines symbolic constants */ main() { int fd; /* a file descriptor */ ssize_t nread; /* number of bytes read */ char buf[1024]; /* data buffer */ /* open the file data for reading */ fd = open(data, O_RDONLY); /* read in the data */ nread = read(fd, buf, 1024); /* close the file */ close(fd); }
The system can execute in user mode or kernel mode! Memory is divided into user space and kernel space! What happens when we write to a file?
the write call forces a context switch to the system. What?? the system copies the specified number of bytes from user space into kernel space. (into mbufs) the system wakes up the device driver to write these mbufs to the physical device (if the file-system is in synchronous mode). the system selects a new process to run. finally, control is returned to the process that executed the write call.
Un-buffered I/O
Every read and write is executed by the kernel. Hence, every read and write will cause a context switch in order for the system routines to execute. Why do we suffer performance loss? How can we reduce the loss of performance? ==> We could try to move as much data as possible with each system call. How can we measure the performance?
Buffered I/O
explicit - collect as many bytes as you can before writing to file and read more than a single byte at a time. However, use the basic UNIX I/O primitives
Careful !! Your program my behave differently on different systems. Here, the programmer is explicitly controlling the buffer-size
implicit - use the Stream facility provided by <stdio.h> FILE *fd, fopen, fprintf, fflush, fclose, ... etc. a FILE structure contains a buffer (in user space) that is usually the size of the disk blocking factor (512 or 1024)
File Locking
Consider the following problem: Processes can obtain a unique integer by reading from a file. The file contains a single integer (at all times), which must be incremented by the process that executes a read. Since multiple processes can compete for the file (a unique integer), we must make sure that the file access is synchronized. HOW??
lockf()
lockf() is a C-Library function for locking records of a file. Its prototype is int lockf( int fd, int func, long size); func-parameters are:
F_ULOCK: 0 (unlock a locked section) F_LOCK: 1 (locks a section) F_TLOCK: 2 (Test and Lock a section) F_TEST: 3 (Test section for Locks) see the UNIX manual pages!!
If we rewind the file before locking AND use a size of 0L as the corresponding size parameter, the entire file is being locked. lseek(fd, 0L, 0) can be used to rewind the file (fd) to the beginning.
flock()
flock() is a UNIX system call to apply or remove an advisory lock to an open file The locking is only on an advisory basis (not absolute) Prototype: int flock(fd, operation) see manual pages
UNIX Processes
A program that has started is manifested in the context of a process. A process in the system is represented:
Process Identification Elements Process State Information Process Control Information User Stack Private User Address Space, Programs and Data Shared Address Space
Process Information, Process State Information, and Process Control Information constitute the PCB. All Process State Information is stored in the Process Status Word (PSW). All information needed by the OS to manage the process is contained in the PCB. A UNIX process can be in a variety of states:
User running: Process executes in user mode Kernel running: Process executes in kernel mode Ready to run in memory: process is waiting to be scheduled Asleep in memory: waiting for an event Ready to run swapped: ready to run but requires swapping in Preempted: Process is returning from kernel to user-mode but the system has scheduled another process instead Created: Process is newly created and not ready to run Zombie: Process no longer exists, but it leaves a record for its parent process to collect.
In UNIX, a new process is created by means of the fork() - system call. The OS performs the following functions:
It allocates a slot in the process table for the new process It assigns a unique ID to the new process It makes a copy of process image of the parent (except shared memory) It assigns the child process to the Ready to Run State It returns the ID of the child to the parent process, and 0 to the child.
Note, the fork() call actually is called once but returns twice - namely in the parent and the child process.
Fork()
Pid_t fork(void) is the prototype of the fork() call. Remember that fork() returns twice
in the newly created (child) process with return value 0 in the calling process (parent) with return value = pid of the new process. A negative return value (-1) indicates that the call has failed
Different return values are the key for distinguishing parent process from child process! The child process is an exact copy of the parent, yet, it is a copy i.e. an identical but separate process image.
A fork() Example
#include <unistd.h> main() { pid_t pid /* process id */ printf(just one process before the fork()\n); pid = fork(); if(pid == 0) printf(I am the child process\n); else if(pid > 0) printf(I am the parent process\n); else printf(DANGER Mr. Robinson - the fork() has failed\n) }
Its prototype is: void exit(int status), where status is used as the return value of the process. exit(i) can be used to announce success and failure to the calling process.
The wait() call is used to temporarily suspend the parent process until one of the child processes terminates.
The prototype is: pid_t wait(int *status), where status is a pointer to an integer to which the childs status information is being assigned. wait() will return with a pid when any one of the children terminates or with -1 when no children exist.
more coordination
To wait for a particular child process to terminate, we can use the waitpid() call.
getpid() returns the process id getppid() returns the parents process id getuid() returns the users id use the manual pages for more id information.
A child process whose parent has terminated is referred to as orphan. When a child exits when its parent is not currently executing a wait(), a zombie emerges.
A zombie is not really a process as it has terminated but the system retains an entry in the process table for the non-existing child process. A zombie is put to rest when the parent finally executes a wait().
When a parent terminates, orphans and zombies are adopted by the init process (prosess-id -1) of the system.
Inter-Process Communication
In addition to synchronizing different processes, we may want to be able to communicate data between them. Note, that we are dealing with processes in the same machine. Hence, we can use shared memory segments to send messages between processes. One of the way to establish a communication channel between processes with a parent-child relationship is through the concept of pipes. We can use the pipe() system call to create a pipe.
UNIX Pipes
At the UNIX command level, we can use pipes to channel the output of one command into another
ls | wc prototype: int pipe(int filedes[2]) filedes[0] will be a file descriptor open for reading filedes[1] will be a file descriptor open for writing the return value of pipe() is -1 if it could not successfully open the file descriptors.
Overview
What is IPC ? How can we achieve IPC? The pipe at the shell level! The pipe between processes! The pipe() system call! closing the pipe! Programming with pipes.
FIFOs - named Pipes FIFOs vs. regular pipes Steps for using a FIFO
size of a pipe Non-blocking read() and write() The select() system call
What is IPC
Inter-Process Communication allows different processes to exchange information and synchronize their actions. Why do processes have to synchronize their actions? We need to distinguish how processes may be related:
Parent / Child relationship i.e., the child process was created by the parent Processes that are not related yet execute on the same host Processes that are not related and execute on different hosts
some similarities
how can we exchange information between the main() function and any of the other functions func()? how do we produce side effects in func() that are visible in main()? what do we need to do to guarantee that func() accesses the same variables as main()?
The trick is to either allow different functions to work with identical memory locations or to create a communication channel in the form of parameter lists or return values.
OS - Kernel
shared resources
OS-Kernel
OS-Kernel
Network
Processes need to use some facility that they have in common. Both processes must speak the same IPClanguage. What facilities can two or more processes share when they reside on the same host?
Memory File System Space Communication Facilities Common communication protocol provided by the OS (signals)
In addition to synchronizing different processes, we may want to be able to communicate data between them. For the time, we are dealing with processes in the same machine. Hence, we can use shared memory segments to send messages between processes. A pipe is a one-way communication channel which can be used to connect two related processes
Pipes contd
Unix provides a construct called pipe, a communication channel through which two processes can exchange information. One of the way to establish a communication channel between processes with a parent-child relationship is through the concept of pipes. Why do the processes need to be related?
At the UNIX command level, we can use pipes to channel the output of one command into another
ls | wc the shell actually creates a child process, uses exec() to execute the corresponding program (i.e., ls and wc)
How does the shell implement the pipe-command i.e., ls|wc ?? How would you implement the ability to pipe?? Discuss....
prototype: int pipe(int filedes[2]) filedes[0] will be a file descriptor open for reading filedes[1] will be a file descriptor open for writing the return value of pipe() is -1 if it could not successfully open the file descriptors.
example
#include .... main() { int p[2], pid; char buf[64]; if(pipe(p) == -1) { perror(pipe call); exit(1); } /* at this point we have a pipe p with p[0] opened for reading and p[1] opened for writing - just like a file */ write(p[1], hi there, 9); read(p[0], buf, 9); printf(%s\n, buf); }
A pipe to itself ?
Process write()
read()
the child was created by a fork() call that was executed by the parent. the child process is an image of the parent process ---> all the file descriptors that are opened by the parent are now available in the child. The file descriptors refer to the same I/O entity, in this case a pipe. The pipe is inherited by the child and may be passed on to the grand-children by the child process or other children by the parent. This can easily lead to a chaotic conglomeration of pipes throughout our system of processes
read()
read()
The fix
Child Process write() write()
read()
read()
The file descriptors associated with a pipe can be closed with the close(fd) system call Some Rules:
A read() on a pipe will generally block until either data appears or all processes have closed the write file descriptor of the pipe! Closing the write fd while other processes are writing to the pipe does not have any effect! Closing the read fd while others are still reading will not have any effect! Closing the read while others are still writing will cause an error to be returned by the write and a signal is sent by the kernel (Broken Pipe!!)
In most cases, we only transfer small amounts of data through a pipe - but we for some applications we may want to send and receive large data blocks. A valid question is: How much data will fit into a pipe ?? Why do we care? Remember - a write() will block until the requested number of bytes have been written. The POSIX standard specifies a minimum size of 512 bytes!