Interlude: Process API: 5.1 The Fork System Call
Interlude: Process API: 5.1 The Fork System Call
A SIDE : I NTERLUDES
Interludes will cover more practical aspects of systems, including a par-
ticular focus on operating system APIs and how to use them. If you don’t
like practical things, you could skip these interludes. But you should like
practical things, because, well, they are generally useful in real life; com-
panies, for example, don’t usually hire you for your non-practical skills.
1
2 I NTERLUDE : P ROCESS API
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4
5 int main(int argc, char *argv[]) {
6 printf("hello world (pid:%d)\n", (int) getpid());
7 int rc = fork();
8 if (rc < 0) { // fork failed; exit
9 fprintf(stderr, "fork failed\n");
10 exit(1);
11 } else if (rc == 0) { // child (new process)
12 printf("hello, I am child (pid:%d)\n", (int) getpid());
13 } else { // parent goes down this path (main)
14 printf("hello, I am parent of %d (pid:%d)\n",
15 rc, (int) getpid());
16 }
17 return 0;
18 }
19
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
I NTERLUDE : P ROCESS API 3
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <sys/wait.h>
5
6 int main(int argc, char *argv[]) {
7 printf("hello world (pid:%d)\n", (int) getpid());
8 int rc = fork();
9 if (rc < 0) { // fork failed; exit
10 fprintf(stderr, "fork failed\n");
11 exit(1);
12 } else if (rc == 0) { // child (new process)
13 printf("hello, I am child (pid:%d)\n", (int) getpid());
14 } else { // parent goes down this path (main)
15 int rc_wait = wait(NULL);
16 printf("hello, I am parent of %d (rc_wait:%d) (pid:%d)\n",
17 rc, rc_wait, (int) getpid());
18 }
19 return 0;
20 }
21
Figure 5.2: Calling fork() And wait() (p2.c)
You might also have noticed: the output (of p1.c) is not deterministic.
When the child process is created, there are now two active processes in
the system that we care about: the parent and the child. Assuming we
are running on a system with a single CPU (for simplicity), then either
the child or the parent might run at that point. In our example (above),
the parent did and thus printed out its message first. In other cases, the
opposite might happen, as we show in this output trace:
prompt> ./p1
hello world (pid:29146)
hello, I am child (pid:29147)
hello, I am parent of 29147 (pid:29146)
prompt>
The CPU scheduler, a topic we’ll discuss in great detail soon, deter-
mines which process runs at a given moment in time; because the sched-
uler is complex, we cannot usually make strong assumptions about what
it will choose to do, and hence which process will run first. This non-
determinism, as it turns out, leads to some interesting problems, par-
ticularly in multi-threaded programs; hence, we’ll see a lot more non-
determinism when we study concurrency in the second part of the book.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
4 I NTERLUDE : P ROCESS API
In this example (p2.c), the parent process calls wait() to delay its
execution until the child finishes executing. When the child is done,
wait() returns to the parent.
Adding a wait() call to the code above makes the output determin-
istic. Can you see why? Go ahead, think about it.
(waiting for you to think .... and done)
Now that you have thought a bit, here is the output:
prompt> ./p2
hello world (pid:29266)
hello, I am child (pid:29267)
hello, I am parent of 29267 (rc_wait:29267) (pid:29266)
prompt>
With this code, we now know that the child will always print first.
Why do we know that? Well, it might simply run first, as before, and
thus print before the parent. However, if the parent does happen to run
first, it will immediately call wait(); this system call won’t return until
the child has run and exited2 . Thus, even when the parent runs first, it
politely waits for the child to finish running, then wait() returns, and
then the parent prints its message.
prompt> ./p3
hello world (pid:29383)
hello, I am child (pid:29384)
29 107 1030 p3.c
hello, I am parent of 29384 (rc_wait:29384) (pid:29383)
prompt>
2
There are a few cases where wait() returns before the child exits; read the man page
for more details, as always. And beware of any absolute and unqualified statements this book
makes, such as “the child will always print first” or “U NIX is the best thing in the world, even
better than ice cream.”
3
On Linux, there are six variants of exec(): execl, execlp(), execle(),
execv(), execvp(), and execvpe(). Read the man pages to learn more.
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
I NTERLUDE : P ROCESS API 5
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <string.h>
5 #include <sys/wait.h>
6
7 int main(int argc, char *argv[]) {
8 printf("hello world (pid:%d)\n", (int) getpid());
9 int rc = fork();
10 if (rc < 0) { // fork failed; exit
11 fprintf(stderr, "fork failed\n");
12 exit(1);
13 } else if (rc == 0) { // child (new process)
14 printf("hello, I am child (pid:%d)\n", (int) getpid());
15 char *myargs[3];
16 myargs[0] = strdup("wc"); // program: "wc" (word count)
17 myargs[1] = strdup("p3.c"); // argument: file to count
18 myargs[2] = NULL; // marks end of array
19 execvp(myargs[0], myargs); // runs word count
20 printf("this shouldn’t print out");
21 } else { // parent goes down this path (main)
22 int rc_wait = wait(NULL);
23 printf("hello, I am parent of %d (rc_wait:%d) (pid:%d)\n",
24 rc, rc_wait, (int) getpid());
25 }
26 return 0;
27 }
28
Figure 5.3: Calling fork(), wait(), And exec() (p3.c)
The fork() system call is strange; its partner in crime, exec(), is not
so normal either. What it does: given the name of an executable (e.g., wc),
and some arguments (e.g., p3.c), it loads code (and static data) from that
executable and overwrites its current code segment (and current static
data) with it; the heap and stack and other parts of the memory space of
the program are re-initialized. Then the OS simply runs that program,
passing in any arguments as the argv of that process. Thus, it does not
create a new process; rather, it transforms the currently running program
(formerly p3) into a different running program (wc). After the exec()
in the child, it is almost as if p3.c never ran; a successful call to exec()
never returns.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
6 I NTERLUDE : P ROCESS API
The shell is just a user program4 . It shows you a prompt and then
waits for you to type something into it. You then type a command (i.e.,
the name of an executable program, plus any arguments) into it; in most
cases, the shell then figures out where in the file system the executable
resides, calls fork() to create a new child process to run the command,
calls some variant of exec() to run the command, and then waits for the
command to complete by calling wait(). When the child completes, the
shell returns from wait() and prints out a prompt again, ready for your
next command.
The separation of fork() and exec() allows the shell to do a whole
bunch of useful things rather easily. For example:
prompt> wc p3.c > newfile.txt
In the example above, the output of the program wc is redirected into
the output file newfile.txt (the greater-than sign is how said redirec-
tion is indicated). The way the shell accomplishes this task is quite sim-
ple: when the child is created, before calling exec(), the shell closes
standard output and opens the file newfile.txt. By doing so, any out-
put from the soon-to-be-running program wc are sent to the file instead
of the screen.
Figure 5.4 (page 7) shows a program that does exactly this. The reason
this redirection works is due to an assumption about how the operating
system manages file descriptors. Specifically, U NIX systems start looking
for free file descriptors at zero. In this case, STDOUT FILENO will be the
first available one and thus get assigned when open() is called. Subse-
quent writes by the child process to the standard output file descriptor,
for example by routines such as printf(), will then be routed transpar-
ently to the newly-opened file instead of the screen.
Here is the output of running the p4.c program:
prompt> ./p4
prompt> cat p4.output
32 109 846 p4.c
prompt>
4
And there are lots of shells; tcsh, bash, and zsh to name a few. You should pick one,
read its man pages, and learn more about it; all U NIX experts do.
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
I NTERLUDE : P ROCESS API 7
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <string.h>
5 #include <fcntl.h>
6 #include <sys/wait.h>
7
8 int main(int argc, char *argv[]) {
9 int rc = fork();
10 if (rc < 0) { // fork failed; exit
11 fprintf(stderr, "fork failed\n");
12 exit(1);
13 } else if (rc == 0) { // child: redirect standard output to a file
14 close(STDOUT_FILENO);
15 open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
16
17 // now exec "wc"...
18 char *myargs[3];
19 myargs[0] = strdup("wc"); // program: "wc" (word count)
20 myargs[1] = strdup("p4.c"); // argument: file to count
21 myargs[2] = NULL; // marks end of array
22 execvp(myargs[0], myargs); // runs word count
23 } else { // parent goes down this path (main)
24 int rc_wait = wait(NULL);
25 }
26 return 0;
27 }
Figure 5.4: All Of The Above With Redirection (p4.c)
You’ll notice (at least) two interesting tidbits about this output. First,
when p4 is run, it looks as if nothing has happened; the shell just prints
the command prompt and is immediately ready for your next command.
However, that is not the case; the program p4 did indeed call fork() to
create a new child, and then run the wc program via a call to execvp().
You don’t see any output printed to the screen because it has been redi-
rected to the file p4.output. Second, you can see that when we cat the
output file, all the expected output from running wc is found. Cool, right?
U NIX pipes are implemented in a similar way, but with the pipe()
system call. In this case, the output of one process is connected to an in-
kernel pipe (i.e., queue), and the input of another process is connected
to that same pipe; thus, the output of one process seamlessly is used as
input to the next, and long and useful chains of commands can be strung
together. As a simple example, consider looking for a word in a file, and
then counting how many times said word occurs; with pipes and the util-
ities grep and wc, it is easy — just type grep -o foo file | wc -l
into the command prompt and marvel at the result.
Finally, while we just have sketched out the process API at a high level,
there is a lot more detail about these calls out there to be learned and
digested; we’ll learn more, for example, about file descriptors when we
talk about file systems in the third part of the book. For now, suffice it
to say that the fork()/exec() combination is a powerful way to create
and manipulate processes.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
8 I NTERLUDE : P ROCESS API
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
I NTERLUDE : P ROCESS API 9
cesses, and exercise full control over them (pause them, kill them, etc.).
Users generally can only control their own processes; it is the job of the
operating system to parcel out resources (such as CPU, memory, and disk)
to each user (and their processes) to meet overall system goals.
5.7 Summary
We have introduced some of the APIs dealing with U NIX process cre-
ation: fork(), exec(), and wait(). However, we have just skimmed
the surface. For more detail, read Stevens and Rago [SR05], of course,
particularly the chapters on Process Control, Process Relationships, and
Signals. There is much to extract from the wisdom therein.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
10 I NTERLUDE : P ROCESS API
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
I NTERLUDE : P ROCESS API 11
References
[C63] “A Multiprocessor System Design” by Melvin E. Conway. AFIPS ’63 Fall Joint Computer
Conference. New York, USA 1963 An early paper on how to design multiprocessing systems; may
be the first place the term fork() was used in the discussion of spawning new processes.
[J16] “They could be twins!” by Phoebe Jackson-Edwards. The Daily Mail. March 1, 2016.
Available: www.dailymail.co.uk/femail/article-3469189/Photos-children-
look-IDENTICAL-parents-age-sweep-web.html. This hard-hitting piece of journalism shows
a bunch of weirdly similar child/parent photos and is frankly kind of mesmerizing. Go ahead, waste two
minutes of your life and check it out. But don’t forget to come back here! This, in a microcosm, is the
danger of surfing the web.
[L83] “Hints for Computer Systems Design” by Butler Lampson. ACM Operating Systems
Review, Volume 15:5, October 1983. Lampson’s famous hints on how to design computer systems.
You should read it at some point in your life, and probably at many points in your life.
[QI15] “With Great Power Comes Great Responsibility” by The Quote Investigator. Available:
https://quoteinvestigator.com/2015/07/23/great-power. The quote investigator
concludes that the earliest mention of this concept is 1793, in a collection of decrees made at the French
National Convention. The specific quote: “Ils doivent envisager qu’une grande responsabilit est la
suite insparable d’un grand pouvoir”, which roughly translates to “They must consider that great
responsibility follows inseparably from great power.” Only in 1962 did the following words appear in
Spider-Man: “...with great power there must also come–great responsibility!” So it looks like the French
Revolution gets credit for this one, not Stan Lee. Sorry, Stan.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
12 I NTERLUDE : P ROCESS API
Homework (Code)
In this homework, you are to gain some familiarity with the process
management APIs about which you just read. Don’t worry – it’s even
more fun than it sounds! You’ll in general be much better off if you find
as much time as you can to write some code, so why not start now?
Questions
1. Write a program that calls fork(). Before calling fork(), have the main
process access a variable (e.g., x) and set its value to something (e.g., 100).
What value is the variable in the child process? What happens to the vari-
able when both the child and parent change the value of x?
2. Write a program that opens a file (with the open() system call) and then
calls fork() to create a new process. Can both the child and parent ac-
cess the file descriptor returned by open()? What happens when they are
writing to the file concurrently, i.e., at the same time?
3. Write another program using fork(). The child process should print “hello”;
the parent process should print “goodbye”. You should try to ensure that
the child process always prints first; can you do this without calling wait() in
the parent?
4. Write a program that calls fork() and then calls some form of exec() to
run the program /bin/ls. See if you can try all of the variants of exec(),
including (on Linux) execl(), execle(), execlp(), execv(), execvp(),
and execvpe(). Why do you think there are so many variants of the same
basic call?
5. Now write a program that uses wait() to wait for the child process to finish
in the parent. What does wait() return? What happens if you use wait()
in the child?
6. Write a slight modification of the previous program, this time using waitpid()
instead of wait(). When would waitpid() be useful?
7. Write a program that creates a child process, and then in the child closes
standard output (STDOUT FILENO). What happens if the child calls printf()
to print some output after closing the descriptor?
8. Write a program that creates two children, and connects the standard output
of one to the standard input of the other, using the pipe() system call.
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG