Operating System Support For Virtual Machines

Operating System Support for Virtual Machines
Samuel T. King, George W. Dunlap, Peter M. Chen
Computer Science and Engineering Division

Department of Electrical Engineering and Computer Science
University of Michigan
http://www.eecs.umich.edu/CoVirt
Abstract: A virtual-machine monitor (VMM) is a use- VMM is called a virtual machine. The hardware emu-
ful technique for adding functionality below existing lated by the VMM typically is similar or identical to the
operating system and application software. One class of hardware on which the VMM is running.
VMMs (called Type II VMMs) builds on the abstrac-
tions provided by a host operating system. Type II Virtual machines were first developed and used in
VMMs are elegant and convenient, but their perfor- the 1960s, with the best-known example being IBM’s
mance is currently an order of magnitude slower than VM/370 [Goldberg74]. Several properties of virtual
that achieved when running outside a virtual machine (a machines have made them helpful for a wide variety of
standalone system). In this paper, we examine the rea- uses. First, they can create the illusion of multiple vir-
sons for this large overhead for Type II VMMs. We find tual machines on a single physical machine. These mul-
that a few simple extensions to a host operating system tiple virtual machines can be used to run applications on
can make it a much faster platform for running a VMM. different operating systems, to allow students to experi-
Taking advantage of these extensions reduces virtualiza- ment conveniently with building their own operating
tion overhead for a Type II VMM to 14-35% overhead, system [Nieh00], to enable existing operating systems to
even for workloads that exercise the virtual machine run on shared-memory multiprocessors [Bugnion97],
intensively. and to simulate a network of independent computers.
Second, virtual machines can provide a software envi-
ronment for debugging operating systems that is more
1. Introduction convenient than using a physical machine. Third, virtual
machines provide a convenient interface for adding
A virtual-machine monitor (VMM) is a layer of functionality, such as fault injection [Buchacker01], pri-
software that emulates the hardware of a complete com- mary-backup replication [Bressoud96], and undoable
puter system (Figure 1). The abstraction created by the disks. Finally, a VMM provides strong isolation
guest guest guest

application application application
guest guest guest
application application application guest operating system
guest operating system virtual-machine monitor (VMM)
virtual-machine monitor (VMM) host operating system
host hardware host hardware
Type I VMM Type II VMM
Figure 1: Virtual-machine structures. A virtual-machine monitor is a software layer that runs on a host platform and provides
an abstraction of a complete computer system to higher-level software. The host platform may be the bare hardware (Type I
VMM) or a host operating system (Type II VMM). The software running above the virtual-machine abstraction is called guest
software (operating system and applications).
between virtual-machine instances. This isolation achieved by each host OS extension. Section 6 describes
allows a single server to run multiple, untrusted applica- related work, and Section 7 concludes.
tions safely [Whitaker02, Meushaw00] and to provide
security services such as monitoring systems for intru- 2. Virtual machines
sions [Chen01, Dunlap02, Barnett02].
Virtual-machine monitors can be classified along
As a layer of software, VMMs build on a lower- many dimensions. This section classifies VMMs along
level hardware or software platform and provide an two dimensions: the higher-level interface they provide
interface to higher-level software (Figure 1). In this and the lower-level platform they build upon.
paper, we are concerned with the lower-level platform
The first way we can classify VMMs is according
that supports the VMM. This platform may be the bare
to how closely the higher-level interface they provide
hardware, or it may be a host operating system. Building
matches the interface of the physical hardware. VMMs
the VMM directly on the hardware lowers overhead by
such as VM/370 [Goldberg74] for IBM mainframes and
reducing the number of software layers and enabling the
VMware ESX Server [Waldspurger02] and VMware
VMM to take full advantage of the hardware capabili-
Workstation [Sugerman01] for x86 processors provide
ties. On the other hand, building the VMM on a host
an abstraction that is identical to the hardware under-
operating system simplifies the VMM by allowing it to
neath the VMM. Simulators such as Bochs [Boc] and
use the host operating system’s abstractions. Virtutech Simics [Magnusson95] also provide an
abstraction that is identical to physical hardware,
Our goal for this paper is to examine and reduce
although the hardware they simulate may differ from the
the performance overhead associated with running a
hardware on which they are running.
VMM on a host operating system. Building it on a stan-
dard Linux host operating system leads to an order of Several aspects of virtualization make it difficult or
magnitude performance degradation compared to run- slow for a VMM to provide an interface that is identical
ning outside a virtual machine (a standalone system). to the physical hardware. Some architectures include
However, we find that a few simple extensions to the instructions whose behavior depends on whether the
host operating system reduces virtualization overhead to CPU is running in privileged or user mode (sensitive
14-35% overhead, which is comparable to the speed of instructions), yet which can execute in user mode with-
virtual machines that run directly on the hardware. out causing a trap to the VMM [Robin00]. Virtualizing
these sensitive-but-unprivileged instructions generally
The speed of a virtual machine plays a large part in requires binary instrumentation, which adds significant
determining the domains for which virtual machines can complexity and may add significant overhead. In addi-
be used. Using virtual machines for debugging, student tion, emulating I/O devices at the low-level hardware
projects, and fault-injection experiments can be done interface (e.g. memory-mapped I/O) causes execution to
even if virtualization overhead is quite high (e.g. 10x switch frequently between the guest operating system
slowdown). However, using virtual machine in produc- accessing the device and the VMM code emulating the
tion environments requires virtualization overhead to be device. To avoid the overhead associated with emulating
much lower. Our CoVirt project on computer security a low-level device interface, most VMMs encourage or
depends on running all applications inside a virtual require the user to run a modified version of the guest
machine [Chen01]. To keep the system usable in a pro- operating system. For example, the VAX VMM security
duction environment, we would like the speed of our kernel [Karger91], VMware Workstation’s guest tools
virtual machine to be within a factor of 2 of a standalone [Sugerman01], and Disco [Bugnion97] all add special
system. drivers in the guest operating system to accelerate the
virtualization of some devices. VMMs built on host
The paper is organized as follows. Section 2 operating systems often require additional modifications
describes two ways to classify virtual machines, focus- to the guest operating system. For example, the original
ing on the higher-level interface provided by the VMM version of SimOS adds special signal handlers to sup-
and the lower-level platform upon which the VMM is port virtual interrupts and requires relinking the guest
built. Section 3 describes UMLinux, which is the VMM operating system into a different range of addresses
we use in this paper. Section 4 describes a series of [Rosenblum95]; similar changes are needed by User-
extensions to the host operating system that enable vir- Mode Linux [Dike00] and UMLinux [Buchacker01].
tual machines built on the host operating system to
approach the speed of those that run directly on the Other virtualization strategies make the higher-
hardware. Section 5 evaluates the performance benefits level interface further different from the underlying
hardware. The Denali isolation kernel does not support long as compiling it directly on a Linux host operating
instructions that are sensitive but unprivileged, adds sev- system. VMMs that run directly on the bare hardware
eral virtual instructions and registers, and changes the achieve much lower performance overhead. For exam-
memory management model [Whitaker02]. Microker- ple, VMware Workstation 3.1 compiles the Linux 2.4.18
nels provide higher-level services above the hardware to kernel with only a 30% overhead relative to running
support abstractions such as threads and inter-process directly on the host operating system.
communication [Golub90]. The Java virtual machine
defines a virtual architecture that is completely indepen- The goal of this paper is to examine and reduce the
dent from the underlying hardware. order-of-magnitude performance overhead associated
with running a VMM on a host operating system. We
A second way to classify VMMs is according to find that a few simple extensions to a host operating sys-
the platform upon which they are built [Goldberg73]. tem can make it a much faster platform for running a
Type I VMMs such as IBM’s VM/370, Disco, and VMM, while preserving the conceptual elegance of the
VMware’s ESX Server are implemented directly on the Type II approach.
physical hardware. Type II VMMs are built completely
on top of a host operating system. SimOS, User-Mode 3. UMLinux
Linux, and UMLinux are all implemented completely
on top of a host operating system. Other VMMs are a To conduct our study, we use a Type II VMM
hybrid between Type I and II: they operate mostly on the called UMLinux [Buchacker01]. UMLinux was devel-
physical hardware but use the host OS to perform I/O. oped by researchers at the University of Erlangen-Nürn-
For example, VMware Workstation [Sugerman01] and berg for use in fault-injection experiments. UMLinux is
Connectix VirtualPC [Con01] use the host operating a Type II VMM: the guest operating system and all
system to access some virtual I/O devices. guest applications run as a single process (the guest-
machine process) on a host Linux operating system.
A host operating system makes a very convenient UMLinux provides a higher-level interface to the guest
platform upon which to build a VMM. Host operating operating system that is similar but not identical to the
system provide a set of abstractions that map closely to underlying hardware. As a result, the machine-depen-
each part of a virtual machine [Rosenblum95]. A host dent portion of the guest Linux operating system must
process provides a sequential stream of execution simi- be modified to use the interface provided by the VMM.
lar to a CPU; host signals provide similar functionality Simple device drivers must be added to interact with the
to interrupts; host files and devices provide similar func- host abstractions used to implement the devices for the
tionality to virtual I/O devices; host memory mapping virtual machine; a few assembly-language instructions
and protection provides similar functionality to a virtual (e.g. iret and in/out) must be replaced with function
MMU. These features make it possible to implement a calls to emulation code; and the guest kernel must be
VMM as a normal user process with very little code. relinked into a different address range [Hoxer02]. About
17,000 lines of code were added to the guest kernel to
Other reasons contribute to the attractiveness of work on the new platform. Applications compiled for
using a Type II VMM. Because a Type II VMM runs as the host operating system work without modification on
a normal process, the developer or administrator of the the guest operating system.
VMM can use the full power of the host operating sys-
tem to monitor and debug the virtual machine’s execu- UMLinux uses functionality from the host operat-
tion. For example, the developer or administrator can ing system that maps naturally to virtual hardware. The
examine or copy the contents of the virtual machine’s guest-machine process serves as a virtual CPU; host
I/O devices or memory or attach a debugger to the vir- files and devices serve as virtual I/O devices; a host
tual-machine process. Finally, the simplicity of Type II TUN/TAP device serves as a virtual network; host sig-
VMMs and the availability of several good open-source nals serve as virtual interrupts; and host memory map-
implementations make them an excellent platform for ping and protection serve as a virtual MMU. The virtual
experimenting with virtual-machine services. machine’s memory is provided by a host file that is
mapped into different parts of the guest-machine pro-
A potential disadvantage of Type II VMMs is per- cess’s address space. We store this host file in a memory
formance. Current host operating systems do not pro- file system (ramfs) to avoid needlessly writing to disk
vide sufficiently powerful interfaces to the bare the virtual machine’s transient state.
hardware to support the intensive usage patterns of
VMMs. For example, compiling the Linux 2.4.18 kernel The address space of the guest-machine process
inside the UMLinux virtual machine takes 18 times as differs from a normal host process because it contains
0xffffffff
host operating guest guest
system application application
guest operating system
0xc0000000
0xbfffffff
VMM process guest-machine process
guest operating
system host operating system
host hardware
0x70000000
0x6fffffff Figure 3: UMLinux system structure. UMLinux uses two
host processes. The guest-machine process executes the guest
operating system and all guest applications. The VMM
process uses ptrace to mediate access between the guest-
guest application machine process and the host operating system.
calls that would otherwise go to the host operating sys-

tem, and it restricts the set of system calls allowed by
0x0 the guest operating system. The VMM process uses
ptrace to mediate access between the guest-machine
Figure 2: UMLinux address space. As with all Linux process and the host operating system. Figure 4 shows
processes, the host kernel address space occupies the sequence of steps taken by UMLinux when a guest
[0xc0000000, 0xffffffff], and the host user address application issues a system call.
space occupies [0x0, 0xc0000000). The guest kernel
occupies the upper portion of the host user space The VMM process is also invoked when the guest
[0x70000000, 0xc0000000), and the current guest kernel returns from its SIGUSR1 handler and when the
application occupies the remainder of the host user space
guest kernel protects its address space from the guest
[0x0, 0x70000000).
application process. A similar sequence of context
switches occurs on each memory, I/O, and timer excep-
both the host and guest operating system address ranges tion received by the guest-machine process.
(Figure 2). In a standard Linux process, the operating
system occupies addresses [0xc0000000, 4. Host OS support for Type II VMMs
0xffffffff] while the application is given [0x0, A host operating system makes an elegant and con-
0xc0000000). Because the UMLinux guest-machine venient base upon which to build and run a VMM such
process must hold both the host and guest operating sys- as UMLinux. Each virtual hardware component maps
tems, the address space for the guest operating system naturally to an abstraction in the host OS, and the
must be moved to occupy [0x70000000, administrator can interact conveniently with the guest-
0xc0000000), which leaves [0x00000000, machine process just as it does with other host pro-
0x70000000) for guest applications. The guest kernel cesses. However, while a host OS provides sufficient
memory is protected using host mmap and munmap sys- functionality to support a VMM, it does not provide the
tem calls. To facilitate this protection, UMLinux main- primitives needed to support a VMM efficiently.
tains a virtual current privilege level, which is analogous
to the x86 current privilege level. This is used to differ- In this section, we investigate three bottlenecks
entiate between guest user and guest kernel modes, and that occur when running a Type II VMM, and we elimi-
the guest kernel memory will be accessible or protected nate these bottlenecks through simple changes to the
according to the virtual privilege level. host OS.
Figure 3 shows the basic system structure of We find that three bottlenecks are responsible for
UMLinux. In addition to the guest-machine process, the bulk of the virtualization overhead. First,
UMLinux uses a VMM process to implement the VMM. UMLinux’s system structure with two separate host pro-
cesses causes an inordinate number of context switches
The VMM process serves two purposes: it redi- on the host. Second, switching between the guest kernel
rects to the guest operating system signals and system and the guest user space generates a large number of
1
guest
application
5
VMM process guest operating system
8 7 6 4 3 2
host operating system
1. guest application issues system call; intercepted by VMM process via ptrace
2. VMM process changes system call to no-op (getpid)
3. getpid returns; intercepted by VMM process
4. VMM process sends SIGUSR1 signal to guest SIGUSR1 handler
5. guest SIGUSR1 handler calls mmap to allow access to guest kernel data; intercepted by VMM process
6. VMM process allows mmap to pass through
7. mmap returns to VMM process
8. VMM process returns to guest SIGUSR1 handler, which handles the guest application’s system call
Figure 4: Guest application system call. This picture shows the steps UMLinux takes to transfer control to the guest operating
system when a guest application process issues a system call. The mmap call in the SIGUSR1 handler must reside in guest user
space. For security, the rest of the SIGUSR1 handler should reside in guest kernel space. The current UMLinux implementation
includes an extra section of trampoline code to issue the mmap; this trampoline code is started by manipulating the guest machine
process’s context and finishes by causing a breakpoint to the VMM process; the VMM process then transfers control back to the
guest-machine process by sending a SIGUSR1.
memory protection operations. Third, switching kernel module when the guest-machine process executes
between two guest application processes generates a a system call or receives a signal. The VMM kernel
large number of memory mapping operations. module and other hooks in the host kernel were imple-
mented in 150 lines of code (not including comments).
4.1. Extra host context switches
Moving the VMM process’s functionality into the
The VMM process in UMLinux uses ptrace to
host kernel drastically reduces the number of context
intercept key events (system calls and signals) executed
by the guest-machine process. ptrace is a powerful switches in UMLinux. For example, transferring control
tool for debugging, but using it to create a virtual to the guest kernel on a guest system call can be done in
machine causes the host OS to context switch frequently just two context switches (Figure 5). It also simplifies
between the guest-machine process and the VMM pro- the system conceptually, because the VMM kernel mod-
cess (Figure 4). ule has more control over the guest-machine process
than is provided by ptrace. For example, the VMM
We can eliminate most of these context switches
kernel module can change directly the protections of the
by moving the VMM process’s functionality into the
host kernel. We encapsulate the bulk of the VMM pro- guest-machine process’s address space, whereas the
cess functionality in a VMM loadable kernel module. ptracing VMM process must cause the guest-machine
We also modified a few lines in the host kernel’s system process to make multiple system calls to change protec-
call and signal handling to transfer control to the VMM tions.
head, especially if the guest kernel accesses many pages
1 in its address space. In contrast, a standalone host
guest
machine incurs very little overhead when switching
application between user mode and kernel mode. The page table on
x86 processors need not change when switching
between kernel mode and user mode, because the page
guest operating system table entry for a page can be set to simultaneously allow
kernel-mode access and prevent user-mode access.
4
We developed two solutions that use the x86 paged
segments and privilege modes to eliminate the overhead
VMM kernel module
incurred when switching between guest kernel mode
and guest user mode. Linux normally uses paging as its
3 2 primary mechanism for translation and protection, using
segments only to switch between privilege levels. Linux
host operating system uses four segments: kernel code segment, kernel data
segment, user code segment, and user data segment.
Normally, all four segments span the entire address
1. guest application issues system call; intercepted range. Linux normally runs all host user code in CPU
by VMM kernel module privilege ring 3 and runs host kernel code in CPU privi-
2. VMM kernel module calls mmap to allow access lege ring 0. Linux uses the supervisor-only bit in the
page table to prevent code running in CPU privilege ring
to guest kernel data
3 from accessing the host operating system’s data (Fig-
3. mmap returns to VMM kernel module ure 6).
4. VMM kernel module sends SIGUSR1 to guest
SIGUSR1 handler Our first solution protects the guest kernel space
from guest user code by changing the bound on the user
code and data segments (Figure 7). When the guest-
Figure 5: Guest application system call with VMM kernel
module. This picture shows the steps taken by UMLinux with
machine process is running in guest user mode, the
a VMM kernel module to transfer control to the guest VMM kernel module shrinks the user code and data seg-
operating system when a guest application issues a system ments to span only [0x0, 0x70000000). When the
call. guest-machine process is running in guest kernel mode,
the VMM kernel module grows the user code and data
4.2. Protecting guest kernel space from segments to its normal range of [0x0, 0xffffffff].
guest application processes This solution added only 20 lines of code to the VMM
kernel module and is the solution we currently use.
The guest-machine process switches frequently
between guest user mode and guest kernel mode. The One limitation of the first solution is that it
guest kernel is invoked to service guest system calls and assumes the guest kernel space occupies a contiguous
other exceptions issued by a guest application process region directly below the host kernel space. Our second
and to service signals initiated by virtual I/O devices. solution allows the guest kernel space to occupy arbi-
Each time the guest-machine process switches from trary ranges of the address space within [0x0,
guest kernel mode to guest user mode, it must first pre- 0xc0000000) by using the page table’s supervisor-
vent access to the guest kernel’s portion of the address only bit to distinguish between guest kernel mode and
space [0x70000000, 0xc0000000). Similarly, each guest user mode (Figure 8). In this solution, the VMM
time the guest-machine process switches from guest kernel module marks the guest kernel’s pages as accessi-
user mode to guest kernel mode, it must first enable ble only by supervisor code (ring 0-2), then runs the
access to the guest kernel’s portion of the address space. guest-machine process in ring 1 while in guest kernel
The guest-machine process performs these address mode. When running in ring 1, the CPU can access
space manipulations by making the host system calls pages marked as supervisor in the page table, but it can-
mmap, munmap, and mprotect. not execute privileged instructions (such as changing the
segment descriptor). To prevent the guest-machine pro-
Unfortunately, calling mmap, munmap, or mpro- cess from accessing host kernel space, the VMM kernel
tect on large address ranges incurs significant over- module shrinks the user code and data segment to span
pages are pages are
accessible accessible
in ring 3 in ring 3
{
0xffffffff 0xffffffff
host operating host operating
system system
0xc0000000 0xc0000000
{
0xbfffffff ✓ 0xbfffffff ✓
guest operating ✓ guest operating ✓

user system ✓ system ✓
code/data ✓ ✓
segment 0x70000000 0x70000000
✓ user ✓
0x6fffffff 0x6fffffff
✓ code/data ✓
segment
✓ ✓
guest application ✓ guest application ✓
✓ ✓
✓ ✓
0x0 ✓ 0x0 ✓
Figure 6: Segment and page protections when running a Figure 7: Segment and page protections when running the
normal Linux host processes. A normal Linux host process guest-machine process in guest user mode (solution 1).
runs in CPU privilege ring 3 and uses the user code and data This solution protects the guest kernel space from guest
segment. The segment bounds allow access to all addresses, application processes by changing the bound on the user code
but the supervisor-only bit in the page table prevents the host and data segments to [0x0, 0x70000000) when running
process from accessing the host operating system’s data. In guest user code. When the guest-machine process switches to
order to protect the guest kernel’s data with this setup, the guest kernel mode, the VMM kernel module grows the user
guest-machine process must munmap or mprotect code and data segments to its normal range of [0x0,
[0x70000000, 0xc0000000) before switching to guest 0xffffffff].
user mode.
in the incoming guest application process. UMLinux
only [0x0, 0xc0000000). The guest-machine process minimizes the calls to mmap by doing it on demand, i.e.
runs in ring 3 while in guest user mode, which prevents as the incoming guest application process faults in its
guest user code from accessing the guest kernel’s data. address space. Even with this optimization, however,
This allows the VMM kernel module to protect arbitrary UMLinux generates a large number of calls to mmap,
pages in [0x0, 0xc0000000) from guest user mode especially when the working sets of the guest applica-
by setting the supervisor-only bit on those pages. It does tion processes are large.
still require the host kernel and user address ranges to
each be contiguous. To improve the speed of guest context switches,
we enhance the host OS to allow a single process to
4.3. Switching between guest application maintain several address space definitions. Each address
processes space is defined by a separate set of page tables, and the
guest-machine processes switches between address
A third bottleneck in a Type II VMM occurs when space definitions via a new host system call switch-
switching address spaces between guest application pro- guest. To switch address space definitions, switch-
cesses. Changing guest address spaces means changing guest needs only to change the pointer to the current
the current mapping between guest virtual pages and the first-level page table. This task is much faster than
page in the virtual machine’s “physical” memory file. mmap’ing each virtual page of the incoming guest
Changing this mapping is done by calling munmap for application process. We modify the guest kernel to use
the outgoing guest application process’s virtual address switchguest when context switching from one guest
space, then calling mmap for each resident virtual page application process to another. We reuse initialized
pages are We also measure performance on three mac-
accessible robenchmarks. POV-Ray is a CPU-intensive ray-tracing
in ring 3 program. We render the benchmark image from the
0xffffffff POV-Ray distribution at quality 8. kernel-build compiles
host operating the complete Linux 2.4.18 kernel (make bzImage).
system SPECweb99 measures web server performance, using
the 2.0.36 Apache web server. We configure
0xc0000000 SPECweb99 with 15 simultaneous connections spread
{
0xbfffffff over two clients connected to a 100 Mb/s Ethernet
switch. kernel-build and SPECweb99 exercise the vir-
guest operating tual machine intensively by making many system calls.
system They are similar to the I/O-intensive and kernel-inten-
sive workloads used to evaluate Cellular Disco
0x70000000 [Govil00]. All workloads start with a warm guest file
user 0x6fffffff ✓
code/data cache. Each results represents the average of 5 runs.
segment ✓ Variance across runs is less than 3%.
✓
guest application All experiments are run on an computer with an
✓
AMD Athlon 1800+ CPU, 256 MB of memory, and a
✓ Samsung SV4084 IDE disk. The guest kernel is Linux
✓ 2.4.18 ported to UMLinux, and the host kernels for
0x0 ✓ UMLinux are all Linux 2.4.18 with different degrees of
support for VMMs. All virtual machines are configured
Figure 8: Segment and page protections when running the with 192 MB of “physical” memory. The virtual hard
guest-machine process (solution 2). This solution protects disk for UMLinux is stored on a raw disk partition on
the guest kernel space from guest application processes by the host to avoid double buffering the virtual disk data in
marking the guest kernel’s pages as accessible only by code the guest and host file caches and to prevent the virtual
running in CPU privilege ring 0-2 and running the guest- machine from benefitting unfairly from the host’s file
machine process in ring 1 when executing guest kernel code.
cache. The host and guest file systems have the same
To prevent the guest-machine process from accessing host
kernel space, the VMM kernel module shrinks the user code
versions of all software (based on RedHat 6.2).
and data segment to span only [0x0, 0xc0000000).
We measure baseline performance by running
directly on the host operating system (standalone). The
host uses the same hardware and software installation as
address space definitions to minimize the overhead of
the virtual-machine systems and has access to the full
creating guest application processes. We take care to
256 MB of host memory.
prevent the guest-machine process from abusing
switchguest by limiting it to 1024 different address
We use VMware Workstation 3.1 to illustrate the
spaces and checking all parameters carefully. This opti-
performance of VMMs that are built directly on the host
mization added 340 lines of code to the host kernel.
hardware. We chose VMware Workstation because it
executes mostly on host hardware and because it is
5. Performance results
regarded widely as providing excellent performance.
This section evaluates the performance benefits However, note that VMware Workstation may be slower
achieved by each of the optimizations described in Sec- than a Type I VMM that is ideal for the purposes of
tion 4. comparing with UMLinux. First, VMware Workstation
issues I/O through the host OS rather than controlling
We first measure the performance of three impor- the host I/O devices directly. Second, unlike UMLinux,
tant primitives: a null system call, switching between VMware Workstation can support unmodified guest
two guest application processes (each with a 64 KB operating systems, and this capability forces VMware
working set), and transferring 10 MB of data using TCP Workstation to do extra work to provide the same inter-
across a 100 Mb/s Ethernet switch. The first two of these face to the guest OS as the host hardware does. The con-
microbenchmarks come from the lmbench suite figuration for VMware Workstation matches that of the
[McVoy96]. other virtual-machine systems, except that VMware
Workstation uses the host disk partition’s cacheable With all three host OS optimizations to support
block device for its virtual disk. VMMs, UMLinux runs all macrobenchmarks well
within our performance target of a factor of 2 relative to
Figures 9 and 10 summarize results from all per- standalone. POV-Ray incurs 1% overhead; kernel-build
formance experiments. incurs 35% overhead; and SPECweb99 incurs 14%
overhead. These overheads are comparable to those
The original UMLinux is hundreds of times slower attained by VMware Workstation 3.1.
for null system calls and context switches and is not able
to saturate the network. UMLinux is 8x as slow as the The largest remaining source of virtualization
standalone host on SPECweb99, 18x as slow as the stan- overhead for kernel-build is the cost and frequency of
dalone host on kernel-build and 10% slower than the handling memory faults. kernel-build creates a large
standalone host on POV-Ray. Because POV-Ray is com- number of guest application processes, each of which
pute-bound, it does not interact much with the guest ker- maps its executable pages on demand. Each demand-
nel and thus incurs little virtualization overhead. The mapped page causes a signal to be delivered to the guest
overheads for SPECweb99 and kernel-build are higher kernel, which must then ask the host kernel to map the
because they issue more guest kernel calls, each of new page. In addition, UMLinux currently does not sup-
which must be trapped by the VMM kernel module and port the ability to issue multiple outstanding I/Os on the
reflected back to the guest kernel by sending a signal. host. We plan to update the guest disk driver to take
advantage of non-blocking I/O when it becomes avail-
VMMs that are built directly on the hardware exe- able on Linux.
cute much faster than a Type II VMM without host OS
support. VMware Workstation 3.1 executes a null sys- 6. Related work
tem call nearly as fast as the standalone host, can satu-
rate the network, and is within a factor of 5 of the User-Mode Linux is a Type II VMM that is very
context switch time for a standalone host. VMware similar to UMLinux [Dike00]. Our discussion of User-
Workstation 3.1 incurs an overhead of 6-30% on the Mode Linux assumes a configuration that protects guest
intensive macrobenchmarks (SPECweb99 and kernel- kernel memory from guest application processes (jail
build). mode). The major technical difference between the
User-Mode Linux and UMLinux is that User-Mode
Our first optimization (Section 4.1) moves the Linux uses a separate host process for each guest appli-
VMM functionality into the kernel. This improves per- cation process, while UMLinux runs all guest code in a
formance by a factor of about 2-3 on the microbench- single host process. Assigning each guest application
marks, and by a factor of about 2 on the intensive process to a separate host process technique speeds up
macrobenchmarks. context switches between guest application processes,
but it leads to complications such as keeping the shared
Our second optimization (Section 4.2) uses seg- portion of the guest address spaces consistent and diffi-
ment bounds to eliminate the need to call mmap, mun- cult synchronization issues when switching guest appli-
map, and mprotect when switching between guest cation processes [Dike02a].
kernel mode and guest user mode. Adding this optimiza-
tion improves performance on null system calls and con- User-Mode Linux in jail mode is faster than
text switches by another factor of 5 (beyond the UMLinux (without host OS support) on context
performance with just the first optimization) and enables switches (157 vs. 2029 microseconds) but slower on
UMLinux to saturate the network. Performance on the system calls (296 vs. 96 microseconds) and network
two intensive macrobenchmarks improves by a factor of transfers (54 vs. 39 seconds). User-Mode Linux in jail
3-4. mode is faster on kernel-build (1309 vs. 2294 seconds)
and slower on SPECweb99 (200 vs. 172 seconds) than
Our final optimization (Section 4.3) maintains UMLinux without host OS support.
multiple address space definitions to speed up context
switches between guest application processes. This opti- Concurrently with our work on host OS support
mization has little effect on benchmarks with only one for VMMs, the author of User-Mode Linux proposed
main application process, but it has a dramatic affect on modifying the host OS to support multiple address space
benchmarks with more than one main application pro- definitions for a single host process [Dike02a]. Like the
cess. Adding this optimization improves the context optimization in Section 4.3, this would speed up
switch microbenchmark by a factor of 13 and improves switches between guest application processes and allow
kernel-build by a factor of 2. User-Mode Linux to run all guest code in a single host
100
95.6
2029.4
2000
Time per context switch (microseconds)

Time per syscall (microseconds)
80
1500
60
1000
40 830.9
34.1
20 500
6.96 6.85 178.3

0.36 0.21 14.0 54.5 11.0
0 0
Workstation
UMLinux
UMLinux
kernel module
kernel module
address spaces
address spaces
Workstation
standalone
standalone
+ segment
+ segment
bounds
bounds
VMware
VMware
+ multiple
+ multiple
+ VMM
+ VMM
null system call context switch
40 38.8
35
Time to transfer 100 MB (seconds)
30
25
21.0
20
15
12.2 12.2 12.3 12.2
10
0
UMLinux
kernel module
address spaces
standalone
+ segment
Workstation
bounds
VMware
+ multiple
+ VMM
network transfer
Figure 9: Microbenchmark results. This figure compares the performance of different virtual-machine monitors on three
microbenchmarks: a null guest system call, context switching between two 64 KB guest application processes, and receiving 10
MB of data over the network. The first four bars represent the performance of UMLinux with increasing support from the host OS.
Each optimization level is cumulative, i.e. it includes all optimizations of the bars to the left. The performance of a standalone
host (no VMM) is shown for reference. Without support from the host OS, UMLinux is much slower than a standalone host.
Adding three extensions to the host OS improves the performance of UMLinux dramatically.
2500
2294
150 147
140 136 2000
134 133 132
Runtime (seconds)
Runtime (seconds)
1500
100
1154
1000
50
500
332
176 169 130
0 0
UMLinux
UMLinux
kernel module
address spaces
kernel module
address spaces
standalone
+ segment
standalone
+ segment
Workstation
Workstation
bounds
bounds
VMware
VMware
+ multiple
+ multiple
+ VMM
+ VMM
POV-Ray kernel-build
200
172.4
150
Runtime (seconds)
100 92.6
50
24.3 24.7 23.0 21.6
0
UMLinux
kernel module
address spaces
standalone
+ segment
Workstation
bounds
VMware
+ multiple
+ VMM
SPECweb99
Figure 10: Macrobenchmark results. This figure compares the performance of different virtual-machine monitors on three
macrobenchmarks: the POV-Ray ray tracer, compiling a kernel, and SPECweb99. The first four bars represent the performance
of UMLinux with increasing support from the host OS. Each optimization level is cumulative, i.e. it includes all optimizations
of the bars to the left. The performance of a standalone host (no VMM) is shown for reference. Without support from the host
OS, UMLinux is much slower than a standalone host. Adding three extensions to the host OS allows UMLinux to approach the
speed of a Type I VMM.
process. Implementation of this optimization is cur- code were added to the host kernel to support these three
rently underway [Dike02b], though User-Mode Linux optimizations.
still uses two separate host processes, one for the guest
kernel and one for all guest application processes. We With all three optimizations, performance of a
currently use UMLinux for our CoVirt research project Type II VMM on macrobenchmarks improved to within
on virtual machines [Chen01] because running all guest 14-35% overhead relative to running on a standalone
code in a single host process is simpler, uses fewer host host (no VMM), even on benchmarks that exercised the
resources, and simplifies the implementation of our VMM intensively. The main remaining source of over-
VMM-based replay service (ReVirt) [Dunlap02]. head was the large number of guest application pro-
cesses created in one benchmark (kernel-build) and
The SUNY Palladium project used a combination accompanying page faults from demand mapping in the
of page and segment protections on x86 processors to executable.
divide a single address space into separate protection
domains [Chiueh99]. Our second solution for protecting In the future, we plan to reduce the size of the host
the guest kernel space from guest application processes operating system used to support a VMM. Much of the
(Section 4.2) uses a similar combination of x86 features. code in the host OS can be eliminated, because the
However, the SUNY Palladium project is more complex VMM uses only a small number of system calls and
because it needs to support a more general set of protec- abstractions in the host OS. Reducing the code size of
tion domains than UMLinux. the host OS will help make Type II VMMs a fast and
trusted base for future virtual-machine services.
Reinhardt, et al. implemented extensions to the
CM-5’s operating system that enabled a single process 8. Acknowledgments
to create and switch between multiple address spaces
We are grateful to the researchers at the University
[Reinhardt93]. This capability was added to support the
of Erlangen-Nürnberg for writing UMLinux and sharing
Wisconsin Wind Tunnel’s parallel simulation of parallel
it with us. In particular, Kerstin Buchacker and Volkmar
computers.
Sieh helped us understand and use UMLinux. Our shep-
herd Ed Bugnion and the anonymous reviewers helped
7. Conclusions and future work
improve the quality of this paper. This research was sup-
Virtual-machine monitors that are built on a host ported in part by National Science Foundation grants
operating system are simple and elegant, but they are CCR-0098229 and CCR-0219085 and by Intel Corpora-
currently an order of magnitude slower than running tion. Samuel King was supported by a National Defense
outside a virtual machine, and much slower than VMMs Science and Engineering Graduate Fellowship. Any
that are built directly on the hardware. We examined the opinions, findings, conclusions or recommendations
sources of overhead for a VMM that run on a host oper- expressed in this material are those of the authors and do
ating system. not necessarily reflect the views of the National Science
Foundation.
We found that three bottlenecks are responsible for
the bulk of the performance overhead. First, the host OS 9. References
required a separate host user process to control the main [Barnett02] Ryan C. Barnett. Monitoring VMware
guest-machine process, and this generated a large num- Honeypots, September 2002. http://hon-
ber of host context switches. We eliminated this bottle- eypots.source-
neck by moving the small amount of code that forge.net/monitoring_vmware_honeypots
controlled the guest-machine process into the host ker- .html.
nel. Second, switching between guest kernel and guest [Boc] http://bochs.sourceforge.net/.
user space generated a large number of memory protec-
tion operations on the host. We eliminated this bottle- [Bressoud96] Thomas C. Bressoud and Fred B.
neck in two ways. One solution modified the host user Schneider. Hypervisor-based fault toler-
segment bounds; the other solution modified the seg- ance. ACM Transactions on Computer
ment bounds and ran the guest-machine process in CPU Systems, 14(1):80–107, February 1996.
privilege ring 1. Third, switching between two guest [Buchacker01] Kerstin Buchacker and Volkmar Sieh.
application processes generated a large number of mem- Framework for testing the fault-tolerance
ory mapping operations on the host. We eliminated this of systems including OS and network as-
bottleneck by allowing a single host process to maintain pects. In Proceedings of the 2001 IEEE
several address space definitions. In total, 510 lines of Symposium on High Assurance System
Engineering (HASE), pages 95–105, Oc- [Govil00] Kinshuk Govil, Dan Teodosiu, Yong-
tober 2001. qiang Huang, and Mendel Rosenblum.
[Bugnion97] Edouard Bugnion, Scott Devine, Kinshuk Cellular disco: resource management us-
Govil, and Mendel Rosenblum. Disco: ing virtual clusters on shared-memory
Running Commodity Operating Systems multiprocessors. ACM Transactions on
on Scalable Multiprocessors. ACM Trans- Computer Systems, 18(3):226–262, Au-
actions on Computer Systems, 15(4):412– gust 2000.
447, November 1997. [Hoxer02] H. J. Hoxer, K. Buchacker, and V. Sieh.
[Chen01] Peter M. Chen and Brian D. Noble. When Implementing a User-Mode Linux with
virtual is better than real. In Proceedings Minimal Changes from Original Kernel.
of the 2001 Workshop on Hot Topics in In Proceedings of the 2002 International
Operating Systems (HotOS), pages 133– Linux System Technology Conference,
138, May 2001. pages 72–82, September 2002.
[Chiueh99] Tzi-cker Chiueh, Ganesh Venkitachalam, [Karger91] Paul A. Karger, Mary Ellen Zurko,
and Prashant Pradhan. Integrating Seg- Douglis W. Bonin, Andrew H. Mason,
mentation and Paging Protection for Safe, and Clifford E. Kahn. A retrospective on
Efficient and Transparent Software Ex- the VAX VMM security kernel. IEEE
tensions. In Proceedings of the 1999 Sym- Transactions on Software Engineering,
posium on Operating Systems Principles, 17(11), November 1991.
December 1999. [Magnusson95] Peter Magnusson and B. Werner. Effi-
[Con01] The Technology of Virtual Machines. cient Memory Simulation in SimICS. In
Technical report, Connectix Corp., Sep- Proceedings of the 1995 Annual Simula-
tember 2001. tion Symposium, pages 62–73, April 1995.
[Dike00] Jeff Dike. A user-mode port of the Linux [McVoy96] Larry McVoy and Carl Staelin. Lmbench:
kernel. In Proceedings of the 2000 Linux Portable tools for performance analysis. In
Showcase and Conference, October 2000. Proceedings of the Winter 1996 USENIX
[Dike02a] Jeff Dike. Making Linux Safe for Virtual Conference, January 1996.
Machines. In Proceedings of the 2002 Ot- [Meushaw00] Robert Meushaw and Donald Simard.
tawa Linux Symposium (OLS), June 2002. NetTop: Commercial Technology in High
[Dike02b] Jeff Dike. User-Mode Linux Diary, No- Assurance Applications. Tech Trend
vember 2002. http://user-mode- Notes: Preview of Tomorrow’s Informa-
linux.sourceforge.net/diary.html. tion Technologies, 9(4), September 2000.
[Dunlap02] George W. Dunlap, Samuel T. King, Suk- [Nieh00] Jason Nieh and Ozgur Can Leonard. Ex-
ru Cinar, Murtaza Basrai, and Peter M. amining VMware. Dr. Dobb’s Journal,
Chen. ReVirt: Enabling Intrusion Analy- August 2000.
sis through Virtual-Machine Logging and [Reinhardt93] Steven K. Reinhardt, Babak Falsafi, and
Replay. In Proceedings of the 2002 Sym- David A. Wood. Kernel Support for the
posium on Operating Systems Design and Wisconsin Wind Tunnel. In Proceedings
Implementation (OSDI), pages 211–224, of the 1993 Usenix Symposium on Micro-
December 2002. kernels and Other Kernel Architectures,
[Goldberg73] R. Goldberg. Architectural Principles for pages 73–89, September 1993.
Virtual Computer Systems. PhD thesis, [Robin00] John Scott Robin and Cynthia E. Irvine.
Harvard University, February 1973. Analysis of the Intel Pentium’s Ability to
[Goldberg74] Robert P. Goldberg. Survey of Virtual Support a Secure Virtual Machine Moni-
Machine Research. IEEE Computer, pag- tor. In Proceedings of the 2000 USENIX
es 34–45, June 1974. Security Symposium, August 2000.
[Golub90] David Golub, Randall Dean, Allessandro [Rosenblum95] Mendel Rosenblum, Stephen A. Herrod,
Forin, and Richard Rashid. Unix as an Ap- Emmett Witchel, and Anoop Gupta. Com-
plication Program. In Proceedings of the plete computer system simulation: the Si-
1990 USENIX Summer Conference, 1990. mOS approach. IEEE Parallel &
Distributed Technology: Systems & Ap- Server. In Proceedings of the 2002 Sym-
plications, 3(4):34–43, January 1995. posium on Operating Systems Design and
[Sugerman01] Jeremy Sugerman, Ganesh Venkitacha- Implementation (OSDI), December 2002.
lam, and Beng-Hong Lim. Virtualizing [Whitaker02] Andrew Whitaker, Marianne Shaw, and
I/O Devices on VMware Workstation’s Steven D. Gribble. Scale and Performance
Hosted Virtual Machine Monitor. In Pro- in the Denali Isolation Kernel. In Pro-
ceedings of the 2001 USENIX Technical ceedings of the 2002 Symposium on Oper-
Conference, June 2001. ating Systems Design and Implementation
[Waldspurger02] Carl A. Waldspurger. Memory Re- (OSDI), December 2002.
source Management in VMware ESX

Operating System Support For Virtual Machines

Uploaded by

Copyright:

Available Formats

Operating System Support For Virtual Machines

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Operating System Support For Virtual Machines

Uploaded by

Copyright:

Available Formats

Operating System Support for Virtual Machines

Samuel T. King, George W. Dunlap, Peter M. Chen

Computer Science and Engineering Division

guest guest guest

Type I VMM Type II VMM

calls that would otherwise go to the host operating sys-

VMM process guest operating system

host operating system

guest operating ✓ guest operating ✓

Time per context switch (microseconds)

6.96 6.85 178.3

null system call context switch

24.3 24.7 23.0 21.6

You might also like