Concept, Design, and Implementation of A Slimline Boot Firmware For Linux On Power Architecture
Concept, Design, and Implementation of A Slimline Boot Firmware For Linux On Power Architecture
i
Disclaimer
Hereby I reassure having written the presented work independently and by using only the
listed sources and facilities.
ii
Credits
Here by I would like to thank Dr. rer. nat. Otto Wohlmuth that he provided me with many
instructions, thaugt me many usful techniques, and helped me all the time. Thanks to Prof.
Dr. Martin Rieger for his tutorial work and remarks.
I would also like to thank my family for their patience and support, especially my father
who made me acquire the tase for computer sciences, my twin brother who helped to find ap-
propriate I2C hardware for testing my ideas, and my sister for proofreading this diploma thesis.
Many thanks to Hartmut Penner, Segher Boessenkool, and Benjamin Herrenschmidt for their
help in understanding firmware basics and concepts, the PowerPC architecture, the magic and
beauty of Forth, and the Linux/PPC64 kernel.
Thanks to all the colleagues at IBM Deutschland Entwicklung GmbH for their help in creation
of this project.
iii
Contents
1 Introduction 1
2 Basic Technologies 3
2.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 IBM JS20 Blade Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 IBM PowerPC 970 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Miscellaneous Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Firmware Anatomy 7
3.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Common Hardware Reference Platform . . . . . . . . . . . . . . . . . . . . . . 8
3.3 RISC Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Apple Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 LinuxBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 OpenBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.7 Extensible Firmware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iv
Contents
8 Conclusions 60
v
Contents
A Glossary 61
Bibliography 69
vi
List of Figures
vii
List of Tables
viii
Listings
7.1 Constant-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2 Variable-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3 Colon Definition-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.4 i2c driver structure used for the I2C Chip Driver . . . . . . . . . . . . . . . . 57
ix
Chapter 1
Introduction
In consequence of Moores Law computer systems become not only smaller, faster and cheaper,
but also more complex. The development of soft– and hardware, which increases in volume,
tends to expanded administration and programming efforts. Configuration and debugging
engages more and more time. A big handicap of modern systems is the hardware. Defec-
tive hardware components are mostly recognized not until they are turned out. The problem
is that not only the whole computer systems can breakdown – other components can get
damages, too. It is only possible to find the cause of defect in complex analyses. Clues or in-
formation exist sparsely and debug tools are often only available in development environments.
As a result of this, one of the biggest problems in firmware development is the simplification
of the whole software design without circumcises functionality and flexibility. Leading com-
panies like IBM, Apple, and Intel addressed the problem and drives enormous research and
development efforts on firmware specifications.
1
Chapter 1. Introduction
is accessed by a Forth based language interface and is described by IEEE standard IEEE-1275.
Intel tries to establish his own BIOS standard. This standard is named as the Extensible
Firmware Interface (EFI) which describes a new model for the interface between operating
system and platform firmware. This interfaces contains platform-related information, plus
boot– and run-time service calls that are available to the operating system and its loader.
The target of Intel is that these components provide a standard environment for booting an
operating system and running pre-boot applications.
IBM uses the RISC Platform Architecture (RPA) which is based on Open Firmware. The
biggest different to Open Firmware is that the RISC Platform Architecture implements a hy-
pervisor which allows the execution of several operating systems at the same machine. This de-
sign is mostly used on big mainframe machines. They also implements Run-Time Abstraction
Services (RTAS) which provides hardware-specific functions, including functions for accessing
the real-time clock, non-volatile RAM (NVRAM), restart, shutdown, and PCI configuration
cycles. These functions are implemented under a hardware-independent synchronous interface.
Apple uses an Open Firmware based concept, but without the usage of a hypervisor and the
Run-Time Abstraction Services. Instead they implemented a new and complex software pack-
age to get rid of all drawbacks in such hardware-abstraction concepts.
These entire and other concepts have serious differences in skeletal structure and implemen-
tation. Every concept has drawbacks and advantages. To get a “Concept, Design, and Im-
plementation of a Slimline Boot Firmware for Linux on Power Architecture” it is necessary
to understand these basics completely. Chapter 2, 3, and 4 will give introductions and imple-
mentation details on these technologies.
Chapter 5 and 6 deal with the boot process and the control flow of a slimline boot firmware.
Furthermore, design aspects and implementations are specified and described more in detail.
2
Chapter 2
Basic Technologies
This chapter describes the basic technologies, which are used in existing PowerPC systems.
The intention of this chapter is to be a good starting point in case of understanding Open
Firmware and the hardware of an IBM JS20 64-bit PowerPC processor-based 2-way blade
server.
It was designed to support a variety of different processor Instruction Set Architectures (ISAs)
and different buses, that’s why it is used over a million machines and supported by several
system vendors. For example: provisions for PCI, Futurebus+, VME+D, and SMBus already
exist and can be used for card identification and booting.
Beside this, Open Firmware uses the “plug-in driver” technique to make use of new devices for
booting or message display without modification to the main Open Firmware system ROM.
Each device has its own plug-in driver—normally located in a ROM on the device itself. Such
3
Chapter 2. Basic Technologies
a driver is realized in FCode and not in machine language. FCode is a machine indepen-
dent language, which is a byte-coded “intermediate language” for the Forth programming
language, therefore FCode drivers can be used on different hardware models. Here plug-in
device cards can use FCode to report their characteristics to the firmware and the system
software. Such characteristics may include the device name, model, revision level, device type,
register locations, interrupt levels, supported features, and any other identification informa-
tion that make sense for the particular device. System software, like an operating system, can
use this information for automatic configuration. All information’s are stored in a processor–
and architecture-independent format that may easily retrieved decoded. The main part of
Open Firmware is developed in the programming language Forth. Forth was originally devel-
oped in the early 1970s by Charles H. Moore, at the National Radio Astronomy Observatory.
It was used for controlling radio telescopes with all associated scientific instruments and for
high-speed data acquisition and graphical analysis. Forth is an industry-standard interactive
programming language and is based on a stack oriented “virtual machine” that may be easily
and efficiently implemented on any system.
4
Chapter 2. Basic Technologies
2.2 Hardware
2.2.1 IBM JS20 Blade Server
The IBM PowerPC 970 is the first 64-bit high-performance RISC processor for mainstream
desktop usage. It could be characterizes as “wide and deep”, which means, that the PowerPC
970 complies both design philosophies in modern chip manufacturing. In other words, it has an
extremely wide execution core and a 16-stage pipeline. One the other hand, with a maximum
of 2 GHz it has not the same speed like a Pentium 4, but it was also designed from the
ground with multiprocessing in mind. Instead of increasing the clockspeed to get a higher
performance, this processor is normaly used in a SMP system. The L1 cache of the PowerPC
970 is split into an instruction cache (i-cache) and a data cache (d-cache). Its instruction
cache is roughly twice the size of his predecessors. This is necessary, due to the much higher
performance penalty for cache misses, because of the longer pipeline. When you combine the
32 KB d-cache with the sizable 512 KB L2, the 900 Mhz DDR frontside bus, and the support
for up to 8 data prefetch streams, it is clear that this chip was designed for floating-point–
and SIMD-intensive applications.
Generally, the PC87417 is targeted for a wide range of servers and workstations. It provides
support for serial ports, an IEEE 1284 parallel port, floppy disk controller, keyboard and mouse
controller, LPC bus interface, system wakeup control, real time clock and general purpose I/O
ports.
The AMD-8111 HyperTransport I/O Hub replaces what traditional is called “Southbridge”.
This device integrates storage, connectivity, audio, I/O expansion, security and system man-
agement functions into a single component.
5
Chapter 2. Basic Technologies
Figure 2.2: IBM PowerPC 970 Architectue; 64-bit data, 48-bit adresses (4TB), native 32-bit compatibility;
2LSU, 2IU, 2FPU, 2VPU (VALU+VPERM, 128-bits); up to 212 instructions in flight.
This high-speed device provides two independent high-performance PCI-X bus bridges inte-
grated with a high-speed HyperTransport technology tunnel. This tunnel function provides
connection to other HyperTransport technology devices.
6
Chapter 3
Firmware Anatomy
The target of this chapter is to describe the structure of existing boot firmware and all asso-
ciated mechanisms. It shows more details of a typically Open Firmware implementation and
explains boot firmware which is based on this standard, like the Common Hardware Reference
Platform or the RISC Platform Architecture. Because of the necessity to understand com-
peting implementations, like Intel’s Extensible Firmware Interface or LinuxBIOS, this chapter
includes also a short outlook of existing commercial and open source implementations.
7
Chapter 3. Firmware Anatomy
Figure 3.1: Open Firmware Figure 3.2: Common Hardware Reference Platform
8
Chapter 3. Firmware Anatomy
officially came into being in August of 1997. A key benefit of the RPA specification is the
ability of hardware platform developers to have degrees of freedom of implementation below
the level of architected interfaces and therefore have the opportunity for adding unique values.
In addition to this, RPA includes also a Hypervisor on top of the low-level firmware layer.
This Hypervisor owns all system resources and provides an abstraction layer through which
device access and control are arbitrated. Because of this, it is possible to run several operating
systems (at the same time) on a system.
9
Chapter 3. Firmware Anatomy
3.5 LinuxBIOS
LinuxBIOS is an open source replacement for BIOS’s found on x86, AMD64, Alpha and
PowerPC systems. The LinuxBIOS project was started at the Los Alamos National Lab
(LANL) in September 1999 to get better control during boot time in large cluster environments.
The original idea of LinuxBIOS was to load the Linux kernel from the ROM and build a boot
loader on top. Nowadays, it could be better described with: “Bring a computer for so far that
it is possible to boot a Linux kernel”. LinuxBIOS initializes the hardware, setups all exception
vectors, loads an ELF file and executes it. In other words, it interacts like low-level firmware
with an ELF loader included. Because of the ELF loader, LinuxBIOS can load several ELF
images (hereafter known as payload) and establishes four main scenarios how LinuxBIOS could
be used.
Variation A
This was the original concept of LinuxBIOS. LinuxBIOS replaces the normal BIOS code on
the motherboard with the Linux kernel itself, so that the operating system boots instantly into
Linux within seconds of turning it on. Nevertheless, this solution is only useful during bring
up of hardware. The problem is that packaging is inflexible, because every time the kernel
changes it is necessary to rewrite the flash. The next drawback is that, when the flashed
kernel is defective the complete hardware couldn’t used because of a broken firmware or a
Linux kernel.
10
Chapter 3. Firmware Anatomy
Variation B
The idea of this variation is to use separate kernels for the firmware and the Linux system. To
reach this, the firmware kernel implements a special system call (kexec, LOBOS, or 2 kernel
monte) which can load and execute another Linux kernel. Corresponding to the functionality
of the firmware kernel, the Kernel for the Linux system could be loaded from a file system
on a harddisk or via network. This solution may solve the inflexible packaging, but still has
some other problems. One major problem is that firmware, which makes use of this special
system call, only boots Linux. A second problem is that the system call needed to load and
execute a Linux kernel is not available on all platforms. But besides this, the idea of using
two separate kernels could be a great solution for machines which only want to boot Linux as
major operating system.
Variation C
Operating systems like Win2k and BSD need old-style PC-BIOS interrupt support during
the boot sequence. LinuxBIOS implements two additional layers on top of it to support
this functionality. The first layer is a small wrapper program to transfer informations from
LinuxBIOS to Bochs BIOS without having to make modifications in Bochs BIOS. This layer is
named Adhesive Loader (ADLO). ADLO is responsible for making sure the ROMs that makes
up Bochs BIOS and the VGA BIOS are stored at the expected addresses. It also performs the
11
Chapter 3. Firmware Anatomy
task of copying Boch BIOS from its original location into shadow RAM. Additional, LinuxBIOS
stores some tables (e.g. memory map, IRQ routing) in a portable format. The problem is that
this format is not conforming to the format they are stored in PC-BIOS. ADLO converts these
tables to a format understood by Bochs BIOS. Bochs BIOS was written for the Bochs IA-32
emulation project to emulate an AMI BIOS. The primary job of Bochs BIOS is to setup the
Interrupt Vector Table and supply an entry point for each of its BIOS services. With these two
layer, ADLO and Bochs BIOS, it is possible to boot operating systems which needs PC-BIOS
support. This solution is not interesting for PowerPC platforms, because no operating systems
on such platforms uses PC-BIOS services.
Variation D
Sometimes a Linux kernel could not be used to boot another Linux kernel, as it is done in
variation B. The problem is mostly that the Linux kernel is too big to put it in the flash memory
or in the BIOS ROM. In such a case LinuxBIOS can boot a boot manager as payload. But this
soultion has the problem that every platform has its own boot manager which is completely
different. Intel machines for example uses LILO, Grub or FILO and PowerPC platforms uses
yaboot as boot manager. FILO is a small boot manager which can load boot images from local
file systems without the help of legacy BIOS services, which makes it attractive for porting it
to further platforms. It is also possible to use Etherboot as payload to support booting via
network. Etherboot is a software package for creating ROM images that can download code
over an Ethernet network to be executed on an x86 computer. Many network adapters have
a socket where a ROM chip can be installed. Etherboot is code that can be put in such a
ROM. Etherboot is normally used for for booting PCs diskless. A last option could be that
LinuxBIOS should load OpenBIOS. OpenBIOS is an open source project which wants to have
a 100% IEEE 1275–1994 compliant boot firmware.
3.6 OpenBIOS
OpenBIOS is a free portable firmware implementation. The goal is to implement a 100% IEEE
1275–1994 (referred to as Open Firmware) compliant firmware. Among it is features, Open
Firmware provides an instruction set independent device interface. This can be used to boot
the operating system from expansion cards without native initialization code. It is one goal of
OpenBIOS to work on all common platforms, like x86, Alpha, AMD64 and IPF. Additionally
OpenBIOS targets the embedded systems sector, where a sane and unified firmware is a crucial
design goal. Open Firmware is found on many servers and workstations and there are several
commercial implementations from SUN, Apple, IBM, CodeGen, and others. Even though
OpenBIOS has made quite some progress with it is several components, there’s a lot of work
12
Chapter 3. Firmware Anatomy
to be done to get OpenBIOS booting an operating system. The basic development environment
is functional, but some parts of the device initialization infrastructure are still incomplete. Our
development environment consists of a Forth kernel (stack based virtual machine), an FCode
tokenizer and detokenizer (assembler/disassembler for Forth bytecode drivers).
The Boot Services provides an interface for devices and system functionality that can be used
during boot time. Device access is abstracted through handles and protocols. During boot,
system resources are owned by the firmware and are controlled through boot services interface
functions. These functions can be characterized as global or handle-based. Runtime Services
are a minimal set of services which ensure an appropriate abstraction of base platform hard-
ware resources that may be needed by the operating system during its normal operating after
the boot phases. Beside this EFI implements a Boot Manager and a Virtual Machine. The
Boot Manager is a firmware policy engine that can be configured by modifying architecturally
defined global NVRAM variables and can load EFI drivers and EFI applications. EFI drivers
are EFI Byte Code programs and runs in the EFI Byte Code Virtual Machine. This virtual
13
Chapter 3. Firmware Anatomy
For Intel the Extensible Firmware Interface is an innovative concept for next generation com-
puters, but the idea of a boot firmware with services for the operating system during boot-
and execution time, with stored platform information and a byte-code driver model is not
completely new. Exactly this behavior was described eight years ago in the IEEE Standard
for Boot (Initialization Configuration) Firmware: Core Requirements and Practices (IEEE
Std. 1275–1994).
14
Chapter 4
The programming language Forth is the basis of every Open Firmware based boot firmware.
To understanding how a Forth systems works, as interpreter and as compiler, is necessary and
needful for the following chapters. This chapter shows all elements of a Forth system and
describes the different implementation strategies. At the moment, four open source implemen-
tation exist and could be used for a slimline boot firmware. To know which functionality is
necessary or can leave out, this chapter includes a detailed requirement list and shows advan-
tages and drawbacks.
15
Chapter 4. Programming Language “Forth”
The dictionary contains all executable “words” in a Forth system. Forth words are functionally
analogous to subroutines and equivalent to commands in other languages. A word is made by
a colon definitions.
Figure 4.1: Structure of a Dictionary Entry (Indirect Threaded Code); the Control Bit controls the type and
the use of the Definition, the Parameter Field can include compiled Addresses which are used
by the Address Interpreter.
Data Stack
Forth implements a cell-wide push-down LIFO (last-in, first-out) data stack. The purpose
of the data stack is to hold numerical operands for Forth commands. Forth includes several
words to manipulate the data stack, like swap elements on the stack, duplicate or delete it.
16
Chapter 4. Programming Language “Forth”
Return Stack
The return stack is implemented like the data stack. This means, it is also a cell-wide push-
down LIFO stack. It cannot be directly manipulated via Forth words. The main tasks of the
return stack are to hold return addresses, loop parameters, to save temporary data, and the
interpreter pointer.
Text Interpreter
Every command typed by a user, read from stored source code on a disk, or evaluated from
a string is executed by the text interpreter. The first step of this interpreter is to parse
the given string. This is done by skipping leading spaces and parsing it with space (ASCII
0x20) as delimiter. Then the dictionary is searched for a definition which matches the current
token received from the parsed string. When a match occurs, the text interpreter performs
the interpretation or compilation behaviors of the definition. If no match is found, the text
interpreter tries to convert the token to a binary number. After successful conversion the
number is placed on the stack, otherwise the word “abort” is executed.
Address Interpreter
The internal engine of a Forth system is referred to as address interpreter and distinct from
the text interpreter which processes source code and user input. The text interpreter extracts
strings separated by spaces and looks if this word is in the dictionary. If the word is found in
the dictionary it is executed by the address interpreter who processes all addresses compiled in
the parameter field of a word definition by executing the definition pointed by the addresses.
The address interpreter has two important properties. First, it is fast, often requires as few
as one or two machine instructions per address. Second, it makes Forth definitions extremely
compact, as each reference requires only one cell.
The primary unit (and almost the only data type) of information in the architecture of a Forth
system is the cell. A cell has the word length of the processors and is also the size of an address
and the size of an single item on a stack. It can be a flag, character, number, execution token,
or an address which means that Forth systems don’t have compiler services like type checking,
macro preprocessing, or common subexpression elimination. Forth also provides a basic set of
words used to define objects of various kinds. As with other features of Forth, the set of such
commands may be expanded. A word is defined when an entry is created in the dictionary.
CREATE is the basic word that does this; it may be used by: VARIABLE, CONSTANT, and other
defining words to perform the initial functions of setting up the dictionary entry.
17
Chapter 4. Programming Language “Forth”
CREATE <name> Constructs a dictionary entry for name. Execution of name will return the
address of its data space. No data space is allocated for name, however; this must be
done by subsequent actions such as ALLOT.
: <name> Creates a definition for name, called a colon definition. Enter compilation state
and start compiling the definition. The execution behavior of name will be determined
by the previously defined words that follow, which are compiled into the body of the
definition. name cannot be found in the dictionary until the definition is ended. At
execution time, the stack effects of name depend on its behavior.
VARIABLE <name> Defines a single-cell variable. Execution of name will return the address of
its data space.
DEFER <name> Defines name to be an execution variable. When name is executed, the execution
token stored in name’s data area will be retrieved and the behavior associated with that
token will be performed.
VALUE <name> Defines a single-precision data object name whose initial value is x.
2 Indirect-Threaded Code:
This was the original design, and remains the most common method. Pointers to pre-
viously defined words are compiled into the execution word’s parameter field. The code
file of the execution word contains a pointer to machine code for an address interpreter,
which sequentially executes those definitions by performing indirect jumps through the
instruction pointer, which is used to keep its place. When a definition calls another def-
inition, the current instruction pointer is pushed onto the return stack; when the called
definition is finished, the saved instruction pointer is popped off of the return stack.
2 Direct-Threaded Code:
In this model, the code field contains the actual machine code for the address interpreter,
instead of a pointer to it. This is somewhat faster, but typically costs extra bytes for
some classes of words. It is most prevalent on 32-bit systems.
2 Subroutine-Threaded Code:
In this model, the compiler places a jump-to-subroutine instruction with the destination
18
Chapter 4. Programming Language “Forth”
address in-line. On 16-bit systems, this technique costs extra bytes for each compiled
reference. It is often slower than direct-threaded code, but it is an enabling technique to
allow the progression to native code generation.
2 Token Threading:
This technique compiles to other words by using a token, such as an index into a table,
which is more compact than an absolute address. Such an implementation equalizes to
an indirect-threaded model.
4.4 Requirements
Various Forth system exists on the market. They differ in threading, design, implementation,
used programming language, and complexity. To choose an appropriate Forth system for the
prototype implementations it is necessary to define some requirements first.
1. The Forth system should use indirect-threading. Sure, indirect-threading is less efficient
as direct-threading, but it is easier to debug, because in indirect-thread implementations
the code field can support non-primitives like it is done for variables. Also a reason is,
that dictionary entries contain no machine code for primitives.
2. C should used as programming language for the Forth system. A Forth system could
also easily be implemented in Assembler, but Assembler code is harder to maintain than
C code and languages like C++ or Java still means to much overhead for firmware
development compared with a pure C implementation.
3. The design should be simple and implementations of own extensions must be possible.
4. It should be a full ANS Forth compliant implementation and must be distributed under
an open-source license (e.g. GPL or BSD).
Gforth is a fast and portable implementation of the ANS Forth language. It offers some nice
features such as input completion and history, backtraces, a decompiler and support for local
19
Chapter 4. Programming Language “Forth”
The Portable Forth Environment (PFE) is based on the ANSI Standard for Forth. The PFE
has been created by Dirk-Uwe Zoller and had been maintained up to the 0.9.x versions (1993-
1995). Tektronix has adopted the PFE package in 1998 and made a number of extensions.
It is now fully multithreaded and it features a module system. It is possible to load addi-
tional C objects at runtime to extend the Forth dictionary. It is best targeted for embedded
environments since terminal driver and the initialization routines could be easily changed.
Ficl
Paflof
Paflof is a full ANS Forth compliant Forth system and is portable to nearly every system. It
has been created by Segher Boessenkool and is distributed unter the BSD license. The current
implementation of the virtual machine is very clean and small. It fits uncompressed into less
than 40k flash memory. Paflof needs perl to create the initial dictionary and preferably a C99
compliant compiler which supports the restrict keyword and C++ style comments. It can
also run hosted in the user space of a UNIX style operating system. It is extensible, too –
primitives to read / write processors register, etc. could be easily implemented. This behavior
makes Paflof an ideal base for a slimline, Open Firmware based, boot firmware implementation.
20
Chapter 4. Programming Language “Forth”
included. In addition to this Forth has found its way. Forth isn’t a new language. It is been
commercially available for over 25 years and has its own ANSI standard. But it is not widely
used. There are probably less than a hundred full-time Forth programmers in the country.
But the programming language Forth isn’t out of date, because of the following advantages:
1. Forth remains one of the few environments which is totally comprehensible by one person.
This is a big plus for developers who works in safety-critical systems.
2. Forth makes the best out of a slow microprocessors with little RAM. Embedded systems
mostly include such a processor without haven 16 MB RAM and hard disk support. In
such scenarios Forth could be an appropriate solution.
4. Forth is an extensible programming language. This means that if the language doesn’t
support some features or capabilities which are necessary, it is easily possible to add
them – not as subroutines, but as a part of the programming language itself.
Because of all these benefits, Forth is not only used in Open Firmware. The NASA God-
dard Space Flight Center uses it for spacecraft flight system controllers, on-board payload
experiment controllers, ground support systems (e.g., communications controllers and data
processing systems), and to test flight and ground systems1 . Furthermore Forth is used in a
portable assistive and therapeutic communication device for people with aphasia, which was
developed by the Rehab R&D Center2 or in a computer-controlled electromechanical finger-
spelling hand to offers deaf-blind individuals access to computers, communication devices, or
person-to-person conversations3 . These areshort example where all over Forth is still in use.
It is a programming language which is still alive and quite a good environment not only for
embedded systems.
1
http://forth.gsfc.nasa.gov/
2
http://guide.stanford.edu/Projects/CommlProd.html
3
http://guide.stanford.edu/TTran/ttralph.html
21
Chapter 5
Like every program, the Linux kernel must pass a load and initializing phase, before the real
jobs can be done. While this first phase is in normal applications quite unspectacular, the
kernel gets confronted as central layer with some exceptional problems. The boot phases self
could be forked in three different sections:
2 Loading of the kernel into RAM and draw up of a minimal runtime environment.
2 Jump into the platform dependent machine code of the kernel to do system specific
initializing of all element functions.
2 Jump into the platform independent initializing code, which does complete initializing
of all subsystems that is followed by a changeover to normal operation.
For firmware development, the first and second phases is important, because the kernel com-
municates in this layer with the firmware or processes firmware data. The concentration on
these layers is needed to get a better understanding of Open Firmware, firmware services, and
the later following firmware concepts.
22
Chapter 5. Linux/PPC64 Boot Procedure
processing. The second data structure is the naca (node address communications area). This
data structure is used to hold system wide informations like the number of processors in a
system or a partition, the size of real memory available to the kernel, and cache characteris-
tics. In addition, this data structure also contains one field to point to a data area used by
the hypervisor to transfer system configuration data to the kernel.
The early phases of boot and initialization differ between pSeries and iSeries platforms. For
the implementation of a slimline boot firmware only the pSeries kernel is interesting, because
this kernel is closer to a hypervisor less implementation as the iSeries kernel. First, the pSeries
kernel is loaded by a bootloader (e.g. Yaboot or directly Open Firmware) into a contiguous
block of real memory and gets control with relocation disabled. Initialization code in the kernel
interacts with Open Firmware to accomplish the following tasks:
1. Determine the system configuration (e.g. real memory and device tree).
3. Move secondary processors from spinning in Open Firmware to spinning in a kernel loop.
This initialization code then relocates the kernel to the real address 0x0, creates a kernel stack,
the TOC, builds initial hardware page tables and segment page tables, and initializations the
naca pointers. The naca is always located at a fixed real address (0x4000) in order to facilitate
debug. Table 5.1 shows the complete physical memory layout. Finally, relocation is enabled
and the common pSeries and iSeries code is executed.
23
Chapter 5. Linux/PPC64 Boot Procedure
bare metal Linux/PPC64 implementation. The file heads.S contains the low-level support
and setup for Linux/PPC64 platforms, including trap and interrupt dispatch. Via entering
this code the following assumptions where taken:
The genesis of the Linux/PPC64 kernel is start. For pSeries platforms a branch to la-
bel start initialization pSeries is followed. This code fragment saves as first task all
parameters (client interface handler and client program arguments) which where given from
Open Firmware to a client program. Then 64-bit mode is enabled and a relocation offset is put
in r3. This relocation offset is necessary, because the PPC64 Linux kernel is not running at its
target address (KERNELBASE), due to in the low address region Open Firmware still takes place.
The Linux/PPC64 kernel needs communication with and informations from Open Firmware.
It must share during this time one memory space with it. As next the function prom init
is executed. This function does all interaction with the Open Firmware client interface (see
Chapter 5.3 for more information’s).
After the Open Firmware communication is done, a branch to label 970 cpu preinit is
done which setups some critical PowerPC 970 SPRs, before the MMU is switched off. Now
the Linux kernel is copied from its current address, where it is running, to its target ad-
dress at KERNELBASE. This is done in two transactions with the functions copy and flush and
copy to here. This procedure overwrites the Open Firmware exception vectors and the main
kernel code begins with execution. The segment table (stab initialize) and the hashed page
table (htab initialize) are initialized to get an initial memory mapping. Both functions need
an initialized systemcfg and naca pointer. By now the kernel branches to start here common,
who converges execution for all platforms and setups the initialized systemcfg and paca
pointer. Last of all, setup system is exeuted (common boot and setup code), followed by
start kernel.
Herewith, the end of the platform dependent initialization is reached. start kernel conducts
as dispatcher function and executes platform dependent and independent code. This function
calls mainly all high-level initializing routines for all subsystems and prints as first job the Linux
startup banner. The boot process is not longer described, because the kernel has overwritten
Open Firmware and is in the high-level initialization phases.
24
Chapter 5. Linux/PPC64 Boot Procedure
25
Chapter 5. Linux/PPC64 Boot Procedure
prom initialize naca. If the kernel is running on a SMP (Symmetric Multi-Processors) ma-
chine, it is needful to do some extra handling for the further processors. For example: Open
Firmware and the complete low-level initialization of the kernel is done on only one CPU.
If a machine has two CPUs, the second CPU is hanging in a slave loop for so long as it is
freed on request of the first one. The Linux kernel gets the control of the second CPU, which
is spinning in Open Firmware, with the function prom hold cpus. This function executes a
client interface call and tells Open Firmware that the further CPUs should stop spinning in
Open Firmware and should go further with execution at the give address. In the case of the
Linux kernel this address is the location of a second slave loop in kernel space. That means
that further CPUs are freed from the Open Firmware slave loop and placed again into a slave
loop, which is under the control of the Linux kernel. The last job of the Linux kernel is to copy
the whole device tree (copy device tree). After everything is done, the function prom init
returns and the Open Firmware client interface services could not used anymore.
Figure 5.2: Call Tree for prom init; API functions like prom print, prom print nl, prom print hex,
call prom, prom panic, etc. are not included.
26
Chapter 6
This chapter describes a first concept of a slimline boot firmware which is based on Open
Firmware. By this developable blueprint, which is partial in design, the attention is put on
packages and its functionality. The target is to show what packages are needed in low-level
firmware and Open Firmware. The concept has thought given to introduced technologies
in Chapter 2, 3 and 4 and to the function of the Linux/PPC64 boot process, illustrated in
Chapter 5. The structure of the slimline prototype firmware stacks looks similar to Open
Firmware, see Section 3.1. The first level builds the low-level firmware, followed by Open
Firmware and at last, the operating system on top. Components like RTAS or a hypervisor
are not included and implemented, because they blow up the size and complexity of the whole
firmware stack. To get hardware abstraction, the prototype uses “Agnostic Device Driver”.
This new technology is described in detail in Chapter 7 and prevents synchronous and high-
latency call-paths. Furthermore, this technology enables new ways of packaging firmware code.
The prototype uses the Forth engine “Paflof” as basis and as execution environment for Open
Firmware. This Forth engine is described in Section 4.5.
27
Chapter 6. Slimline Prototype Firmware
low-level firmware does all jobs, which are needed to start the basis of Open Firmware – the
Forth engine.
As main job, the low-level firmware brings up the machine to such a consistent state, that a
Forth engine could be loaded and executed. As additional task, the low-level firmware must
check if functional impairments exists and must handle these failures in a proper way. The
first task is to take sure that only one CPU is going further with execution. All residual CPUs
must be placed in a loop until they are freed. If this done, the next task initializes the serial
28
Chapter 6. Slimline Prototype Firmware
port to print checkpoint and debug information or error codes. The serial port is normally
the only way to print information in such an early state, because the handling of it is quite
easy and is fast implemented. The bootstrap component can now load code into the L2-cache
and execute it from there. This code configures and initializes the memory or tests it for bad
memory regions. If everything went well, it copies the rest of the execution code into memory
and goes on with further execution. The low-level firmware setups now GPIOs and I2C buses
and devices. Finally, it establishes an interface which can encapsulate intellectual properties
or system services. This is for example necessary to liberate all looping processors. During
the whole execution time, the low-level firmware uses auxiliary components to read and write
from the PCI bus, to handle the SPU, to talk to a watchdog, or the read and write processor
registers.
29
Chapter 6. Slimline Prototype Firmware
Serial Port
Component: Serial Port
Description:
The serial port package is needed to print and get information’s over the serial port.
During development it necessary, because it is the only way to print error information’s.
Later on, it is used to print checkpoint information’s in such an early state. Furthermore,
it is also used to run into several startup modes. For example: When “v” is pressed during
startup – the firmware runs into a special verbose mode and can print more information’s
or debug output.
Functions:
serial init
serial write byte
serial write word
serial write long
serial write double
serial write hex
serial write cp
serial write ec
serial write di
serial write nl
Bootstrap
Component: Bootstrap
Description:
At startup it is not possible to use the memory, because the hardware is not properly
initialized or configured. The job of the bootstrap code is to copy firmware into the
processor cache. This code initializes and tests the memory. Later on, this code copies
the rest of the firmware from NVRAM to the memory and begins with the execution of
the whole low-level firmware.
Functions:
copy to cache
execute from cache
copy to memory
execute from memory
30
Chapter 6. Slimline Prototype Firmware
Memory
Component: Memory
Description:
Some helper functions to initialize, read, write, and test the memory are implemented in
this component. Here, it is possible to run different test and error patterns to check if
some regions of the memory are defect.
Functions:
mem configure
mem init
mem test
write 8
write 16
write 16 le
write 32
write 32 le
write 64
read 8
read 16
read 16 le
read 32
read 32 le
read 64
I/O
Component: I/O
Description:
The I/O package setups GPIO and rudimentary input and output devices.
Functions:
io setup
I2C
Component: I2C
Description:
To send and receive message from the I2C bus it is necessary to implement some functions.
This package initializes all I2C buses and devices. It also implements core functions to
send and receive messages.
Functions:
i2c init
i2c send
i2c recv
31
Chapter 6. Slimline Prototype Firmware
IP Interface
Component: IP Interface
Description:
The IP interface can hide intellectual property from Open Firmware programmers. For
example: Special protocols to talk with the service processors are implemented in this
package.
Exception Handling
Component: Exception Handling
Description:
The exception package includes all exception handler of the low-level firmware.
PCI
Component: PCI
Description:
To read and write from the PCI bus, it is necessary to implement functionality which
initializes the PCI bus. This package is also useful to read and write byte or words in big
and little endian format. Furthermore, it could include code to walk over the PCI bus or
probe all PCI devices.
Functions:
pci write 8
pci write 16
pci write 32
pci write 64
pci read 8
pci read 16
pci read 32
pci read 64
32
Chapter 6. Slimline Prototype Firmware
SPU
Component: SPU
Description:
The SPU package includes all code which does for example power management over the
service processor. It is necessary to talk quite often with an service processor for different
purposes. The protocol for this communication is stored here and could be also used for
enabling and disabling the watchdog
Functions:
enable watchdog
disable watchdog
reboot
halt
suspend
manage spu
Open Firmware is not directly executed by the low-level firmware. The low-level firmware
actually loads and executes a small wrapper component. This wrapper copies all needed
exception vectors and the Forth engine to a specified address in the memory. If everything
was copied well, the wrapper begins with execution of the Forth engine. The wrapper script
is needed, because start addresses or the interfaces could differ in every low-level firmware.
After that, the Forth engine begins to execute Forth code which implements the most Open
Firmware functionality. The first task of this Forth code is to initialize the serial port or the
frame buffer device to get an output possibility. This code can also set the serial port as input
device. With this option, the Forth engine could be programmed or used interactively, over
the serial port to debug errors or to setup Open Firmware environment variables. The device
interface component includes also code to build the device tree, acquire the boot mode, and
start the boot process of a client program. The device tree is created in two stages. The
first stage executes code which inserts hard-coded information into the device tree and the
33
Chapter 6. Slimline Prototype Firmware
second stage executes code which inserts the information dynamically. For example: To get
this information the whole PCI bus can be probed and every found device can be integrated
into device tree with its properties. Furthermore, the device interface can execute a FCode
programs which sits on the PCI device itself. This code can identify the device or includes
information into the device tree. As next, the device interface can load an ELF image from
network or hard disk. To realize this functionality, additional packages are used, as show in
Figure 6.2. The last job is to load this ELF image by an ELF loader and executes it. This
ELF image could be the Linux kernel. Now, the client program can communicate with Open
Firmware over the client interface to get the device tree, etc. If this is done completely, the
client program overwrites Open Firmware and gets complete control of the machine.
Forth Engine
Component: Forth Engine
Description:
This component builds the skeletal structure of Open Firmware – the Forth system.
Without such a package, no Forth code could be interpreted, compiled, and executed.
34
Chapter 6. Slimline Prototype Firmware
Serial Port
Component: Serial Port
Description:
Like the serial port component in the low-level firmware, this package is needed to print
and get information’s over the serial port. The only differenct is, that this package
initializes the serial port in a more effective way and sets it as standard input and output
device.
Functions:
>serial
serial!
serial@
serial-emit
serial-key
serial-init
serial-fini
Frame Buffer
Component: Frame Buffer
Description:
The frame buffer component could be used to print information not only over the serial
port. With this component it is possible to print information’s over the graphic card,
too.
Functions:
>fb
fb!
fb@
fb-emit
fb-key
fb-init
fb-fini
Additional Data:
Device Interface
The device interface allows Open Firmware to identify and use plug-in devices. The interface
is based on a byte-coded programming language known as FCode. The FCode language is
evaluated by a Open Firmware component known as the FCode evaluator. The Open Firmware
device interface specifies the behavior of a firmware system so that, when compliant devices
are added to a computer system whose firmware is compliant, the firmware may determine the
characteristics of those devices and may use them for various purposes, such as text display
and program loading. A standard FCode evaluator provides a defined environment for the
35
Chapter 6. Slimline Prototype Firmware
IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,
1994, See esp. Chap. 5, “Device Interface”, p. 45.
User Interface
The user interface allows a person to use Open Firmware services for such purposes as configu-
ration management and debugging of hardware, software, and firmware. The interface consists
of facilities for keyboard input, line editing, display output, and an evaluator (the Forth com-
mand interpreter) for the Forth programming language. It also specifies the behavior of a
firmware system so that a human may interact with it for such purposes as configuration
management, control of the booting process, and the debugging of hardware, client programs,
device drivers, and the firmware itself. A standard command interpreter accepts and executes
commands, typically entered interactively by a human, according to define command editing,
syntax, and semantic rules. A standard command intepreter is typically a component of the
boot firmware associated with a CPU board. A command group is a set of commands with
defined behaviors, the group as a whole providing some particular capability (for example,
one group of commands is concerned with client program debugging). Each command in the
group may be executed via a standard command interpreter. A standard program is a program,
written in the language defined by the specification of the standard command interpreter in
conjunction with the specification of one or more command groups, that obeys prescribed rules
for program structure and usage. Consequently, its behavior is predictable when executed by
a standard command interpreter. A standard program is typically either entered interactively
by a human, downloaded from some storage device, or stored within the script.
IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,
1994, See esp. Chap. 7, “User Interface”, p. 71.
Client Interface
The client interface allows client programs (programs that have been loaded and executed
under the control of Open Firmware) to make use of services provided by Open Firmware.
The interface consists of a set of software procedures and a mechanism for calling and passing
36
Chapter 6. Slimline Prototype Firmware
arguments and results to and from those procedures. The Open Firmware client interface
specifies the behavior of a firmware system so that client programs (programs that are loaded
into and execute from RAM) begin their execution with a predictable machine state and may
use various Open Firmware facilities. The client interface consists of both the specification of
the machine environment that exists when the client program begins execution and the set
of services that Open Firmware provides for the program’s use. Client interface services are
those services that Open Firmware provides to client programs, including device tree access,
memory allocation, mapping, console I/O, mass storage and network I/O, and other services.
IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,
1994, See esp. Chap. 6, “Client Interface”, p. 63.
37
Chapter 6. Slimline Prototype Firmware
File Systems
Component: File Systems
Description:
To read and write from different file systems it is necessary to implement this package.
With the file system package it is possible to read a kernel image from Ext2, RaiserFS,
ISO9660, etc. file systems.
Functions:
ext2-open
ext2-close
ext2-read
ext2-seek
iso9660-open
iso9660-close
iso9660-read
iso9660-seek
raiserfs-open
raiserfs-close
raiserfs-read
raiserfs-seek
xfs-open
xfs-close
xfs-read
xfs-seek
Network Protocols
Component: Network Protocols
Description:
To read and write over different network protocols it is necessary to implement this
package. With the file system package it is possible to read a kernel image via TFTP,
BOOTP, etc.
Additional Data:
2 Bill Croft and John Gilmore. Bootstrap Protocol (BOOTP), RFC 951, September
1986.
2 R. Droms. Dynamic Host Configuration Protocol (DHCP), RFC 2131, March 1997.
2 J. Postel and J. Reynolds. Telnet Protocol Specification, RFC 854, Mai 1983.
38
Chapter 6. Slimline Prototype Firmware
IDE / ATA
Component: IDE / ATA
Description:
This package implements a driver for IDE hard drivers.
USB
Component: USB
Description:
This package implements a driver for USB (OHCI, UHCI, etc.).
Ethernet
Component: Ethernet
Description:
This package implements a driver for a Ethernet card to read and write packages via
network.
Functions:
TODO TODO
39
Chapter 6. Slimline Prototype Firmware
PCI
Component: PCI
Description:
The PCI package must include dynamic content into the device tree of Open Firmware.
This content could be get via running a stored Fcode program which sits on the de-
vice itself or doing a PCI bus walk, which fetchtes all stored information in the PCI
configuration space.
Functions:
pci-probe-devices
pci-probe-mf
pci-create-props
pci-class-code2name
pci-class,CCSSPP
pci-class,CCSS
pci-VVVV,DDDD.RR
pci-SSSS,ssss
..
.
pci-VVVV,DDDD
pci-enable-bridge
pci-mf?
pci-bridge?
pci-device?
>config
Additional Data:
2 IEEE Std 1275-1994. PCI Bus Binding to: IEEE Standard for Boot (Initialization
Configuration) Firmware, Rev. 2.1, August 1998.
40
Chapter 6. Slimline Prototype Firmware
41
Chapter 7
One main goal of this diploma thesis is to introduce a new hardware abstraction mechanism,
which is fast and flexible in case of packaging. The chapter shows all advantages und drawbacks
of existing technologies and what kind of problems afflicted with it. As result, an executable
prototype is introduced with detailed description of all components and its functionality. This
new approach runs currently on Linux, but could easily adapt to every existing and new
boot firmware or operating system. Agnostic Device Drivers (ADD) is a technology how
binary program code can be integrated into the device tree of Open Firmware, which is later
executed in the kernel of the running operating system. ADD typically control devices like I2C
and GPIO. Preferably this code is very similar to Open Firmware Code (FCode) to leverage
existing tools and experiences. In a system with a service processor (SPU), functionality of
these services can be implemented as protocol or wrapper to the SPU itself. Agnosticness is
reached by running this interpreted code directly in the operating system. This functionality
prevents synchronous and high-latency call-paths.
7.1 Motivation
At the moment two hardware abstraction possibilities exist, which are based on Open Firmware.
Run-Time Abstraction Services are specified in the Common Hardware Reference Platform and
in the RISC Platform Architecture (see Section 3.2 and 3.3 for detailed information). RTAS
is packaged with the firmware code and stored in the NVRAM of the current system. The
operating system initiates RTAS over Open Firmware during the boot time. Because of this,
RTAS can only be packaged as firmware code and changes in RTAS means also rewrite the
current firmware in NVRAM with the new version. RTAS implements several calls which can
later used by the operating system. This call does mostly power management, reading from
and writing to NVRAM or PCI configuration space, and time management. The problem with
these calls is that RTAS is designed as a synchronous interface. When an operating system
does such a RTAS call, it must wait until the call is completed. Modern computer uses a
42
Chapter 7. Agnostic Device Drivers
difficult thermal calibration system with more than twenty sensors, fans, and sometimes liquid
cooling in it. The algorithms for such systems are quite complex and hard to program. To
implement this functionality with RTAS is not possible, because of the synchronous interface
and the fact that a thermal calibration system must called quite often and repetitive. This
will slow down the operating system. The other problem is that the algorithm must be im-
plemented in the kernel of the operating system itself. RTAS can only get the values from the
sensors or switch fans on. The complete policy and logic can not be done with RTAS. Apple
uses an own hardware abstraction concept with an asynchronous interface. This technology
is called Platform Expert and described in Section 2.1. Platform Expert has no high-latency
call-paths, like RTAS. The main issue with Platform Expert is that it uses three different com-
ponents. These components are packaged with the firmware code and the operating system.
When a new machine comes out – it could be possible that Platform Expert Data, the Plat-
form Expert, and the Platform Expert with the machine dependent part will change. This
means changes in the firmware and the operating system. Platform Expert has like RTAS
the drawback, that the policy and the logic of driver programs must be integrated into the
operating system. Furthermore, Platform Expert was designed for Mac OS X. This means it
is grew together with Mac OS X and can not be used in Linux. Table 7.1 shows a comparison
of RTAS and Platform Expert.
The target of Agnostic Device Drivers is to have an interface without high-latency call-paths,
flexible packaging mechanisms, and good porting features for a new operating system.
43
Chapter 7. Agnostic Device Drivers
The ADD byte-code program is stored in the device tree of Open Firmware. During the boot
phases of the operating system, this byte-code program can be fetched via the client interface
and can be used directly and asynchronously in an evaluator, which runs in the kernel of the
operating system. The structure of the device tree and the functionality of the client interface
are specified in IEEE 1275-1994, Standard for Boot (Initialization Configuration) Firmware,
Core Requirements and Practices.
The key benefit of this behavior is to give firmware and hardware vendors the freedom to
implement functions, which are later executed by the operating system in an effective and fast
way. The operating system does not have to know all the details of the hardware, so power-
handling, I2C and GPIO tasks could be easily implemented. With this flexible functionality the
complete RTAS interface could be replaced, so we can keep proper distance of slow synchronous
and high-latency call-paths.
1. The byte-code program could be placed in the device tree of Open Firmware. In this
option, the byte-code program is packaged with the firmware code and stored in the
NVRAM of the machine.
2. This option implements an interface to copy byte-code programs to the ADD virtual
44
Chapter 7. Agnostic Device Drivers
machine, which is running in the kernel of the operating system. This transaction is
done during run-time of the operating system. Furthermore, a developer can create and
test byte-code programs without having a long development cycle. Permanents reboots
of the machine are not necessary.
Finally, the biggest advantage is to have high-level language facilities, which gives the option
to implement the logic or policy directly in the ADD byte-code programs. It is possible to use
control structures (loops, branches, etc.) and defining words in byte-code programs. Defining
words are special functions to create and establish the usage of constants, variables, and sub-
routines. With these functions the Agnostic Device Driver concept has not the same problem
like the existing hardware abstraction mechanisms.
45
Chapter 7. Agnostic Device Drivers
typically its own instruction set, that is used for the execution environments. This instruction
set is independent of the architecture of the operating system or the host hardware. Virtual
machines have often its own memory subsystem and controls or limits access with the virtual
machine’s native function interface. The design and implementation of a virtual machine is
influenced by factors like size, portability, performance, memory consumption, and security.
At the moment, four main design strategies exits how a virtual machine could be implemented.
These design strategies are:
2 Interpreted:
Embedded Devices uses often interpreted virtual machines. Such an interpreted virtual
machine is fast and easy implemented or ported to new hardware. On the other hand, it
has poor performance, because it executes one byte code at a time. This implementation
strategy is the worst of all possibilities in case of performance.
2 Just-In-Time:
This implementation has the advantage of knowing the hardware, which makes it more
complex to implement it. The performance is far above interpreters (with a pause up
front), because this virtual machine has an immediately prior to execute a program –
it compiles it for the corresponding architecture. A better name for this technology is
“better-late-than-never compiler”.
2 Hotspot:
Hotspot works by analyzing code as it runs, finding the hotspots. It halts program ex-
ecution to take time and optimize those pieces. A virtual machine that uses hotspot is
best suited for long running applications. Doing micro benchmarks on such an imple-
mentation is not representation.
2 Hybrid:
Hotspot is the flagship, but other JIT’s do this in some degree. JIT compile only code
that will run a lot and has no wasting time for JIT’ing initialization code. This virtual
machine has the best overall performance, but also the most complex design.
The goal for the ADD virtual machine is to get a small footprint virtual machine for the
operating system to control resource constrained devices. It should be easy to understand
and maintain. Beside this, it should be small without sacrificing features of programming
drivers for I2C or GPIO devices. Dynamic compilation or other performance techniques are
46
Chapter 7. Agnostic Device Drivers
not necessary, but it should run in Linux kernel. Finally, a interpreted virtual machine will
meet all these requirements.
7.5.1 Front-End
The front-end component handles the loading of byte-code into the virtual machine. The ADD
virtual machine has support for two possibilities how to load the byte-code. The byte-code
could be loaded from the device tree which was fetched during the boot process or later during
the run-time of the operating system. To load byte-code programs from the device tree, the
front-end walk through the complete device tree and searches for ADD byte-code programs.
If a program was found, the front-end looks if additional properties for the byte-code program
exist in the device tree and pass it to the virtual machine. The virtual machine can use these
properties to setup the environment or to control the program execution. When a program is
loaded during the run-time of the operating system, program byte-code must be loaded from
user space into the kernel space. In case of the operating system Linux, this is done by a
character device driver which copies the ADD byte-code into kernel-space, so that the virtual
machine can execute it. The advantage of this scenario is, that a developer can program and
test fast ADD programs without to restart the operating system or the machine with new
firmware code.
Programs are test on validity by a byte-code verifier. The byte-code verifier of the virtual
machine reads the header of an ADD programs and checks if it is valid. The header includes
the checksum and the length of the corresponding program. Calculated is the checksum by
using two’s complement addition and ignoring overflow. The program length is the quadlet
47
Chapter 7. Agnostic Device Drivers
size number of bytes in the program, including the body and the header. These two values
must also be calculated by the byte-code verifier and checked against the checksum in the
program header.
In a token threaded virtual machine, each execution token is an offset into a table of code
fields. The inner interpreter fetches the execution token pointed by the instruction pointer
and indexes into the token table, where it fetches the code field address of the word. Parameter
fields are used in various ways, depending upon the type of the entry in the token table. The
inner interpreter executes colon definitions which are implemented as primitive and handles
control structures, constants and variables. One reason for a threaded approach is that all of
the altered bindings are conveniently contained in a single table. Each task can be provided
with a save buffer to hold its version of the altered token table. To perform a context switch, it
is possible to merely copy the token table off to the old task’s token save buffer, and copy the
new task’s save buffer into the token table. If access to the source code for the inner interpreter
is available, the inner interpreter finds the token table by an active token table pointer. This
eliminates all the copying and provides a context switch that only requires to swap pointers to
the token table rather than swap the contents of the table. The outer interpreter of the virtual
machines parses all byte-codes which are not implemented as primitive. This is necessary
for the byte-code programs itself and for byte-code build-in functions. Such functions are
implemented in the virtual machine as byte-code and not in C or Assembler. If the outer
interpreter gets such a byte-code, it saves the return address on the data stack and executes
the corresponding primitive function via the inner interpreter. If the corresponding byte-code
is not a primitive, it must do this step until it gets a byte-code which is implemented as
primitiv.
As virtual stack machine, the ADD virtual machine implements two data stacks. The data
stack is used to hold numeric operands. When a number is pushed onto or popped off the
stack, the remaining numbers are not moved. Instead, a pointer is adjusted to indicate the
last used position in a stack memory array. The top-of-stack pointer is kept in a register. The
standard token-table in the ADD virtual machine provides words for simple manipulation of
operands on the stack: SWAP, DUP, DROP, 2SWAP, etc. (see Appendix B for all Functions). In
general, the data stack is used to pass parameter to colon definitions or give back return codes.
The ADD virtual machine also implements a return stack. Like the data stack, the return
stack is also a LIFO list. It is mostly used for system functions of the virtual machine, but may
also be accessed directly by an ADD byte-code program. The return stack serves purposes
48
Chapter 7. Agnostic Device Drivers
like holding return addresses for nested definitions and loop parameters. Because the return
stack has multiple uses, care must be exercised to avoid conflicts when accessing it directly.
7.5.5 Token-Tables
The ADD virtual machine uses three different token tables. Every token-table has its own
byte-code and application range (see Table 7.2).
2 Build-In Token-Table:
This table includes all functionality which is hard integrated into the virtual machine.
It is not possible for an ADD program to change this table – only reading from it is
allowed. Build-In Byte-Codes are implemented in the C (the mother language of the
virtual machine) or in the ADD byte-code language itself.
2 Vendor Token-Table:
This table includes all functions that interact with the back-end. Functions like reading
and writing to the I2C bus are implemented in this table. Furthermore, the vendor token-
table includes functions with real-time and performance requirements. No function in
this table is implemented in the byte-code language of the virtual machine and always
in its mother language.
2 Local Token-Table:
When an ADD program uses own variables, constants, or colon definitions a new entry
in this table is created with the corresponding token value. This table is for an ADD
program readable and writeable and differs between every ADD programs. In case of a
multi-tasking virtual machine, every thread must have its own local token-table.
49
Chapter 7. Agnostic Device Drivers
2 Constant-Doer:
The value for a constant number is stored in the parameter field of the token-table entry.
By executing a variable word, this doer takes the value from the parameter field and
puts in onto the data stack.
1 /**
2 * Doer code to handle constants in ADD byte - code programs .
3 */
4 void
5 add_do_con ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )
6 {
7 # ifdef __DEBUG__
8 printk ("{% s }" , ttp [ fcode ]. name ) ;
9 # endif
10 add_check_stack (0 , 1) ;
11 {
12 cell tmp ;
13 tmp . u = ttp [ fcode ]. parameter ;
14 add_push ( tmp ) ;
15 }
16 return ;
17 }
Listing 7.1: Constant-Doer
2 Variable-Doer:
When a variable is created, the virtual machine allocates the needed memory and stores
the address of this memory region in the parameter field of its token-table entry. By
executing a variable word, this doer takes the address from the parameter field and puts
in onto the data stack.
1 /**
2 * Doer code to handle variables in ADD byte - code programs .
3 */
4 void
5 add_do_var ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )
6 {
7 # ifdef __DEBUG__
8 printk ("{% s }" , ttp [ fcode ]. name ) ;
9 # endif
10 add_check_stack (0 , 1) ;
11 {
12 cell tmp ;
13 tmp . u = ( type_l ) (&( ttp [ fcode ]. parameter ) ) ;
14 add_push ( tmp ) ;
15 }
16 return ;
17 }
Listing 7.2: Variable-Doer
50
Chapter 7. Agnostic Device Drivers
2 Colon Definition-Doer:
For colon definition, the virtual machine stores the beginning address of the byte-code for
a colon definition in the code field address (CFA) field of the corresponding token-table
entry. The doer for colon definitions must put the current instruction pointer onto the
return stack and sets it to the address which is stored in the CFA field. After that, the
outer interpreter begins with the execution of the colon definition until it is completed.
Finally, the instruction pointer on the return stack is restored by the inner interpreter.
1 /**
2 * Doer code to handle colon definitions in ADD byte - code
3 * programs .
4 */
5 void
6 add_do_col ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )
7 {
8 # ifdef __DEBUG__
9 printk ("{% s }" , ttp [ fcode ]. name ) ;
10 # endif
11 add_check_s tack_r (0 , 1) ;
12 {
13 cell tmp ;
14 tmp . u = ( type_u ) * ip ;
15 add_push_r ( tmp ) ;
16 }
17 (* ip ) = ttp [ fcode ]. cfa ;
18 return ;
19 }
Listing 7.3: Colon Definition-Doer
7.5.7 Back-End
Every operating system has different functions or calls to use the hardware. To keep it modular,
the back-end builds the interface between the virtual machine and the kernel of the running
operating system. All functions that use the back-end are stored in the vendor token-table,
because they are typically implemented as primitive who uses kernel specific functions. It is
also possible to implement own functionality in the back-end and in the vendor token table.
The benefit of programming functionality as primitive in the vendor token-table and not in
the byte-code language itself, is that it gives the possibility to optimize the routine itself. This
is necessary when a program code must care on performance or on real-time requirements.
7.6 Byte-Code
A programmer is greatly influenced by the language in which programs are written;
there is an overwhelming tendency to prefer constructions that are simplest in that
language, rather than those that are best for the machine. By understanding a
51
Chapter 7. Agnostic Device Drivers
machine-oriented language, the programmer will tend to use a much more efficient
method; it is much closer to reality.
The ADD byte-code header data type appears only at the beginning of an ADD program
following one of the functions start0, start1, start2, or start4. It contains information
about the ADD program as a whole. That information is provided for the benefit of external
software that may wish to characterize the ADD program. A standard ADD virtual machine
is permitted to skip and ignore the ADD byte-code header information, or to use it to verify
that the ADD program is intact.
The following byte-code formats are used to encode ADD programs. An ADD program consists
out of a sequence of bytes, which are read as byte-code numbers (ADD#). Some ADD#
uses additional bytes for representing the byte-code number. Those functions are recognized
52
Chapter 7. Agnostic Device Drivers
during interpretation of the ADD program. Some byte-codes use arguments to control the
interpretation in the virtual machine or the compilation with a tokenizer.
ADD#
The byte value 0x00 and 0x10 ... 0xFF encodes an ADD# with the size of one byte. Values
with 0x01 ... 0x0F encode two byte ADD#.
ADD-num32
ADD-string
ADD-string encodes a text string. The byte value 0x12 encodes a string where the first byte
(count-byte) is the length of the string (0 to 255), not including the count byte. Subsequent
bytes are the bytes of the string.
53
Chapter 7. Agnostic Device Drivers
ADD-offset
Add-offset encodes an 8-bit signed (two’s complement) offset or a 16-bit signed (two’s comple-
ment offset). An ADD-offset specifies the number of bytes in the ADD program between two
corresponding components of a control flow construct.
54
Chapter 7. Agnostic Device Drivers
Inter-Integrated Circuit (I2C) is a serial computer bus invented by Philips. It is used to con-
nect low-speed peripherals in an embedded system or motherboard. The original system was
created in the early 1980s as a battery control interface, but it was later used as a simple
internal bus system for building control electronics with various Philips chips. I2C uses only
two bi-directional pins, clock and data, both running at +5V and pulled high with resistors.
The bus operates in several modes, the most common being the 100 kbit/s standard mode and
a 10 kbit/s low-speed mode. Clock frequencies down to zero are allowed. Buses of this type
became popular when engineers realized that much of the expense of an integrated circuit re-
sults from the size of the package and the number of pins. A large package has more pins, thus
more assembly steps when manufactured, more area on a printed circuit board, more weight,
and more connections to fail. All of those cost money to make, assemble and test, and can
increase operational expenses (fuel), or decrease convenience (weight is critical in cell-phones,
for example). A particular strength of I2C is that a microcontroller can control a network of
devices chips with just two general-purpose I/O pins and software. Over 1000 master and/or
slave devices (depending on the mode used) can co-exist on the same two line bus.
Although much slower than most bus systems, the low expense is excellent for peripherals
that have to exist, but need not to be fast. The bus is often used for built-in-tests, volume,
tone and color balance controls, low-speed analog-to-digital and digital-to-analog controllers,
real-time-clocks, small non-volatile memories (used to preserve user-settable options), control
of clock-generators (for computers that can vary their clock speeds) and integrated circuits
that combine a shift-register and power transistors. Chips can also be added or removed from
the bus while the system is running, which makes I2C ideal for environments requiring hot
swappable components. The basic bus has a seven-bit address space, allowing up to 112 nodes
on one bus (16 of the 128 addresses are reserved). In 1992 the first standardized version was
released, version 1.0. This added a new fast mode at 400 kbit/s and a ten-bit addressing mode
to support up to 1024 nodes. The version 2.0 from 1998 added high-speed mode at 3.4 Mbit/s,
while reducing the voltage and current requirements when run in that mode (thus saving power
as well as being faster). The latest version2.1 from 2001 is a minor cleanup of version 2.0. The
System Management Bus or SMBus is similar to the I2C bus, but with differences in clock
frequency range and voltage levels, and an optional extra interrupt-request wire.
55
Chapter 7. Agnostic Device Drivers
I2C is commonly used in embedded systems so different components can communicate. For
example: PC motherboards use I2C to talk to different sensor chips. Those sensors typically
report back fan speeds, processor temperatures and a whole raft of system hardware informa-
tion. The protocol also is used in some RAM chips to report information about the DIMM
itself back to the operating system.
The I2C kernel code is splited into a number of logical components: I2C core1 , I2C adapter
driver, I2C algorithm drivers, and I2C chip drivers:
One main target of Agnostic Device Driver is to control I2C chip devices. The ADD virtual
machine borrows and uses functionality of the operating system, where it is running in. This
means, that ADD needs a binding from its own I2C functions to the I2C functions of the op-
erating system. In case of Linux, the best option was to implement the ADD virtual machine
as I2C chip driver. Such an I2C chip driver can have several clients, which controls and talks
to the I2C chip devices.
The i2c driver structure describes an I2C chip driver for the ADD virtual machine. This
structure is defined in the include/linux/i2c.h file. Only the following field are necessary
to create a working chip driver:
1
The I2C core component is not a part of this diploma thesis.
56
Chapter 7. Agnostic Device Drivers
2 struct module *owner; — set to the value THIS MODULE that allows the proper module
reference counting.
2 char name[I2C NAME SIZE]; — set to a descriptive name of the I2C chip driver. This
value shows up in the sysfs file name created for every I2C chip device.
2 unsigned int flags; — set to the value I2C DF NOTIFY in order for the chip driver to
be notified of any new I2C devices loaded after this driver is loaded. This field probably
will go away soon, as almost all drivers set this field.
2 int (*attach adapter)(struct i2c adapter *); — called whenever a new I2C bus
driver is loaded in the system. This function is described in more detail below.
2 int (*detach client)(struct i2c client *); — called when the i2c client device
is to be removed from the system. More information about this function is provided
below.
The following code is from the I2C chip driver of the ADD virtual machine. It shows how the
struct i2c driver structure is set up:
1 struct i2c_driver add_driver = {
2 . owner = THIS_MODULE ,
3 . name = " add " ,
4 . flags = I2C_DF_NOTIFY ,
5 . attach_adapter = add_attach_adapter ,
6 . detach_client = add_detach_client ,
7 };
Listing 7.4: i2c driver structure used for the I2C Chip Driver
After the I2C chip driver is registered in init add init(void) by i2c add driver and the
i2c driver structure as parameter, the attach adapter function is called when an I2C bus
driver is loaded. This function checks normally if any I2C devices are on the I2C bus to
which the client driver wants to attach. Almost all I2C chip drivers call the core I2C function
i2c detect to determine this. The i2c detect function takes a function pointer to the chip
detection routine of the dependent chip driver, which is called if any responsible client is found.
It is not possible in the ADD virtual machine to use this function for the I2C device detection,
because this design was made for sensors and not for usage in a virtual machine. Instead, the
attach adapter function exports the i2c adapter structure to use it in the inner interpreter.
With the fact that the i2c detect function is not usable, the inner interpreter needs this
functionality. The ADD byte-code function i2c-ping realizes this functionality. The normal
way to do an I2C chip driver in ADD byte-code language is the following:
57
Chapter 7. Agnostic Device Drivers
1. Use i2c-ping to detect if a responsible client with the address addr is attached to the
I2C bus.
2. Create a client with i2c-new which is attached to address addr. The function i2c-new
returns a client-handle. This client-handle should be stored in a variable to use it
twice.
4. When the client is not longer necessary, allocated memory can freed with i2c-delete.
But this concept still has space for improvements or further opportunities. Application with
real-time or performance requirements can implemented in C or Assembler and integrated in
the inner interpreter for the vendor token-table functions. The only overhead is that some
cycles for fetching the byte-code and grabbing the corresponding function pointer from the
vendor table are necessary. The virtual machine executes only one program via getting the
byte-code as pointer. If the need exist that the virtual machines must be able to run threads,
some extra source code must added which saves the current instruction pointer and the pointer
of the currently used token-table. Also there is some need to save or not to overwrite the local
token-table, because this table differs in every program. If thread support is integrated, the
virtual machine will also need some options to control scheduling. It is possible that a byte-
code program can use properties (for the virtual machine) in the device tree. These properties
58
Chapter 7. Agnostic Device Drivers
can control the scheduling. For example: These properties can control that the byte-code
program is only executed during start-up of the operating system to initialize or deactivate
hardware components. The properties can also tell the virtual machine that the byte-code
programs want to be executed every ten seconds with a high or low priority. Writing driver
could take some time, especially for temperature control algorithms. If the virtual machine is
not only placed in the kernel of the operating system but also in Open Firmware, a firmware
programmer can make use of ADD byte-code programs, too.
59
Chapter 8
Conclusions
At the time there are fighting several companies for pushing its firmware specification to a
level where it is taken as a pseudo standard and is in common usage. This step is logical,
because the firmware builds the interface between the hardware and the operating system.
The company which has the main control of a firmware standard can guide which hardware
and operating system is taken or how a motherboard layout looks. This is one reason why
Intel wants to see the Extensible Firmware Interface on almost every computer system.
A major problem of Open Firmware is that the working group sunk into hibernation. Sup-
plements for new hardware to extend Open Firmware doesn’t exist. Companies that uses
Open Firmware drives apart in case of implementation and strategy. Technologies like Agnos-
tic Device Driver can influence the direction, but it is also necessary to reactivate this Open
Firmware working group, too.
As shown in the last chapter, Agnostic Device Driver is a platform independent concept and
works well on nearly every operating system and hardware. It is fast and easy to understand.
Nevertheless, it is needful to have an open source firmware implementation which can used
by everybody, who is interested in. Such an open source firmware implementation should not
include complex software layes. The motto is: “keep it simple, small, and beautiful.” An open
source community wants to have a piece of software which takes the best out of the hardware.
In the future, the role of boot firmware will increase and it is surly interesting to see how
the things will work. Of course, I will continue to work with the PowerPC architecture, the
Linux/PPC64 Kernel, and certainly will keep an eye on Open Firmware.
60
Appendix A
Glossary
This glossary contains an alphabetical list of terms, phrases, and abbreviations used in this
diploma thesis.
CI Client Interface
DI Device Interface
IU Integer Unit
OF Open Firmware
61
Appendix A. Glossary
UI User Interface
62
Appendix B
63
Appendix B. ADD Byte-Code Functions
64
Appendix B. ADD Byte-Code Functions
65
Appendix B. ADD Byte-Code Functions
66
Appendix B. ADD Byte-Code Functions
67
Bibliography
[1] Adam Agnew, Adam Sulmicki, Ronald Minnich, and William Arbaugh. Flexibility in
ROM: A Stackable Open Source BIOS. In Proceedings of the FREENIX Track: 2003
USENIX Annual Technical Conference, pages 115–124, 2003.
[2] Inc. Apple Computer. Technical Note 1061 – Fundamentals of Open Firmware, Part I:
The User Interface. Apple Developer Documentation, 2004.
[3] Inc. Apple Computer. Technical Note 1062 – Fundamentals of Open Firmware, Part II:
The Device Tree. Apple Developer Documentation, 2004.
[4] Edward K. Conklin and Elizabeth D. Rather. Forth Programmer’s Handbook. FORTH,
Inc., August 2000.
[5] M. Anton Ertl. Threaded Code Variations and Optimizations. In EuroForth 2001 Con-
ference Proceedings, pages 49–55, 2001.
[6] IBM Corporation. PowerPC Architecture Book, Book I: PowerPC User Instruction Set
Architecture, September 2003.
[7] IBM Corporation. PowerPC Architecture Book, Book II: PowerPC Virtual Environment
Architecture, September 2003.
[8] IBM Corporation. PowerPC Architecture Book, Book III: PowerPC Operating Environ-
ment Architecture, September 2003.
[11] IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,
1994.
[12] IEEE Std 1275-1994. PCI Bus Binding to: IEEE Standard for Boot (Initialization Con-
figuration) Firmware, Auf. 1998.
68
Bibliography
[14] Elizabeth D. Rather, Donald R. Colburn, and Charles H. Moore. The Evolution of Forth.
SIGPLAN Not., pages 177–199, 1993.
[15] Jon Stokes. A Brief Look at the IBM PowerPC 970. Ars Technica!, October 2002.
[16] Jon Stokes. Inside the IBM PowerPC 970, Part I: Design Philosophy and Front End. Ars
Technica!, October 2002.
[17] Jon Stokes. Inside the IBM PowerPC 970, Part II: The Execution Core. Ars Technica!,
May 2003.
[18] Antony Stone. The LinuxBIOS project: Putting Linux on your motherboard. Linux
Magazine, pages 76–80, March 2003.
[19] Sun Microsystems, Inc. Writing FCode 3.x Programs, February 2000.
69