Programming Embedded Systems - 2nd Edition With C and GNU Development Tools
Programming Embedded Systems - 2nd Edition With C and GNU Development Tools
Foreword
If you mention the word embedded to most people, they'll assume you're talking about reporters in a war
zone. Few dictionariesincluding the canonical Oxford English Dictionarylink embedded to
computer systems. Yet embedded systems underlie nearly all of the electronic devices used today, from
cell phones to garage door openers to medical instruments. By now, it's nearly impossible to build
anything electronic without adding at least a small microprocessor and associated software.
Vendors produce some nine billion microprocessors every year. Perhaps 100 or 150 million of those go
into PCs. That's only about one percent of the units shipped. The other 99 percent go into embedded
systems; clearly, this stealth business represents the very fabric of our highly technological society.
And use of these technologies will only increase. Solutions to looming environmental problems will
surely rest on the smarter use of resources enabled by embedded systems. One only has to look at the
network of 32-bit processors in Toyota's hybrid Prius to get a glimpse of the future.
Page 1
Preface
First figure out why you want the students to learn the subject and what you want them to know, and the
method will result more or less by common sense.
Richard Feynman
Embedded software is in almost every electronic device in use today. There is software hidden away
inside our watches, DVD players, mobile phones, antilock brakes, and even a few toasters. The military
uses embedded software to guide missiles, detect enemy aircraft, and pilot UAVs. Communication
Page 2
Intended Audience
This is a book about programming embedded systems in C. As such, it assumes that the reader already
has some programming experience and is at least familiar with the syntax of the C language. It also
helps if you have some familiarity with basic data structures, such as linked lists. The book does not
assume that you have a great deal of knowledge about computer hardware, but it does expect that you
are willing to learn a little bit about hardware along the way. This is, after all, a part of the job of an
embedded programmer.
While writing this book, we had two types of readers in mind. The first reader is a beginnermuch as
we were once. He has a background in computer science or engineering and a few years of programming
Page 3
Organization
The book contains 14 chapters and 5 appendixes. The chapters can be divided quite nicely into two
parts. The first part consists of Chapters 1 through 5 and is intended mainly for newcomers to embedded
systems. These chapters should be read in their entirety and in the order that they appear. This will bring
you up to speed quickly and introduce you to the basics of embedded software development. After
completing Chapter 5, you will be ready to develop small pieces of embedded software on your own.
The second part of the book consists of Chapters 6 through 14 and discusses advanced topics that are of
interest to inexperienced and experienced embedded programmers alike. These chapters are mostly selfcontained and can be read in any order. In addition, Chapters 6 through 12 contain example programs
that might be useful to you on a future embedded software project.
Chapter 1, Introduction
Explains the field of embedded programming and lays out the parameters of the book, including
the reference hardware used for examples
Page 4
Chapter 6, Memory
Describes the different types of memory that developers choose for embedded systems and the
issues involved in using each type
Chapter 7, Peripherals
Introduces the notion of a device driver, along with other coding techniques for working with
devices
Chapter 8, Interrupts
Covers this central area of working with peripherals
Page 5
Page 6
Italic
Indicates names of files, programs, methods, and options when they appear in the body of a
paragraph. Italic is also used for emphasis and to introduce new terms.
Constant Width
In examples, indicates the contents of files and the output of commands. In regular text, this style
indicates keywords, functions, variable names, classes, objects, parameters, and other code
snippets.
Page 7
Indicates text to be replaced with user values; for example, a filename on your system. This style
is used in examples only.
This symbol is used to indicate a tip, suggestion, or general note.
Other conventions relate to gender and roles. With respect to gender, we have purposefully used both
"he" and "she" throughout the book. With respect to roles, we have occasionally distinguished between
the tasks of hardware engineers, embedded software engineers, and application programmers. But these
titles refer only to roles played by individual engineers, and it should be noted that it can and often does
happen that a single individual fills more than one of these roles on an embedded-project team.
Page 8
Chapter 1. Introduction
I think there is a world market for maybe five computers.
Thomas Watson, Chairman of IBM, 1943
There is no reason anyone would want a computer in their home.
Ken Olson, President of Digital Equipment Corporation, 1977
One of the more surprising developments of the last few decades has been the ascendance of computers
to a position of prevalence in human affairs. Today there are more computers in our homes and offices
than there are people who live and work in them. Yet many of these computers are not recognized as
such by their users. In this chapter, we'll explain what embedded systems are and where they are found.
We will also introduce the subject of embedded programming and discuss what makes it a unique form
of software programming. We'll explain why we have selected C as the language for this book and
describe the hardware used in the examples.
Page 10
Real-time system design is not simply about speed. Deadlines for real-time systems vary; one deadline
might be in a millisecond, while another is an hour away. The main concern for a real-time system is
that there is a guarantee that the hard deadlines of the system are always met. In order to accomplish this
the system must be predictable.
The architecture of the embedded software, and its interaction with the system hardware, play a key role
in ensuring that real-time systems meet their deadlines. Key software design issues include whether
polling is sufficient or interrupts should be used, and what priorities should be assigned to the various
tasks and interrupts. Additional forethought must go into understanding the worst-case performance
requirements of the specific system activities.
All of the topics and examples presented in this book are applicable to the designers of real-time
systems. The designer of a real-time system must be more diligent in his work. He must guarantee
reliable operation of the software and hardware under all possible conditions. And, to the degree that
human lives depend upon the system's proper execution, this guarantee must be backed by engineering
calculations and descriptive paperwork.
Page 11
Page 12
Figure 1-3. (a) Basic embedded software diagram and (b) a more complex embedded software
diagram
Both the basic embedded software diagram in Figure 1-3(a) and the more complex embedded software
diagram in Figure 1-3(b) contain very similar blocks. The hardware block is common in both diagrams.
The device drivers are embedded software modules that contain the functionality to operate the
individual hardware devices. The reason for the device driver software is to remove the need for the
application to know how to control each piece of hardware. Each individual device driver would
typically need to know only how to control its hardware device. For instance, for a microwave oven,
separate device drivers control the keypad, display, temperature probe, and radiation control.
If more functionality is required, it is sometimes necessary to include additional layers in the embedded
software to assist with this added functionality. In this example, the complex diagram includes a realtime operating system (RTOS) and a networking stack. The RTOS can help the programmer separate the
application's functionality into distinct tasks for better organization of the application software and a
more responsive system. We will investigate the use of an RTOS later in this book. The network stack
Page 13
Processing power
The workload that the main chip can handle. A common way to compare processing power is the
millions of instructions per second (MIPS) rating. If two otherwise similar processors have
ratings of 25 MIPS and 40 MIPS, the latter is said to be the more powerful. However, other
important features of the processor need to be considered. One is the register width, which
typically ranges from 8 to 64 bits. Today's general-purpose computers use 32- and 64-bit
processors exclusively, but embedded systems are still mainly built with less costly 4-, 8-, and
16-bit processors.
Memory
The amount of memory (ROM and RAM) required to hold the executable software and the data
it manipulates. Here the hardware designer must usually make his best estimate up front and be
prepared to increase or decrease the actual amount as the software is being developed. The
amount of memory required can also affect the processor selection. In general, the register width
of a processor establishes the upper limit of the amount of memory it can access (e.g., a 16-bit
address register can address only 64 KB (216 ) memory locations). [*]
Page 14
The narrower the register width, the more likely it is that the processor employs tricks such as
multiple address spaces to support more memory. There are still embedded systems that do the
job with a few hundred bytes. However, several thousand bytes is a more likely minimum, even
on an 8-bit processor.
Number of units
The expected production run. The trade-off between production cost and development cost is
affected most by the number of units expected to be produced and sold. For example, it rarely
makes sense to develop custom hardware components for a low-volume product.
Power consumption
The amount of power used during operation. This is extremely important, especially for batterypowered portable devices. A common metric used to compare the power requirements of
portable devices is mW/MIPS (milliwatts per MIPS); the greater this value, the more power is
required to get work done. Lower power consumption can also lead to other favorable device
characteristics, such as less heat, smaller batteries, less weight, smaller size, and simpler
mechanical design.
Development cost
The cost of the hardware and software design processes, known as nonrecurring engineering
(NRE). This is a fixed, one-time cost, so on some projects, money is no object (usually for highvolume products), whereas on other projects, this is the only accurate measure of system cost
(for the production of a small number of units).
Lifetime
How long the product is expected to stay in use. The required or expected lifetime affects all
sorts of design decisions, from the selection of hardware components to how much system
development and production is allowed to cost. How long must the system continue to function
(on average)? A month, a year, or a decade?
Reliability
Page 15
Low
Medium
High
Processor
4- or 8-bit
16-bit
32- or 64-bit
Memory
< 64 KB
64 KB to 1 MB
> 1 MB
Development cost
< $100,000
$100,000 to $1,000,000
> $1,000,000
Production cost
< $10
$10 to $1,000
> $1,000
Number of units
< 100
100 to 10,000
> 10,000
Power consumption
> 10 mW/MIPS
1 to 10 mW/MIPS
< 1 mW/MIPS
Lifetime
Years
Decades
Reliability
Must be fail-proof
For example, Atari and Nintendo have designed some of their systems this way.
Page 17
Page 18
Hardware knowledge
The embedded software developer must become intimately familiar with the integrated circuits,
the boards and buses, and the attached devices used in order to write solid embedded software
(also called firmware). Embedded developers shouldn't be afraid to dive into the schematics,
grab an oscilloscope probe, and start poking around the circuit to find out what is going on.
Efficient code
Because embedded systems are typically designed with the least powerful and most costeffective processor that meets the performance requirements of the system, embedded software
developers must make every line of code count. The ability to write efficient code is a great
quality to possess as a firmware developer.
Peripheral interfaces
At the lowest level, firmware is very specialized, because each component or circuit has its own
activity to perform and, furthermore, its own way of performing that activity. Embedded
developers need to know how to communicate with the different devices or peripherals in order
to have full control of the devices in the system. Reacting to stimuli from external peripherals is
a large part of embedded software development.
For example, in one microwave oven, the firmware might get the data from a temperature sensor
by reading an 8-bit register in an external analog-to-digital converter; in another system, the data
might be extracted by controlling a serial bus that interfaces to the external sensor circuit via a
single wire.
Robust code
There are expectations that embedded systems will run for years in most cases. This is not a
typical requirement for software applications written for a PC or Mac. Now, there are exceptions.
However, if you had to keep unplugging your microwave in order to get it to heat up your lunch
for the proper amount of time, it would probably be the last time you purchased a product from
that company.
Minimal resources
Page 19
Reusable software
As we mentioned before , code portability or code reusewriting software so that it can be
moved from hardware platform to hardware platformis very useful to aid transition to new
projects. This cannot always be done; we have seen how individual each embedded system is.
Throughout this book, we will look at basic methods to ensure that your embedded code can be
moved more easily from project to project. So if your next project uses an LCD for which you've
previously developed a driver, you can drop in the old code and save some precious time in the
schedule.
Development tools
The tools you will use throughout your career as an embedded developer will vary from
company to company and often from project to project. This means you will need to learn new
tools as you continue in your career. Typically, these tools are not as powerful or as easy to use
as those used in PC software development.
The debugging tools you might come across could vary from a simple LED to a full-blown incircuit emulator (ICE). This requires you, as the firmware developer, and the one responsible for
debugging your code, to be very resourceful and have a bag of techniques you can call upon
when the debug environment is lacking. Throughout the book, we will present different "lowlevel software tools" you can implement with little impact on the hardware design.
Page 20
Page 21
Page 22
we generally expect our compiler to generate the most efficient code possible, whether that
makes the loop counter an 8-, 16-, 32-, or even 64-bit quantity.
As long as the integer is wide enough to hold the maximum value (N, in the example just
shown), we want the processor to be used in the most efficient way. And that's precisely what
the ISO C and C++ standards tell the compiler writer to do: choose the most efficient integer
size that will fulfill the specific request. Because of the variable size of integers on different
processors and the corresponding flexibility of the language standards, the previous code may
result in a 32-bit integer with one compiler but a 16-bit integer with anotherpossibly even
when the very same processor is targeted.
But in many other programming situations, integer size matters. Embedded programming, in
particular, often involves considerable manipulation of integer data of fixed widths.
In hindsight, it sure would've been nice if the authors of the C standard had defined some
standard names and made compiler providers responsible for providing the appropriate
typedef for each fixed-size integer type in a library header file. Alternatively, the C standard
could have specified that each of the types short, int, and long has a standard width on all
platforms; but that might have had an impact on performance, particularly on 8-bit processors
that must implement 16- and 32-bit additions in multi-instruction sequences.
Interestingly, it turns out the 1999 update to the International Organization for
Standardization's (ISO) C standard (also referred to as C99) did just that. The ISO has finally
put the weight of its standard behind a preferred set of names for signed and unsigned fixedsize integer data types. The newly defined type names are:
8-bit: int8_t, uint8_t
16-bit: int16_t, uint16_t
32-bit: int32_t, uint32_t
Page 23
Page 24
The processor on the VIPER-Lite board is the PXA255 XScale processor, which is based on the
ARM v.5TE architecture. The XScale processor was developed by an Intel Corporation embedded
systems division that was sold to Marvell Technology Group in July 2006.
Page 25
If you have access to the reference hardware, you will be able to work through the examples in the book
as they are presented. Otherwise, you will need to port the example code to an embedded platform that
you do have access to. Toward that end, we have made every effort to make the example programs as
portable as possible. However, the reader should bear in mind that the hardware is different in each
embedded system and that some of the examples might be meaningless on hardware different from the
hardware we have chosen here. For example, it wouldn't make sense to port our flash memory driver to a
board that had no flash memory devices.
Although we will get into some basic details about hardware, the main focus of this book is embedded
software. We recommend that you take a look at Designing Embedded Systems by John Catsoulis
(O'Reilly). John has an extensive background on the subject and does a wonderful job presenting often
difficult material in a very understandable way. It makes a great companion for this book.
For example, imagine that you are a software developer on a design team building a print server. You
have just received an early prototype board from the hardware designers. The purpose of the board is to
share a printer among several computers. The hardware reads data from a network connection and sends
that data to a printer for output. The print server must mediate between the computers and decide which
computer from the network gets to send data to the printer. Status information also flows in the opposite
direction to the computers on the network.
Though the purpose of most systems is self-explanatory, the flow of the data might not be. We often find
that a block diagram is helpful in achieving rapid comprehension. If you are lucky, the documentation
provided with your hardware will contain a block diagram. However, you might also find it useful to
create your own block diagram. That way, you can leave out hardware components that are unrelated to
the basic flow of data through the system.
Page 27
In order to get a better idea of how the block diagram relates to the actual hardware on the Arcom board
for our print server device, examine Figure 2-2, which shows the diagram overlaid on top of the Arcom
board. This figure gives you a better idea of the ICs involved in the print server device and how the data
is routed through the actual hardware.
Figure 2-2. Block diagram for the print server on Arcom board
Page 28
We recommend creating a project notebook or binder. Once you've created a block diagram, place it as
the first page in your project notebook. You need it handy so you can refer to it throughout the project.
As you continue working with this piece of hardware, write down everything you learn about it in your
notebook. If you get a useful handout at a meeting, put it into your notebook. Put tabs in there so you
can quickly jump to important information that you refer to all the time. You might also want to keep
notes about the software design and implementation. It is very useful to refer back to your notes to
refresh your memory about why a particular decision was made for the software. A project notebook is
valuable not only while you are developing the software, but also once the project is complete. You will
appreciate the extra effort you put into keeping a notebook in case you need to make changes to your
software, or work with similar hardware, months or years later.
If you still have any big-picture questions after reading the hardware documents, ask a hardware
engineer for some help. If you don't already know the hardware's designer, take a few minutes to
introduce yourself. If you have some time, take him out to lunch, or buy him a beer after work. (You
don't even have to talk about the project the whole time!) We have found that many software engineers
have difficulty communicating with hardware engineers, and vice versa. In embedded systems
development, it is especially important that the hardware and software teams be able to communicate
Page 29
Page 30
You may notice that two symbols are shown for the diode component. The symbol on the right is for a
light emitting diode (LED), which we will take a look at shortly.
The symbols for ground and power can also vary from schematic to schematic; two symbols for power
and ground are included in Figure 2-3. In addition to VCC, the reference designator commonly used for
power is VDD. Since many circuits use multiple voltage levels, you may also come across power
symbols that are labeled with the actual voltage, such as +5 or +3.3, instead of VCC or VDD. The power
and ground symbols are typically placed vertically, as shown, whereas the other symbols in Figure 2-3
might show up in any orientation.
A reference designator is a combination of letters, numbers, or both, which are used to identify
components on a schematic. Reference designators typically include a number to aid in identifying a
specific part in the schematic. For example, three resistors in a schematic might have reference
designators R4, R21, and R428. The reference designators are normally silkscreened (a painted overlay)
on the circuit board for part identification.
Page 31
You will also notice that integrated circuit symbols are not included in this figure. That is because IC
schematic representations vary widely. Typically, a hardware engineer needs to create his own
schematic symbol for the ICs used in the design. It is then up to the hardware engineer to use the clearest
method possible to capture and represent the IC's symbol.
The reference designator for ICs can vary as well. Typically, the letter U is used. The Arcom board
schematic, however, uses the reference designator IC.
IC symbols also include a component type or part number used by the manufacturer. The component
type is often helpful when you need to hunt for the datasheets for the parts of a particular design.
Descriptive text might save you the trouble of deciphering the markings and codes on the top of a
specific IC.
Now that we have an introduction to the symbols used in a schematic, let's take a look at a schematic.
Figure 2-4 is a partial schematic of a fictional board. In this figure, we show some of the connections on
the processor.
Page 32
The italic labels and associated arrows are not part of the original schematic.
These are included to point out particular aspects of the schematic. We wanted to
note this because quite often text notes are included in the schematic for
clarification or layout concerns. One such clarification note in Figure 2-4 is the
label OUTPUT PORT on port PL1.
The processor is the main component in this schematic. The symbol for the processor has a reference
designator IC12, which is located at the top of the symbol on this schematic. The component type of the
processor is PXA255 and is located at the bottom of the symbol.
The processor's pins and their associated pin numbers run along the sides of the symbol. For example,
bit 0 of the data bus is named D0 and is located on pin number 5 of the processor.
You will also notice that some pins, such as P1.1/rts0 pin number 102, have a bar over the pin name.
This indicates that the signal is active low. This means a logic level of 0 will activate the funtionality of
Page 33
Incidentally, "rats nest" is the term used to describe the connection of nets made during layout. Once
you see the initial stage of a layout, you'll understand how this name was derived.
For example, in Figure 2-4, a portion of the connector with the reference designator PL1 is shown.
(Incidentally, connectors and jumpers often use the reference designator J.) Because the net connected to
pin number 23 of the connector PL1 is labeled A2, and the net connected to the processor's pin number
43 is labeled A2, they are connected even though the hardware engineer did not run a line to represent
the A2 net connected from the processor over to PL1.
In order to aid in testing the hardware and software, hardware engineers often include test points. A test
point is a metallic area on the finished board that provides access to a particular signal to aid in the
debugging or monitoring of that signal. One test point, with the reference designator TP11, is shown in
Figure 2-4 on the RESET pin of the processor. With the move to smaller and smaller IC packages and
smaller pins, test points are a necessity for debugging and also aid in production testing. Also, it is
impossible to probe on any pins of a ball grid array (BGA) package part, because all of the pins are
contained under the IC. In this case, a test point helps greatly.
Page 34
When you take a look at the full set of schematics, you will notice that there is a block at the lowerrighthand corner of each page. This is the title block; every schematic we have come across has some
version of this. The title block has useful information about the schematic, such as the date, the
designer's name, the revision, a page number and description of the schematic on that page, and often a
list of changes made.
At this point, we have a solid understanding of the system components that make up our platform and
how they are interconnected. Let's next see how to get to know the hardware.
Page 35
Page 36
The clock signal (CLK) is the basis for all operations of the processor and is shown as the top signal in
the timing diagram of Figure 2-5. The processor clock is generally a square wave that sequences all
operations of the processor.
The next group of signals are the address bus, A[0:20], followed by the data bus, D[0:15]. Such buses
are depicted in timing diagrams as shown in Figure 2-5, where a single entry represents the entire range
of signals rather than each signal having its own entry. A bus is typically stable (meaning it contains a
valid address or data) during the period of time when the single line splits into two lines. In hardware
terms, the bus goes from being tristate (single line) to having real information present (dual line), and
then back to being tristate again.
Page 37
While you are reading about a new board, create a table that shows the name and address range of each
memory device and peripheral that is located in the memory space. This table is called a memory map.
Organize the table so that the lowest address is at the bottom and the highest address is at the top. Each
time you add a device to the memory map, place it in its appropriate location in memory and label the
starting and ending addresses in hexadecimal. After you have finished inserting all of the devices into
the memory map, be sure to label any unused memory regions as such.
If you look back at the block diagram of the Arcom board in Figure 2-1, you will see that there are three
devices attached to the address and data buses. (The PC/104 bus is connected to the address and data
buses through buffers.) These devices are the RAM, ROM, and SMSC Ethernet controller. The RAM is
located at the bottom of the memory address range. The ROM is located toward the top of the range.
Sometimes a single address range, particularly for memory devices, is comprised
of multiple ICs. For example, the hardware engineer might use two ROM chips,
each of which has a storage capacity of 1 MB. This gives the processor a total of 2
MB of ROM. Furthermore, the hardware engineer is able to set up the two
individual ROM chips so the processor does not know which chip it is actually
accessing; the division of the two chips is transparent to the processor, and it sees
the memory as one contiguous block.
The memory map in Figure 2-6 shows what the memory devices in Figure 2-1 look like to the processor.
Also included in Figure 2-6 are the processor's internal peripheral registers, labeled PXA255
Peripherals, which are mapped into the processor's memory space. In a sense, this is the processor's
"address book." Just as you maintain a list of names and addresses in your personal life, you must
maintain a similar list for the processor. The memory map contains one entry for each of the memories
Page 38
For each new board, you should create a C-language header file that describes its most important
features. This file provides an abstract interface to the hardware. In effect, it allows you to refer to the
various devices on the board by name rather than by address. This has the added benefit of making your
application software more portable. If the memory map ever changesfor example, if the 64 MB of
RAM is movedyou need only change the affected lines of the board-specific header file and recompile
your application.
Abstracting the hardware into a file (for smaller projects) or a directory of files (for larger projects) also
allows you to reuse certain portions of your code as you move from project to project. Because the
hardware engineer will likely reuse components in new designs, you too can reuse drivers for these
components in these new designs.
As this chapter progresses, we will show you how to create a header file for the Arcom board; the
following code is the first section of this file. This part of the header file describes the memory map:
/**********************************************************************
*
Page 39
(0x00000000)
(0x08000300)
(0x50000000)
The second communication technique uses interrupts. An interrupt is an asynchronous electrical signal
from a peripheral to the processor. Interrupts can be generated from peripherals external or internal to
the processor, as well as by software.
When interrupts are used, the processor issues commands to the peripheral exactly as before, but then
waits for an interrupt to signal completion of the assigned work. While the processor is waiting for the
interrupt to arrive, it is free to continue working on other things. When the interrupt signal is asserted,
the processor finishes its current instruction, temporarily sets aside its current work, and executes a
Page 40
Page 41
Are there any special interruptssometimes called trapsthat are generated within the
processor itself? Must an ISR be written to handle each of these?
How are interrupts enabled and disabled? Globally and individually?
How are interrupts acknowledged or cleared?
In addition to the processor databook, the Internet contains a wealth of information for embedded
software developers. The manufacturer's site is a great place to start. In addition, a search for a particular
processor can yield oodles of useful information from fellow developers, including code snippets giving
you exact details on how to write your software. Several newsgroups are also targeted toward embedded
software development and toward specific processors.
You need to take care to fully understand any licensing issues of the software you
find on the Internet should you decide to use someone else's code. You might have
to get your company's legal department involved in order to avoid any problems.
Another useful tool for understanding the processor is a development board. Once the processor is
selected, you can search for your options for a development board. You need to consider the peripherals
and software tools included on the development board. For example, if your application is going to
include an Ethernet port, it would be a good idea to select a development board that also includes an
Ethernet port. There is typically example software included with the development board as well. If the
project uses a processor that you have not worked with before, the example software can get you up the
learning curve a lot faster. The development board will assist you in getting a jump-start on the
embedded software development.
Another benefit of a development board is that if you are seeing some oddities related to your project's
hardware, you can always go back to the development board (where the hardware should be stable) and
run some tests to see whether the problem is specific to the new design.
Page 42
Page 43
/********************************************************
* PXA255 XScale ARM Processor On-Chip Peripherals
********************************************************/
/* Timer Registers */
#define TIMER_0_MATCH_REG
#define TIMER_1_MATCH_REG
#define TIMER_2_MATCH_REG
#define TIMER_3_MATCH_REG
#define TIMER_COUNT_REG
#define TIMER_STATUS_REG
#define TIMER_INT_ENABLE_REG
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
volatile
volatile
volatile
volatile
volatile
volatile
volatile
*)0x40A00000))
*)0x40A00004))
*)0x40A00008))
*)0x40A0000C))
*)0x40A00010))
*)0x40A00014))
*)0x40A0001C))
Page 44
volatile
volatile
volatile
volatile
*)0x40E00000))
*)0x40E00004))
*)0x40E00008))
*)0x40E0000C))
GPIO_1_DIRECTION_REG
GPIO_2_DIRECTION_REG
GPIO_0_SET_REG
GPIO_1_SET_REG
GPIO_2_SET_REG
GPIO_0_CLEAR_REG
GPIO_1_CLEAR_REG
GPIO_2_CLEAR_REG
GPIO_0_FUNC_LO_REG
GPIO_0_FUNC_HI_REG
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
(*((uint32_t
volatile
volatile
volatile
volatile
volatile
volatile
volatile
volatile
volatile
volatile
*)0x40E00010))
*)0x40E00014))
*)0x40E00018))
*)0x40E0001C))
*)0x40E00020))
*)0x40E00024))
*)0x40E00028))
*)0x40E0002C))
*)0x40E00054))
*)0x40E00058))
Let's take a look at the earlier code snippet written to use a register definition from the example header
file:
if (bLedEnable == TRUE)
{
GPIO_0_SET_REG = 0x00400000;
}
This code is a lot easier to read and understand, even without a comment. Defining registers in a header
file, as we have shown in the preceding code, also prevents you or another team member from running
to the databook every other minute to look up a register address.
Page 45
In order to make the example in Chapter 3 a little easier to understand, we didn't show any of the
initialization code there. However, it is necessary to get the hardware initialization code working before
you can write even simple programs such as Blinking LED. The Arcom board includes a debug monitor
that handles all of the assembly language hardware initialization.
If you are one of the first software engineers to work with a new board
especially a prototypethe hardware might not work as advertised. All processorbased boards require some amount of software testing to confirm the correctness
of the hardware design and the proper functioning of the various peripherals. This
puts you in an awkward position when something is not working properly. How
do you know whether the hardware or your software is causing the problem? If
you happen to be good with hardware or have access to a simulator, you might be
able to construct some experiments to answer this question. Otherwise, you should
probably ask a hardware engineer to join you in the lab for a joint debugging
session.
The hardware initialization should be executed before the startup code described in Chapter 4. The code
described there assumes that the hardware has already been initialized and concerns itself only with
creating a proper runtime environment for high-level language programs. Figure 2-7 provides an
overview of the entire initialization process, from processor reset through hardware initialization and C
startup code to main.
Page 46
The first stage of the initialization process is the reset code. This is a small piece of assembly language
(usually only two or three instructions) that the processor executes immediately after it is powered on or
reset. The sole purpose of this code is to transfer control to the hardware initialization routine. The first
instruction of the reset code must be placed at a specific location in memory, usually called the reset
address or reset vector, which is specified in the processor databook. The reset address for the PXA255
is 0x00000000.
Most of the actual hardware initialization takes place in the second stage. At this point, we need to
inform the processor about its environment. This is also a good place to initialize the interrupt controller
and other critical peripherals. Less critical hardware devices can be initialized when the associated
device driver is started, usually from within main.
The PXA255 has boot select pins that allow you to specify the type and width of the memory device
from which the processor attempts to execute the initial instructions. The memory device that the
processor boots from typically contains the code to program several internal registers of the PXA255
that must be programmed before any useful work can be done with the processor. These internal
registers are responsible for setting up the memory map and are part of the processor's internal memory
Page 47
Make sure the processor and ROM are receiving the proper voltage required to operate the parts.
Check to make sure the clock signal is running. The processor won't do anything without a clock.
Page 48
Verify that the processor is coming out of reset properly. You can check the address a processor
is fetching using a logic analyzer. This will validate that the processor is trying to fetch the first
instruction from the location you expect.
Make sure that a watchdog timer isn't resetting the processor.
Ensure that input pins on the processor are pulled high or low. This is particularly important for
interrupt pins. An input pin in an unknown state (commonly called a floating pin) can wreak all
sorts of havoc for a processor.
The hardware engineer might handle these tasks for you, but don't be afraid to jump right in and look
over the schematics yourself. Or better yet, see whether you can sit in the lab with the hardware engineer
while he performs his initial checkout of the board.
Expect that the initial hardware bring-up will be the hardest part of the project. You will soon see that
once you have a basic program operating that you can fall back on, the work just gets easier and easier
or at least more similar to other types of computer programming.
Of course, the rate of blink is completely arbitrary. But one of the good things about the 1 Hz rate is
that it's easy to confirm with a stopwatch. Simply start the stopwatch, count off a number of blinks, stop
the stopwatch, and see whether the number of elapsed seconds is the same as the the number of blinks
you counted. Need greater accuracy? Simply count off more blinks.
Our first step is to learn how to control the green LED we want to toggle. On the Arcom board, the
green LED is located on the add-on module shown in Figure 3-1. The green LED is labeled "LED2" on
the add-on module. The Arcom board's VIPER-Lite Technical Manual and the VIPER-I/O Technical
Page 50
Figure 3-1. Arcom board add-on module containing the green LED
LED2 is controlled by the signal OUT2, as described in the LEDs section in the Arcom board's VIPERI/O Technical Manual. This text also informs us that the signals to the LEDs are inverted; therefore,
when the output is high, the LEDs are off, and vice versa. The general-purpose I/O section of the VIPER
Technical Manual shows that the OUT2 signal is controlled by the processor's GPIO pin 22. Therefore,
we will need to be able to set GPIO pin 22 alternately high and low to get our blinker program to
function properly.
The superstructure of the Blinking LED program is shown next. This part of the program is hardwareindependent. However, it relies on the hardware-dependent functions ledInit, ledToggle, and
delay_ms to initialize the GPIO pin controlling the LED, change the state of the LED, and handle the
timing, respectively. These functions are described in the following sections, where we'll really get a
sense of what it's like to do embedded systems programming.
Page 51
All of the documentation for the Arcom board is contained on the VIPER-Lite Development Kit CDROM. This includes datasheets and user's manuals for the components on the board.
On the PXA255, each port pin can be configured for use by the internal peripheral (called an alternatefunction pin) or by the user (called a general-purpose pin). For each GPIO pin, there are several 32-bit
registers. These registers allow for configuration and control of each GPIO pin. The description of the
registers for the GPIO port that contains the pin for the green LED is shown in Table 3-1. These
registers are located within the PXA255 chip.
Page 52
Type
Address
Name
Purpose
GPLR0
GPDR0
GPIO Pin
Read/write 0x40E0000C Direction
Register
GPSR0
GPIO Pin
Write-only 0x40E00018 Output Set
Register
GPCR0
GPIO
Alternate
GAFR0_U Read/write 0x40E00058 Function
Register
(High)
The PXA255 Processor Developer's Manual states that the configuration of the GPIO pins for the LEDs
are controlled by bits 20 (red), 21 (yellow), and 22 (green) in the 32-bit GPDR0 register. Figure 3-2
shows the location of the bit for GPIO pin 22 in the GPDR0 register; this bit configures the direction of
GPIO pin 22 that controls the green LED.
Page 53
Most registers within a CPU have a default configuration after reset. This means that before we are able
to control the output on any I/O pins, we need to make sure the pin is configured properly. After reset,
all GPIO pins in the PXA255 are configured as inputs. In addition, they function as general-purpose I/O
pins rather than alternate-function pins.
Although the GPIO pins that control the LEDs are configured as general-purpose
I/O pins upon reset, we need to ensure that the other software that is running did
not change the functionality of these GPIO pins.
It is a good practice always to initialize hardware you are going to use, even if you
think the default behavior is fine.
In our case, we need to configure GPIO pin 22 as an output via bit 22 in the GPDR0 register.
Furthermore, the GPIO pin that controls the green LED must be set to function as a general-purpose I/O
pin via the same bit in the GAFR0_U register.
The bitmask for the GPIO pin that controls the green LED on the Arcom board is defined in our
program as:
#define LED_GREEN
(0x00400000)
(0xFFFFCFFF)
/**********************************************************************
*
* Function:
ledInit
*
* Description: Initialize the GPIO pin that controls the LED.
*
* Notes:
This function is specific to the Arcom board.
*
* Returns:
None.
*
**********************************************************************/
void ledInit(void)
{
/* Turn the GPIO pin voltage off, which will light the LED. This should
* be done before the pins are configured. */
GPIO_0_CLEAR_REG = LED_GREEN;
/* Make sure the LED control pin is set to perform general
* purpose functions. RedBoot may have changed the pin's operation. */
GPIO_0_FUNC_HI_REG &= PIN22_FUNC_GENERAL;
/* Set the LED control pin to operate as output. */
GPIO_0_DIRECTION_REG |= LED_GREEN;
}
Page 56
The hardware-specific constant CYCLES_PER_MS represents the number of times the processor can get
through the while loop in a millisecond. To determine this number, we used trial and error. We will see
later how to use a hardware counter to achieve better timing accuracy.
The four functions main, ledInit, ledToggle, and delay_ms do the whole job of the Blinking LED
program. Of course, we still need to talk about how to build and execute this program. We'll examine
those topics in the next two chapters. But first, we have a little something to say about infinite loops and
their role in embedded systems.
Page 57
Used this way, the term "target platform" is best understood to include not only the hardware but also
the operating system that forms the basic runtime environment for your software. If no operating system
is present, as is sometimes the case in an embedded system, the target platform is simply the processor
on which your program runs.
The process of converting the source code representation of your embedded software into an executable
binary image involves three distinct steps:
1. Each of the source files must be compiled or assembled into an object file.
2. All of the object files that result from the first step must be linked together to produce a single
object file, called the relocatable program.
Page 58
Each of the steps of the embedded software build process is a transformation performed by software
running on a general-purpose computer. To distinguish this development computer (usually a PC or
Unix workstation) from the target embedded system, it is referred to as the host computer. The compiler,
assembler, linker, and locator run on a host computer rather than on the embedded system itself. Yet,
these tools combine their efforts to produce an executable binary image that will execute properly only
on the target embedded system. This split of responsibilities is shown in Figure 4-2.
Page 59
In this book, we'll be using the GNU tools (compiler, assembler, linker, and debugger) for our examples.
These tools are extremely popular with embedded software developers because they are freely available
(even the source code is free) and support many of the most popular embedded processors. We will use
features of these specific tools as illustrations for the general concepts discussed. Once understood, these
same basic concepts can be applied to any equivalent development tool. The manuals for all of the GNU
software development tools can be found online at http://www.gnu.org/manual.
4.1.1. Compiling
The job of a compiler is mainly to translate programs written in some human-readable language into an
equivalent set of opcodes for a particular processor. In that sense, an assembler is also a compiler (you
might call it an "assembly language compiler"), but one that performs a much simpler one-to-one
translation from one line of human-readable mnemonics to the equivalent opcode. Everything in this
section applies equally to compilers and assemblers. Together these tools make up the first step of the
embedded software build process.
Of course, each processor has its own unique machine language, so you need to choose a compiler that
produces programs for your specific target processor. In the embedded systems case, this compiler
almost always runs on the host computer. It simply doesn't make sense to execute the compiler on the
embedded system itself. A compiler such as thisthat runs on one computer platform and produces
code for anotheris called a cross-compiler. The use of a cross-compiler is one of the defining features
of embedded software development.
The GNU C compiler (gcc) and assembler (as) can be configured as either native compilers or crosscompilers. These tools support an impressive set of host-target combinations. The gcc compiler will run
Page 60
4.1.2. Linking
All of the object files resulting from the compilation in step one must be combined. The object files
themselves are individually incomplete, most notably in that some of the internal variable and function
references have not yet been resolved. The job of the linker is to combine these object files and, in the
process, to resolve all of the unresolved symbols.
The output of the linker is a new object file that contains all of the code and data from the input object
files and is in the same object file format. It does this by merging the text, data, and bss sections of the
input files. When the linker is finished executing, all of the machine language code from all of the input
object files will be in the text section of the new file, and all of the initialized and uninitialized
variables will reside in the new data and bss sections, respectively.
Page 61
We are talking only about static linking here. When dynamic linking of libraries is used, the code and
data associated with the library routine are not inserted into the program directly.
Unfortunately, the standard library routines often require some changes before they can be used in an
embedded program. One problem is that the standard libraries provided with most software development
tool suites arrive only in object form. You only rarely have access to the library source code to make the
necessary changes yourself. Thankfully, a company called Cygnus (which is now part of Red Hat)
created a freeware version of the standard C library for use in embedded systems. This package is called
newlib . You need only download the source code for this library from the Web (currently located at
http://sourceware.org/newlib), implement a few target-specific functions, and compile the whole lot. The
library can then be linked with your embedded software to resolve any previously unresolved standard
library calls.
After merging all of the code and data sections and resolving all of the symbol references, the linker
produces an object file that is a special "relocatable" copy of the program. In other words, the program is
complete except for one thing: no memory addresses have yet been assigned to the code and data
sections within. If you weren't working on an embedded system, you'd be finished building your
software now.
But embedded programmers aren't always finished with the build process at this point. The addresses of
the symbols in the linking process are relative. Even if your embedded system includes an operating
system, you'll probably still need an absolutely located binary image. In fact, if there is an operating
system, the code and data of which it consists are most likely within the relocatable program too. The
Page 62
Typically, the startup code will also include a few instructions after the call to main. These instructions
will be executed only in the event that the high-level language program exits (i.e., the call to main
returns). Depending on the nature of the embedded system, you might want to use these instructions to
halt the processor, reset the entire system, or transfer control to a debugging tool.
Because the startup code is often not inserted automatically, the programmer must usually assemble it
himself and include the resulting object file among the list of input files to the linker. He might even
need to give the linker a special command-line option to prevent it from inserting the usual startup code.
Working startup code for a variety of target processors can be found in a GNU package called libgloss .
Debug Monitors
In some cases, a debug monitor (or ROM monitor) is the first code executed when the board
powers up. In the case of the Arcom board, there is a debug monitor called RedBoot. [ ]
RedBoot, the name of which is an acronym for RedHat's Embedded Debug and Bootstrap
program, is a debug monitor that can be used to download software, perform basic memory
operations, and manage nonvolatile memory. This software on the Arcom board contains the
startup code and performs the tasks listed previously to initialize the hardware to a known
state. Because of this, programs downloaded to run in RAM via RedBoot do not need to be
Page 63
linked with startup code and should be linked but not located.
After the hardware has been initialized, RedBoot sends out a prompt to a serial port and waits
for input from the user (you) to tell it what to do. RedBoot supports commands to load
software, dump memory, and perform various other tasks. We will take a look at using
RedBoot to load a software program in the next chapter.
[ ]
4.1.3. Locating
The tool that performs the conversion from relocatable program to executable binary image is called a
locator. It takes responsibility for the easiest step of the build process. In fact, you have to do most of the
work in this step yourself, by providing information about the memory on the target board as input to the
locator. The locator uses this information to assign physical memory addresses to each of the code and
data sections within the relocatable program. It then produces an output file that contains a binary
memory image that can be loaded into the target.
Whether you are writing software for a general-purpose computer or an embedded system, at some point
the sections of your relocatable program must be assigned actual addresses. Sometimes software that is
already in the target does this for you, as RedBoot does on the Arcom board.
In some cases, there is a separate development tool, called a locator, to assign addresses. However, in
the case of the GNU tools, this feature is built into the linker (ld).
The memory information required by the GNU linker can be passed to it in the form of a linker script.
Such scripts are sometimes used to control the exact order of the code and data sections within the
relocatable program. But here, we want to do more than just control the order; we also want to establish
the physical location of each section in memory.
What follows is an example of a linker script for the Arcom board. This linker script file is used to build
the Blinking LED program covered in Chapter 3:
ENTRY (main)
MEMORY
{
ram : ORIGIN = 0x00400000, LENGTH = 64M
rom : ORIGIN = 0x60000000, LENGTH = 16M
}
SECTIONS
{
Page 64
/* Initialized data. */
} >ram
bss :
{
_BssStart = . ;
*(.bss)
_BssEnd
= . ;
} >ram
/* Uninitialized data. */
text :
{
*(.text)
} >ram
This script informs the GNU linker's built-in locator about the memory on the target board, which
contains 64 MB of RAM and 16 MB of flash ROM. [] The linker script file instructs the GNU linker to
locate the data, bss, and text sections in RAM starting at address 0x00400000. The first executable
instruction is designated with the ENTRY command, which appears on the first line of the preceding
example. In this case, the entry point is the function main.
[]
There is also a version of the Arcom board that contains 32 MB of flash. If you have this version of
the board, change the linker script file as follows:
Names in the linker command file that begin with an underscore (e.g., _DataStart) can be referenced
similarly to ordinary variables from within your source code. The linker will use these symbols to
resolve references in the input object files. So, for example, there might be a part of the embedded
software (usually within the startup code) that copies the initial values of the initialized variables from
ROM to the data section in RAM. The start and stop addresses for this operation can be established
symbolically by referring to the addresses as _DataStart and _DataEnd.
A linker script can also use various commands to direct the linker to perform other operations.
Additional information and options for GNU linker script files can be found at http://www.gnu.org.
The output of this final step of the build process is a binary image containing physical addresses for the
specific embedded system. This executable binary image can be downloaded to the embedded system or
programmed into a memory chip. You'll see how to download and execute such memory images in the
next chapter.
Page 65
We will next take a look at the individual commands in order to manually perform the three separate
tasks (compiling, linking, and locating) described earlier in this chapter. Then we will learn how to
automate the build procedure with makefiles.
4.2.1. Compile
As we have implemented it, the Blinking LED example consists of two source modules: led.c and
blink.c. The first step in the build process is to compile these two files. The basic structure for the gcc
compiler command is:
arm-elf-gcc [
options
]
file
...
-g
To generate debugging info in default format
-c
To compile and assemble but not link
Page 66
-Wall
To enable most warning messages
-I../include
To look in the directory include for header files
Here are the actual commands for compiling the C source files:
# arm-elf-gcc g -c Wall -I../include led.c
# arm-elf-gcc -g c -Wall -I../include blink.c
We broke up the compilation step into two separate commands, but you can compile the two files with
one command. To use a single command, just put both of the source files after the options. If you
wanted different options for one of the source files, you would need to compile it separately as just
shown. For additional information about compiler options, take a look at http://gcc.gnu.org.
Running these commands will be a good way to verify that the tools were set up properly. The result of
each of these commands is the creation of an object file that has the same prefix as the .c file, and the
extension .o. So if all goes well, there will now be two additional filesled.o and blink.oin the
working directory. The compilation procedure is shown in Figure 4-3.
Page 67
-Map blink.map
To generate a map file and use the given filename
Page 68
-N
To set the text and data sections to be readable and writable
-o blink.exe
To set the output filename (if this option is not included, ld will use the default output filename
a.out)
The actual command for linking and locating is:
# arm-elf-ld Map blink.map T viperlite.ld -N o blink.exe led.o blink.o
The order of the object files determines their placement in memory. Because we are not linking in any
startup code, the order of the object files is irrelevant. If startup code were included, you would want
that object file to be located at the proper address. The linker script file can be used to specify where you
want the startup routine (and other code) to reside in memory. Furthermore, you can also use the linker
script file to specify exact addresses for code or data, should you find it necessary to do so.
As you can see in this command, the two object filesled.o and blink.oare the last arguments on the
command line for linking. The linker script file, viperlite.ld, is also passed in for locating the data and
code in the Arcom board's memory. The result of this command is the creation of two filesblink.map
and blink.exein the working directory. The linking and locating procedure is shown in Figure 4-4.
Page 69
The .map file gives a complete listing of all code and data addresses for the final software image. If you
have never seen such a map file before, be sure to take a look at this one before reading on. It provides
information similar to the contents of the linker script described earlier. However, these are results rather
than instructions and therefore include the actual lengths of the sections and the names and locations of
the public symbols found in the relocatable program. We'll see later how this file can be used as a
debugging aid.
Page 70
The build procedure for subsequent chapters in the book generates two executable files: one with debug
information and one without. The executable that contains the debug information includes dbg in its
filename. The debug image should be used with gdb. If an image is downloaded with RedBoot, the
nondebug image should be used.
The command used to strip symbol information is:
# arm-elf-strip --remove-section=.comment blinkdbg.exe -o blink.exe
This removes the section named .comment from the image blinkdbg.exe and creates the new output file
blink.exe.
There might be another time when you need an image file that can be burned into ROM or flash. The
GNU toolset has just what you need for this task. The utility objcopy (object copy) is able to copy the
contents of one object file into another object file. The basic structure for the objcopy utility is:
arm-elf-objcopy [
options
]
input-file
[
output-file
]
For example, let's suppose we want to convert our Blinking LED program from ELF format into an Intel
Hex Format file. [||] The command line we use for this is:
Page 71
Intel Hex format is an ASCII file format devised by Intel for storing and downloading binary images.
This command uses the O ihex option to generate an Intel Hex Format file. The input file is blink.exe
(the objcopy utility determines the input file type). Finally, the output file is named blink.hex.
If no output filename is given, the strip and objcopy utilities overwrite the original
input file with the generated file.
Some of the other GNU tools are useful for providing other information about the image you have built.
For example, the size utility, which is part of the binutils package, lists the section sizes and total size for
a given object file. Here is the command for using the size utility:
# arm-elf-size blink.exe
data
0
bss
0
dec
328
hex filename
148 blink.exe
The top row consists of column headings and shows the sections text, data, and bss. The Blinking
LED program contains 328 bytes in the text section, no bytes in the data section, and no bytes in the
bss section. The dec column shows the total image size in decimal, and the hex column shows it in
hexadecimal (decimal 328 = hexadecimal 0x148). These total sizes are in bytes. The last column,
filename, contains the filename of the object file.
You will notice that the size of the section, 328 bytes, is much smaller than the approximately 3 KB file
size of our blink.exe. This is because debugging information is located also in the blink.exe file.
Additional information about the other GNU binutils can be found online at http://www.gnu.org.
We're now ready to download the program to our development board, which we'll do in the next chapter.
To wrap up our discussion of building programs, let's take a quick look at another useful tool in the
build process.
Page 72
prerequisite
command
The target is what is going to be built, the prerequisite is a file that must exist before the target
can be created, and the command is a shell command used to create the target. There can be multiple
prerequisites on the target line (separated by white space) and/or multiple command lines. But be
sure to put a tab, not spaces, at the beginning of every line containing a command.
Here's a makefile for building our Blinking LED program:
XCC
LD
CFLAGS
= arm-elf-gcc
= arm-elf-ld
= -g -c -Wall \\
-I../include
LDFLAGS = -Map blink.map -T viperlite.ld -N
all: blink.exe
led.o: led.c led.h
$(XCC) $(CFLAGS) led.c
blink.o: blink.c led.h
$(XCC) $(CFLAGS) blink.c
blink.exe: blink.o led.o viperlite.ld
$(LD) $(LDFLAGS) -o $@ led.o blink.o
clean:
-rm -f blink.exe *.o blink.map
The first four statements in this makefile contain variables for use in the makefile. The variable names
are on the left side of the equal sign. In this makefile, the respective variables do the following:
XCC
Page 73
LD
Defines the linker executable program
CFLAGS
Defines the flags for the compiler
LDFLAGS
Defines the flags for the linker
Variables in a makefile are used to eliminate some of the duplication of text as well as to ease
portability. In order to use a variable in the code, the syntax $( ) is used with the variable name
enclosed in the parentheses.
Note that if a line in a makefile gets too long, you can continue it on the following line by using the
backslash (\\), as shown with the CFLAGS variable.
Now for the build rules. The build targets in this file are all, led.o, blink.o, and blink.exe. Unless
you specify a target when invoking the make utility, it searches for the first target (in this case, the first
target is all) and tries to build it; this, in turn, can lead to it finding and building other targets. The make
utility creates (or re-creates, as the case may be) the target file if it does not exist or if the prerequisite
files are more recent than the target file.
At this point, it might help to look at the makefile from the bottom up. In order for blink.exe to be
created, blink.o and led.o need to be built as shown in the prerequisites. However, since these files
don't exist, the make utility will need to create them first. It will search for ways to create these two files
and will find them listed as targets in the makefile. The make utility can create these files because the
prerequisites (the source files) for these two targets exist.
Because the targets led.o and blink.o are handled similarly, let's focus on just one of them. The
prerequisites for the target led.o are led.c and led.h. As stated above, the command tells the make
utility how to create the target. The first part of the command for led.o is a reference to the variable
XCC, as indicated by the syntax $(XCC), and the next part of the command is a reference to the variable
CFLAGS, as indicated by the syntax $(CFLAGS). The make utility simply replaces variable references with
the text assigned to them in the makefile. The final part of the command is the source file led.c. Strung
Page 74
This is the same command we entered by hand in order to compile the led.c file earlier in this chapter, in
the section "Building the Blinking LED Program." The make utility compiles blink.c in the same way.
At this point, the make utility has all of the prerequisites needed to generate the target blink.exe default
target. The command that the make utility executes (the same command we entered by hand to link and
locate the Blinking LED program) to build blink.exe is:
arm-elf-ld -Map blink.map -T viperlite.ld -N -o blink.exe led.o blink.o
You may notice that in this makefile the linker is invoked directly. Instead, gcc could have been used to
invoke the linker indirectly with the following line:
arm-elf-gcc -Wl,-Map,blink.map -T viperlite.ld -N -o blink.exe led.o blink.o
When invoking the linker indirectly, the special option Wl is used so that gcc passes the request to
generate a linker map file to the linker rather than trying to parse the argument itself. While this simple
Blinking LED program does not need to link using gcc, you should remember that more complex C
programs may need special runtime library support from gcc and will need to be linked in this way.
The last part of the makefile is the target clean. However, because it was not needed for the default
target, the command was not executed.
To execute the makefile's build instructions, simply change to the directory that contains the makefile
and enter the command:
# make
The make utility will search the current directory for a file named makefile. If your makefile has a
different name, you can specify that on the command line following the -f option.
With the previous command, the make utility will make the first target it finds. You can also specify
targets on the command line for the make utility. For example, because all is the default target in the
preceding makefile, you can just as easily use the following command:
# make all
Page 75
A target called clean is typically included in a makefile, with commands for removing old object files
and executables, in order to allow you to create a fresh build. The command line for executing the clean
target is:
# make clean
Keep in mind that we've presented a very basic example of the make utility and makefiles for a very
basic project. The make utility contains very powerful tools within its advanced features that can benefit
you when executing large and more complex projects.
It is important to keep the makefile updated as your project changes. Remember to
incorporate new source files and keep your prerequisites up to date. If
prerequisites are not set up properly, you might change a particular source file, but
that source file will not get incorporated into the build. This situation can leave
you scratching your head.
Additional information about the GNU make utility can be found online at http://www.gnu.org as well
as in the book Managing Projects with GNU make, by Robert Mecklenburg (O'Reilly). These resources
will give you a deeper understanding of both the make utility and makefiles and allow you to use their
more powerful features.
Page 76
As shown in Figure 5-1, the software development cycle begins with design and the first implementation
of the code. After that, there are usually iterations of the build, download and debug, and bug-fixing
stages. Because a lot of time is spent in these three stages, it helps to eliminate any kinks in this process
so that the majority of time can be spent on debugging and testing the software. (This diagram does not
take into account the handling of feature creep inevitably inflicted by the marketing department.)
Because this is a very basic diagram, other stages that may be necessary are profiling and optimization.
Profiling allows a developer to determine various metrics about a program, such as where the processor
is spending most of its time. Optimization is the process by which the developer tries to eliminate
bottlenecks in software using various techniques, such as implementing time-critical code in assembly
language. Very often, optimization techniques are compiler-, processor-, or system-specific.
Page 77
5.1.1.1. RedBoot
The Arcom board includes a debug monitor called RedBoot, which is described in the sidebar "Debug
Monitors" in Chapter 4. RedBoot resides in the bootloader flash on the Arcom board and uses the
Page 78
If all connections are properly made, an initialization message is output from the Arcom board's COM1
port once power is applied, along with the RedBoot prompt, which looks like this:
Ethernet eth0: MAC address 00:80:12:1c:89:b6
No IP info for device!
RedBoot(tm) bootstrap and debug environment [ROM]
Non-certified release, version W468 V3I7 - built 10:11:20, Mar 15 2006
Platform: VIPER (XScale PXA255)
Copyright (C) 2000, 2001, 2002, 2003, 2004 Red Hat, Inc.
RAM: 0x00000000-0x04000000, [0x00400000-0x03fd1000] available
FLASH: base 0x60000000, size 0x02000000, 256 blocks of 0x00020000 bytes each.
== Executing boot script in 1.000 seconds - enter ^C to abort
^C
RedBoot>
Because we have not entered an Internet Protocol (IP) address for the Arcom board, RedBoot outputs
the message: No IP info for device! This message can be ignored for now. Another thing to notice is
that we have stopped the boot script from running (and loading Linux) by entering Ctrl-C (shown in the
preceding code as ^C) when RedBoot is started.
Page 79
This tells RedBoot to load an image using the xmodem protocol as the method. After you press the Enter
key, RedBoot begins to output the character C while waiting for the file to be sent over.
To begin the file transfer using Windows HyperTerminal, select Transfer
Send File... from the menu
(use a similar command if you have a different terminal program). This brings up the Send File dialog
box; select Xmodem for the protocol. Browse to the location of the blink.exe program and select it. Then
click Send. A transfer statistics dialog box will be displayed showing the status of the file transfer.
Once the transfer has successfully completed, RedBoot outputs a message similar to the following:
Entry point: 0x00400110, address range: 0x00000024-0x0040014c
xyzModem - CRC mode, 24(SOH)/0(STX)/0(CAN) packets, 2 retries
This shows the entry point of the programin this case, 0x00400110. If you refer to the map file
generated by the linker during the build process, blink.map, the entry point address should look familiar,
as shown in this portion of the map:
Name
Origin
Length
.text
0x004000b0
0x00400110
0x9c
blink.o
main
The map file shows that the routine main resides at 0x00400110, which is the entry point for execution
of the Blinking LED program. The value 0x9C is the total length of the object file blink.o.
Page 80
After you press Enter, RedBoot hands control of the Arcom board over to the Blinking LED program. If
everything is successful, you should now see the green LED blinking on the add-on board.
You have just successfully completed your first pass through the embedded software development cycle.
Page 81
As soon as power is applied, a processor will begin to fetch and execute the code that is stored inside the
ROM. However, be aware that each type of processor has its own rules about the location of its first
instruction. For example, when the ARM processor is reset, it begins by fetching and executing
whatever is stored at physical address 0x00000000. This is called the reset address, and the instructions
located there are collectively known as the reset code. In the case of the Arcom development board, the
reset code is part of the RedBoot debug monitor.
If your program doesn't appear to be working, something could be wrong with your reset code. You
must always ensure that the binary image you've loaded into the ROM satisfies the target processor's
reset rules. During product development, we often find it useful to turn on one of the board's LEDs just
after the reset code has been completed. That way, we know at a glance that any new code either does or
doesn't satisfy the processor's most basic requirements.
FLASH addr
0x00000000
0x0001F000
0x00020000
Mem addr
0x00000000
0x00000000
0x00000000
Length
0x0001F000
0x00001000
0x01FE0000
Page 82
Entry point
0x00000000
0x00000000
0x00000000
Page 83
The debug monitor resides in ROMhaving been placed there either by you or at the factoryand is
automatically started whenever the target processor is reset. It monitors the communications link to the
host computer and responds to requests from the remote debugger host software. Of course, these
requests and the monitor's responses must conform to some predefined communications protocol and are
typically of a very low-level nature. Examples of requests the host software can make are "read register
x," "modify register y," "read n bytes of memory starting at address z," and "modify the data at address
a." The remote debugger combines sequences of these low-level commands to accomplish complex
debugging tasks such as downloading a program, single-stepping, and setting breakpoints.
It is helpful to build the program being tested to include symbolic debug
information, which we did with the g option during the compilation step of the
build procedure in Chapter 4. The g option causes the compiler to place
additional information in the object file for use by the debugger. This debug
information allows the debugger to relate between the executable program and the
source code.
One such debugger is the GNU debugger (gdb). Like the other GNU tools, it was originally designed for
use as a native debugger and was later given the ability to perform remote debugging. The gdb debug
monitor that runs on the target hardware is called a gdb stub. Additional information about gdb can be
found online at http://sources.redhat.com/gdb.
The GNU software tools include gdb. The version installed is CLI-based, so there are a few commands
to learn in order to run the debugger properly. There are several GUIs available for gdb, such as Insight
(http://sources.redhat.com/insight) and DataDisplay Debugger (http://www.gnu.org/software/ddd).
RedBoot contains a gdb-compatible debug monitor. Therefore, once a host attempts to communicate
with the target using the gdb protocol, RedBoot turns control of the target over to the gdb stub for the
debug session.
Page 84
This document is included in electronic form on the Arcom VIPER-Lite Development Kit CD-ROM.
Remote debuggers are one of the most commonly used downloading and testing tools during
development of embedded software. This is mainly because of their low cost. Embedded software
developers already have the requisite host computer. In addition, the price of a remote debugger does not
add significantly to the cost of a suite of cross-development tools (compiler, linker, locator, etc.).
However, there are some disadvantages to using a debug monitor, including the inability to debug
startup code. Another disadvantage is that code must execute from RAM. Furthermore, when using a
debug monitor, a communication channel must exist to interface the target to the host computer.
To prepare for the debugging examples, cycle power on the Arcom board and halt the RedBoot boot
script by pressing Ctrl-C. Once the RedBoot initialization message is output, you're ready to start.
Invoke gdb, passing the name of the program to debug as an argument, by using the following
command:
# arm-elf-gdb blink.exe
Page 85
If you use the wrong executable that does not contain debugging information with
gdb, the following message is output:
(no debugging symbols found)
There should also be a (gdb) prompt waiting for input. Next, issue the command to have gdb connect to
the Arcom board. The following command assumes that the computer's serial port that is connected to
the target board is COM1 (if a different PC serial port is used, modify the command accordingly):
(gdb) target remote /dev/ttyS0
After gdb successfully connects to the target, a response similar to this one will follow:
Remote debugging using /dev/ttyS0
The host computer running the gdb command-line interface is now connected to the gdb stub residing on
the target hardware within RedBoot.
Next download the blink.exe program onto the target with the command:
(gdb) load blink.exe
When program loading completes successfully, a message similar to this one is output from gdb:
Loading section data, size 0x4 lma 0x400000
Loading section text, size 0x148 lma 0x400004
Start address 0x400110, load size 332
Transfer rate: 2656 bits in <1 sec, 166 bytes/write.
Page 86
gdb commands are not case-sensitive (though symbols are) and can be abbreviated
to the shortest unique string. For example, you can set a breakpoint with any of
these commands:
(gdb) breakpoint ledToggle(gdb) break ledToggle(gdb) br
ledToggle(gdb) b ledToggle
After successfully setting the breakpoint, gdb responds with information about the breakpoint as
follows:
Breakpoint 1 at 0x400070: file led.c, line 66.
The response shows the breakpoint number (1 in this case), the address of the breakpoint (0x400070),
the file where the function is located (led.c), and the line number in that file where the breakpoint is set
(66).
If you need to check which breakpoints are set within a program for a gdb session, you can use the
command:
(gdb) info b
Page 87
Once gdb hits the breakpoint, it will halt execution and output the source code line that is to be executed
next:
Breakpoint 1, ledToggle ( ) at led.c:66
66
if (GPIO_0_LEVEL_REG & LED_GREEN)
The first line of output shows the breakpoint that was hit, along with the description. The second line
shows the line number in the file, 66 in this case, with the source code for that line. The green LED
should be lit now because that is its initial state in the program.
To have gdb show the source code where the current program is stopped, use the list command:
(gdb) l
This will dump, by default, 10 source code lines. To dump the next 10 source code lines, simply enter
the list command again.
To repeat the last command in a gdb session, simply press Enter.
A useful feature of the remote debugger is that it can check symbol values. The command to show the
value of the gChapter variable is:
(gdb) print /x gChapter
The /x option formats the output for hexadecimal. The response from gdb is:
$1 = 0x5
Page 88
This shows that the new value of gChapter is 0xC (12 decimal).
0x00400000
0x00400000
0x4 blink.o
gChapter
Given the address of a symbol, you can use gdb to find out the current value of that symbol
by using the command:
(gdb) x/d 0x400000
Page 89
of memory to read (here 0x400000). In this case, gdb responds with the following output:
0x400000 <gChapter>:
12
This shows the current value of gChapter as 12 (this value was set in the previous
command). This allows you to peek and poke variables without having the symbol
information.
Another very useful feature of a remote debugger is the ability to step through lines of source code, an
action commonly called single-stepping. There are several different types of single-step commands, such
as stepping a single machine instruction and stepping a single source code line. The following
command, next, steps a single source code line:
(gdb) n
The source code line to be executed next is output by gdb as shown here:
69
GPIO_0_SET_REG = LED_GREEN;
gdb provides two commands for stepping one line at a time: step and next. The difference between
them is that when you reach the start of a function call, step enters the function and runs the first
statement within the function, whereas next runs the whole function.
Now run the program again with the continue command:
(gdb) c
Page 90
ledToggle ( ) at led.c:66
0x00400140 in main ( ) at blink.c:75
The response from gdb shows the most recently executed function (indicated by #0), followed by the
function that called it (indicated by #1), and so on. The preceding response shows that the routine main
called the routine ledToggle.
With gdb you can also view the processor's register values:
(gdb) info registers
This command outputs the current value of the program counter register in hexadecimal format.
You should now have a good understanding of how to use gdb. We've examined the most important
commands, but it may be helpful to play around with some others at this point.
In order to remove the breakpoint set earlier, use the delete command:
(gdb) d
Page 91
5.3. Emulators
An in-circuit emulator (ICE) provides a lot more functionality than a remote debugger. In addition to
providing the features available with a remote debugger, an ICE allows you to debug startup code and
programs running from ROM, set breakpoints for code running from ROM, and even run tests that
require more RAM than the system contains.
An ICE typically takes the place ofor emulatesthe processor on your target board. (Some emulators
come with an adapter that clips over the processor on the target.) The ICE is itself an embedded system,
with its own copy of the target processor, RAM, ROM, and embedded software. In-circuit emulators are
usually pretty expensive. But they are powerful tools, and in a tight spot, nothing else will help you get
the debug job done better.
Like a debug monitor, an emulator uses a remote debugger for its host interface. In some cases, it is even
possible to use the same debugger frontend for both. But because the emulator has its own copy of the
target processor, it is possible to monitor and control the state of the processor in real time. This allows
the emulator to support such powerful debug features as hardware breakpoints and real-time tracing.
Additional information about in-circuit emulators can be found in the November 2001 Embedded
Systems Programming article "Introduction to In-Circuit Emulators," which can be found online at
http://www.embedded.com.
With a debug monitor, you can set breakpoints in your program. However, these software breakpoints
are restricted to instruction fetchesthe equivalent of the command "stop execution if this instruction is
about to be fetched." Emulators, by contrast, also support hardware breakpoints. Hardware breakpoints
allow you to stop execution in response to a wider variety of eventsnot only instruction fetches, but
also interrupts and reads and writes of memory. For example, you might set a hardware breakpoint on
the event "address bus = 0x2034FF00 and data bus = 0x20310000."
Another useful feature of an in-circuit emulator is real-time tracing. Typically, an emulator incorporates
a large block of special-purpose RAM that is dedicated to storing information about each processor
cycle executed. This feature allows you to see in exactly what order things happened, so it can help you
answer questions such as, "Did the timer interrupt occur before or after the variable bar became 12?" In
Page 92
5.4.1. Simulators
A simulator is a completely host-based program that simulates (hence the catchy name) the functionality
and instruction set of the target processor. The user interface is usually the same as or similar to that of
the remote debugger. In fact, it might be possible to use one debugger host for the simulator target as
well, as shown in Figure 5-3. Although simulators have many disadvantages, they are quite valuable in
the earlier stages of a project when there is not yet any actual hardware for the programmers to
experiment with. If you cannot get your hands on a development board, a simulator is the best tool for
getting a jump-start on the software development.
Page 93
By far, the biggest disadvantage of a simulator is that it simulates only the processor. Embedded systems
frequently contain one or more important peripherals. Interaction with these devices can sometimes be
imitated with simulator scripts or other workarounds, but such workarounds are often more trouble to
create than the simulation is worth. So you probably won't do too much with the simulator once the
actual embedded hardware is available.
Page 95
One of the most primitive debug techniques available is the use of an LED as an indicator of success or
failure. The basic idea is to slowly walk the LED enable code through the larger program. In other
words, first begin with the LED enable code at the reset address. If the LED turns on, you can edit the
programmoving the LED enable code to just after the next execution milestoneand then rebuild and
test. Because this technique gives you very little information about the state of the processor, it is most
appropriate for very simple, linearly executed programs such as the startup code. But if you don't have
access to a remote debugger or any of the other debug tools, this type of debugging might be your only
choice.
If an LED is not present on your hardware platform, you can still use this debug technique with an I/O
signal and an oscilloscope. In this case, set the I/O signal to a specific level once you reach a particular
execution milestone. Using the oscilloscope, you can then probe that I/O pin to determine whether the
code has set it appropriately. If so, you know that the code executed successfully up to that point, and
you can now move the I/O signal code to the next milestone.
The method of using an I/O signal and an oscilloscope can also be used as a basic performance
measurement tool. An I/O pin can be used to measure how long a program is spending in a given
routine, or how long it takes to execute a particular fragment of code. This can show potential
bottlenecks in the program.
For example, to precisely measure the length of time spent in the delay_ms routine (when passed in a
parameter of 1), we could set an I/O pin high when we enter the routine and then set the same I/O pin
low before exiting. We could then attach an oscilloscope lead to this I/O pin to measure the amount of
time that the I/O pin is high, which is the time spent in the delay_ms routine. The oscilloscope screen
should look similar to the image in Figure 5-5.
Figure 5-5. Using I/O signals for debug and performance measurements
Page 96
As shown in Figure 5-5, channel 1 (CH 1) is the probe that captured the signal. The dotted horizontal
lines indicate voltage incrementsin this case, 2 volts per division; the dotted vertical lines indicate
time incrementsin this case, 500 microseconds per division. The "T" at the top of the screen, along
with the arrow, indicate when the oscilloscope triggered on the falling edge of the I/O pin. The I/O
signal goes from (approximately) 0 to 3.3 volts.
Incidentally, this test shows that the actual delay_ms routine that is supposed to delay for 1 millisecond
is a little off, because the time from setting the I/O pin high to setting the I/O pin low is a bit longer than
two divisions.
If I/O pins are available and several inputs are supported by the oscilloscope, you can use multiple I/O
signals simulataneously in order to get a snapshot of the entire system. In more complex systems, you
can move the I/O set calls around in the various routines and measure how each routine is performing.
Finding Pin 1
Before (carefully) probing around the circuit board, let's learn how to identify particular pins
on an IC. Figure 5-6 shows several common methods used to identify pin 1 on an IC. As
shown in this figure, a square pad is often used for pin 1, whereas the other pads are typically
Page 97
circles. This is typically the case when the IC is in a dual inline package (DIP). Another
indicator of pin 1 is a silkscreened circle next to pin 1.
IC manufacturers also indicate pin 1 by putting either a circular indentation next to it or an arc
indentation on the top of the IC, in which case pin 1 is located to the left of this indentation.
On some smaller chips, the pin 1 side is chamfered (a grove is cut). The other pin numbers
almost always increase as you move counterclockwise from pin 1 around the chip.
A combination of these may be used in some cases. You might want to take a look at the
Arcom board to get a better idea of what some of these pin-1 indicators look like.
5.4.3. Lint
A lint program is a tool for statically checking source code for portability problems and common coding
syntax errors, such as ignored return values and type inconsistencies. A compiler provides some of this
error checking, but a lint program verifies these areas of a program much more carefully and therefore
aids in the development of more robust software.
Setting up lint is similar to setting up a compiler, where different options are passed into the program to
control the type of output produced. In fact, you can augment your build procedure to include a lint
check that sends its output to a file for you to review at a later time.
A good introduction to using lint is the article "Introduction to Lint" from the archives of Embedded
Systems Programming. This article can be found online at http:/ /www.embedded.com. For additional
information, pick up Checking C Programs with Lint, by Ian F. Darwin (O'Reilly).
Page 98
Subversion (http://subversion.tigris.org)
A follow-on to CVS that solves some of CVS's problems for large projects and is gaining
adherents. Version Control with Subversion, by Ben Collins-Sussman, Brian W. Fitzpatrick, and
C. Michael Pilato (O'Reilly) covers Subversion version control software. [ ]
[ ]
Page 99
Chapter 6. Memory
Tyrell: If we give them a past, we create a cushion for their emotions and, consequently, we can control
them better.Deckard: Memories. You're talking about memories.
the movie Blade Runner
In this chapter, you will learn everything you need to know about memory in embedded systems. In
particular, you will learn about the types of memory you are likely to encounter, how to test memory
devices to see whether they are working properly, and how to use flash memory.
Page 101
DRAM Controllers
If your embedded system includes DRAM, there is probably a DRAM controller onboard (or
on-chip) as well. The PXA255 has a DRAM controller on-chip. The DRAM controller is an
extra piece of hardware placed between the processor and the memory chips. Its main purpose
is to perform the refresh operations required to keep your data alive in the DRAM. However,
it cannot do this properly without some help from you.
One of the first things your software must do is initialize the DRAM controller. If you do not
have any other RAM in the system, you must do this before creating the stack or heap,
because those areas of memory would then be located in the DRAM. This initialization code
is usually written in assembly language and placed within the hardware-initialization module.
Almost all DRAM controllers require a short initialization sequence that consists of one or
more setup commands. The setup commands tell the controller about the hardware interface
to the DRAM and how frequently the data there must be refreshed. To determine the
initialization sequence for your particular system, consult the designer of the board or read the
databooks that describe the DRAM and DRAM controller. If the DRAM in your system does
not appear to be working properly, it could be that the DRAM controller either is not
initialized or has been initialized incorrectly.
Page 102
Page 103
Volatile?
Writable?
Erase/rewrite
size
Erase/rewrite
cycles
Relative cost
Relative
speed
SRAM
Yes
Yes
Byte
Unlimited
Expensive
Fast
DRAM
Yes
Yes
Byte
Unlimited
Moderate
Moderate
Masked
ROM
No
No
N/A
N/A
Inexpensive
(in quantity)
Slow
PROM
No
Once, with
programmer
N/A
N/A
Moderate
Slow
EPROM
No
Yes, with
programmer
Entire chip
Limited (see
specs)
Moderate
Slow
EEPROM No
Yes
Byte
Limited (see
specs)
Expensive
Moderate to
read, slow to
write
Flash
No
Yes
Sector
Limited (see
specs)
Moderate
Fast to read,
slow to write
NVRAM
No
Yes
Byte
None
Expensive
Fast
Page 104
Big-endian
Means that the most significant byte of any multibyte data field is stored at the lowest memory
address, which is also the address of the larger field
Page 105
Little-endian
Means that the least significant byte of any multibyte data field is stored at the lowest memory
address, which is also the address of the larger field
The origin of the odd terms big-endian and little-endian can be traced to the 1726 book Gulliver's
Travels, by Jonathan Swift. In one part of the story, resistance to an imperial edict to break soft-boiled
eggs on the "little end" escalates to civil war. The plot is a satire of England's King Henry VIII's break
with the Catholic Church. A few hundred years later, in 1981, Danny Cohen applied the terms and the
satire to our current situation in IEEE Computer (vol. 14, no. 10).
Page 106
htons( )
Reorder the bytes of a 16-bit unsigned value from processor order to network order. The macro
name can be read "host to network short."
htonl( )
Page 107
ntohs( )
Reorder the bytes of a 16-bit unsigned value from network order to processor order. The macro
name can be read "network to host short."
ntohl( )
Reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro
name can be read "network to host long."
Following is an example of the implementation of these macros. We will take a look at the left shift (<<)
and right shift (>>) operators in Chapter 7.
#if defined(BIG_ENDIAN) && !defined(LITTLE_ENDIAN)
#define
#define
#define
#define
htons(A)
htonl(A)
ntohs(A)
ntohl(A)
(A)
(A)
(A)
(A)
#define ntohs
#define ntohl
((((uint16_t)(A)
(((uint16_t)(A)
((((uint32_t)(A)
(((uint32_t)(A)
(((uint32_t)(A)
(((uint32_t)(A)
htons
htohl
&
&
&
&
&
&
0xff00) >> 8) | \\
0x00ff) << 8))
0xff000000) >> 24) | \\
0x00ff0000) >> 8) | \\
0x0000ff00) << 8) | \\
0x000000ff) << 24))
#else
#error Either BIG_ENDIAN or LITTLE_ENDIAN must be #defined, but not both.
#endif
If the processor on which the TCP/IP stack is to be run is itself also big-endian, each of the four macros
will be defined to do nothing, and there will be no runtime performance impact. If, however, the
processor is little-endian, the macros will reorder the bytes appropriately. These macros are routinely
called when building and parsing network packets and when socket connections are created.
Page 108
Page 109
Problems with the wiring between the processor and memory device
Missing memory chips
Improperly inserted memory chips
These are the problems that a good memory test algorithm should be able to detect. Such a test should
also be able to detect catastrophic memory failures without specifically looking for them. So let's discuss
circuit board problems in more detail.
Address signal
Data signal
Control signal
The address and data signals select the memory location and transfer the data, respectively. The control
signals tell the memory device whether the processor wants to read or write the location and precisely
when the data will be transferred. Unfortunately, one or more of these wires could be improperly routed
or damaged in such a way that it is either shorted (i.e., connected to another wire on the board) or open
(not connected to anything). Shorting is often caused by a bit of solder splash, whereas an open wire
could be caused by a broken trace. Both cases are illustrated in Figure 6-3.
Page 110
Problems with the electrical connections to the processor will cause the memory device to behave
incorrectly. Data might be corrupted when it's stored, stored at the wrong address, or not stored at all.
Each of these symptoms can be explained by wiring problems on the data, address, and control signals,
respectively.
If the problem is with a data signal, several data bits might appear to be "stuck together" (i.e., two or
more bits always contain the same value, regardless of the data transmitted). Similarly, a data bit might
be either "stuck high" (always 1) or "stuck low" (always 0). These problems can be detected by writing a
sequence of data values designed to test that each data pin can be set to 0 and 1, independently of all the
others.
If an address signal has a wiring problem, the contents of two memory locations might appear to
overlap. In other words, data written to one address will instead overwrite the contents of another
address. This happens because an address bit that is shorted or open causes the memory device to see an
address different from the one selected by the processor.
Another possibility is that one of the control signals is shorted or open. Although it is theoretically
possible to develop specific tests for control signal problems, it is not possible to describe a general test
that covers all platforms. The operation of many control signals is specific to either the processor or the
memory architecture. Fortunately, if there is a problem with a control signal, the memory probably won't
work at all, and this will be detected by other memory tests. If you suspect a problem with a control
signal, it is best to seek the advice of the board's designer before constructing a specific test.
Page 111
Page 112
Table 6-2. Consecutive data values for an 8-bit walking 1's test
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000
Because we are testing only the data bus at this point, all of the data values can be written to the same
address. Any address within the memory device will do. However, if the data bus splits as it makes its
Page 113
Page 114
return 1;
}
Page 115
Page 116
Binary value
Inverted value
0x00
00000001
11111110
0x01
00000010
11111101
0x02
00000011
11111100
0x03
00000100
11111011
...
...
...
0xFE
11111111
00000000
Page 117
Binary value
00000000
Inverted value
11111111
The function memtestDevice implements just such a two-pass increment/decrement test. It accepts three
parameters from the caller. The first parameter is the starting address, the second is the number of bytes
to be tested, and the third is used to return the address of the failure, if one occurs. The first two
parameters give the user maximum control over which areas of memory are overwritten. The function
returns 1 on success, and the parameter ppFailAddr is set to NULL. Otherwise, 0 is returned and the first
address that contains an incorrect data value is returned in the parameter ppFailAddr.
/**********************************************************************
*
* Function:
memtestDevice
*
* Description: Test the integrity of a physical memory device by
*
performing an increment/decrement test over the
*
entire region. In the process, every storage bit
*
in the device is tested as a zero and a one. The
*
base address and the size of the region are
*
selected by the caller.
*
* Notes:
*
* Returns:
0 if the test fails. The failure address is returned
*
in the parameter ppFailAddr.
*
1 if the test succeeds. The parameter ppFailAddr is
*
set to NULL.
*
**********************************************************************/
int memtestDevice(datum *pBaseAddress, uint32_t numBytes, datum **ppFailAddr)
{
uint32_t offset;
uint32_t numWords = numBytes / sizeof(datum);
datum
pattern;
*ppFailAddr = NULL;
/* Fill memory with a known pattern. */
for (pattern = 1, offset = 0; offset < numWords; pattern++, offset++)
pBaseAddress[offset] = pattern;
/* Check each location and invert it for the second pass. */
for (pattern = 1, offset = 0; offset < numWords; pattern++, offset++)
{
if (pBaseAddress[offset] != pattern)
{
*ppFailAddr = &pBaseAddress[offset];
return 0;
}
Page 118
pBaseAddress[offset] = ~pattern;
}
/* Check each location for the inverted pattern and zero it. */
for (pattern = 1, offset = 0; offset < numWords; pattern++, offset++)
{
if (pBaseAddress[offset] != ~pattern)
{
*ppFailAddr = &pBaseAddress[offset];
return 0;
}
pBaseAddress[offset] = 0;
}
return 1;
}
Page 119
(datum *)(0x00500000)
(0x10000)
/**********************************************************************
*
* Function:
main
*
* Description: Test a 64 KB block of DRAM.
*
* Notes:
*
* Returns:
0 on failure.
*
1 on success.
*
**********************************************************************/
int main(void)
{
datum *pFailAddr;
/* Configure the LED control pins. */
ledInit( );
/* Make sure all LEDs are off before we start the memory test. */
ledOff(LED_GREEN | LED_YELLOW | LED_RED);
if ((memtestDataBus(BASE_ADDRESS, &pFailAddr) != 1) ||
(memtestAddressBus(BASE_ADDRESS, NUM_BYTES, &pFailAddr) != 1) ||
(memtestDevice(BASE_ADDRESS, NUM_BYTES, &pFailAddr) != 1))
{
ledOn(LED_RED);
return 0;
}
else
{
ledOn(LED_GREEN);
return 1;
}
}
Unfortunately, it is not always possible to write memory tests in a high-level language. For example, C
requires the use of a stack. But a stack itself requires working memory. This might be reasonable in a
system that has more than one memory device. For example, you might create a stack in an area of
RAM that is already known to be working, while testing another memory device. In a common situation,
a small SRAM could be tested from assembly and the stack could be created in this SRAM afterward.
Then a larger block of DRAM could be tested using a test algorithm implemented in a high-level
language, such as the one just shown. If you cannot assume enough working RAM for the stack and data
needs of the test program, you will need to rewrite these memory test routines entirely in assembly
language.
Page 120
It might be possible to use the processor cache for the stack. Or if the processor
uses a link register, and variables are kept in registers, it may still be possible to
write tests in C without needing a stack.
Another option is to run the memory test program from an in-circuit emulator. In this case, you could
choose to place the stack in an area of the emulator's own internal memory. By moving the emulator's
internal memory around in the target memory map, you could systematically test each memory device
on the target.
Running an emulator before you are assured that your hardware is working entails
risk. If there is a physical (electrical/bus) fault in your system, the fault could
destroy your expensive ICE.
You also need to be careful that the processor's cache does not fool you into thinking that the memory
tests falsely succeeded. For example, imagine that the processor stores the data that you intended to
write out to a particular memory location in its cache. When you read that memory location back, the
processor provides the cached value. In this case, you get a valid result regardless of whether there is an
actual memory error. It is best to run the memory tests with the cache (at least the data cache) disabled.
The need for memory testing is perhaps most apparent during product development, when the reliability
of the hardware and its design are still unproved. However, memory is one of the most critical resources
in any embedded system, so it might also be desirable to include a memory test in the final release of
your software. In that case, the memory test and other hardware confidence tests should be run each time
the system is powered on or reset. Together, this initial test suite forms a set of hardware diagnostics. If
one or more of the diagnostics fail, a repair technician can be called in to diagnose the problem and
repair or replace the faulty hardware.
6.5.1. Checksums
How can we tell whether the data or program stored in a nonvolatile memory device is still valid? One
of the easiest ways is to compute a checksum of the data when it is known to be validprior to
programming the ROM, for example. Then, each time you want to confirm the validity of the data, you
Page 121
Page 122
The divisor is simply a binary representation of the coefficients of the generator polynomial, each of
which is either 0 or 1. To make this even more confusing, the highest-order coefficient of the generator
polynomial (always a 1) is left out of the binary representation. For example, the polynomial in CRC16
has four nonzero coefficients. But the corresponding binary representation has only three 1s in it (bits
15, 2, and 0).
CRC16
CRC32
Checksum size
(width)
16 bits
Generator
polynomial
x16 + x15 + x2 + x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 +
1
x2 + x 1 + 1
32 bits
0x04C11DB7
Initial remainder
0x0000
0xFFFFFFFF
0x0000
0xFFFFFFFF
The following code can be used to compute any CRC formula that has a similar set of parameters. To
make this as easy as possible, we have defined all of the CRC parameters as constants. To select the
CRC parameters according to the desired standard, define one (and only one) of the macros CRC16 or
CRC32.
/* The CRC parameters. Currently configured for CRC16. */
#define CRC_NAME
"CRC16"
#define POLYNOMIAL
0x8005
#define INITIAL_REMAINDER
0x0000
#define FINAL_XOR_VALUE
0x0000
#define REFLECT_DATA
TRUE
#define REFLECT_REMAINDER
TRUE
Page 123
0xBB3D
(8 * sizeof(crc_t))
(1 << (WIDTH - 1))
The function crcCompute can be called over and over from your application to compute and verify CRC
checksums.
/*********************************************************************
*
* Function:
crcCompute
*
* Description: Compute the CRC of a given message.
*
* Notes:
*
* Returns:
The CRC of the message.
*
*********************************************************************/
crc_t crcCompute(uint8_t const message[], uint32_t numBytes)
{
crc_t
remainder = INITIAL_REMAINDER;
uint32_t byte;
int
nBit;
/* Perform modulo-2 division, a byte at a time. */
for (byte = 0; byte < numBytes; byte++)
{
/* Bring the next byte into the remainder. */
remainder ^= (REFLECT_DATA(message[byte]) << (WIDTH - 8));
/* Perform modulo-2 division, a bit at a time. */
for (nBit = 8; nBit > 0; nBit--)
{
/* Try to divide the current data bit. */
if (remainder & TOPBIT)
remainder = (remainder << 1) ^ POLYNOMIAL;
else
remainder = (remainder << 1);
}
}
/* The final remainder is the CRC result. */
return (REFLECT_REMAINDER(remainder) ^ FINAL_XOR_VALUE);
}
A function named crcFast that uses a lookup table to compute a CRC more efficiently is included on
this book's web site (http://www.oreilly.com/catalog/embsys2). Precomputing the remainders for all 256
Page 124
Limit downtime
The timing of the upgrade should take place during downtime. Since the unit will probably not
be able to function at its full capacity during the upgrade, you need to make sure that the unit is
not performing a critical task. The customer will have to dictate the most convenient time.
Power failure
How will the unit recover should power be removed (intentionally or otherwise) while the
upgrade is taking place? If only a few bytes of the application image have been programmed into
flash when the power is removed, you need a way to determine that an error occurred and
prevent that code from executing. A solution may be to include a loader (similar to a debug
monitor) that cannot be erased because it resides in protected flash sectors. One of the boot tasks
for the loader is to check the flash memory for a valid application image (i.e., for a valid
checksum). If a valid image is not present, the loader needs to know how to get a valid image
onto the board, via serial port, network, or some other means.
Another solution for power failures may be to include a flash memory device that is large enough
to store two application images: the current image and the old image. When new firmware is
available, the old image is overwritten with the new software; the current image is left alone.
Only after the image has been programmed properly and verified does it become the current
image. This technique ensures that the unit always has a valid application image to execute
should something bad happen during the upgrade procedure.
Page 125
Security
If security of the image is an issue, you may need to find an algorithm to digitally sign and/or
encrypt the new software. The validation and decryption of the software would then be
performed prior to programming the new software into the flash memory.
Page 126
Page 127
Chapter 7. Peripherals
Each pizza glides into a slot like a circuit board into a computer, clicks into place as the smart box
interfaces with the onboard system of the car. The address of the customer is communicated to the car,
which computes and projects the optimal route on a heads-up display.
Neal Stephenson Snow Crash
In addition to the processor and memory, most embedded systems contain a handful of other hardware
devices. Some of these devices are specific to each embedded system's application domain, while
otherssuch as timers/counters and serial portsare useful in a wide variety of systems. The most
commonly used devices are often included within the same chip as the processor and are called internal,
or on-chip, peripherals. Hardware devices that reside outside the processor chip are, therefore, said to be
external peripherals. In this chapter, we'll discuss the most common software issues that arise when
interfacing to a peripheral of either type.
Page 128
/* First write. */
delay_ms(1000);
/* Set GPIO pin 1 high. */
*pGpio0Set = 2;
/* Second write. */
If the volatile keyword was not used to declare the variable pGpio0Set, the optimizer would be
permitted to change the operation of the code. For example, the compiler might remove the setting of
pGpio0Set to 1 in the previous code because the compiler can't see any purpose to this setting. If the
compiler intervened in this manner, the GPIO pins would not operate as the software developer
intended. So the volatile keyword instructs the optimization phase of the compilation to leave every
change to a variable in place and to assume that the variable's contents cannot be predicted by earlier
states.
It would be wrong to interpret the declaration statement of pGpio0Set to mean that the pointer itself is
volatile. In fact, the value of the variable pGpio0Set will remain 0x40E00018 for the duration of the
program (unless it is changed somewhere else, of course). The data that is pointed to, rather, is subject to
change without notice. This is a very subtle point, and thinking about it too much may confuse you. Just
remember that the location of a register is fixed, though its contents might not be. And if you use the
volatile keyword, the compiler will assume the same.
Page 129
In this case, we'll imagine that the value in the timer status register, contained in the variable
pTimerStatus, is 0x4C; the & operator performs an AND operation with 0x08. The operation looks like
this:
0 1 0 0 1 1 0 0 (0x4C)
AND (&)
0 0 0 0 1 0 0 0 (0x08)
=======================
0 0 0 0 1 0 0 0 (0x08)
Because the proper bit is set in the register, the code enters the if statement.
resulting in:
0 1 0 0 1 1 0 0
(0x4C)
OR (|)
0 0 0 1 0 0 0 0 (0x10)
=================
0 1 0 1 1 1 0 0 (0x5C)
For this operation, the inverse of 0x04 equals 0xFB. The &= operator sets bit 2 of the timer status register
to 0, while leaving all other bits unchanged. The operation looks like this:
0 1 0 1 1 1 0 0 (0x5C)
AND (&)
NOT (~) 1 1 1 1 1 0 1 1 (0xFB)
=================
0 1 0 1 1 0 0 0 (0x58)
Note that all bits in the register remain the same except for the bit we want to clear.
(0x58)
XOR (^)
1 0 0 0 0 0 0 0 (0x80)
=================
1 1 0 1 1 0 0 0 (0xD8)
Page 131
(0xAC)
>> by 1
=================
0 1 0 1 0 1 1 0 (0x56)
In this case, a 0 is shifted in from the left. However, the C standard also allows the most significant bit to
be repeated when the variable is signed. We recommend you use unsigned integers for variables on
which you perform bit operations so that you will not have to worry about the different results on
different compilers.
Assume the value of the 8-bit unsigned integer bitCount is again 0xAC and is shifted left by 2 bits:
bitCount <<= 2;
(0xAC)
<< by 2
=================
1 0 1 1 0 0 0 0 (0xB0)
One reason to use a shift is if you want to perform an operation on each bit of a register in turn; you can
create a bitmask (discussed in the next section) with 1 bit set or clear and shift it so you can operate on
the individual bits of the register.
7.1.1.6. Bitmasks
A bitmask is a constant often used along with bitwise operators to manipulate one or more bits in a
larger integer field. A bitmask is a constant binary pattern, such as the 16-bit hexadecimal literal
0x00FF, that can be used to mask specific bits. Bitmasks can be used with bitwise operators in order to
set, test, clear, and toggle bits. Following are example bitmasks for the timer status register:
Page 132
(0x08)
(0xC0)
The bitmasks TIMER_COMPLETE and TIMER_ENABLE are descriptive names that correspond to specific bits
in a peripheral's register. Using a symbolic (e.g., #define) bitmask allows you to write code that is more
descriptive and almost self-commented. By replacing hexadecimal literals with words, the definition
makes it easier for you (or someone else) to understand the code at a later time. Here is an example of a
bitwise operation involving a bitmask:
if (*pTimerStatus & TIMER_COMPLETE)
{
/* Do something here... */
}
Bitmask Macros
Here is a handy macro that will help you avoid typos in long hexadecimal literals:
#define BIT(X)
(1 << (X))
To define a specific register bit in a bitmask, such as bit 22, use the macro as follows:
#define TIMER_STATUS
BIT(22)
7.1.1.7. Bitfields
A bitfield is a field of one or more bits within a larger integer value. Bitfields are useful for bit
manipulations and are supported within a struct by C language compilers.
struct
{
uint8_t
uint8_t
uint8_t
uint8_t
uint8_t
} foo;
bit0
bit1
bit2
bit3
nibble
:
:
:
:
:
1;
1;
1;
1;
4;
Bits within a bitfield can be individually set, tested, cleared, and toggled without affecting the state of
the other bits outside the bitfield.
To test bits using the bitfield, use code such as the following:
Page 133
Here's how to test a wider field (such as two bits) using a bitfield:
if (foo.nibble == 0x03)
{
/* Do other stuff. */
}
And use code such as the following to set multiple bits in a bitfield:
foo.nibble = 0xC;
There are some issues you must be aware of should you decide to use bitfields. Bitfields are not
portable; some compilers start from the least significant bit, while others start from the most significant
bit. In some cases, the compiler may require enclosing the bitfield within a union; doing this makes the
bitfield code portable across ANSI C compilers.
In the following example, we use a union to contain the bitfield. In addition to making the bitfield code
portable, the union provides wider register access.
union
{
uint8_t byte;
struct
{
uint8_t bit0
uint8_t bit1
uint8_t bit2
: 1;
: 1;
: 1;
Page 134
bit3
: 1;
nibble : 4;
Instead of accessing only individual bits, the register can be written to as a whole. For example, the
bitfield union, along with bitmasks, can be useful when initializing a register, as shown here:
foo.byte = (TIMER_COMPLETE | TIMER_ENABLE);
Bitmasks are more efficient than bitfields in certain instances. Specifically, a bitmask is usually a better
way to initialize several bits in a register. For example, the following code initializes the timer status
register by setting the two bits denoted by the macros and clearing all others:
*pTimerStatus = (TIMER_COMPLETE | TIMER_ENABLE);
Setting and clearing bits using a bitfield is no faster than using a bitmask; with some compilers, it can be
slower to use a bitfield. One benefit of using bitfields is that individual bitfields may be declared
volatile or const. This is useful when a register is writeable but contains a few read-only bits.
Unique Registers
Some registers (or bits within a register) can be read-only or write-only. For write-only
registers, read-modify-write operations (such as |=, &=, and ^=) cannot be used. In this case, a
shadow copy of the register's contents should be held in a variable in RAM to maintain the
current state of the write-only register. An example of a write-only register using a shadow
copy timerRegValue follows:
/* Initialize timer write-only register. */
timerRegValue = TIMER_INTERRUPT;
*pTimerReg = timerRegValue;
After the shadow copy and timer register have been initialized, subsequent writes to the
register are performed by first modifying the shadow copy timerRegValue and then writing
the new value to the register. For example:
timerRegValue |= TIMER_ENABLE;
*pTimerReg = timerRegValue;
Page 135
/*
/*
/*
/*
Offset
Offset
Offset
Offset
0
2
4
6
*/
*/
*/
*/
Note that the individual fields of a struct, as well as the entire struct, can be
declared volatile.
When you use a struct overlay to access registers, the compiler constructs the actual memory-mapped
I/O addresses. The members of the timer_t struct defined in the previous example have the address
offsets shown in Table 7-1.
Offset
count
0x00
maxCount
0x02
_reserved1
0x04
Page 136
Offset
control
0x06
It is very important to be careful when creating a struct overlay to ensure that the sizes and addresses of
the underlying peripheral's registers map correctly.
The bitwise operators shown earlier to test, set, clear, and toggle bits can also be used with a struct
overlay. The following code shows how to access the timer peripheral's registers using the struct
overlay. Here's the code for testing bits:
if (pTimer->control & 0x08)
{
/* Do something here... */
}
Page 137
These two calls closely resemble the way all flash chips work in regard to reads and writes. An erase
operation can be performed only on an entire sector. Once erased, individual bytes or words can be
rewritten. But the interfaces here hide the specific features of the flash device and its functions from
higher software levels, as desired.
Device drivers for embedded systems are quite different from their PC counterparts. In a generalpurpose computer, the core of the operating system is distinct from the device drivers, which are often
written by people other than the application developers. The operating system offers an interface that
drivers must adhere to, while the rest of the system and applications depend on drivers doing so. For
example, Microsoft's operating systems impose strict requirements on the software interface to a
network card. The device driver for a particular network card must conform to this software interface,
regardless of the features and capabilities of the underlying hardware. Application programs that want to
use the network card are forced to use the networking API provided by the operating system and don't
have direct access to the card itself. In this case, the goal of hiding the hardware completely is easily
met.
By contrast, the application software in an embedded system can easily access the hardware. In fact,
because all of the software is generally linked together into a single binary image, little distinction is
made between the application software, operating system, and device drivers. Drawing these lines and
enforcing hardware access restrictions are purely the responsibilities of the software developers. Both
are design decisions that the developers must consciously make. In other words, the implementers of
embedded software can more easily cheat on the software design than can their nonembedded peers.
The benefits of good device driver design are threefold:
Because of the modularity, the structure of the overall software is easier to understand. In
addition, it is easier to add or modify features of the overall application as it evolves and
matures, even in deployed units.
Because there is only one module that ever interacts directly with the peripheral's registers, the
state of the hardware device can be more accurately tracked.
Software changes that result from hardware changes are localized to the device driver, thereby
making the software more portable.
Each of these benefits can and will help to reduce the total number of bugs in your embedded software
and enhance the reusability of your code across systems. But you have to be willing to put in a bit of
extra effort up front, at design time, in order to realize the savings.
Page 138
Because the device driver contains the code to operate the hardware, the application software does not
need to be complicated by these details. For example, looking back at the Blinking LED program, the
file led.c is the LED device driver. This file contains all of the knowledge about how to initialize and
operate the LED on the Arcom board. The LED device driver provides an API consisting of ledInit
and ledToggle. The application in blink.c uses this API to toggle the LED. The application does not
need to concern itself with the operation of GPIO registers in the PXA255 in order to get the LED to
perform a certain task.
The philosophy of hiding all hardware specifics and interactions within the device driver usually
consists of the five components in the following list. To make driver implementation as simple and
incremental as possible, these elements should be developed in the order they are presented.
1. An interface to the control and status registers.
2. For a commonly used memory-mapped I/O, the first step in the driver development process is to
create a representation of the memory-mapped registers of your device. This usually involves
studying the databook for the peripheral and creating a table of the control and status registers
and their offsets. The method for representing the control and status registers can be whatever
style you feel comfortable implementing.
3. Variables to track the current state of the physical (and logical) devices.
4. The second step in the driver development process is to figure out what state variables you will
need. For example, we'll probably need to define variables to remind us whether the hardware
has been initialized. Write-only registers are also good candidates for state variables.
Page 139
Page 140
As shown in Figure 7-2, the PXA255 UART connects to the RS-232 Transceiver, which then connects
to the COM1 DB-9 connector on the Arcom board. The transceiver converts the voltage level that the
Arcom board's processor uses to RS-232 voltage levels. This allows the Arcom board's UART to
communicate with a PC via its serial port.
The next step is to understand how the particular peripheral works. What ICs need to be programmed in
order to control the peripheral? For this serial driver, we only need to focus on the UART peripheral
registers in the processor. The information about these registers is contained in the processor's
documentation.
For information about the PXA255 processor UARTs, check the PXA255 Processor Developer's
Manualspecifically, Section 10: UARTs. Information about interrupts is contained in Section 4.2:
Interrupt Controller. While reading this documentation, the goal is to get an understanding of several
different concepts, including:
The register structure for controlling the peripheralthat is, how to set up communications and
how to get data into and out of the peripheral
The addresses of the control and status registers
The method that will be used for the peripheral's operation (namely, polling or interrupts)
If using interrupts, what conditions can cause interrupts, how the software driver is informed
when an interrupt occurs, and how the interrupt is acknowledged
Get a firm grasp on what the device driver will need to do to get the peripheral to perform its task within
the system. Once these initial steps are complete, you can move on to the task of writing the device
driver software.
Page 141
The variable pSerialPort is used to access the UART registers at address 0x40100000 and is defined
as:
uart_t *pSerialPort = (uart_t *)(0x40100000);
Page 142
Page 143
(0x40)
/**********************************************************************
*
* Function:
serialPutChar
*
* Description: Send a character via the serial port.
*
* Notes:
This function is specific to the Arcom board.
*
* Returns:
None.
*
**********************************************************************/
void serialPutChar(char outputChar)
{
/* Wait until the transmitter is ready for the next character. */
while ((pSerialPort->uartStatus & TRANSMITTER_EMPTY) == 0)
;
/* Send the character via the serial port. */
pSerialPort->data = outputChar;
}
The serial device driver API function serialGetChar waits until a character is received and then reads
the character from the serial port. To determine whether a character has been received, the data ready bit
is checked in the UART status register. The character received is returned to the calling function. Here is
the serialGetChar function:
#define DATA_READY
(0x01)
/**********************************************************************
*
* Function:
serialGetChar
*
* Description: Get a character from the serial port.
*
* Notes:
This function is specific to the Arcom board.
*
* Returns:
The character received from the serial port.
*
**********************************************************************/
char serialGetChar(void)
{
/* Wait for the next character to arrive. */
while ((pSerialPort->uartStatus & DATA_READY) == 0)
;
return pSerialPort->data;
}
Page 144
Because this serial device driver does not use interrupts, the final step in the device driver philosophy
implementing device driver interrupt service routinesis skipped.
in the
PC's terminal program, the program exits; otherwise, the loop continues and checks for another
incoming character.
#include "serial.h"
/**********************************************************************
*
* Function:
main
*
* Description: Exercise the serial device driver.
*
* Notes:
*
* Returns:
This routine contains an infinite loop, which can
*
be exited by entering q.
*
**********************************************************************/
int main(void)
{
char rcvChar = 0;
/* Configure the UART for the serial driver. */
serialInit( );
Page 145
Selectable configuration
You can change serialInit to take a parameter that allows the calling function to specify the
initial communication parameters, such as baud rate, for the serial port.
Error checking
It is important for the device driver to do adequate error checking. Another enhancement would
be to define error codes (such as parameter error, hardware error, etc.) for the device driver API.
The functions in the device driver would then use these error codes to return status from the
attempted operation. This allows the higher-level software to take note of failures and/or retry.
Page 146
FIFO usage
Typically, UARTs contain FIFOs for the data received and transmitted. Using these FIFOs adds
buffering to both the receive and transmit channels, making the UART driver more robust.
Interrupts
Implementing UART interrupts for reception and transmission is usually better than using
polling. For example, in the function serialGetChar, using interrupts would eliminate the need
for the driver to sit in a loop waiting for an incoming character. The application software is thus
able to perform other work while waiting for data to be received.
Interrupt priorities
If interrupts are used for the device drivers in a system, you need to determine and set
appropriate priority levels.
Complete requirements
You need to be aware of the requirements of the various peripherals in the system. You don't
want to design and implement your software in a manner that unknowingly handicaps the
Page 147
Resource usage
It is important to understand what resources are necessary for each device driver. For example,
imagine designing an Ethernet device driver in a system with a very limited amount of memory.
This limitation would affect the buffering scheme implemented in the Ethernet driver. The driver
might accommodate the storage of only a few incoming packets, which would affect the
throughput of the Ethernet interface.
Resource sharing
Beware of possible situations where multiple device drivers need to access common hardware
(such as I/O pins) or common memory. This can make it difficult to track down bugs if the
sharing scheme is not thoroughly thought out ahead of time. We will take a look at mechanisms
for resource sharing in Chapters 8 and 10.
Chapter 8. Interrupts
And, as Miss [Florence] Nightingale was so vehemently to complain"women never have an half
hour... that they can call their own"she was always interrupted.
Virginia Woolf A Room of One's Own
In this chapter, we'll take a look at interruptsa sophisticated way of managing relationships with
external devices. Interrupts are an important aspect of embedded software development and one that
programmers need to study carefully in order to create efficient applications. The start of this chapter
gives an introduction to interrupts and different characteristics associated with them. It is important to
understand what happens when an interrupt event occurs and how the interrupt is processed. Although
the implementation of interrupts is processor-specific, much of the material in this chapter applies to all
processors. Finally, we will expand on the Blinking LED example by using an interrupt found in
practically all processors.
8.1. Overview
An interrupt can be used to signal the processor for all sorts of eventsfor example, because data has
arrived and can be read, a user flipped a switch, or a specific amount of time has elapsed.
Page 148
Page 149
Figure 8-1(a) shows peripherals connected directly to the interrupt pins of the processor. In this case, the
processor contains an interrupt controller on-chip to manage and process incoming interrupts. The
PXA255 has an internal interrupt controller.
Figure 8-1(b) shows the two peripherals connected to an external interrupt controller device. An
interrupt controller multiplexes several input interrupts into a single output interrupt. The controller also
allows control over these individual input interrupts for disabling them, prioritizing them, and showing
which are active.
Because many embedded processors contain peripherals on-chip, the interrupts from these peripherals
are also routed to the interrupt controller within the main processor. Sometimes more interrupts are
required in a system than there are interrupt pins in the processor. For these situations, peripherals can
share an interrupt. The software must then determine which device caused the interrupt.
Interrupts can be either maskable or nonmaskable. Maskable interrupts can be disabled and enabled by
software. Nonmaskable interrupts (NMI) are critical interrupts, such as a power failure or reset, that
cannot be disabled by software.
A complete listing of the interrupts in your system can be constructed from information in the reference
manuals for your processor and board. For example, the interrupts supported by the PXA255 processor
are detailed in the PXA255 Processor Developer's Manual. A partial list of the supported interrupts for
the PXA255 is shown in Table 8-1. We will take a look at what the interrupt number means shortly.
Interrupt source
GPIO Pin 0
GPIO Pin 1
Page 150
Interrupt source
11
USB
26
Timer 0
27
Timer 1
28
Timer 2
8.1.1. Priorities
Because interrupts are asynchronous events, there must be a way for the processor to determine which
interrupt to deal with first should multiple interrupts happen simultaneously. The processor defines an
interrupt priority for all of the interrupts and exceptions it supports. The interrupt priorities are found in
the processor's documentation.
For example, the ARM processor supports six types of interrupts and exceptions. [*] The priorities of
these interrupts and exceptions are shown in Table 8-2. This table is contained in the XScale
Microarchitecture User's Manual.
[*]
ARMv6 also includes an imprecise abort exception priority between IRQ and prefetch abort.
Exception/interrupt source
1 (highest)
Reset
Data abort
Prefetch abort
6 (lowest)
Typically, when an interrupt occurs, a processor disables all interrupts at the same- or lower-priority
levels. If multiple interrupts are waiting to be processed or are pending, the priority associated with the
interrupts determines the order in which they are executed.
Page 151
The interrupt priority is set by hardware design, by software, or, in some cases, by a combination of the
two. If we look back at the wiring diagram in Figure 8-1(a), we see that the processor has four interrupt
pins (INT0 through INT3). For this example, we'll assume the processor gives INT0 the highest priority,
followed by INT1, INT2, and INT3. The hardware designer must wire the interrupt pins so that the
correct interrupt priorities are assigned to the various peripheral interrupts. In this case, the interrupt
from Peripheral A has the highest priority.
Some interrupt controllers allow the priorities of interrupts to be set in software. In this case, the
interrupt controller typically has registers that set the priorities of the various interrupts.
Page 152
In Figure 8-2, the level-sensitive interrupt is active high. The time when the interrupt is active is shown
in the signal diagram, which is the time when the signal is at the higher voltage. The interrupt signal on
the bottom of Figure 8-2 is a rising edge-sensitive interrupt. It is active when the signal transitions from
low to high, is held high for a certain minimum time (typically two or three processor clocks), and then
it returns to low again.
There are issues related to both types of interrupts. For example, an edge-sensitive interrupt can be
missed if a subsequent interrupt occurs before the initial interrupt is serviced. Conversely, a levelsensitive interrupt constantly interrupts the processor as long as the interrupt signal is asserted.
Most peripherals assert their interrupt until it is acknowledged. Some processors, such as the Intel386
EX, contain registers that can be programmed to support either level-sensitive or edge-sensitive
interrupts on individual interrupts. Thus the sensitivity selection affects detection of new interrupts on
that signal.
Acknowledging an interrupt tells the interrupting device that the processor has received the interrupt and
queued it for processing. The method of acknowledging an interrupt can vary from reading an interrupt
controller register to clearing an interrupt pending bit. Once the interrupt is acknowledged, the
peripheral will deassert the interrupt signal. Some processors have an interrupt acknowledge signal that
takes care of this automatically in hardware.
Page 153
The ICMR is located at address 0x40D00004. Figure 8-3 shows the Interrupt Mask (IM) for each of the
interrupts supported in the PXA255. Setting the corresponding bit to 1 in the ICMR allows that
particular source to generate interrupts; setting the corresponding bit to 0 masks that interrupt source.
For example, imagine that the interrupt pin from a peripheral is connected to GPIO pin 0 on the PXA255
processor. Table 8-1 shows that the GPIO pin 0 interrupt source is assigned to interrupt number 8.
Therefore, to enable the GPIO pin 0 interrupt, bit 8 of the ICMR is set to 1. If an interrupt occurs when
the GPIO pin 0 interrupt is enabled, it is routed to the interrupt controller for processing. To mask the
GPIO pin 0 interrupt source, set bit 8 of the ICMR to 0. If an interrupt occurs while the source is
disabled, the interrupt is ignored.
Each processor typically has a global interrupt enable/disable bit in one of its registers. The PXA255 has
two bits in the Current Program Status Register (CPSR) that globally disable all interrupts.
It is important to remember to reenable interrupts in your software after you have
disabled them. This is a common problem that can lead to unexplained behavior in
the system. If interrupts are disabled at the entry to a function, ensure that all
software paths that exit the routine reenable interrupts.
You generally cannot access the global interrupt flags directly using the C language. In this case, you
need to write assembly code to enable and disable global interrupts. Some compiler libraries, such as
those for the x86 family of processors, contain functions to handle global interrupt enabling (with the
enable function) and global interrupt disabling (with the disable function).
Page 154
Address
Reset
0x00000000
Undefined instruction
0x00000004
Software interrupt
0x00000008
Prefetch abort
0x0000000C
Data abort
0x00000010
IRQ
0x00000018
FIQ
0x0000001C
The addresses in Table 8-3 are locations in memory used by the ARM processor to execute the ISR for a
particular interrupt. Information about the interrupt vector table is contained in the documentation about
the processor. Because the addresses in the ARM interrupt vector table are 32 bits apart, the code
installed in the interrupt vector table is a jump to the real ISR.
It is critical for the programmer to install an ISR for all interrupts, even the interrupts that are not used in
the system. If an ISR is not installed for a particular interrupt and the interrupt occurs, the execution of
the program can become undefined (commonly called "going off into the weeds").
Page 155
Interrupt source
Ethernet
11
USB
21
Serial Port 2
22
Serial Port 1
26
Timer 0
27
Timer 1
28
Timer 2
This table is similar to the interrupt list for the PXA255 processor shown in Table 8-1. However, this
table shows the interrupt sources that are specific to the Arcom board.
Once again, our goal is to translate the information in the table into a form that is useful for the
programmer. The interrupt map table should go into your project notebook for future reference. After
constructing an interrupt map such as the one in Table 8-4, you should add a third section to the boardspecific header file. Each line of the interrupt map becomes a single #define within the file, as shown
here:
/**********************************************************************
* Interrupt Map
**********************************************************************/
#define
#define
#define
#define
#define
#define
#define
ETHERNET_INT
USB_INT
SERIAL2_INT
SERIAL1_INT
TIMER0_INT
TIMER1_INT
TIMER2_INT
(8)
(11)
(21)
(22)
(26)
(27)
(28)
Page 156
Page 157
The documentation for the compiler shows whether the interrupt keyword is supported. If the
compiler does not support this keyword, a compiler-specific #pragma may be required to declare an ISR.
The GNU compiler gcc uses a third approach, involving the compiler-specific keyword _ _attribute_
_, which takes options as arguments, as shown here:
void interruptServiceRoutine(
) _ _attribute_
_ ((interrupt ("IRQ")));
Some processors, such as certain Microchip PICs, can have only one ISR for all interrupts. This ISR
must determine the source of the interrupt by checking each potential interrupt source. In this case, it is a
good idea to check the most important interrupt first. The technique used by the ISR to determine that
the interrupt source is hardware-specific.
When designing your software, it is typically best to include the ISR for a
particular device in the driver for that peripheral. This keeps all the device-specific
code for a particular peripheral isolated in a single module.
Figure 8-4 is a graphical representation of the interrupt process. For this example, we will assume the
Ethernet network interface controller generates the interrupt, although this process is relevant for any
interrupt.
Page 158
In Figure 8-4, the processor is executing the main program when an interrupt occurs from the Ethernet
network interface controller. The processor finishes the instruction in progress before halting execution
of the main program. (Some processors allow interruption of long instructions so that interrupts are not
delayed for extended periods of time.)
The processor next looks up the address of the ISR for the Ethernet network interface controller,
interruptEthernetISR, in the interrupt vector table, and the processor jumps to this function. The
interruptEthernetISR function saves the processor context to the processor's stack.
The ISR then clears the interrupt. Once complete, the ISR restores the context and returns. The main
program continues its execution from the point at which it was interrupted. Most processors have a
special "return from interrupt" instruction for exiting the ISR.
One important concept associated with interrupts is latency. Interrupt latency is the amount of time from
when an interrupt occurs to when the processor begins executing the associated interrupt service routine.
Interrupt latency is a metric used by some engineers to evaluate processors and is very important in realtime systems. Disabling interrupts increases interrupt latency in an embedded system, because the
Page 159
Page 160
This line of code results in assembly-language instructions that do something like this:
LOAD gIndex into a register;
DECREMENT the register value;
STORE the register value back into gIndex;
The first step is to read the value of gIndex, 3, from its location in RAM into a processor register. Next,
the register value is decremented, resulting in a value of 2.
Now suppose a serial port receive interrupt occurs before the new value of gIndex is stored in the
memory. The processor stops executing main and executes the serial port ISR, serialReceiveIsr. The
ISR increments gIndex to a value of 4.
The processor resumes execution of main after the ISR exits. At this point, main executes the line of
code that stores the register value, 2, back into the variable gIndex. Now gIndex has a value of 2, as if
the latest interrupt never occurred to increment gIndex.
Page 161
The decrement code in the main program is called a critical section. A critical section is a part of a
program that must be executed in order and atomically, or without interruption. A line of C code (even
as trivial as increment or decrement) is not necessarily atomic, as we've seen in this example.
So, how is this problem corrected? Because an interrupt can occur at any time, the only way to make
such a guarantee is to disable interrupts during the critical section. In order to protect the critical section
in the previous example code, interrupts are disabled before the critical section executes and then
enabled after, as shown here:
int main(void)
{
while (1)
{
interruptDisable(
);
if (gIndex)
{
/* Process receive character in memory buffer. */
gIndex--;
}
interruptEnable(
);
}
}
In embedded systems, and especially real-time systems, it is important to keep interrupts enabled as
much as possible to avoid hindering the responsiveness of the system. Try to minimize the number of
critical sections and the length of critical section code.
Page 162
The safest solution is to save the state of the interrupt enable flag, disable
interrupts, execute the critical section, and then restore the state of the interrupt
enable flag. Enabling interrupts at the end without ensuring that they were enabled
at the outset of the critical section is risky.
Race conditions can also occur when the resource shared between an ISR and the main program is a
peripheral or one of the peripheral's registers. For example, suppose a main program and an ISR use the
same peripheral register. The main program reads a register and stores the value. At this point, an ISR
executes and modifies the value in that same register. When the main program resumes and updates the
peripheral register, the ISR's value is overwritten and lost.
Critical sections are also an issue when using a real-time operating system (RTOS), because the tasks
may then also share resources such as global variables or peripheral registers. We will look at this in
Chapter 10.
Page 163
On the PXA255, the timer count register (OSCR) contains a count that is incremented on rising edges of
the timer clock, which operates at 3.6864 MHz. In other words, each time the clock signal goes from
low to high, the OSCR is incremented by one.
The timer match register (OSMRn, where n is the timer number) contains the timer values for the four
different timers. After every rising edge of the timer clock, the processor compares the value in the
OSMRn to the OSCR. If there is a match, an interrupt is generated and the corresponding bit is set in the
timer status register (OSSR). The timer interrupt enable register (OIER) determines which interrupts are
enabled for the four different timers.
Page 164
Watchdog Timers
Another type of timer frequently mentioned in reference to embedded systems is a watchdog
timer. A watchdog timer is a special hardware fail-safe mechanism that intervenes if the
software stops functioning properly. The watchdog timer is periodically reset (sometimes
called "kicking the dog") by software. If the software crashes or hangs, the watchdog timer
soon expires, causing the entire system to be reset automatically.
The inclusion and use of a watchdog timer is a common way to deal with unexpected
software hangs or crashes that may occur after the system is deployed. For example, suppose
your company's new product will travel into space. No matter how much testing you do
before deployment, the possibility remains that there are undiscovered bugs lurking in the
software and that one or more of these is capable of hanging the system altogether. If the
software hangs, you won't be able to communicate with the system, so you can't issue a reset
command remotely. Instead, you must build an automatic recovery mechanism into the
system. And that's where the watchdog timer comes in.
One important implementation detail to remember when using a watchdog timer is that you
should always implement the code that handles resetting the watchdog timer in the main
processing loop. Never implement the watchdog timer reset in an ISR. The reason is that in an
embedded system, the main processing loop can hang while the interrupts and ISRs continue
to function. In this case, the watchdog timer would never be able to reset the system and thus
allow the software to recover.
The main routine for the Blinking LED implementation that uses a timer instead of a delay loop is very
similar to the main routine discussed in Chapter 3. This part of the code is hardware-independent. The
main function starts with initialization of the LED control port with the ledInit function. An
initialization routine for the timer device driver, timerInit, is called to initialize and start the timer
hardware.
The infinite loop in main is empty in this case because there is no other processing needed for this
version of the Blinking LED program. All of the processing happens in the background with the timer
interrupt, but the infinite loop is still needed in order to keep the program running. Notice here that the
delay_ms function has been removed:
#include "led.h"
#include "timer.h"
/**********************************************************************
*
* Function:
main
*
* Description: Blink the green LED once a second.
*
* Notes:
*
Page 165
The timerInit routine initializes the registers needed for the timer device driver and then enables the
timer interrupt. The global state variable bInitialized is used to ensure the timer registers are only
configured once.
The first step to configure the timer is to clear any pending interrupts. This is done by writing bit 0
(defined by the bitmask TIMER_0_MATCH) to the timer status (OSSR) register (defined by the macro
TIMER_STATUS_REG).
For the next step, we need to calculate the interrupt interval. The PXA255 Processor Developer's
Manual states that the timers are incremented by a 3.6864 MHz clock. To toggle the LED every 500 ms,
the following equation is used to determine the value for the timer match register:
Timer Match Register Value = Timer clock x Timer interval
For our example, the calculation is:
Timer Match Register Value = 3,686,400 Hz x 0.5 seconds
= 1,843,200
= 0x001C2000
The macro TIMER_INTERVAL_500MS is set to the interval value 0x001C2000. The PXA255 Processor
Developer's Manual describes the algorithm for setting up a timer as follows:
1. Read the current count value in the timer count register (OSCR).
2. Add the interval offset to the current count value. This value corresponds to the amount of time
before the next time-out.
3. Program the new interval value into the timer match register (OSMR0).
Page 166
For the final step, initializing the timer, set the initialization state variable to TRUE, as shown here:
#define TIMER_INTERVAL_500MS
(0x001C2000)
/**********************************************************************
*
* Function:
timerInit
*
* Description: Initialize and start the timer.
*
* Notes:
This function is specific to the Arcom board.
*
Ensure an ISR has been installed for the timer prior
*
to calling this routine.
*
* Returns:
None.
*
**********************************************************************/
void timerInit(void)
{
static int bInitialized = FALSE;
/* Initialize the timer only once. */
if (bInitialized == FALSE)
{
/* Acknowledge any outstanding timer interrupts. */
TIMER_STATUS_REG = TIMER_0_MATCH;
/* Initialize the timer interval. */
TIMER_0_MATCH_REG = (TIMER_COUNT_REG + TIMER_INTERVAL_500MS);
/* Enable the timer interrupt in the timer peripheral. */
TIMER_INT_ENABLE_REG |= TIMER_0_INTEN;
/* Enable the timer interrupt in the interrupt controller. */
Page 167
Prior to entering the ISR, the processor context is saved. Use the following function declaration so that
the GNU compiler includes the code for the context save (and restore at the end of the ISR):
void timerInterrupt(
) __attribute_
_ ((interrupt ("IRQ")));
Next, the ISR acknowledges the timer 0 interrupt. The processor documentation states that
acknowledging a timer interrupt is accomplished by writing a 1 to the timer 0 bit (defined by the bitmask
TIMER_0_MATCH) of the 32-bit OSSR (defined by the macro TIMER_STATUS). See Figure 8-6 for details
of the OSSR and timer 0 match status bit 0 (T0MS).
Next, the LED state changes with a call to the function ledToggle. Then the timer match value is
updated for the next timer interrupt interval. To update the timer 0 match register, first read the current
timer count, add the timer interval, and then write this result into the timer match register
(TIMER_0_MATCH_REG).
Finally, the processor context is restored and the ISR returns:
#include "led.h"
/**********************************************************************
*
* Function:
timerInterrupt
*
* Description: Timer 0 interrupt service routine.
*
* Notes:
This function is specific to the Arcom board.
*
* Returns:
None.
*
**********************************************************************/
void timerInterrupt(void)
{
/* Acknowledge the timer 0 interrupt. */
TIMER_STATUS = TIMER_0_MATCH;
/* Change the state of the green LED. */
ledToggle( );
/* Set the new timer interval. */
TIMER_0_MATCH_REG = (TIMER_COUNT_REG + TIMER_INTERVAL_500MS);
}
Page 168
Time Sharing
In some embedded systems, you will have more tasks that need to occur at a specific interval
than there are timers to use for each task. Or the processor you must use will only have a
single timer. Don't worry, there are ways around this predicamentyou can share a timer
among several tasks.
For example, imagine that you have the following tasks that must occur in your embedded
system:
Read a temperature sensor every 5 ms.
Write a character out a serial port every 12 ms.
Toggle an I/O pin every 100 ms.
Furthermore, in this example, the processor has only a single timer to use. What are you to
do?
The timer interval is set to the greatest common factor (in this case, 1 ms) of the desired
times. Next, the ISR counts the number of timer intervals that have elapsed. Once the
appropriate number of intervals has occurred for the specific task, a flag is set for that job to
be performed.
There are several ways to implement this type of timer sharing. One way is to have a separate
static counter variable for each interval of which you need to keep track. The following is a
code snippet that would be used in the timer ISR to handle the three different intervals for the
jobs listed:
timer1Count++;
timer2Count++;
timer3Count++;
/* Set the flag to read the temperature sensor every 5 ms
and then reset the counter. */
if (timer1Count >= 5)
{
gbReadTemperatureSensor = TRUE;
timer1Count = 0;
}
/* Set the flag to write the character out the serial port
every 12 ms and then reset the counter. */
if (timer2Count >= 12)
{
Page 169
gbWriteSerialCharacter = TRUE;
timer2Count = 0;
}
/* Set the flag to toggle the I/O pin every 100 ms and
then reset the counter. */
if (timer3Count >= 100)
{
gbTogglePin = TRUE;
timer3Count = 0;
}
The main program would then regularly check whether any global flag is set, perform the
necessary action, and reset the flag. As discussed earlier in this chapter, care must be taken to
avoid a race condition since the global flags are shared between the ISR and the main
program.
Interrupt blocked
Interrupts can be blocked at several points. Ensure that the specific interrupt is enabled both in
the interrupt controller and at the source peripheral device. Make sure that global interrupts are
enabled in the processor.
ISR installation
Verify that the ISR is installed in the interrupt vector table properly and for the correct interrupt
vector. Understand the mapping of interrupts for the processor. Using the LED debug technique
mentioned in Chapter 5 can be a valuable tool for tracing the execution path when an interrupt
occurs.
Page 170
Processor context
Make sure the processor context is saved and restored properly in the ISR. Registers can be
trampled by an ISR that will eventually wreak havoc on your main program.
Page 171
Page 172
In addition to the Monitor and Control program's processing loop and the CLI module, three device
drivers are shown in the figure. These control one of the Arcom board's LEDs, a buzzer, and a serial
port. This layered design approach allows the drivers to be changed when the program is ported to a new
hardware platform, with the application software remaining unchanged.
The main function that follows contains the primary processing loop for the Monitor and Control
program, which includes functionality that we have explored in previous chapters (such as sending
characters to and receiving characters from a serial port). Additional functionality includes a driver for
the Arcom board's buzzer, the ability to assemble incoming characters into a command, and the ability
to process commands. See the online example code for details about the buzzer driver. Because the
Monitor and Control program accepts user input, a prompt (>) is output when the program is waiting for
the user to enter a command.
#include
#include
#include
#include
"serial.h"
"buzzer.h"
"led.h"
"cli.h"
/**********************************************************************
*
* Function:
main
*
* Description: Monitor and control command-line interface program.
*
* Notes:
*
* Returns:
This routine contains an infinite loop.
*
**********************************************************************/
int main(void)
{
char rcvChar;
int bCommandReady = FALSE;
/* Configure the green LED control pin. */
ledInit( );
Page 173
Page 174
(10)
(4)
*name;
(*function)(void);
gCommandTable[COMMAND_TABLE_LEN] =
commandsHelp,},
commandsLed, },
commandsBuzzer, },
NULL }
Page 175
byte for the string-terminating character. Certain characters are not stored in the command buffer,
including line feeds (\\n), tabs (\\t), and spaces.
Notice that the incoming characters are converted to uppercase (with the macro TO_UPPER) prior to
insertion into the buffer. This makes the command-line interface a bit more user-friendly; the user does
not have to remember to use capitalization to enter valid commands.
The local static variable idx keeps track of the characters inserted into the command buffer. As new
characters are stored in the buffer, the index is incremented. If too many characters are received for a
command, the index is set to zero and TRUE is returned to start processing the command.
#define TO_UPPER(x)
(((x >= 'a') && (x <= 'z')) ? ((x) - ('a' - 'A')) : (x))
The completed command has been received. Replace the final carriage
return character with a NULL character to help with processing the
command. */
(nextChar == '\\r')
gCommandBuffer[idx] = '\\0';
idx = 0;
return TRUE;
Page 176
Once a completed command is assembled, the cliProcessCommand function, which follows, is called
from the main processing loop. This function loops through the command table searching for a matching
command name.
The variable idx is initialized to zero to start searching at the beginning of the command table and then
keeps track of the command currently being checked. The function strcmp is used to compare the user
command with the commands in the table. If the command is found, the flag bCommandFound is set to
TRUE. This causes the search loop to exit and the associated function to execute. If the command is not
found, an error message is sent out the serial port.
#include <string.h>
/**********************************************************************
*
* Function:
cliProcessCommand
*
* Description: Look up the command in the command table. If the
*
command is found, call the command's function. If the
*
command is not found, output an error message.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void cliProcessCommand(void)
{
int bCommandFound = FALSE;
int idx;
/* Search for
* the end of
* out of the
for (idx = 0;
Page 177
All functions in the command table are contained in one file; this keeps the entry point for all commands
in a single location. The commandsLed function toggles the green LED, as shown in the following code.
This function uses the same ledToggle function covered in Chapter 3.
#include "led.h"
/**********************************************************************
*
* Function:
commandsLed
*
* Description: Toggle the green LED command function.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void commandsLed(void)
{
ledToggle( );
}
The commandsBuzzer function toggles the buzzer on the Arcom board add-on module by calling the
function buzzerToggle, as shown here:
#include "buzzer.h"
/**********************************************************************
*
* Function:
commandsBuzzer
*
Page 178
The help command function, commandsHelp, loops through the gCommandTable and sends the command
name out the serial port. This gives the user a listing of all commands supported by the command-line
interface.
#include "cli.h"
#include "serial.h"
/**********************************************************************
*
* Function:
commandsHelp
*
* Description: Help command function.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void commandsHelp(void)
{
int idx;
/* Loop through each command in the table and send out the command
* name to the serial port. */
for (idx = 0; gCommandTable[idx].name != NULL; idx++)
{
serialPutStr(gCommandTable[idx].name);
serialPutStr("\\r\\n");
}
}
The Monitor and Control program gives you a baseline of functionality for the development of a useful
command-line interface. The functionality of the CLI can be extended by adding new commands or
enabling users to input parameters for commands. To accommodate input parameters, the command_t
structure can be expanded to contain the maximum and minimum values. When parsing the command,
you will need to parse the input parameters and validate the parameter ranges with those contained in the
command table.
Page 179
Tasks provide a key software abstraction that makes the design and implementation of embedded
software easier, and the resulting source code simpler to understand and maintain. By breaking the
larger program into smaller pieces, the programmer can more easily concentrate her energy and talents
on the unique features of the system under development.
Strictly speaking, an operating system is not a required component of any computer systemembedded
or otherwise. It is always possible to perform the same functions from within the application program
itself. Indeed, all of the examples so far in this book have done just that. There is one path of
Page 180
First-in-first-out (FIFO)
This scheduling (also called cooperative multitasking) allows each task to run until it is finished,
and only after that is the next task started.
Priority
This algorithm is typically used in real-time operating systems. Each task is assigned a priority
that is used to determine when the task executes once it is ready. Priority scheduling can be
either preemptive or nonpreemptive. Preemptive means that any running task can be interrupted
by the operating system if a task of higher priority becomes ready.
Page 181
Real-time executive
Each task is assigned a unique timeslot in a periodically repeating pattern. A real-time executive
is more static than an operating system and assumes that the programmer knows how long each
task takes. If the executive knows the task can complete its work in the allotted time, its
deadlines can all be met. This won't work if you need to create and exit tasks on the fly or run
tasks at irregular intervals.
Page 182
Resource reservation
When a task is created, it states its requirements in terms of deadlines, processor usage, and other
relevant resources. The operating system should admit the new task only if the system can
guarantee those resources without making any other (already admitted) tasks fail.
The scheduling algorithm you choose depends on your goals. Different algorithms yield different
results. Let's suppose you're given 10 jobs, and each will take a day to finish. In 10 days, you will have
all of them done. But what if one or more has a deadline? If the ninth task given to you has a deadline in
three days, doing the tasks in the order you receive them will cause you to miss that deadline.
The purpose of a real-time scheduling algorithm is to ensure that critical timing constraints, such as
deadlines and response time, are met. When necessary, decisions are made that favor the most critical
timing constraints, even at the cost of violating others.
To demonstrate the importance of a scheduling algorithm, consider a system with only two tasks, which
we'll call Task 1 and Task 2. Assume these are both periodic tasks with periods T1 and T2, and for each
task, the deadline is the beginning of its next cycle. Task 1 has T1 = 50 ms and a worst-case execution
time of C1 = 25 ms. Task 2 has T2 = 100 ms and C2 = 40 ms.
Does the processor have enough time to handle both tasks? We can start answering this question by
looking at how much of the processor time each task needs in its worst caseits utilization. The
utilization of task i is:
Ui = Ci/Ti
Thus, if U1 = 50 percent and U2 = 40 percent, the total requested utilization is:
U = U1 + U2 = 90 percent
It seems logical that if utilization is less than 100 percent, there should be enough available CPU time to
execute both tasks.
Let's consider a static priority scheduling algorithm, where task priorities are unique and cannot change
at runtime. With two tasks, there are only two possibilities:
Page 183
The two cases are shown in Figure 10-1. In Case 1, both tasks meet their respective deadlines. In Case 2,
however, Task 1 misses a deadline, despite 10 percent idle time, because Task 2's higher priority
required it to be scheduled first. This illustrates the importance of priority assignment.
Real-time systems sometimes require a way to share the processor that allows the most important tasks
to grab control of the processor as soon as they need it. Therefore, most real-time operating systems
utilize a priority-based scheduling algorithm that supports preemption. This is a fancy way of saying that
at any given moment, the task that is currently using the processor is guaranteed to be the highestpriority task that is ready to do so. Lower-priority tasks must wait until higher-priority tasks are finished
using the processor before resuming their work. The scheduler detects such conditions through interrupts
or other events caused by the software; these events are called scheduling points.
Figure 10-2 shows a scenario of two tasks running at different priority levels.
Page 184
There are two tasks in Figure 10-2: Task A and Task B. In this scenario, Task B has higher priority than
Task A. At the start, Task A is running. When Task B is ready to run, the scheduler preempts Task A
and allows Task B to run. Task B runs until it has completed its work. At this point, control of the
processor returns to Task A, which continues its work from where it left off.
When a priority-based scheduling algorithm is used, it is also necessary to have a backup scheduling
policy for tasks with the same priority. A common scheduling algorithm for tasks with the same priority
is round robin. In Figure 10-3, we show a scenario in which three tasks are running in order to
demonstrate round robin scheduling with a time slice. In this example, Tasks B and C are at the same
priority, which is higher than that of Task A.
Page 185
Task creation
When a task is created, the scheduler is called to select the next task to be run. If the currently
executing task still has the highest priority of all the ready tasks, it will be allowed to continue
using the processor. Otherwise, the highest-priority ready task will be executed next.
Task deletion
As in task creation, the scheduler is called during task deletion. However, in the case of task
deletion, if the currently running task is being deleted, a new task is always selected by virtue of
the fact that the old one no longer exists.
Clock tick
The clock tick is the heartbeat of the system. It is a periodic event that is triggered by a timer
interrupt (similar to the one we have already discussed). The clock tick provides an opportunity
to awaken tasks that are waiting for a certain length of time to pass before taking their next
action.
Task block
When the running task makes a system call that blocks, that task is no longer able to use the
processor. Thus, the scheduler is invoked to select a new task to run.
Page 186
Task unblock
A task might be waiting for some event to take place, such as for data to be received. Once the
event occurs, the blocked task becomes ready to execute and thus is immediately eligible to be
considered for execution.
Every lock of the scheduler must have an unlock counterpartotherwise, the system will stop running.
The following is an example of locking and unlocking the scheduler:
/* Perform task operations. */
os_scheduler_lock(
);
);
The operating system keeps track of the scheduler lock state with a variable. This variable is
incremented when calls are made to lock the scheduler and decremented when unlock calls are made.
The scheduler knows it can run when the lock state variable is set to 0.
10.3. Tasks
Different types of tasks can run in an operating system. For example, a task can be periodic, where it
exits after its work is complete. The task can then be restarted when there is more work to be done.
Typically, however, a task runs forever, similar to the infinite loop discussed in Chapter 3. Each task
also has its own stack that is typically allocated statically.
Page 187
A transition between the ready and running states occurs whenever the operating system selects a new
task to run during a scheduling point. The task that was previously running leaves the running state, and
the new task (selected from the queue of tasks in the ready state) is promoted to running. Once it is
running, a task will leave that state only if it terminates, if a higher-priority task becomes ready, or if it
needs to wait for some event external to itself to occur before continuing. In the latter case, the task is
said to block, or wait, until that event occurs. A task can block by waiting for another task or for an I/O
device, or it can block by sleeping (waiting for a specific time period to elapse).
When the task blocks, it enters the waiting state, and the operating system selects one of the ready tasks
to be run. So, although there may be any number of tasks in each of the ready and waiting states, there
will always be exactly one task in the running state at any time.
It is important to note that only the scheduler can promote a task to the running state. Newly created
tasks and tasks that are finished waiting for their external event are placed into the ready state first. The
scheduler will then include these new ready tasks in its future decision-making.
In order to keep track of the tasks, the operating system typically implements queues for each of the
waiting and ready states. The ready queue is often sorted by priority so that the highest-priority task is at
the head of the queue. The scheduler can then easily pick the highest-priority task to run next.
Page 188
Page 189
Processor overload occurs when high-priority tasks monopolize the processor and are always
running or ready to run.
Low-priority tasks are always at the end of a priority-based event queue and, therefore, may be
permanently blocked from executing.
A task may be prevented from running by a bug in another task; for example, one task fails to
signal when it is supposed to.
Page 190
Page 191
As shown in Figure 10-7, the first part initializes any task-specific variables or other resources used by
the task. Then an infinite loop is used to perform the task work. First, the task waits for some type of
event to occur. At this point, the task is blocked. It is in the waiting state and is put on the waiting queue.
There are various types of events: a signal from another task, a signal from an ISR, and the expiration of
a timer set by the application. Once one of these events occurs, the task is in the ready state, and the
scheduler can run the task when processor time becomes available.
Page 192
Page 193
Page 194
Page 195
These tasks may run without problems for a long time, but eventually one task may be preempted in
between the wait calls, and the other task will run. In this case, Task 1 needs mutex B to be released by
Task 2, while Task 2 needs mutex A to be released by Task 1. Neither of these events will ever happen.
When a deadlock occurs, it essentially brings both tasks to a halt, though other tasks might continue to
run. The only way to end the deadlock is to reboot the entire system, and even that won't prevent it from
happening again.
Priority inversion occurs whenever a higher-priority task is blocked, waiting for access to a shared
resource that is currently not being used. This might not sound like too big of a dealafter all, the
mutex is just doing its job of arbitrating access to the shared resourcebecause the higher-priority task
is written with the knowledge that at times the lower-priority task will be using the resource they share.
However, consider what happens when there is a third task with a priority level somewhere between
those two. This situation is illustrated in Figure 10-8.
Page 196
In Figure 10-8, there are three tasks: Task H (high priority), Task M (medium priority), and Task L (low
priority). Task L becomes ready first and, shortly thereafter, takes the mutex. Now, when Task H
becomes ready, it must block until Task L is done with their shared resource. The problem is that Task
M, which does not even require access to that resource, gets to preempt Task L and run, though it will
delay Task H from using the processor. Once Task M completes, Task L runs until it releases the
semaphore. Finally, at this point Task H gets its chance to run. This example shows how the task
priorities can be violated because of the mutex sharing between Task H and Task L.
Several solutions to this problem have been developed. One of the most widely used solutions is called
priority inheritance. This technique mandates that a lower-priority task inherit the priority of any higherpriority task that is waiting on a resource they share. This priority change should take place as soon as
the higher-priority task begins to wait; it should end when the resource is released. This requires help
from the operating system. If we apply this to the preceding example, the priority of Task L is increased
to that of Task H as soon as Task H begins waiting for the mutex. Once Task L releases the mutex, its
priority is set to what it was before. Task L cannot be preempted by Task M until it releases the mutex,
and Task H cannot be delayed unnecessarily.
Another solution is called priority ceilings. In this case, a priority value is associated with each resource;
the scheduler then transfers that priority to any task that accesses the resource. The priority assigned to
the resource is the priority of its highest-priority user, plus one. Once a task finishes with the resource,
its priority returns to normal. One disadvantage of using priority ceilings is that the priority level for
tasks using the mutex must be known ahead of time so the proper ceiling value can be set. This means
the operating system cannot do the job automatically for you. Another disadvantage is that if the ceiling
value is set too high, other unrelated tasks with priority levels below the ceiling can be locked out from
Page 197
Event flags
These allow a task to wait for multiple events to occur before unblocking. Either all of the events
must occur (the events are ANDed together) or any of the events may occur (the events are ORed
together).
Page 198
Spinlocks
These are similar to a mutex and are typically used in symmetric multiprocessing (SMP)
systems. Like a mutex, a spinlock is a binary flag that a task attempts to claim. If the flag is not
set, the task is able to obtain the spinlock. If the flag is set, the task will spin in a loop, constantly
checking to see when the flag is not set. This might seem wasteful (and it can be); however, it is
assumed that the spinlock is only held for a very short period of time, and this CPU must wait for
software runnning on the other CPU to progress first anyway.
Interrupt priority
Interrupts have the highest priority in a systemeven higher than the highest operating system
task. Interrupts are not scheduled; the ISR executes outside of the operating system's scheduler.
Disabling interrupts
Because the operating system code must guarantee its data structures' integrity when they are
accessed, the operating system disables interrupts during operations that alter internal operating
system data, such as the ready list. This increases the interrupt latency. The responsiveness of the
operating system comes at the price of longer interrupt latency.
Page 199
Interrupt stack
Some operating systems have a separate stack space for the execution of ISRs. This is important
because, if interrupts are stored on the same stack as regular tasks, each task's stack must
accommodate the worst-case interrupt nesting scenario. Such large stacks increase RAM
requirements across all n tasks.
Signaling tasks
Because ISRs execute outside of the scheduler, they are not allowed to make any operating
system calls that can block. For example, an ISR cannot wait for a semaphore, though it can
signal one.
Some operating systems use a split interrupt handling scheme, where the interrupt processing is divided
into two parts. The first part is an ISR that handles the bare minimum processing of the interrupt. The
idea is to keep the ISR short and quick.
The second part is handled by a DSR. The DSR handles the more extensive processing of the interrupt
event. It runs when task scheduling is allowed; however, the DSR still has a higher priority than any task
in the system. The DSR is able to signal a task to perform work triggered by the interrupt event.
For example, in the print server device, an interrupt might be used to handle incoming data from the
computers on the Ethernet network. The Ethernet controller would interrupt the processor when a packet
is received. Using the split interrupt handling scheme, the ISR would handle the minimal initial work:
determining the interrupt event, masking further Ethernet interrupts, and acknowledging the interrupt.
The ISR would then tell the operating system to run the DSR, which would then handle the low-level
data packet processing before passing the data on to a task for further processing.
Page 200
Page 201
Page 202
Processor support
The processor is typically the first choice in the hardware design on a project. Most RTOSes
support the popular processors (or at least processor families) used in embedded systems. If the
processor used on your project is not supported, you need to determine whether porting the
RTOS to that processor is an option or if it is necessary to choose a different RTOS. Porting an
RTOS is not always trivial.
Real-time characteristics
We have already covered the real-time characteristics of an RTOS, which include interrupt
latency, context switch time, and the execution time of each system call. These are technical
criteria that are inherent to the system and cannot be changed.
Budget constraints
RTOSes span the cost spectrum from open source and royalty free to tens of thousands of dollars
per developer seat plus royalties for each unit shipped. You need to understand what your costs
are in both cases. Open source might mean no upfront costs, but there might be costs associated
with getting support when needed. You also need to understand the licensing details of the RTOS
you choose.
Memory usage
Clearly, in an embedded environment, memory constraints are a frequent concern. A few
RTOSes can be scaled to fit the smallest of embedded systemsfor example, by removing
features to create a smaller footprint. Others require a minimum set of resources comparable to a
low-end PC. It is important to keep in mind the potential need to change an RTOS in the future,
when memory is not as plentiful or costs need to be reduced.
Page 203
Technical support
This may include a number of incidents or a period of phone support. Some RTOSes require you
to pay an annual fee to maintain a service contract. For open source RTOSes, an open forum or
mailing list might be provided. If more specialized support is needed, you'll have to search
around to see what is available. Popular open source RTOSes have companies dedicated to
providing support.
Tool compatibility
Make sure the RTOS works with the assembler, compiler, linker, and debugger you have already
obtained. If the RTOS does not support tools that you or your team are familiar with, the learning
curve will take more time.
No matter which RTOS you choose, our advice is to get the source code if you can. The reason for this
is that if you can't get support when you need it (say, at 1 A.M. for a deadline coming at 8 A.M., or if the
operating system vendor stops supporting the product), you'll be glad to be able to find and fix the
problem yourself. Some proprietary RTOSes provide only object code. Find out what is provided before
you make your final decision.
With such a wide variety of operating systems and features to choose from, it can be difficult to decide
which is the best for your project. Try putting your processor, real-time performance, and budgetary
requirements first. These are criteria that you probably cannot change, so you can use them to narrow
the possible choices to a dozen or fewer products. Then you can focus on the more detailed technical
information.
At this point, many developers make their decision based on compatibility with existing crosscompilers, debuggers, and other development tools. But it's really up to you to decide what additional
features are most important for your project. No matter what you decide, the basic kernel and task
mechanisms will be pretty much the same as those described in this chapter. The differences will most
Page 204
11.1. Introduction
We have decided to use two operating systems for the examples of operating system useeCos and
Linux. Why did we choose these two? Both are open source, royalty-free, feature-rich, and growing in
popularity. Learning how to program using them will probably enhance your ability to be productive
with embedded systems. In addition, both operating systems are compatible with the free GNU software
development tools. And both are up and running on the Arcom board.
eCos was developed specifically for use in real-time embedded systems, whereas Linux was developed
for use on PCs and then subsequently ported to various processors used in embedded systems. Some
embedded Linux distributions can require a significant amount of resources (mainly memory and
Page 205
The eCos POSIX API is currently compatible with the 1003.1-1996 version of the standard and
includes elements from the 1004.1-2001 version.
The instructions for setting up the eCos build environment and building the example eCos applications
are covered in Appendix D. Additional information about eCos can be found online at
http://ecos.sourceware.org as well as in the book Embedded Software Development with eCos, by
Anthony Massa (Prentice Hall PTR).
In order to keep the examples in this chapter shorter and easier to read, we don't
bother to check the return values from operating system function calls (although
many eCos system calls do not return a value). In general, it is a good idea to
validate all return codes. This provides feedback about potential problems and
allows you, as the developer, to make decisions in the software based on failed
calls to the operating system. Basically, it makes your code more robust and,
hopefully, less buggy.
The example program provides two variables to eCos to allow it to track the task. The variable
ledTaskObj stores information about the task, such as its current state; ledTaskHdl is a unique value
assigned to the task.
Page 206
(100)
#define LED_TASK_STACK_SIZE
#define LED_TASK_PRIORITY
(4096)
(12)
Next we show the code for performing the toggle. We have attempted to reuse code from the original
Blinking LED example. The ledInit and ledToggle LED driver functions remain unchanged from the
code described in Chapter 3.
The task blinkLedTask immediately enters an infinite loop. The infinite loop is used to keep the task
continually running and blinking the LED. The first routine called in the infinite loop is
cyg_thread_delay. This is an eCos function that suspends a task until a specified number of clock ticks
have elapsed. The parameter passed into the delay routine determines how long to suspend the task and
is based on the system clock used in eCos. At this point, the blinkLedTask is blocked and put in the
waiting state by the eCos scheduler.
Once the timer expires, the eCos scheduler puts the blinkLedTask into the ready queue. If no other
higher-priority tasks are ready to execute (which is the situation in this case), the scheduler runs the
blinkLedTask; the task continues executing from the point at which it was blocked.
Next, ledToggle is called in order to change the state of the LED. When ledToggle completes and
returns, cyg_thread_delay is called to delay for another 500 ms. The blinkLedTask is placed back in
the waiting state until the time elapses again.
#include <cyg/kernel/kapi.h>
#include "led.h"
/**********************************************************************
*
* Function:
blinkLedTask
*
* Description: This task handles toggling the green LED at a
*
constant interval.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void blinkLedTask(cyg_addrword_t data)
Page 207
);
}
}
Following is the code to create and start the LED task. The first thing to notice is that instead of the
function main, eCos programs use a function called cyg_user_start.
The first job of cyg_user_start is to initialize the LED by calling the function ledInit. Next, the
blinkLedTask task is created. In eCos, tasks created during initialization (when the scheduler is not
running) are initially suspended. To allow the scheduler to run the task, cyg_thread_resume is called.
Additionally, the scheduler does not run until cyg_user_start exits; then the eCos scheduler takes
over.
/**********************************************************************
*
* Function:
cyg_user_start
*
* Description: Main routine for the eCos Blinking LED program. This
*
function creates the LED task.
*
* Notes:
This routine invokes the scheduler upon exit.
*
* Returns:
None.
*
**********************************************************************/
void cyg_user_start(void)
{
/* Configure the green LED control pin. */
ledInit( );
/* Create the LED task. */
cyg_thread_create(LED_TASK_PRIORITY,
blinkLedTask,
(cyg_addrword_t)0,
"LED Task",
(void *)ledTaskStack,
LED_TASK_STACK_SIZE,
&ledTaskHdl,
&ledTaskObj);
/* Notify the scheduler to start running the task. */
cyg_thread_resume(ledTaskHdl);
}
Page 208
The previous example demonstrates how to create, resume, and delay a task in eCos. Other task
operations include deleting tasks (which can sometimes occur by returning from the task function),
yielding to other tasks in the system, and other mechanisms for suspending/resuming tasks.
You may notice that the function debug_printf is called at the end of cyg_user_start. This is eCos's
lightweight version of printf.
#include <cyg/kernel/kapi.h>
#include <cyg/infra/diag.h>
cyg_mutex_t sharedVariableMutex;
int32_t gSharedVariable = 0;
/**********************************************************************
*
* Function:
cyg_user_start
Page 209
The incrementTask function first delays for three seconds. After the delay, the function tries to take the
mutex by calling cyg_mutex_lock, passing in the mutex it wishes to acquire. If the mutex is available,
cyg_mutex_lock returns and the task can proceed. If the mutex is not available, the task blocks at this
point (and is placed in the waiting state by the scheduler) and waits for the mutex to be released.
Once the incrementTask task obtains the mutex, the shared variable gSharedVariable is incremented
and its value is output. The mutex is then released by calling cyg_mutex_unlock, again passing in the
mutex to release as a parameter. Unlike the cyg_ mutex_lock function, the unlock function never
blocks, although it may cause a reschedule.
Page 210
The decrementTask function is similar to the previous increment task. First, the task delays for seven
seconds. Then the task waits to acquire the sharedVariableMutex. Once the task gets the mutex, it
decrements the gSharedVariable value and outputs its value. Finally, the task releases the mutex.
/**********************************************************************
*
* Function:
decrementTask
*
* Description: This task decrements a shared variable.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void decrementTask(cyg_addrword_t data)
{
while (1)
{
/* Delay for 7 seconds. */
cyg_thread_delay(TICKS_PER_SECOND * 7);
Page 211
Page 212
When the SW0 button is pressed, the producer task signals the consumer task using a semaphore. The
consumer, consumerTask, waits for the semaphore signal from the producer task. Upon receiving the
signal, the consumer task outputs a message and toggles the green LED.
The main function, cyg_user_start, starts by initializing the LED by calling ledInit. Next, the
semaphore is initialized with a call to cyg_semaphore_init. The initial value of the semaphore,
semButton, is set to zero so that the consumer task that is waiting does not execute until the semaphore
is signaled by the producer task. Lastly, the two tasks are created and resumed, as in the prior example,
and then a message is output signifying the start of the program.
#include <cyg/kernel/kapi.h>
#include <cyg/infra/diag.h>
#include "led.h"
cyg_sem_t semButton;
/**********************************************************************
*
* Function:
cyg_user_start
Page 213
The producerTask contains an infinite loop that first delays for 10 ms and then checks to see whether
the SW0 button has been pressed, by calling the function buttonDebounce (we will take a closer look at
this function in a moment). The delay interval was selected in order to ensure the task is responsive
when the button is pressed. For additional information about selecting sampling intervals, read the
sidebar later in this chapter titled "Switch Debouncing."
Page 214
Now let's take a look at the function buttonDebounce. The debounce code is from the June 2004
Embedded Systems Programming article "My Favorite Software Debouncers," which can be found
online at http://www.embedded.com.
The debounce function is called in the producer task every 10 ms to determine whether the SW0 button
has been pressed. As shown in Figure 11-1, button SW0 is located on the add-on module. The Arcom
board's VIPER-Lite Technical Manual and the VIPER-I/O Technical Manual describe how the add-on
module's buttons are connected to the processor. The add-on module schematics, which are found in the
VIPER-I/O Technical Manual, can be used to trace the connection from the button back to the processor.
The button SW0 is read from the signal IN0, as shown in the switches section in the VIPER-I/O
Technical Manual. According to the VIPER-Lite Technical Manual, the IN0 signal value is retrieved by
Page 215
);
The consumerTask contains a simple infinite loop: wait for the semaphore to be signaled, then print a
message once the signal is received. The consumer task waits for the semaphore signal by calling
cyg_semaphore_wait, which blocks the task if the value of the semaphore is equal to 0.
Once the semaphore signal is received, the consumer task outputs a message that the button was pressed
and toggles the green LED by calling ledToggle. After the LED is toggled, the consumer task reverts to
waiting for another semaphore signal.
/**********************************************************************
*
Page 216
);
}
}
We cover another example using semaphores in the section "eCos Interrupt Handling" later in this
chapter.
Page 217
Page 218
The consumerTask, shown in the code that follows, contains an infinite loop that waits for an incoming
message and then prints the message once it's received. The consumer task waits for an incoming
Page 219
);
}
}
Most operating systems, including eCos, have additional API functions for various synchronization and
message passing operations. For example, eCos includes the functions cyg_mbox_tryput and
cyg_mbox_tryget that return false if they are unsuccessful at putting or getting a message in the
mailbox, instead of blocking the task. These functions can be used to do polling (that is, to check
whether a mailbox is available, go off and do other tasks if it is not, and then return and try again).
There are also functions for which you pass in a timeout value that blocks for a set amount of time while
the function attempts to perform the specified operation. Additional information about the eCos RTOS
APIs can be found in the eCos Reference online at http://ecos.sourceware.org/docs-latest.
Switch Debouncing
When pressed or released, any mechanical input, such as a switch or button, will bounce open
and closed briefly before settling. Processors are so fast that they can detect this rapid
Page 220
succession of opens and closes; thus, it's hard to know whether any individual read of a
switch's position is accurate. This is a case where the processor sees the trees but not the
forest. Debouncing is a technique used to smooth out the samples and make sure the
processor doesn't get confused by the bouncing.
You can debounce inputs by inserting extra hardware or software to filter out this noise.
Software debounce routines function more or less by testing the input regularly at a
predetermined interval and making decisions only after a succession of reads that are the
same. Writing debounce code is straightforward; selecting the sampling interval and deciding
when the input has settled is more difficult, and specific to each input device and application.
For additional information on sampling intervals, refer to the July 2002 Embedded Systems
Programming article "How to Choose A Sensible Sampling Rate," which can be found online
at http://www.embedded.com.
Page 222
eCos provides the functionality of saving and restoring the processor's context when an interrupt occurs
so that the ISR does not need to perform these operations. Thus we reduce the workload of the timer
interrupt handler, timerIsr, to three critical operations that must be carried out before interrupts are
enabled again.
The first operation masks the timer interrupt, with a call to cyg_interrupt_mask passing in the
interrupt vector timerInterruptVector, until the DSR is run. This blocks the ISR from being called
again until the current interrupt has been processed.
The second operation performed in the ISR is to acknowledge the interrupt. The interrupt must be
acknowledged in the processor's interrupt controller and timer peripheral. The interrupt is acknowledged
in the interrupt controller using the eCos function cyg_interrupt_acknowledge and passing in the
interrupt vector timerInterruptVector. The interrupt is acknowledged in the timer peripheral by
writing the TIMER_1_MATCH (0x02) bit to the timer status register.
The third operation, performed when returning, is to inform the operating system that the timer interrupt
has been handled (with the macro CYG_ISR_HANDLED) and that the DSR needs to be run (with the macro
CYG_ISR_CALL_DSR). Notifying eCos that the interrupt has been handled prevents it from calling any
other ISRs to handle the same interrupt.
The following timerIsr function shows these operations:
#include <cyg/hal/hal_intr.h>
Page 223
/**********************************************************************
*
* Function:
timerIsr
*
* Description: Interrupt service routine for the timer interrupt.
*
* Notes:
*
* Returns:
Bitmask to inform operating system that the
*
interrupt has been handled and to schedule the
*
deferred service routine.
*
**********************************************************************/
uint32_t timerIsr(cyg_vector_t vector, cyg_addrword_t data)
{
/* Block the timer interrupt from occurring until the DSR runs. */
cyg_interrupt_mask(timerInterruptVector);
/* Acknowledge the interrupt in the interrupt controller and the
* timer peripheral. */
cyg_interrupt_acknowledge(timerInterruptVector);
TIMER_STATUS_REG = TIMER_1_MATCH;
/* Inform the operating system that the interrupt is handled by this
* ISR and that the DSR needs to run. */
return (CYG_ISR_HANDLED | CYG_ISR_CALL_DSR);
}
The DSR function, timerDsr, which is shown next, is scheduled to be run by eCos once the ISR
completes. The DSR signals the LED task using the semaphore ledToggleSemaphore with the function
call cyg_semaphore_post.
Next, the new timer interval is programmed into the timer match register, which is set to expire 500 ms
from the current timer count. Finally, before exiting, the DSR unmasks the timer interrupt in the
operating system, by calling cyg_interrupt_unmask and passing in the interrupt vector, which
reenables the handling of incoming timer interrupts.
/**********************************************************************
*
* Function:
timerDsr
*
* Description: Deferred service routine for the timer interrupt.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void timerDsr(cyg_vector_t vector, cyg_ucount32 count, cyg_addrword_t data)
{
Page 224
The blinkLedTask contains an infinite loop that waits for the semaphore ledToggle- Semaphore to be
signaled by calling cyg_semaphore_wait. When the semaphore is signaled by the timer DSR, the task
calls ledToggle to change the state of the LED.
/**********************************************************************
*
* Function:
blinkLedTask
*
* Description: This task handles toggling the LED when it is
*
signaled from the timer interrupt handler.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void blinkLedTask(cyg_addrword_t data)
{
while (1)
{
/* Wait for the signal that it is time to toggle the LED. */
cyg_semaphore_wait(&ledToggleSemaphore);
/* Change the state of the green LED. */
ledToggle( );
}
}
This concludes our brief introduction to the eCos operating system and its API. Hopefully, these few
examples have clarified some of the points made elsewhere in the book. These are valuable
programming techniques used frequently in embedded systems.
Page 225
12.1. Introduction
The embedded Linux examples demonstrate certain basic APIs for various operations. Additional APIs
exist that offer other functionality. You should research the additional APIs on your own to determine
whether there are other, better ways to perform the operations necessary for your particular embedded
system.
One aspect of Linux you need to be familiar with is its thread model. The Linux API conforms to the
key POSIX standard in the space, POSIX 1003.1c, commonly called the pthreads standard. POSIX
leaves many of the implementation details up to the operating system implementer. A good source of
information on pthreads is the book Pthreads Programming, by Bradford Nichols, Dick Buttlar, and
Jacqueline Farrell (O'Reilly).
The version of embedded Linux used on the Arcom board is a standard kernel tree (version 2.6) with
additional ARM and XScale support from the ARM Linux Project at http://www.arm.linux.org.uk.
A plethora of books about Linux and embedded Linux are available. Some good resources include
Understanding the Linux Kernel, by Daniel P. Bovet and Marco Cesati (O'Reilly), Linux Device
Drivers, by Alessandro Rubini and Jonathan Corbet (O'Reilly), and Building Embedded Linux Systems,
by Karim Yaghmour (O'Reilly).
The instructions for configuring the embedded Linux build environment and building the example Linux
applications are detailed in Appendix E. Additional information about using embedded Linux on the
Arcom board can be found in the Arcom Embedded Linux Technical Manual and the VIPER-Lite
Technical Manual.
In order to keep the examples in this chapter shorter and easier to read, we don't
check the return values from function calls. In general, it is a good idea to validate
all return codes. This provides feedback about potential problems and allows you,
as the developer, to make decisions in the software based on failed calls.
Basically, it makes your code more robust and, hopefully, less buggy.
Page 226
);
}
}
Page 227
Additional task operations are supported in Linux. These operations include terminating tasks,
modifying task attributes, yielding to other tasks in the system, and suspending/resuming tasks.
Page 228
The task incrementTask, shown following, includes an infinite loop that starts by delaying for three
seconds by calling sleep, which suspends the task for a specified number of seconds.
Once the delay time elapses, the increment task resumes from where it left off. The task then calls
pthread_mutex_lock and passes in the sharedVariableMutex in order to take the mutex and access
the shared variable. If the mutex is available, it is locked, and the increment task proceeds to increment
Page 229
If the mutex is not available, the increment task blocks until it can acquire the
mutex.
After incrementing the shared variable and outputting a message, the mutex is released with a call to
The mutex unlock function never blocks.
pthread_mutex_unlock.
#include <stdio.h>
#include <unistd.h>
/**********************************************************************
*
* Function:
incrementTask
*
* Description: This task increments a shared variable.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
void incrementTask(void *param)
{
while (1)
{
/* Delay for 3 seconds. */
sleep(3);
/* Wait for the mutex before accessing the GPIO registers. */
pthread_mutex_lock(&sharedVariableMutex);
gSharedVariable++;
printf("Increment Task: shared variable value is %d\\n",
gSharedVariable);
/* Release the mutex for other task to use. */
pthread_mutex_unlock(&sharedVariableMutex);
}
}
The task decrementTask is similar to the increment task. In its infinite loop, the task first suspends for
seven seconds, then waits to acquire the sharedVariableMutex. After taking the mutex, the task
decrements the value of gSharedVariable, outputs a message, and then releases the mutex, as shown
here:
/**********************************************************************
*
* Function:
decrementTask
*
* Description: This task decrements a shared variable.
*
Page 230
The Linux pthread API supports additional mutex functions that provide other functionality. For
example, the function pthread_mutex_trylock can be used to attempt to get a mutex. If the mutex is
available, the task acquires the mutex; if the mutex is not available, the task can proceed with other work
without waiting for the mutex to be freed up.
Page 231
<stdio.h>
<pthread.h>
<semaphore.h>
"led.h"
sem_t semButton;
/**********************************************************************
*
* Function:
main
*
* Description: Main routine for the Linux semaphore example. This
*
function creates the semaphore and then the increment
*
and decrement tasks.
*
* Notes:
*
* Returns:
0.
*
**********************************************************************/
int main(void)
{
/* Configure the green LED control pin. */
ledInit( );
/* Create the semaphore for this process only and with an initial
* value of zero. */
sem_init(&semButton, 0, 0);
/* Create the producer and consumer tasks using the default task
* attributes. Do not pass in any parameters to the tasks. */
pthread_create(&producerTaskObj, NULL, (void *)producerTask, NULL);
pthread_create(&consumerTaskObj, NULL, (void *)consumerTask, NULL);
printf("Linux semaphore example - press button SW0.\\n");
/* Allow the tasks to run. */
pthread_join(producerTaskObj, NULL);
pthread_join(consumerTaskObj, NULL);
return 0;
}
The producerTask function shown next contains an infinite loop that first delays for 10 ms and then
checks to see whether the SW0 button has been pressed, by calling the function buttonDebounce. For
additional information about selecting sampling intervals and button debouncing, read the sidebar
"Switch Debouncing" in Chapter 11.
When the SW0 button is pressed, the producerTask function signals the semButton semaphore by
calling sem_post. This increments the semaphore value and wakes the consumer task. The producer task
then returns to monitoring the SW0 button.
Page 232
The following consumerTask function contains an infinite loop that waits for the semaphore to be
signaled by calling sem_wait. The wait function blocks the task if the value of the semaphore is 0.
Once the semaphore signal is received, the consumer task outputs a message and toggles the green LED
by calling ledToggle. After toggling the LED, the consumer task returns to waiting for another
semaphore signal.
/**********************************************************************
*
* Function:
consumerTask
*
* Description: This task waits for the semaphore signal from the
*
producer task. Once the signal is received, the task
*
outputs a message and toggles the green LED.
*
* Notes:
*
* Returns:
None.
Page 233
);
}
}
The main routine demonstrates how to create the message queue. First, the LED is initialized by calling
ledInit. Then the message queue is created by calling mq_open. The first parameter specifies the name
of the queue as message queue. The second parameter, which is the OR of the file status flags and
access modes, specifies the following:
O_CREAT
Create the message queue if it does not exist.
O_EXCL
Used with O_CREAT to create and open a message queue if a queue of the same name does not
already exist. If a queue does exist with the same name, the message queue is not opened.
Page 234
O_RDWR
Open for read and write access.
After the message queue is created successfully, the tasks are created as shown previously. The
techniques we've just discussed are shown in the following main function:
#include <pthread.h>
#include <mqueue.h>
#include "led.h"
int8_t messageQueuePath[] = "message queue";
/**********************************************************************
*
* Function:
main
*
* Description: Main routine for the Linux message queue program. This
*
function creates the message queue and the producer
*
and consumer tasks.
*
* Notes:
*
* Returns:
0.
*
**********************************************************************/
int main(void)
{
mqd_t messageQueueDescr;
/* Configure the green LED control pin. */
ledInit( );
/* Create the message queue for sending information between tasks. */
messageQueueDescr = mq_open(messageQueuePath, (O_CREAT | O_EXCL | O_RDWR));
/* Create the producer task using the default task attributes. Do not
* pass in any parameters to the task. */
pthread_create(&producerTaskObj, NULL, (void *)producerTask, NULL);
/* Create the consumer task using the default task attributes. Do not
* pass in any parameters to the task. */
pthread_create(&consumerTaskObj, NULL, (void *)consumerTask, NULL);
/* Allow the tasks to run. */
pthread_join(producerTaskObj, NULL);
pthread_join(consumerTaskObj, NULL);
return 0;
}
Page 235
Prior to entering its infinite loop, the producerTask starts by initializing the variable that keeps track of
the number of button presses, buttonPressCount, to zero. Then the producer task opens the message
queue, whose name is specified by messageQueuePath, which was created in the main function. The
queue is opened with write-only permission, specified by the second parameter flag O_WRONLY, because
the producer sends only messages. Because the flag O_NONBLOCK is not specified in the second parameter
to mq_open, if a message cannot be inserted into the queue, the producer task blocks. The function
mq_open returns a message queue descriptor that is used in subsequent accesses to the message queue.
In the infinite loop, the producer task first delays for 10 ms by calling usleep. The delay interval
selected ensures the task is responsive to button presses. Next, the function buttonDebounce is called to
determine if the SW0 button has been pressed.
Each time the SW0 button is pressed, buttonPressCount is incremented and the value is sent to the
waiting consumer task, using the message queue. To accommodate the message queue send function, the
union msgbuf_t is used to contain the message. This union consists of a 32-bit count and a 4-byte buffer
array. The message is sent using the function mq_send, with the message queue descriptor,
messageQueueDescr, in the first parameter, the button press count in the second parameter, the size of
the message in the third parameter, and the priority of the message in the last parameter.
Unlike the eCos message queue implementation, messages under Linux have a priority (which is
specified in the last parameter passed to mq_send). You can use this parameter to insert higher-priority
messages at the front of the message queue, to be read by the receiving task.
After the message is successfully sent, the loop returns to calling the delay function. Here is the
producerTask function:
#include <unistd.h>
#include "button.h"
typedef union
{
uint32_t count;
uint8_t buf[4];
} msgbuf_t;
/**********************************************************************
*
* Function:
producerTask
*
* Description: This task monitors button SW0. Once pressed, the button
*
is debounced and a message is sent to the waiting
*
consumer task.
*
* Notes:
*
* Returns:
None.
*
**********************************************************************/
Page 236
The task consumerTask in the following function begins like the producer task by opening the message
queue with a call to mq_open. But the queue is opened with read-only permission by the second
parameter flag, O_RDONLY, because the consumer task receives only messages. Since the flag
O_NONBLOCK isn't specified in the second parameter to mq_open, if no message is available in the queue
when the consumer calls the message queue receive function, it blocks until a message is present. The
function mq_open returns a message queue descriptor used in subsequent accesses to the message queue.
The consumer task then enters an infinite loop where it waits for a message by calling mq_receive.
Once a message is available, it is copied [*] (by Linux) into the rcvMsg variable. The consumerTask then
outputs the message and toggles the green LED. The task then returns to waiting for the next message.
[*]
There may be an implementation of the message queue routine that does not use a copy.
#include <stdio.h>
/**********************************************************************
*
* Function:
consumerTask
*
Page 237
);
}
}
Linux includes numerous other API functions that offer additional functionality for the mechanisms
covered previously, as well as other types of synchronization mechanisms. Other mechanisms include
condition variables and reader-writer locks. Condition variables allow multiple tasks to wait until a
specific event occurs or until the variable reaches a specific value. Reader-writer locks allow multiple
tasks to read data concurrently, whereas any task writing data has exclusive access.
Interrupt handling in Linux is more complex than that found in RTOSes. For this reason, we have
omitted an interrupt example using Linux. For a better understanding of Linux interrupt handling, take a
look at Linux Device Drivers, by Alessandro Rubini and Jonathan Corbe t (O'Reilly).
Page 238
The two I2C bus signals are serial data (SDA) and serial clock (SCL). The master on the bus initiates all
transfers; other devices on the bus are called slaves. In Figure 13-1, the microprocessor is the master and
the other devices are the slaves. Both master and slaves can receive and transmit data on the bus.
The master initiates transactions on the bus and controls the clock signal. Because of this, a slave device
needs a way of holding off the master during a transaction. When a slave holds off the master device to
perform flow control on the incoming data, it is called clock stretching. During this time the slave keeps
the clock line pulled low until it is ready to continue with the transaction. It is important that all master
devices support this feature. Figure 13-2 is the format of an I2C bus transaction. All data on the bus is
communicated most significant bit first (MSB).
An I2C bus data transaction begins by the master initiating a start condition. A start condition occurs
when the master causes a high-to-low transition on the data line while the clock line is held high.
Next, the 7-bit unique address of the device is sent out by the master device. Each device on the bus
checks this address with its own to determine whether the master is communicating with it. I2C slave
devices come with a predefined device address. The lower bits of this address are sometimes
configurable in hardware.
Then the master outputs the read or write bit. If the bit is high, the transaction is a read, where data goes
from the slave to the master device. If the bit is low, the transaction is a write from the master to the
slave device.
Page 240
Page 241
}
}
The SPI bus includes 3 + N signals, where N is the number of slaves on the bus. In Figure 13-3, there
are two slaves, so the SPI bus requires five signals. These signals are serial clock (SCLK), data signal
Master Out Slave In (MOSI), data signal Master In Slave Out (MISO), and Slave Select (SS1 and SS2).
Page 242
Page 243
Page 244
Page 245
The logic blocks within an FPGA can be as small and simple as the macrocells in a PLD (a so-called
fine-grained architecture) or larger and more complex (a coarse-grained architecture). However, the
logic blocks in an FPGA are never as large as an entire PLD, as are the logic blocks of a CPLD.
Remember that the logic blocks of a CPLD contain multiple macrocells. But the logic blocks in an
FPGA are generally nothing more than a couple of logic gates or a look-up table and a flip-flop.
Because of all the extra flip-flops, the architecture of an FPGA is much more flexible than that of a
CPLD. This makes FPGAs better in register-heavy and pipelined applications. They are also often used
in place of a processor-plus-software solution, particularly where the processing of input data streams
must be performed at a very fast pace. In addition, FPGAs are usually denser (more gates in a given
area) than their CPLD cousins, so they are the de facto choice for larger logic designs.
Page 246
Page 247
In Figure 13-7, we show a simple circuit that could be driven using PWM. In this figure, a 9 V battery
powers an incandescent lightbulb. If the switch connecting the battery and lamp is closed for 50 ms, the
bulb receives the full 9 V during that interval. If we then open the switch for the next 50 ms, the bulb
receives 0 V. If we repeat this cycle 10 times a second, the bulb will be lit as though it were connected to
a 4.5 V battery (50 percent of 9 V). We say that the duty cycle is 50 percent and the modulating
frequency is 10 Hz. (Note that we're not advocating you actually power a lightbulb this way; we just
think this an easy-to-understand example.)
Page 248
Page 249
Page 250
Page 251
UDP is a connectionless, unreliable protocol and has no mechanism for the retransmission of
packets that are lost. There can be disastrous consequences if crucial information about the
health of the system is lost and the sender has no way of knowing it was not received. SNMP
also often requires the use of costly and complex network management software on the client
side.
One of the keys in deciding which stack will best fit your device is to determine the resource
requirements the software needs in order to operate. The amount of data the device transmits and
Page 252
lwIP (http://savannah.nongnu.org/projects/lwip)
This " lightweight IP" stack is a simplified but full-scale TCP/IP implementation. lwIP was
designed to be run in a multithreaded system with applications executing in concurrent threads,
but it can also be implemented on a system with no operating system. In addition to the standard
TCP/IP protocol support, lwIP also includes Internet Control Message Protocol (ICMP),
Dynamic Host Configuration Protocol (DHCP), Address Resolution Protocol (ARP), and UDP.
It supports multiple local network interfaces. lwIP has a flexible configuration that allows it to be
easily used in a wide variety of devices and scaled to fit different resource requirements.
OpenTCP (http://www.opentcp.org)
Tailored to 8- and 16-bit microcontrollers, OpenTCP incorporates the ICMP, DHCP, Bootstrap
Protocol (BOOTP), ARP, and UDP. This package also includes several applications, such as a
Trivial File Transfer Protocol (TFTP) server, a Post Office Protocol Version 3 (POP3) client to
retrieve email, Simple Mail Transfer Protocol (SMTP) support to send email, and a Hypertext
Transfer Protocol (HTTP) server for web-based device management.
Page 253
uC/IP (http://ucip.sourceforge.net)
uC/IP (pronounced mew-kip)[*] is designed for microcontrollers and based on BSD network
software. Protocol support includes ICMP and Point-to-Point Protocol (PPP).
[*]
The u at the front of uC/IP and uIP is a crude but common way to represent the Greek letter
mu.
uIP (http://www.sics.se/~adam/uip)
This "micro IP" stack is designed to incorporate only the minimal set of components necessary
for a full TCP/IP stack solution. There is support for only a single network interface. Application
examples included with uIP are SMTP for sending email, a Telnet server and client, an HTTP
server and web client, and Domain Name System (DNS) resolution.
Each network stack has been ported to various processors and microcontrollers. The device driver
support for the network interface varies from stack to stack. It is a good idea to review the license for the
network stack you decide to use, to make sure it does not place undesirable limitations or requirements
on your product.
In addition, some operating systems include or have network stacks ported to them. The operating
systems covered earlier in this book, eCos and embedded Linux (see Chapters 11 and 12), both offer
networking support modules. eCos includes the OpenBSD, FreeBSD, and lwIP network stacks as well
as application-layer support for many of the extended features discussed previously. Embedded Linux,
having been developed for a desktop PC environment, offers extensive network support.
If a network stack is included or already exists in your device, several embedded web servers are
available to incorporate web-based control. One such open source solution is the GoAhead WebServer
(http://www.goahead.com).
Embedding a networking stack is no longer a daunting task that requires an enormous amount of
resources. The solutions listed previously can quickly be leveraged and integrated to bring networking
features to any embedded system. Tailoring one of the network stack solutions to the specific
characteristics of a device ensures that the system will operate at its optimal level and best utilize system
resources.
Page 254
Inline functions
In C99, the keyword inline can be added to any function declaration. This keyword asks the
compiler to replace all calls to the indicated function with copies of the code that is inside. This
eliminates the runtime overhead associated with the function call and is most effective when the
function is used frequently but contains only a few lines of code.
Inline functions provide a perfect example of how execution speed and code size are sometimes
inversely linked. The repetitive addition of the inline code will increase the size of your program
in direct proportion to the number of times the function is called. And, obviously, the larger the
function, the more significant the size increase will be. However, you will lose the overhead of
setting up the stack frame if parameters are passed into the function. The resulting program runs
faster but requires more code memory.
Table lookups
A switch statement is one common programming technique that should be used with care. Each
test and jump that makes up the machine language implementation uses up valuable processor
time simply deciding what work should be done next. To speed things up, try to put the
individual cases in order by their relative frequency of occurrence. In other words, put the most
likely cases first and the least likely cases last. This will reduce the average execution time,
though it will not improve at all upon the worst-case time.
If there is a lot of work to be done within each case, it might be more efficient to replace the
entire switch statement with a table of pointers to functions. For example, the following block
of code is a candidate for this improvement:
enum NodeType {NODE_A, NODE_B, NODE_C};
switch (getNodeType(
{
case NODE_A:
.
.
case NODE_B:
.
.
case NODE_C:
.
.
}
))
Page 256
Hand-coded assembly
Some software modules are best written in assembly language. This gives the programmer an
opportunity to make them as efficient as possible. Though most C compilers produce much
better machine code than the average programmer, a skilled and experienced assembly
programmer might do better work than the compiler for a given function.
For example, on one of our past projects, a digital filtering algorithm was implemented in C and
targeted to a TI TMS320C30 DSP. The compiler was unable to take advantage of a special
instruction that performed exactly the mathematical operations needed. By manually replacing
one for loop of the C program with inline assembly instructions that did the same thing, overall
computation time decreased by more than a factor of 10.
Register variables
The keyword register can be used when declaring local variables. This asks the compiler to
place the variable into a general-purpose register rather than on the stack. Used judiciously, this
technique provides hints to the compiler about the most frequently accessed variables and will
somewhat enhance the performance of the function. The more frequently the function is called,
the more likely it is that such a change will improve the code's performance. But some compilers
ignore the register keyword.
Global variables
Page 257
Polling
ISRs are often used to improve a program's responsiveness. However, there are some rare cases
in which the overhead associated with the interrupts actually causes inefficiency. These are cases
in which the average time between interrupts is of the same order of magnitude as the interrupt
latency. In such cases, it might be better to use polling to communicate with the hardware device.
But this too can lead to a less modular software design.
Fixed-point arithmetic
Unless your target platform features a floating-point processor, you'll pay a very large penalty for
manipulating float data in your program. The compiler-supplied floating-point library contains
a set of software subroutines that emulate the floating-point instructions. Many of these functions
take a long time to execute relative to their integer counterparts and also might not be reentrant.
If you are using floating-point for only a few calculations, it might be better to implement the
calculations themselves using fixed-point arithmetic. For example, two fractional bits
representing a value of 0.00, 0.25, 0.50, or 0.75 are easily stored in any integer by merely
multiplying the real value by 4 (e.g., << 2). Addition and subtraction can be accomplished via the
integer instruction set, as long as both values have the same imaginary binary point.
Multiplication and division can be accomplished similarly, if the other number is a whole
integer.
It is theoretically possible to perform any floating-point calculation with fixed-point arithmetic.
(After all, that's how the floating-point software library does it, right?) Your biggest advantage is
that you probably don't need to implement the entire IEEE 754 standard just to perform one or
two calculations. If you do need that kind of complete functionality, stick with the compiler's
floating-point library and look for other ways to speed up your program.
Variable size
Page 258
Loop unrolling
In some cases, repetitive loop code can be optimized by performing loop unrolling. In loop
unrolling, the loop overhead at the start and end of a loop is eliminated. Here's an example of a
for loop:
for (idx = 0; idx < 5; idx++)
{
value[idx] = incomingData[idx];
}
=
=
=
=
=
incomingData[0];
incomingData[1];
incomingData[2];
incomingData[3];
incomingData[4];
Some compilers offer loop unrolling as an optimization; in other cases, it might be better for the
developer to code it. It is helpful to check the assembly output from the compiler to see whether
efficiency has actually been improved.
The amount of rolling that youor the compilerchoose to do must balance the gain in speed
versus the increased size of the code. Loop unrolling increases code sizeanother situation
where you must trade code size for speed. Also, loop unrolling can be used only when the
number of iterations through the loop are fixed. One example of an optimized implementation of
Page 259
Page 260
In addition to these techniques for reducing code size, several of the ones described in the prior section
could be helpful, specifically table lookups, hand-coded assembly, register variables, and global
variables. Of these techniques, the use of hand-coded assembly usually yields the largest decrease in
code size.
But what if pControl and pData are actually pointers to memory-mapped device registers? In that case,
the peripheral device would not receive the DISABLE command before the byte of data was written. This
Page 261
To make matters worse, debugging an optimized program is challenging, to say the least. With the
compiler's optimization enabled, the correlation between a line of source code and the set of processor
instructions that implements that line is much weaker. Those particular instructions might have moved
or been split up, or two similar code blocks might now share a common implementation. In fact, some
lines of the high-level language program might have been removed from the program altogether (as they
were in the previous example)! As a result, you might be unable to set a breakpoint on a particular line
of the program or examine the value of a variable of interest.
Page 262
Of course, you probably want to leave a little extra space on the stack, in case your testing didn't last
long enough or did not accurately reflect all possible runtime scenarios. Never forget that a stack
overflow is a potentially fatal event for your software and should be avoided at all costs.
Be especially conscious of stack space if you are using a real-time operating system. Preemptive
operating systems create a separate stack for each task. These stacks are used for function calls and ISRs
that occur within the context of a task. You can determine the amount of memory required for each task
stack in the manner previously described. You might also try to reduce the number of tasks or switch to
an operating system that has a distinct "interrupt stack" for execution of all ISRs. The latter method can
significantly reduce the stack size requirement of each task.
The size of the heap is limited to the amount of RAM left over after all of the global data and stack
space has been allocated. If the heap is too small, your program will not be able to allocate dynamic
memory when it is needed, so always be sure to compare the result of malloc with NULL before
dereferencing the memory you tried to allocate. If you've tried all of these suggestions and your program
is still requiring too much memory, you might have no choice but to eliminate the heap altogether. This
isn't entirely bad in the case of embedded systems, which frequently allocate all memory needed by the
system at initialization time.
Note that many embedded programmers avoid the use of malloc, and thus the need for a heap,
altogether. But the key benefit of dynamic memory allocation is that you don't need to spend RAM to
keep variables around that are only used briefly in the program. This is a way to reduce total memory
utilization.
Page 263
Turbo mode
The processing core runs at the peak frequency. Minimizing external memory accesses would be
worthwhile in this mode, because the processor would have to wait for the external memory.
Run mode
The processor core runs at its normal frequency. This is the normal or default operating mode.
Idle mode
The processor core is not clocked, but the other peripheral components operate as normal.
Sleep mode
This is the lowest power state for the processor.
Understanding the details of these modes and how to get into and out of them is key. For example, the
PXA255 can conserve power by entering and exiting idle mode multiple times in a second, because the
processor is quickly reactivated in the prior state. However, in sleep mode, the processor state is not
maintained and may require a complete system reboot when exiting this mode.
Operating the processor in different modes can save quite a bit of power. The power consumption for the
PXA255 processor (running at 200 MHz) in normal run mode is typically 178 mW. While in idle mode,
the PXA255 typically consumes 63 mW.
There are several issues to consider when planning the power management software design. First, you
must ensure that each task is able to get enough cycles to perform its assigned work. If a system doubles
Page 264
Page 265
400
411
300
283
200
178
As the software designer, you need to understand what happens during the frequency change
sequencewhat to do if an interrupt occurs during the frequency change and what needs to be
reconfigured (such as DRAM refresh cycles) for the new frequency.
You will need comprehensive knowledge of all software operation in the system if you decide to alter
the processor frequency on the fly. For example, it can be tricky to know when to lower the clock speed
when a multitasking RTOS is used.
In other cases, particular peripherals can be completely disabled when they are not in use. For example,
if a particular peripheral module is not used in the PXA255 processor, the clock to that unit can be
disabled using the Clock Enable Register (CKEN).
Page 266
Embedded C++
You might be wondering why the creators of the C++ language included so many features
that are expensive in terms of execution time and code size. You are not alone; people around
the world have wondered the same thingespecially the users of C++ for embedded
programming. Many of these expensive features are recent additions that are neither strictly
necessary nor part of the original C++ specification. These features have been added one by
one as part of the ongoing "standardization" process.
In 1996, a group of Japanese processor vendors joined together to define a subset of the C++
language and libraries that is better suited for embedded software development. They call
their industry standard Embedded C++ (EC++). EC++ generated a great deal of initial interest
and excitement within the embedded community.
A proper subset of the draft C++ standard, EC++ omits pretty much anything that can be left
out without limiting the expressiveness of the underlying language. This includes not only
expensive features such as multiple inheritance, virtual base classes, runtime type
identification, and exception handling, but also some of the newest additions such as
templates, namespaces, and new-style casts. What's left is a simpler version of C++ that is
still object-oriented and a superset of C, but has significantly less runtime overhead and
smaller runtime libraries.
A number of commercial C++ compilers support the EC++ standard as an option. Several
others allow you to manually disable individual language features, thus enabling you to
emulate EC++ (or create your very own flavor of the C++ language).
Of course, not every feature of C++ is expensive. In fact, the earliest C++ compilers used a technology
called C-front to turn C++ programs into C, which was then fed into a standard C compiler. That this is
Page 267
Moreover, we want to make clear that there is no penalty for compiling an orginal C program with a
C++ compiler.
Default parameter values are also penalty-free. The compiler simply inserts code to pass the default
value whenever the function is called without an argument in that position. Similarly, function name
overloading involves only a compile-time code modification. Functions with the same names but
different parameters are each assigned unique names during the compilation process. The compiler alters
the function name each time it appears in your program, and the linker matches them up appropriately.
Operator overloading is another feature that might be used in embedded systems. Whenever the
compiler sees such an operator, it simply replaces it with the appropriate function call. So in the C++
code listing that follows, the last two lines are equivalent, and the performance penalty is easily
understood:
Complex
a, b, c;
c = operator+(a, b);
c = a + b;
Constructors and destructors have a slight penalty. These special methods are guaranteed to be called
each time an object of the type is created or goes out of scope, respectively. However, this small amount
of overhead is a reasonable price to pay for fewer bugs. Constructors eliminate an entire class of C
programming errors having to do with uninitialized data structures. This feature has also proved useful
for hiding the awkward initialization sequences associated with some classes.
Virtual functions also have a reasonable cost/benefit ratio. Without going into too much detail about
what virtual functions are, let's just say that polymorphism would be impossible without them. And
without polymorphism, C++ would not be a true object-oriented language. The only significant cost of
virtual functions is one additional memory lookup before a virtual function can be called. Ordinary
function and method calls are not affected.
The features of C++ that are typically too expensive for embedded systems are templates, exceptions,
and runtime type identification. All three of these negatively impact code size, and exceptions and
runtime type identification also increase execution time. Before deciding whether to use these features,
you might want to do some experiments to see how they will affect the size and speed of your own
application.
Page 268
The Windows CE operating system can be specified instead when ordering the VIPER-Lite board.
There is an additional cost for the VIPER-Lite with the Windows CE operating system. Examples in the
book target the embedded Linux operating system version of the board.
JTAG port for system debugging
Page 269
Page 270
The CD-ROM that comes with the Arcom development kit includes all VIPER-Lite manuals and
reference documents, datasheets for all components on the board, source code for RedBoot, embedded
Linux packages with source code, and binary images for RedBoot and embedded Linux.
The software development tools for the Arcom board are located on the book's web site. We built these
tools ourselves for the ARM processor by following the instructions shown in Appendix C. The software
tools include the GNU C compiler (gcc), assembler (as), linker (ld), and debugger (gdb). We encourage
you to investigate the other GNU tools included in the development kit. All programs in this book were
built using the tools contained on the book's web site.
For readers of this book, the VIPER-Lite development kit is available at a special discount price of $295
(plus shipping). Use one of the following order codes when contacting Arcom, depending on the
operating system you want:
VIPER-Lite Embedded Linux Development Kit
VIPER-Lite Windows CE Development Kit
Page 271
Page 272
The Linux example code in Chapter 12 has not been built and tested using a host
computer running Windows. It is common to use a Linux host system for
developing embedded Linux applications.
Building applications for Linux using a Windows host is beyond the scope of this
book. It involves the use of the Cygwin free software toolset, a somewhat more
involved procedure than the one described in this chapter.
1. Now select the directory from which to install the Cygwin tools. In this case, we select "Install
from Local Directory" and then click Next.
2. In the next dialog box, select the location on your hard drive where you want the Cygwin tools to
be installed. Leave the default as C:\\cygwin. (If you want to choose an alternate destination,
change the drive and directory location accordingly.) In the Install For selection box, select All
Users, and for the Default Text File Type, select DOS. Then click Next.
3. Tell the Cygwin setup where the local files that you want to install reside. Browse to the cygwin
temporary directory, where you unzipped the Cygwin install files, and then click Next. This will
cause the Cygwin setup program to inventory the available tools and display the available list.
Page 273
You should notice that the GNU development tools are installed under C:\\cygwin\\opt\\gnutools. If you
look under the arm-elf \\bin directory, you should see the GNU tools (such as gcc, as, gdb, and ld)
executable files, prepended with the name arm-elf. This describes the processor for which the tools are
built, arm, and the object file format, elf (which stands for "executable and linkable format").
To test that you installed the tools correctly and set up the path properly, open a Cygwin bash shell and
enter the command:
# arm-elf-gcc -v
Page 274
# cd /opt
4. The Linux version of the ARM-based GNU tools is located in the file linuxhost.tar.gz. Copy this
file to the /opt directory. Next, decompress the file on your hard drive using the command:
5.
6.
7. Finally, set the path to the GNU tools location in your bash shell profile. This ensures the path is
set correctly each time the bash shell is started. Edit the bash profile file named
$HOME/.bash_profile (where $HOME is specific to your environment). Add the following to the
last line in this file:
8.
9.
You should notice that the GNU development tools are installed under /opt/gnutools. The executable
files, such as arm-elf-gcc, are contained under the /opt/gnutools/bin directory. The prepended name armelf describes the processor for which the tools are built, arm, and the object file format, elf.
Page 275
Approximately 1.2 GB of disk space is needed for the GNU tools source code and build output
directories. These instructions are adapted from the web page "Building a toolchain for use with eCos,"
found online at http://ecos.sourceware.org.
Another popular site for building cross development tools is found online at http://kegel.com/crosstool.
This site contains instructions and numerous scripts for building various GNU cross development tool
Page 276
# mkdir -p /src
# cd /src
11. Normally, you would apply any patches needed at this point; however, we have already applied
the necessary patches to the source files. There is a patch utility to aid in applying patches.
Page 277
12. The resulting output is contained in the file configure.out. If there are any problems configuring
the tools, refer to this file.
13. Build and install the GNU binutils (this step may take an especially long time):
14. # make -w all install 2>&1 | tee make.out
15.
16. The resulting output is contained in the file make.out. If there are any problems building the
tools, refer to this file.
17. Ensure that the binutils are at the head of the PATH:
18. # PATH=/opt/gnutools/arm-elf/bin:$PATH ; export PATH
19.
31. Build and install gcc (this step may take an especially long time):
32. # make -w all install 2>&1 | tee make.out
33.
41. Build and install gdb (this step may take an especially long time):
42. # make -w all install 2>&1 | tee make.out
43.
Following the successful building and installation of the GNU tools, the associated build tree (located
under /tmp/build) may be deleted to save space if necessary. The toolchain executable files directory
(/opt/gnutools/arm-elf/bin) should be added to the head of your PATH.
Page 279
# mkdir p /opt/ecos
# cd /opt/ecos
9. The directory that contains the eCos source code should now be available under /opt/ecos/ecosredboot-viper-v3i7.
10. Set up the environment variables. Edit the $HOME/.bash_profile file (where $HOME is specific
to your environment) and add the following lines:
11.
PATH=/opt/ecos/ecos-redboot-viper-v3i7/tools:$PATH ;
export PATH
12.
13.
14.
15.
16. Close the current bash environment and open a new one. This allows the changes just made to
the environment to become effective.
Page 280
# mkdir p /opt/ProgEmbSys/chapter11/ecos
# cd /opt/ProgEmbSys/chapter11/ecos
5. Create a new configuration for the Arcom board using the eCos default template by entering
the following command:
6.
7.
14. If you encounter an error, make sure the path is set up correctly, as previously shown. After
successfully building an eCos library, you should see the following message:
15. build finished
16. You should have various directories under /opt/ProgEmbSys/chapter11/ecos, including the
directory install/lib. The lib directory contains the eCos operating system archive files that get
linked with eCos applications.
The eCos makefiles contain a variable, ECOS_INSTALL_DIR, which is set to the location of the eCos
install directory/opt/ProgEmbSys/chapter11/ecos/install in this case. If the eCos install directory
location changes, this variable must also be changed.
Page 281
The Linux distribution running on the host system must be compatible with the Linux Standard Base
(LSB) version 1.3 (see http://www.linuxbase.org for more information). We used Fedora Core 5
(http://fedora.redhat.com) on our host development system.
You will need permission to become superuser (root) to perform the installation
procedure successfully.
The following commands are executed from a terminal window with the Arcom board development CDROM inserted in the drive.
1. Mount the CD-ROM where the Arcom board development CD-ROM is located using the
following command:
2.
3.
Page 282
# perl /mnt/install
7. At this point, the installation program outputs a message similar to the following:
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
Page 283
65. Once the installation is successfully completed, the following message will be output:
66.
67.
68.
69.
70.
71.
72.
73.
Installation complete.
You should add \Q/opt/arcom/bin'' to your PATH. You can do this for the
current login session with the following command:
export PATH=/opt/arcom/bin:$PATH
or you can modify the path for all login sessions for this user by adding
the same statement to \Q$HOME/.bash_profile'' or for all users by adding it
to \Q/etc/profile''.
74. In order to ensure the path is set correctly each time you open a terminal shell, edit the
$HOME/.bash_profile file (where $HOME is specific to your environment) by adding the
following line:
75.
76.
The directory /opt/arcom should be populated with various files. The executable files, such as armlinux-gcc, are contained under the /opt/arcom/bin directory.
To test that you installed the tools correctly and set up the path properly, close the existing terminal
window. Open a new terminal window (to ensure the path is set properly) and enter the command:
# arm-linux-gcc -v
Page 284
# cd /opt/ProgEmbSys/chapter12/blink
4. Then build the example code using the makefile with the following command:
5.
6.
# make
7. This should produce two executable files named blink and blinkdbg.
Page 285
Ensure you have the Arcom board's Ethernet board connected to the main board.
Then connect an Ethernet cable between the Arcom board and your computer
(either directly or via a hub). The instructions for connecting the Ethernet board
are shown in the Arcom VIPER Technical Manual and the VIPER-I/O Technical
Manual.
The following instructions also assume that a Dynamic Host Configuration
Protocol (DHCP) server is present on your network. This allows the Arcom board
to obtain a dynamic Internet Protocol (IP) address. If you do not have a DHCP
server on your network, refer to the Arcom Embedded Linux Technical Manual
section on statically configuring a network interface.
Power up the Arcom board and allow RedBoot to run the Linux boot script. You should see output
similar to the following once Linux begins its boot process:
RedBoot> clock -l 27 -m 2 -n 10
mem:99.532MHz run:199.065MHz turbo:199.065MHz. cccr 0x141 (L=27 M=1 N=1.0)
RedBoot> mount -t jffs2 -f filesystem
RedBoot> load -r -b %{FREEMEMLO} %{kernel}
Using default protocol (file)
Raw file loaded 0x00400000-0x004d4c3f, assumed entry at 0x00400000
RedBoot> exec -c %{cmdline}
Using base address 0x00400000 and length 0x000d4c40
Uncompressing Linux............
After Linux has successfully booted, you should see the Arcom board's login prompt:
viper login:
You can then download and run the Linux examples by following these steps:
1. At the board's login prompt, enter root for the login name and arcom for the password. The
Arcom board should then output a command prompt:
2.
root@viper root#
3. Next, download the program from the host computer to the Arcom board. For instance, to
download the blink example, open a terminal window on the host Linux system and enter the
following commands:
4.
5.
6.
# cd /opt/ProgEmbSys/chapter12/blink
# scp blink 192.168.0.4:/tmp/blink
Page 286
root@192.168.0.4's password:
10. Enter the password arcom. The download will take place and the terminal should show output
similar to the following:
11. blink
100% 3620
3.5KB/s
00:00
12. To execute the downloaded program, enter the following at the Arcom board's prompt:
13. root@viper root# /tmp/blink
14.
15. If the program downloaded properly, the green LED should be toggling.
The Linux examples take control over the Arcom board and are intended to run forever. In order to
terminate a specific example, press Ctrl-C at the console; the Arcom board should abort the program and
return to the VIPER-Lite prompt.
4. The Arcom board will then wait for you to connect your host to the target, indicating that it is
waiting by outputting the following message:
5.
6.
Page 287
# arm-linux-gdb blinkdbg
10. You should then see the familiar gdb prompt (as we covered in Chapter 5), similar to the
following:
11.
12.
13.
14.
21. Connect the host to the target by entering the following command at the host's gdb prompt:
22. (gdb) target remote 192.168.0.4:9000
23.
24. You need to change the IP address (192.168.0.4) to the IP address appropriate for your Arcom
board.
25. Upon successful connection, the Arcom board outputs the following message (the following IP
address may be different for your host):
26. Remote debugging from host 192.168.0.3
You are now ready to start debugging! For additional information about debugging with gdb, refer to
Chapter 5.
Page 288