ECE Embeded Systems Lecture Notes
ECE Embeded Systems Lecture Notes
ON
EMBEDDED SYSTEMS
(Autonomous-IARE-R16)
Mr. S Lakshmanachari
(Assistant Professor)
----------------------------------------------------------------------------------------------------------------
SYLLABUS:
Definition of embedded system, embedded systems vs. general computing systems, history of
embedded systems, complex systems and microprocessor, classification, major application
areas, the embedded system design process, characteristics and quality attributes of
embedded systems, formalisms for system design, design examples.
--------------------------------------------------------------------------------------------------------------------------
INTRODUCTION:
System Definition:
A way of working, organizing or performing one or many tasks according to a fixed set
of rules, program or plan.
Also an arrangement in which all units assemble and work together according to a
program or plan.
Examples of Systems:
Time display system – A watch
Automatic cloth washing system – A washing machine
“An embedded system is a system that has software embedded into computer-hardware,
which makes a system dedicated for an application (s) or specific part of an application
or product or part of a larger system.”
(Or)
An embedded system is one that has dedicated purpose software embedded in computer
hardware.
(Or)
It is a dedicated computer based system for an application(s) or product. It may be an
independent system or a part of large system. Its software usually embeds into a
ROM (Read Only Memory) or flash.”
(Or)
It is any device that includes a programmable computer but is not itself intended to be a
general purpose computer.”
MICROPROCESSOR:
Microprocessor is a multipurpose, programmable device that accepts digital data as
input, processes it according to instructions stored in its memory, and provides
results as output.
or
A microprocessor is a multipurpose, programmable, clock-driven, register-based
electronic device that reads binary instructions from a storage device called memory
accepts binary data as input and processes data according to instructions, and
provides result as output.
MICROCONTROLLER:
IMAGE PROCESSOR:
Whirlwind, a computer designed at MIT in the late 1940s and early 1950s.
Whirlwind was also the first computer designed to support real-time operation and
was originally conceived as a mechanism for controlling an aircraft simulator. It
was extremely large physically compared to today’s computers (e.g., it contained
over 4,000 vacuum tubes).
Very-large-scale integration (VLSI) is the process of creating an integrated circuit
(IC) by combining thousands of transistors into a single chip. VLSI began in the
1970s. A microprocessor is a single-chip CPU. Very large scale integration
(VLSI) technology allowed us to put a complete CPU on a single chip since
1970s, but those CPUs were very simple.
In 1971 the first microprocessor the Intel 4004 invented by Ted Hoff, was
designed for an embedded application, namely, a calculator. The calculator was
not a general-purpose computer—it merely provided basic arithmetic functions.
The HP-35 was the first handheld calculator to perform transcendental
functions. It was introduced in 1972, so it used several chips to implement the
CPU, rather than a single-chip microprocessor.
Automobile designers started making use of the microprocessor soon after single-
chip CPUs became available. The most important and sophisticated use of
microprocessors in automobiles was to control the engine: determining when
spark plugs fire, controlling the fuel/air mixture, and so on.
Microprocessors are usually classified according to their word length.
EXAMPLE:
a. Complex Algorithms
b. User Interface
c. Real Time
d. Multirate
e. Manufacturing Cost
f. Power
Complex algorithms: The operations performed by the microprocessor may be
very sophisticated. For example, the microprocessor that controls an
automobile engine must perform complicated filtering functions to optimize the
performance of the car while minimizing pollution and fuel utilization.
User interface: Microprocessors are frequently used to control complex user
interfaces that may include multiple menus and many options. The moving
maps in Global Positioning System (GPS) navigation are good examples of
sophisticated user interfaces.
Manufacturing cost: The total cost of building the system is very important in
many cases. Manufacturing cost is determined by many factors, including the
type of microprocessor used, the amount of memory required, and the types of
I/O devices.
Power and energy: Power consumption directly affects the cost of the hardware,
since a larger power supply may be necessary. Energy consumption affects battery
life, which is important in many applications, as well as heat consumption, which
can be important even in desktop applications.
There are many ways to design a digital system: custom logic, field-programmable
gate arrays (FPGAs), and so on.
Why use microprocessors? There are two answers:
o Microprocessors are a very efficient way to implement digital systems.
o Microprocessors make it easier to design families of products that can be
built to provide various feature sets at different price points and can be
extended to provide new features to keep up with rapidly changing
markets.
1. Requirements:
Clearly, before we design a system, we must know what we are designing. The initial
stages of the design process capture this information for use in creating the architecture
and components. We generally proceed in two phases:
1. First, we gather an informal description from the customers known as
requirements;
2. Second we refine the requirements into a specification that contains enough
information to begin designing the system architecture.
Separating out requirements analysis and specification is often necessary
because of the large gap between what the customers can describe about the
system they want and what the architects need to design the system.
Requirements may be functional or non functional.
Performance: The speed of the system is often a major consideration both for the
usability of the system and for its ultimate cost. As we have noted, performance may
be a combination of soft performance metrics such as approximate time to perform a
user-level function and hard deadlines by which a particular operation must be
completed.
Cost: The target cost or purchase price for the system is almost always a
consideration. Cost typically has two major components:
Manufacturing cost includes the cost of components and assembly
Nonrecurring engineering (NRE) costs include the personnel and other
costs of designing the system.
Physical size and weight: The physical aspects of the final system can vary greatly
depending upon the application. An industrial control system for an assembly line
may be designed to fit into a standard-size rack with no strict limitations on
weight. A handheld device typically has tight requirements on both size and
weight that can ripple through the entire system design.
■ Name: This is simple but helpful. Giving a name to the project should tell the purpose
of the machine.
■ Purpose: This should be a brief one- or two-line description of what the system is
supposed to do. If you can’t describe the essence of your system in one or two lines,
chances are that you don’t understand it well enough.
■ Inputs and outputs: These two entries are more complex than they seem. The inputs
and outputs to the system encompass a wealth of detail:
— Types of data: Analog electronic signals? Digital data? Mechanical inputs?
— Data characteristics: Periodically arriving data, such as digital audio samples?
How many bits per data element?
— Types of I/O devices: Buttons? Analog/digital converters? Video displays?
■ Functions: This is a more detailed description of what the system does. A good way
to approach this is to work from the inputs to the outputs: When the system receives an
input, what does it do? How do user interface inputs affect these functions? How do
different functions interact?
■ Performance: Many embedded computing systems spend at least some time to
control physical devices or processing data coming from the physical world. In most of
these cases, the computations must be performed within a certain time.
■ Manufacturing cost: This includes primarily the cost of the hardware components.
Even if you don’t know exactly how much you can afford to spend on system
components, you should have some idea of the eventual cost range. Cost has a
substantial influence on architecture.
■ Power: Similarly, you may have only a rough idea of how much power the system
can consume, but a little information can go a long way. Typically, the most important
decision is whether the machine will be battery powered or plugged into the wall.
Battery-powered machines must be much more careful about how they spend energy.
■ Physical size and weight: You should give some indication of the physical size of the
system that helps to take architectural decisions.
After writing the requirements, you should check them for internal consistency. To
practice the capture of system requirements, Example 1.1 creates the requirements
for a GPS moving map system.
Example 1.1
Requirements analysis of a GPS moving map
The moving map is a handheld device that displays for the user a map of the terrain
around the user’s current position; the map display changes as the user and the map
device change position. The moving map obtains its position from the GPS, a
satellite-based navigation system. The moving map display might look something
like the following figure.
What requirements might we have for our GPS moving map? Here is an initial list:
■ Functionality: This system is designed for highway driving and similar uses. The
system should show major roads and other landmarks available in standard
topographic databases.
■ User interface: The screen should have at least 400_600 pixel resolution. The
device should be controlled by no more than three buttons. A menu system should
pop up on the screen when buttons are pressed to allow the user to make selections
to control the system.
■ Performance: The map should scroll smoothly. Upon power-up, a display should
take no more than one second to appear, and the system should be able to verify its
position and display the current map within 15 sec.
■ Cost: The selling cost of the unit should be no more than $100.
■ Physical size and weight: The device should fit comfortably in the palm of the hand.
■ Power consumption: The device should run for at least eight hours on four batteries.
Requirements form for GPS moving map system:
2. Specification:
The specification is more precise—it serves as the contract between the customer and
the architects.
The specification must be carefully written so that it accurately reflects the customer’s
requirements
and that can be clearly followed during design.
An unclear specification leads different types of problems.
If the behaviour of some feature in a particular situation is unclear from the
specification, the designer may implement the wrong functionality.
If global characteristics of the specification are wrong or incomplete, the overall
system architecture derived from the specification may be inadequate to meet the
needs of implementation.
A specification of the GPS system would include several components:
o Data received from the GPS satellite constellation.
o Map data
o User interface.
o Operations that must be performed to satisfy customer requests.
o Background actions required to keep the system running, such as operating
the GPS receiver.
3. Architecture Design:
The architecture is a plan for the overall structure of the system that will be used
later to design the components that make up the architecture.
To understand what an architectural description is, let’s look at sample
architecture for the moving map of Example 1.1.
Figure 1.3 shows a sample system architecture in the form of a block
diagram that shows major operations and data flows among them.
The topographic database and to render (i.e., draw) the results for the display.
We have chosen to separate those functions so that we can potentially do them
in parallel— performing rendering separately from searching the database may
help us update the screen more fluidly.
For more implementation details we should refine that system block
diagram into two block diagrams:
o Hardware block diagram (Hardware architecture)
o Software block diagram(Software architecture)
These two more refined block diagrams are shown in Figure 1.4
The hardware block diagram clearly shows that we have one central CPU
surrounded by memory and I/O devices.
We have chosen to use two memories:
o A frame buffer for the pixels to be displayed
o A separate program/data memory for general use by the CPU
The software block diagram fairly closely follows the system block diagram.
We have added a timer to control when we read the buttons on the user interface
and render data onto the screen.
Architectural descriptions must be designed to satisfy both functional and
nonfunctional requirements.
Not only must all the required functions be present, but we must meet cost, speed,
power and other nonfunctional constraints.
Starting out with system architecture and refining that to hardware and software
architectures is one good way to ensure that we meet all specifications:
We can concentrate on the functional elements in the system block diagram, and
then consider the nonfunctional constraints when creating the hardware and
software architectures.
5. System Integration:
Putting hardware and software components together will give complete working
system.
Bugs are typically found during system integration, and good planning can help
us to find the bugs quickly.
If we debug only a few modules at a time, we are more likely to uncover the
simple bugs and able to easily recognize them.
System integration is difficult because it usually uncovers problems. It is often hard to
observe the system in sufficient detail to determine exactly what is wrong— the
debugging facilities for embedded systems are usually much more limited than what
you would find on desktop systems. As a result, determining why things do not work
correctly and how they can be fixed is a challenge in itself.
4. FORMALISMS FOR SYSTEM DESIGN
Structural Description:
The class has the name that we saw used in the d1 object since d1 is an instance of
class Display. The
Display class defines the pixels attribute seen in the object;
A class defines both the interface for a particular type of object and that object’s
implementation.
There are several types of relationships that can exist between objects and classes:
o Association occurs between objects that communicate with each
other but have no ownership relationship between them.
o Aggregation describes a complex object made of smaller objects.
o Composition is a type of aggregation in which the owner does not
allow access to the component objects.
o Generalization allows us to define one class in terms of another
Derived class:
Link:
Behavioral Description:
We have to specify the behavior of the system as well as its structure. One way to
specify the behavior of an operation is a state machine.
Fig1.10 shows UML states; the transition between two states is shown by arrow.
These state machines will not rely on the operation of a clock, as in hardware;
rather, changes from one state to another are triggered by the occurrence of events.
An event is some type of action. Events are divided into two categories. They are:
o External events: The event may originate outside the system, such as a user
pressing a button.
o Internal events: It may also originate inside, such as when one routine
finishes its computation and passes the result on to another routine.
We will concentrate on the following three types of events defined by UML, as
illustrated in figure 1.11(signal and call event) and (Time out event)
o A signal is an asynchronous occurrence. It is defined in UML by an object
that is labeled as a
<<signal>>. The object in the diagram serves as a declaration of the event’s
existence.
Because it is an object, a signal may have parameters that are passed to the
signal’s receiver.
o A call event follows the model of a procedure call in a programming
language.
o A time-out event causes the machine to leave a state after a certain amount
of time. The label tm (time-value) on the edge gives the amount of time
after which the transition occurs. A time-out is generally implemented with
an external timer.
Unconditional and conditional transitions:
Processing includes three objects shown at the top of the diagram. Extending
below each object is its lifeline, a dashed line that shows how long the object is
alive. In this case, all the objects remain alive for the entire sequence, but in other
cases objects may be created or destroyed during processing.
The boxes along the lifelines show the focus of control in the sequence, that is,
when the object is actively processing.
In this case, the mouse object is active only long enough to create the mouse_click
event. The display object remains in play longer; it in turn uses call events to
invoke the menu object twice: once to determine which menu item was selected
and again to actually execute the menu call.
The find region ( ) call is internal to the display object, so it does not
appear as an event in the diagram.
Fig: A UML sequence diagram for a typical sequence of train control commands
The focus of the control bars shows the both the console and receiver run
continuously. The packets can be sent at any time—there is no global clock
controlling when the console sends and the train receives, we do not have to worry
about detecting collisions among the packets.
Set- inertia message will send infrequently. Most of the message commands are
speed commands. When a train receives speed command, it will speed up and slow
down the train smoothly at rate determined by the set-inertia command.
An emergency stop command may be received, which causes the train receiver to
immediately shut down the train motor.
We can model the commands in UML with two level class hierarchy as shown in
the Fig1.16. Here we have one base class command, there are three sub classes set-
speed, set-inertia, Estop, derived from base class. One for each specific type of
command.
We now need to model the train control system itself. There are clearly two major
subsystems: the control-box and the train board component. Each of these
subsystems has its own internal structure.
The figure 1.17 Shows relationship between console and receiver (ignores role of
track):
The console and receiver are each represented by objects: the console sends a
sequence of packets to the train receiver, as illustrated by the arrow. The notation
on the arrow provides both the type of message sent and its sequence in a flow of
messages .we have numbered the arrow’s messages as
1…n .
Let’s break down the console and receiver into three major components.
The console needs to perform three functions
o Console:
Read state of front panel
Format messages
Transmit messages.
The train receiver must also perform three major functions
o Train receiver:
receive message
interpret message
control the train
The UML class diagram is show in the below figure 1.18
Panel: Describes the console front panel, which contains analog knobs and
interface hardware to interface to the digital parts of the system.
Formatter: It knows how to read the panel knobs and creates bit stream for
message.
Transmitter: Send the message along the track.
Knobs* describes the actual analog knobs, buttons, and levers on the control
panel.
Sender* describes the analog electronics that send bits along the track.
• Receiver: It knows how to turn the analog signal on the track into digital form.
• Controller: Interprets received commands and figures out how to control the
motor.
• Motor interface: Generates the analog signals required to control the motor.
Conceptual specification that defines the basic classes, let’s refine it to create a
more detailed specification. We won’t make a complete specification. But we
will add details to the class. We can now fill in the details of the conceptual
specification. Sketching out the spec first helps us understand the basic
relationships in the system.
We need to define the analog components in a little more detail because there
characteristics will strongly influence the formatter and controller. Fig1.19 shows a
little more detail than Fig 1.18, It includes attributes and behavior of these classes.
The panel has three knobs: train number (which train is currently being
controlled), speed (which can be positive or negative), and inertia. It also has one
button for emergency-stop.
The Sender and Detector classes are relatively simple: They simply put out and
pick up a bit, respectively.
To understand the Pulser class, let’s consider how we actually control the train
motor’s speed. As shown in Figure 1.20, the speed of electric motors is commonly
controlled using pulse-width modulation: Power is applied in a pulse for a fraction
of some fixed interval, with the fraction of the time that power is applied
determining the speed.
Figure 1.21 shows the classes for the panel and motor interfaces. These classes
form the software interfaces to their respective physical devices.
The Panel class defines a behavior for each of the controls on the panel;
The new-settings behavior uses the set-knobs behavior of the Knobs* class to
change the knobs settings whenever the train number setting is changed.
The Motor-interface defines an attribute for speed that can be set by other classes.
The Transmitter and Receiver classes are shown in Figure 1.22.They provides the
software interface to the physical devices that send and receive bits along the track.
The Transmitter provides a distinct behavior for each type of message that can
be sent; it internally takes care of formatting the message.
The Receiver class provides a read-cmd behavior to read a message off the tracks.
The Formatter class is shown in Figure 1.23. The formatter holds the current
control settings for all of the trains.
The send-command method is a utility function that serves as the interface to the
transmitter.
The operate function performs the basic actions for the object.
The panel-active behaviour returns true whenever the panel’s values do not
correspond to the
current values.
The role of the formatter during the panel’s operation is illustrated by the
sequence diagram of Figure 1.24.
The figure shows two changes to the knob settings: first to the throttle, inertia,
or emergency stop; then to the train number.
The panel is called periodically by the formatter to determine if any control
settings have changed. If a setting has changed for the current train, the formatter
decides to send a command, issuing a send- command behavior to cause the
transmitter to send the bits.
Because transmission is serial, it takes a noticeable amount of time for the
transmitter to finish a command; in the meantime, the formatter continues to
check the panel’s control settings.
If the train number has changed, the formatter must cause the knob settings
to be reset to the proper values for the new train.
The state diagram for a very simple version of the operate behavior of the
Formatter class is shown in Figure 1.25.
This behavior watches the panel for activity: If the train number changes, it
updates the panel display; otherwise, it causes the required message to be
sent.
Figure 1.26 shows a state diagram for the panel-active behavior.
-------------------------------------------------------------------------------------------------------
SYLLABUS:
Here is the last version of the 64-llword packet checksum routine we studied in Section 5.2.
This shows how the compiler treats a loop with incrementing count i++.
checksum_v5
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ;i=0
checksum_v5_loop
LDR r3,[r2],#4 ; r3 = *(data++)
ADD r1,r1,#1 ; i++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v5_loop ; if (i<64) goto loop
MOV pc,r14 ; return sum
An ADD to increment i
A compare to check if i is less than 64
A conditional branch to continue the loop if i< 64
This is not efficient. On the ARM, a loop should only use two instructions:
A subtract to decrement the loop counter, which also sets the condition code flags on
the result
A conditional branch instruction
The key point is that the loop counter should count down to zero rather than counting up
to some arbitrary limit. Then the comparison with zero is free since the result is stored in the
condition flags. Since we are no longer using i as an array index, there is no problem in
counting down rather than up.
This compiles to
checksum_v7
MOV r2,#0 ; sum = 0
CMP r1,#0 ; compare N, 0
BEQ checksum_v7_end ; if (N==0) goto end
checksum_v7_loop
LDR r3,[r0],#4 ; r3 = *(data++)
SUBS r1,r1,#1 ; N-- and set flags
ADD r2,r3,r2 ; sum += r3
BNE checksum_v7_loop ; if (N!=0) goto loop
checksum_v7_end
MOV r0,r2 ; r0 = sum
MOV pc,r14 ; return r0
Notice that the compiler checks that N is nonzero on entry to the function. Often this check is
unnecessary since you know that the array won’t be empty. In this case a do-while loop gives
better performance and code density than a for loop.
We call these instructions the loop overhead. On ARM7 or ARM9 processors the subtract
takes one cycle and the branch three cycles, giving an overhead of four cycles per loop.
You can save some of these cycles by unrolling a loop—repeating the loop body several
times, and reducing the number of loop iterations by the same proportion. For example, let’s
unroll our packet checksum example four times.
EXAMPLE4
The following code unrolls our packet checksum loop by four times. We assume that the
number of words in the packet Nis a multiple of four.
do
{
sum += *(data++);
sum += *(data++);
sum += *(data++);
sum += *(data++);
N -= 4;
} while (N!=0);
return sum;
}
This compiles to
checksum_v9
MOV r2,#0 ; sum = 0
checksum_v9_loop
LDR r3,[r0],#4 ; r3 = *(data++)
SUBS r1,r1,#4 ; N -= 4 & set flags
ADD r2,r3,r2 ; sum += r3
LDR r3,[r0],#4 ; r3 = *(data++)
ADD r2,r3,r2 ; sum += r3
LDR r3,[r0],#4 ; r3 = *(data++)
ADD r2,r3,r2 ; sum += r3
LDR r3,[r0],#4 ; r3 = *(data++)
ADD r2,r3,r2 ; sum += r3
BNE checksum_v9_loop ; if (N!=0) goto
loop
MOV r0,r2 ; r0 = sum
MOV pc,r14 ; return r0
We have reduced the loop overhead from 4N cycles to (4N)/4 N =cycles. On the ARM7TDMI,
= per accumulate, nearly
this accelerates the loop from 8 cycles per accumulate to 20/4 5 cycles
doubling the speed! For the ARM9TDMI, which has a faster load instruction, the benefit is
even higher. ■
There are two questions you need to ask when unrolling a loop:
■ How many times should I unroll the loop?
■ What if the number of loop iterations is not a multiple of the unroll amount? For example,
what if Nis not a multiple of four in checksum_v9?
To start with the first question, only unroll loops that are important for the overall
performance of the application. Otherwise unrolling will increase the code size with little
performance benefit. Unrolling may even reduce performance by evicting more important
code from the cache.
Suppose the loop is important, for example, 30% of the entire application. Suppose you
unroll the loop until it is 0.5 KB in code size (128 instructions). Then the loop overhead is at
most 4 cycles compared to a loop body of around 128 cycles. The loop overhead cost is
3/128, roughly 3%. Recalling that the loop is 30% of the entire application, overall the loop
overhead is only 1%. Unrolling the code further gains little extra performance, but has a
significant impact on the cache contents. It is usually not worth unrolling further when the
gain is less than 1%.
For the second question, try to arrange it so that array sizes are multiples of your unroll
amount. If this isn’t possible, then you must add extra code to take care of the leftover cases.
2.2 REGISTER ALLOCATION
The compiler attempts to allocate a processor register to each local variable you use in a C
function. It will try to use the same register for different local variables if the use of the
variables do not overlap. When there are more local variables than available registers, the
compiler stores the excess variables on the processor stack. These variables are called spilled
or swapped out variables since they are written out to memory (in a similar way virtual
memory is swapped out to disk). Spilled variables are slow to access compared to variables
allocated to registers.
To implement a function efficiently, you need to
First let’s look at the number of processor registers the ARM C compilers have avail- able
for allocating variables. Table 5.3 shows the standard register names and usage when
following the ARM-Thumb procedure call standard (ATPCS), which is used in code
generated by C compilers.
Provided the compiler is not using software stack checking or a frame pointer, then the C
compiler can use registers r0 to r12 and r14 to hold variables. It must save the callee values of
r4 to r11 and r14 on the stack if using these registers.
In theory, the C compiler can assign 14 variables to registers without spillage. In practice,
some compilers use a fixed register such as r12 for intermediate scratch working and do not
assign variables to this register. Also, complex expressions require intermediate working
registers to evaluate. Therefore, to ensure good assignment to registers, you should try to limit
the internal loop of functions to using at most 12 local variables.
If the compiler does need to swap out variables, then it chooses which variables to swap out
based on frequency of use. A variable used inside a loop counts multiple times. You can guide
the compiler as to which variables are important by ensuring these variables are used within
the innermost loop.
The register keyword in C hints that a compiler should allocate the given variable to a
register. However, different compilers treat this keyword in different ways, and different
architectures have a different number of available registers (for example, Thumb and ARM).
Therefore we recommend that you avoid using register and rely on the compiler’s normal
register allocation routine.
Table 2.1 C compiler register usage.
■ Try to limit the number of local variables in the internal loop of functions to 12. The
compiler should be able to allocate these to ARM registers.
■ You can guide the compiler as to which variables are important by ensuring these
variables are used within the innermost loop.
The next example illustrates the benefits of using a structure pointer. First we show a
typical routine to insert N bytes from array data into a queue. We implement the queue using a
cyclic buffer with start address Q_start(inclusive) and end address Q_end(exclusive).
char *queue_bytes_v1(
char *Q_start, /* Queue buffer start address */
char *Q_end, /* Queue buffer end address */
char *Q_ptr, /* Current queue pointer position */
char *data, /* Data to insert into the queue */
unsigned int N) /* Number of bytes to insert */
{
do
{
*(Q_ptr++) = *(data++);
if (Q_ptr == Q_end)
{
Q_ptr = Q_start;
}
} while (--N); return Q_ptr;
}
This compiles to
queue_bytes_v1
STR r14,[r13,#-4]! ; save lr on the stack
LD r12,[r13,#4] ; r12 = N
R
queue_v1_loop
LDRB r14,[r3],#1 ; r14 = *(data++)
STRB r14,[r2],#1 ; *(Q_ptr++) = r14
CMP r2,r1 ; if (Q_ptr ==
Q_end)
MOVE r2,r0 ; {Q_ptr =
Q Q_start;}
SUBS r12,r12,#1 ; --N and set flags
BNE queue_v1_l ; if (N!=0) goto loop
oop
MOV r0,r2 ; r0 = Q_ptr
LDR pc,[r13],#4 ; return r0
Compare this with a more structured approach using three function arguments.
POINTER ALIASING:
Two pointers are said to alias when they point to the same address. If you write to one
pointer, it will affect the value you read from the other pointer.
In a function, the compiler often doesn’t know which pointers can alias and which pointers
can’t. The compiler must be very pessimistic and assume that any write to a pointer may
affect the value read from any other pointer, which can significantly reduce code efficiency.
Whenever the conventional 'C' Language and its extensions are used for programming
embedded systems, it is referred as 'Embedded C’ programming. Programming in
'Embedded C' is quite different from conventional Desktop application development using 'C'
language for a particular OS platform.
Desktop computers contain working memory in the range of Megabytes (Nowadays Giga
bytes) and storage memory in the range of Giga bytes. For a desktop application developer,
the resources available are surplus in quantity and they can be very lavish in the usage of
RAM and ROM and no restrictions are imposed at all. This is not the case for embedded
application developers.
Almost all embedded systems are limited in both storage and working memory resources.
Embedded application developers should be aware of this fact and should develop
applications in the best possible way which optimizes the code memory and working memory
usage as well as performance.
In other words, the hands of an embedded application developer are always tied up in the
memory usage context.
Identifiers are user defined names and labels. Identifiers can contain letters of English
alphabet (both upper and lower case) and numbers. The starting character of an identifier
should be a letter. The only special character allowed in identifier is underscore ( _ ).
Ex: Root, _getchar, _sin, x_1, x1, If
Data Types:
Data type represents the type of data held by a variable. The various data types supported,
their storage space (bits) and storage capacity for 'C' language are tabulated below.
Arithmetic and Relational Operations:
Logical Operations:
Logical operations are usually performed for decision making and program control transfer.
Looping Instructions:
Looping instructions are used for executing a particular block of code repeatedly till a
condition is met or wait till an event is fired.
Embedded programming often uses the looping instructions for checking the status of certain
I/O ports, registers, etc. and also for producing delays. Certain devices allow write/read
operations to and from some registers of the device only when the device is ready and the
device ready is normally indicated by a status register or by setting/clearing certain bits of
status registers.
Hence the program should keep on reading the status register till the device ready indication
comes. The reading operation forms a loop. The looping instructions supported by are listed
below.
Looping Instructions:
//while statement
While (expression)
{
Body of while loop
}
//do while statement
do
{
Body of do loop
}
While (expression)
//for loop
for (initialization; test for condition; update variable)
{
Body of for loop
}
The elements of an array are accessed by using the array index or subscript.
The index of the first element is '0'. For the above example the first element is accessed by
arr[0], second element by arr[1], and so on. In the above example, the array starts at memory
location 0x8000 (arbitrary value taken for illustration) and the address of the first element is
0x8000.
The `address of operator (&) returns the address of the memory location where the variable is
stored. Hence &arr[0] will return 0x8000 and &arr[1] will return 0x8001, etc.. The name of
the array itself with no index (subscript) always returns the address of the first element. If we
examine the first element arr[0] of the above array, we can see that the variable arr[0] is
allocated a memory location 0x8000 and the contents of that memory location holds the value
for arr[0].
Pointers:
Pointer is a flexible at the same time most dangerous feature, capable of creating potential
damages leading to firmware crash, if not used properly.
Pointer is a memory pointing based technique for variable access and modification. Pointers
are very helpful in
1. Accessing and modifying variables
2. Increasing speed of execution
3. Accessing contents within a block of memory
4. Passing variables to functions by eliminating the use of a local copy of variables
5. Dynamic memory.
Some real life examples of embedded systems may involve ticketing machines, vending
machines, temperature controlling unit in air conditioners etc. Microcontrollers are nothing
without a Program in it.
One of the important part in making an embedded system is loading the software/program we
develop into the microcontroller. Usually it is called “burning software” into the controller.
Before “burning a program” into a controller, we must do certain prerequisite operations with
the program. This includes writing the program in assembly language or C language in a text
editor like notepad, compiling the program in a compiler and finally generating the hex code
from the compiled program. Earlier people used different softwares/applications for all these
3 tasks. Writing was done in a text editor like notepad/WordPad, compiling was done using a
separate software (probably a dedicated compiler for a particular controller like 8051),
converting the assembly code to hex code was done using another software etc. It takes lot of
time and work to do all these separately, especially when the task involves lots of error
debugging and reworking on the source code.
Keil MicroVision is free software which solves many of the pain points for an embedded
program developer. This software is an integrated development environment (IDE), which
integrated a text editor to write programs, a compiler and it will convert your source code to
hex files too.
Here is simple guide to start working with Keil uVision which can be used for
Writing programs in C/C++ or Assembly language
Compiling and Assembling Programs
Debugging program
Creating Hex and Axf file
Testing your program without Available real Hardware (Simulator Mode)
This is simple guide on Keil uVision 4 though also applicable on previous versions also.
These are the simple steps to get off the mark your inning!
Step 1: After opening Keil uV4, Go to Project tab and
Create new uVision project
Now Select new folder and give name to Project.
Step 2: After Creating project now Select your device model. Example.NXP-LPC2148
[You can change it later from project window.]
Step 3: so now your project is created and Message window will appear to add startup file of
your Device click on Yes so it will be added to your project folder
Step 4: Now go to File and create new file and save it with .C extension if you will write
program in C language or save with .asm for assembly language.
i.e., Led.c
Step 5: Now write your program and save it again. You can try example given at end of this
tutorial.
Step 6: After that on left you see project window [if it’s not there….go to View tab and click
on project window]
Now come on Project window.
Right click on target and click on options for target
Here you can change your device also.
Click output tab here & check create Hex file if you want to generate hex file
Now click on ok so it will save changes
Step 7: Now Expand target and you will see source group
Right click on group and click on Add files to source group
Now add your program file which you have written in C/assembly.
Step 8: Now Click on Build target.You can find it under Project tab or in toolbar.It can also
be done by pressing F7 key.
Step 9: you can see Status of your program in Build output window
[If it’s not there go to view and click on Build output window]
As we saw in Chapter 3, control of the 8051 ports is carried out using 8-bit latches (SFRs).
We can send some data to Port 1 as follows:
After the 8051 microcontroller is reset, the port latches all have the value 0xFF (11111111 in
binary): that is, all the port-pin latches are set to values of ‘1’. It is tempting to assume that
writing data to the port is therefore unnecessary, and that we can get away with the following
version:
The problem with this code is that, in simple test programs it works: this can lull the
developer into a false sense of security. If, at a later date, someone modifies the program to
include a routine for writing to all or part of the same port, this code will not generally work
as required:
In most cases, initialization functions are used to set the port pins to a known state at the start
of the program. Where this is not possible, it is safer to always write ‘1’ to any port pin
before reading from it.
;Toggle all bits of continuously.
MOV A,#55
BACK: MOV P2,A
ACALL DELAY
CPL A ;complement(inv) reg.A
SJMP BACK
We might also have input and output devices connected to the other pins on Port 1.
These pins may be used by totally different parts of the same system, and the code to
access them may be produced by other team members, or other companies.
It is therefore essential that we are able to read-from or write-to individual port pins
without altering the values of other pins on the same port.
We provided a simple example to illustrates how we can read from Pin 1.1, and write
to Pin 2.1, without disrupting any other pins on this (or any other) port.
#include<reg51.h>
sbit Led = P2^1; //pin connected to toggle Led
sbit Switch =P1^1; //Pin connected to toggle led
void main(void)
{
Led = 0; //configuring as output pin
Switch = 1; //Configuring as input pin
while(1) //Continuous monitor the status of the switch.
{
if(Switch == 0)
{
Led =1; //Led On
}
else
{
Led =0; //Led Off
}
}
return 0;
}
SWITCH BOUNCE:
In an ideal world, this change in voltage obtained by connecting a switch to the port pin of an
8051 microcontroller would take the form illustrated in Figure 4.8 (top). In practice, all
mechanical switch contacts bounce (that is, turn on and off, repeatedly, for a short period of
time) after the switch is closed or opened. As a result, the actual input waveform looks more
like that shown in Figure 4.8 (bottom). Usually, switches bounce for less than 20 ms:
however large mechanical switches exhibit bounce behaviour for 50 ms or more.
When you turn on the lights in your home or office with a mechanical switch, the switches
will bounce. As far as humans are concerned, this bounce is imperceptible.
However, as far as the microcontroller is concerned, each ‘bounce’ is equivalent to one press
and release of an ‘ideal’ switch. Without appropriate software design, this can give rise to a
number of problems, not least:
APPLICATIONS:
Program:
#include<reg51.h> // special function register declarations
sbit LED = P2^0; // Defining LED pin
void Delay(void); // Function prototype declaration
void main (void)
{
while(1) // infinite loop
{
LED = 0; // LED ON
Delay();
LED = 1; // LED OFF
Delay();
}
}
void Delay(void)
{
int j;
int i;
for(i=0;i<10;i++)
{
for(j=0;j<10000;j++)
{
}
}
}
Program:
#include<REG51.H>
#define LEDPORT P1
void delay(unsigned int);
void main(void)
{
LEDPORT =0x00;
while(1)
{
LEDPORT = 0X00;
delay(250);
LEDPORT = 0xff;
delay(250);
}
}
void delay(unsigned int itime)
{
unsigned int i,j;
for(i=0;i<itime;i++)
{
for(j=0;j<250;j++);
}
}
4X4 MATRIX KEYPAD INTERFACING WITH 8051 MICROCONTROLLER:
Keypads/Keyboards are widely used input devices being used in various electronics and
embedded projects. They are used to take inputs in the form of numbers and alphabets, and
feed the same into system for further processing. In this tutorial we are going to interface a
4x4 matrix keypad/Keyboard with 8051 microcontroller.
Before we interface the keypad with microcontroller, first we need to understand how it
works. Matrix keypad consists of set of Push buttons, which are interconnected. Like in our
case we are using 4X4 matrix keypad, in which there are 4 push buttons in each of four rows.
And the terminals of the push buttons are connected according to diagram. In first row, one
terminal of all the 4 push buttons are connected together and another terminal of 4 push
buttons are representing each of 4 columns, same goes for each row. So we are getting 8
terminals to connect with a microcontroller.
Now the question is how to get the location of the pressed button? I am going to explain this
in below steps and also want you to look at the code:
1. First we have made all the Rows to Logic level 0 and all the columns to Logic level 1.
2. Whenever we press a button, column and row corresponding to that button gets shorted
and makes the corresponding column to logic level 0. Because that column becomes
connected (shorted) to the row, which is at Logic level 0. So we get the column no. See
main() function.
3. Now we need to find the Row no., so we have created four functions corresponding to each
column. Like if any button of column one is pressed, we call function row_finder1(), to find
the row no.
4. In row_finder1() function, we reversed the logic levels, means now all the Rows are 1 and
columns are 0. Now Row of the pressed button should be 0 because it has become connected
(shorted) to the column whose button is pressed, and all the columns are at 0 logic. So we
have scanned all rows for 0.
5. So whenever we find the Row at logic 0, means that is the row of pressed button. So now
we have column no (got in step 2) and row no., and we can print no. of that button using
lcd_data function.
Same procedure follows for every button press, and we are using while(1), to continuously
check, whether button is pressed or not.
Code:
#include<reg51.h>
#define display_port P2 //Data pins connected to port 2 on microcontroller
sbit rs = P3^0; //RS pin connected to pin 2 of port 3
sbit rw = P3^1; // RW pin connected to pin 3 of port 3
sbit e = P3^2; //E pin connected to pin 4 of port 3
if(R1==0)
lcd_data('7');
if(R2==0)
lcd_data('4');
if(R3==0)
lcd_data('1');
if(R4==0)
lcd_data('N');
}
if(R1==0)
lcd_data('8');
if(R2==0)
lcd_data('5');
if(R3==0)
lcd_data('2');
if(R4==0)
lcd_data('0');
}
void row_finder3() //Function for finding the row for column 3
{
R1=R2=R3=R4=1;
C1=C2=C3=C4=0;
if(R1==0)
lcd_data('9');
if(R2==0)
lcd_data('6');
if(R3==0)
lcd_data('3');
if(R4==0)
lcd_data('=');
}
if(R1==0)
lcd_data('%');
if(R2==0)
lcd_data('*');
if(R3==0)
lcd_data('-');
if(R4==0)
lcd_data('+');
}
void main()
{
lcd_init();
while(1)
{
msdelay(30);
C1=C2=C3=C4=1;
R1=R2=R3=R4=0;
if(C1==0)
row_finder1();
else if(C2==0)
row_finder2();
else if(C3==0)
row_finder3();
else if(C4==0)
row_finder4();
}
This is how to interface a seven segment LED display to an 8051 microcontroller. 7 segment
LED display is very popular and it can display digits from 0 to 9 and quite a few characters.
Knowledge about how to interface a seven segment display to a micro controller is very
essential in designing embedded systems. Seven segment displays are of two types, common
cathode and common anode.
In common cathode type , the cathode of all LEDs are tied together to a single terminal which
is usually labeled as ‘com‘ and the anode of all LEDs are left alone as individual pins
labeled as a, b, c, d, e, f, g & h (or dot) .
In common anode type, the anodes of all LEDs are tied together as a single terminal and
cathodes are left alone as individual pins.
Program:
In this, we will have brief discussion on how to interface 16×2 LCD module to P89V51RD2,
which is an 8051 family microcontroller. We use LCD display for the displaying messages in
a more interactive way to operate the system or displaying error messages etc. Interfacing
16×2 LCD with 8051 microcontroller is very easy if you understanding the working of LCD.
16×2 Liquid Crystal Display which will display the 32 characters at a time in two rows (16
characters in one row). Each character in the display is of size 5×7 pixel matrix.
PIN
NAME FUNCTION
NO
1 VSS Ground pin
2 VCC Power supply pin of 5V
3 VEE Used for adjusting the contrast commonly attached to the
potentiometer.
4 RS RS is the register select pin used to write display data to the LCD
(characters), this pin has to be high when writing the data to the
LCD. During the initializing sequence and other commands this pin
should low.
5 R/W Reading and writing data to the LCD for reading the data R/W pin
should be high (R/W=1) to write the data to LCD R/W pin should be
low (R/W=0)
7 DB0 DB0-DB7 Data pins for giving data(normal data like numbers
characters or command data) which is meant to be displayed
Program:
#include<reg51.h>
sbit rs=P3^0;
sbit rw=P3^1;
sbit en=P3^2;
void lcdcmd(unsigned char);
void lcddat (unsigned char);
void delay();
void main()
{
P2=0x00;
while(1)
{
lcdcmd(0x38);
delay();
lcdcmd(0x01);
delay();
lcdcmd(0x10);
delay();
lcdcmd(0x0c);
delay();
lcdcmd(0x81);
delay();
lcddat('I');
delay();
lcddat('A');
delay();
lcddat('R');
delay();
lcddat('E');
delay();
}
}
void lcdcmd(unsigned char val)
{
P2=val;
rs=0;
rw=0;
en=1;
delay();
en=0;
}
void lcddat(unsigned char val)
{
P2=val;
rs=1;
rw=0;
en=1;
delay();
en=0;
}
void delay()
{unsigned int i;
for(i=0;i<6000;i++);
}
This section will show how to interface a DAC (digital-to-analog converter) to the 8051.
Then we demonstrate how to generate a sine wave on the scope using the DAC.
Therefore, an 8-input DAC such as the DAC0808 provides 256 discrete voltage (or current)
levels of output. Similarly, the 12-bit DAC provides 4096 discrete voltage levels. There are
also
16-bit DACs, but they are more expensive.
𝐷7 𝐷6 𝐷5 𝐷4 𝐷3 𝐷2 𝐷1 𝐷0
𝐼𝑜𝑢𝑡 = 𝐼𝑟𝑒𝑓 ( 2 + + + 16 + 32 + 64 + 128 + 256)
4 8
Where DO is the LSB, D7 is the MSB for the inputs, and Iref is the input current that must be
applied to pin 14. The Iref current is generally set to 2.0 mA. Figure shows the generation of
current reference (setting Iref = 2 mA) by using the standard 5-V power supply and IK and
1.5K-ohm standard resistors. Some DACs also use the zener diode (LM336), which
overcomes any fluctuation associated
Ideally we connect the output pin Iout to a resistor, convert this current to voltage, and monitor
the output on the scope. In real life, however, this can cause inaccuracy since the input
resistance of the load where it is connected will also affect the output voltage. For this reason,
the Iref current output is isolated by connecting it to an op-amp such as the 741 with Rf = 5K
ohms for the feedback resistor. Assuming that R = 5K ohms, by changing the binary input,
the output voltage changes.
Vout of DAC for various angles is calculated and shown in Table 13-7. See Example 13-5 for
verification of the calculations.
0 0 5 128
30 0.5 7.5 192
60 0.866 9.33 238
90 1.0 10 255
120 0.866 9.33 238
150 0.5 7.5 192
180 0 5 128
210 -0.5 2.5 64
240 -0.866 0.669 17
270 -1.0 0 0
300 -0.866 0.669 17
330 -0.5 2.5 64
360 0 5 128
Program:
#include <reg51.h>
sfr DACDATA = Pl;
void main ()
{
unsigned char WAVEVALUE [12]={128,192,238,255, 238,192,128,64, 17,0,17,64} ;
unsigned char x ,
while (1)
{
for(x=0;x<12;x++)
{
DACDATA = WAVEVALUE[x];
}
}
}
Figure: Angle vs. Voltage Magnitude for Sine Wave
The advantage of interrupts is that the microcontroller can serve many devices (not all at the
same time, of course); each device can get the attention of the microcontroller based on the
priority assigned to it.
The polling method cannot assign priority since it checks all devices in a round robin fashion.
• Its bit sequence and their meanings are shown in the following figure.
It disables all interrupts.
EA IE.7 When EA = 0 no interrupt will be acknowledged and
When EA = 1 enables the interrupt individually.
Since IBM PC/compatible computers are so widely used to communicate with 8051-based
systems, serial communications of the 8051 with the COM port of the PC will be
emphasized. To allow data transfer between the PC and an 8051 system without any error,
we must make sure that the baud rate of the 8051 system matches the baud rate of the PC‟s
COM port.
The 8051 transfers and receives data serially at many different baud rates. Serial
communications of the 8051 is established with PC through the COM port. It must make sure
that the baud rate of the 8051 system matches the baud rate of the PC's COM port/ any
system to be interfaced. The baud rate in the 8051 is programmable. This is done with the
help of Timer. When used for serial port, the frequency of timer tick is determined by
(XTAL/12)/32 and 1 bit is transmitted for each timer period (the time duration from timer
start to timer expiry).
The Relationship between the crystal frequency and the baud rate in the 8051 is that the 8051
divides the crystal frequency by 12 to get the machine cycle frequency which is shown in
figure1. Here the oscillator is XTAL = 11.0592 MHz, the machine cycle frequency is 921.6
kHz. 8051's UART divides the machine cycle frequency of 921.6 kHz by 32 once more
before it is used by Timer 1 to set the baud rate. 921.6 kHz divided by 32 gives 28,800 Hz.
Timer 1 must be programmed in mode 2, that is 8-bit, auto-reload.
In serial communication if data transferred with a baud rate of 9600 and XTAL used is
11.0592 then following is the steps followed to find the TH1 value to be loaded.
Baud rate is selected by timer1 and when Timer 1 is used to set the baud rate it must be
programmed in mode 2 that is 8-bit, auto-reload. To get baud rates compatible with the PC,
we must load TH1 with the values shown in Table 1.
Table.1 Timer 1 THI register values for different baud rates
The fifth bit is TB8 which is used by modes 2 and 3 for the 8-bit transmission. When mode 1
is used the pin TB8 should be cleared. The sixth bit RB8 is used by modes 2 and 3 for the
reception of bit 8. It is used by mode1 to store the stop bit. The seventh bit is TI which is the
Transmit Interrupt. When 8051 finishes the transfer of the 8-bit character, it sets TI to ''1'' to
indicate that it is ready to transfer the next character. The TI is raised at the beginning of the
stop bit. The last bit is the RI which is the receive interrupt. When 8051 receives a
character,the UART removes start bit and stop bit. The UART puts the 8-bit character in
SBUF. RI is set to „1‟ to indicate that a new byte is ready to be picked up in SBUF.RI is
raised halfway through the stop bit
Program to receive bytes of data serially, and put them in P2, set the baud rate at 9600,
8-bit data, and 1 stop bit:
MOV TMOD, #20H ; timer 1,mode 2(auto reload)
MOV TH1, #-3 ; 9600 baud rate
MOV SCON, #50H ; 8-bit, 1 stop, REN enabled
SETB TR1 ; start timer 1
HERE: JNB RI, HERE ; wait for char to come in
MOV A, SBUF ; saving incoming byte in A
MOV P2, A ; send to port 1
CLR RI ; get ready to receive next byte
SJMP HERE ; keep getting data
Importance of the RI flag bit:
It receives the start bit, next bit is the first bit of the character about to be received. When
the last bit is received, a byte is formed and placed in SBUF. when stop bit is received, it
makes RI = 1 indicating entire character byte has been received and can be read before
overwritten by next data. When RI=1, received byte is in the SBUF register, copy SBUF
contents to a safe place. After the SBUF contents are copied the RI flag bit must be
cleared to 0.
Increasing the baud rate:
PCON
It is 8-bit register. When 8051 is powered up, SMOD is zero. By setting the SMOD, baud rate
can be doubled. If SMOD = 0 (which is its value on reset), the baud rate is 1/64 the oscillator
frequency. If SMOD = 1, the baud rate is 1/32 the oscillator frequency.
The Operating System acts as a bridge between the user applications/tasks and the
underlying system resources through a set of system functionalities and services
User Applications
Application Programming
Interface (API)
Memory Management
Kernel Services
Process Management
Time Management
File System Management
I/O System Management
Device Driver
Interface
Underlying Hardware
Figure 1: The Architecture of Operating System
The Kernel:
It is responsible for managing the system resources and the communication among
the hardware and other system services
Kernel acts as the abstraction layer between system resources and user
applications
For a general purpose OS, the kernel contains different services like
Process Management
Primary Memory Management
File System management
I/O System (Device) Management
Secondary Storage Management
Protection
Time management
Interrupt Handling
Kernel Space and User Space:
The memory space at which the kernel code is located is known as ‘Kernel Space’
All user applications are loaded to a specific area of primary memory and this
memory area is referred as ‘User Space’
The partitioning of memory into kernel and user space is purely Operating System
dependent
An operating system with virtual memory support, loads the user applications into its
corresponding virtual memory space with demand paging technique. Most of the
operating systems keep the kernel application code in main memory and it is not
swapped out into the secondary memory
Monolithic Kernel:
All kernel modules run within the same memory space under a single kernel thread
The tight internal integration of kernel modules in monolithic kernel
architecture allows the effective utilization of the low-level features of the
underlying system
The major drawback of monolithic kernel is that any error or failure in any one
of the kernel modules leads to the crashing of the entire kernel application
Microkernel:
The microkernel design incorporates only the essential set of Operating System services into
the kernel.
The rest of the Operating System services are implemented in programs known as
'Servers' which runs in user space.
The kernel design is highly modular provides OS-neutral abstraction.
Memory management, process management, timer systems and interrupt handlers are
examples of essential services, which forms the part of the microkernel.
Examples for microkernel: QNX, Minix 3 kernels.
Benefits of Microkernel:
Robustness: If a problem is encountered in any services in server can reconfigured and
re-started without the need for re-starting the entire OS.
Configurability: Any services , which run as ‘server’ application can be changed
without need to restart the whole system.
ii. The kernel is more generalized and contains all the required servicesto execute
generic applications
iv. May inject random delays into application software and thus cause slow
responsiveness of an application at unexpected times
vi. Personal Computer/Desktop system is a typical example for a system where GPOSs
are deployed.
vii. Windows XP/MS-DOS etc are examples of General Purpose Operating System
2. Real Time Purpose Operating System (RTOS):
i. Operating Systems, which are deployed in embedded systems demanding real-
time response
ii. Deterministic in execution behavior. Consumes only known amount of time for
kernel applications
v. Windows CE, QNX, VxWorks , MicroC/OS-II etc are examples of Real Time
Operating Systems (RTOS)
The Real Time Kernel: The kernel of a Real Time Operating System is referred as Real
Time kernel. In complement to the conventional OS kernel, the Real Time kernel is highly
specialized and it contains only the minimal set of services required for running the user
applications/tasks. The basic functions of a Real Time kernel are
a) Task/Process management
b) Task/Process scheduling
c) Task/Process synchronization
d) Error/Exception handling
e) Memory Management
f) Interrupt handling
g) Time management
Real Time Kernel Task/Process Management: Deals with setting up the memory space
for the tasks, loading the task’s code into the memory space, allocating system resources,
setting up a Task Control Block (TCB) for the task and task/process termination/deletion.
A Task Control Block (TCB) is used for holding the information corresponding to a task.
TCB usually contains the following set of information
Task Type: Task type. Indicates what is the type for this task. The task can be a
hard real time or soft real time or background task.
Task Priority: Task priority (E.g. Task priority =1 for task with priority = 1)
Task Context Pointer: Context pointer. Pointer for context saving
Task Memory Pointers: Pointers to the code memory, data memoryand stack
memory for the task
Task Pointers: Pointers to other TCBs (TCBs for preceding, next and waiting
tasks)
Task/Process Scheduling: Deals with sharing the CPU among various tasks/processes. A
kernel application called ‘Scheduler’ handles the task scheduling. Scheduler is nothing
but an algorithm implementation, which performs the efficient and optimal scheduling of
tasks to provide a deterministic behavior.
Since predictable timing and deterministic behavior are the primary focus for an
RTOS, RTOS achieves this by compromising the effectiveness of memory
allocation
RTOS generally uses ‘block’ based memory allocation technique, instead of the
usual dynamic memory allocation techniques used by the GPOS.
RTOS kernel uses blocks of fixed size of dynamic memory and the block is
allocated for a task on a need basis. The blocks are stored in a ‘Free buffer
Queue’.
Most of the RTOS kernels allow tasks to access any of the memory blocks without
any memory protection to achieve predictable timing and avoid the timing
overheads
RTOS kernels assume that the whole design is proven correct and protection is
unnecessary. Some commercial RTOS kernels allow memory protection as
optional and the kernel enters a fail-safe mode when an illegal memory access
occurs
A few RTOS kernels implement Virtual Memory concept for memory allocation if
the system supports secondary memory storage (like HDD and FLASH memory).
In the ‘block’ based memory allocation, a block of fixed memory is always
allocated for tasks on need basis and it is taken as a unit. Hence, there will not be
any memory fragmentation issues.
Interrupt Handling:
For synchronous interrupts, the interrupt handler runs in the same context of the
interrupting task.
The interrupts generated by external devices (by asserting the Interrupt line of the
processor/controller to which the interrupt line of the device is connected)
connected to the processor/controller, timer overflow interrupts, and serial data
reception / transmission interrupts etc are examples for asynchronous interrupts.
For asynchronous interrupts, the interrupt handler is usually written as
separate task (Depends on OS Kernel implementation) and it runs in a
different context. Hence, a context switch happens while handling the
asynchronous interrupts.
Priority levels can be assigned to the interrupts and each interrupts can be enabled
or disabled individually.
Time Management:
Accurate time management is essential for providing precise time reference for all
applications
The ‘Timer tick’ is taken as the timing reference by the kernel. The ‘Timer tick’
interval may vary depending on the hardware timer. Usually the ‘Timer tick’ varies
in the microseconds range
The time parameters for tasks are expressed as the multiples of the ‘Timer tick’
The System time is updated based on the ‘Timer tick’
If the System time register is 32 bits wide and the ‘Timer tick’ interval
232 * 10-6/ (24 * 60 * 60) = 49700 Days =~ 0.0497 Days = 1.19 Hours
If the ‘Timer tick’ interval is 1 millisecond, the System time register will reset in
The ‘Timer tick’ interrupt is handled by the ‘Timer Interrupt’ handler of kernel. The
‘Timer tick’ interrupt can be utilized for implementing the following actions.
Increment the System time register by one. Generate timing error and reset the
System time register if the timer tick count is greater than the maximum range
available for System time register
Update the timers implemented in kernel (Increment or decrement the timer registers
for each timer depending on the count direction setting for each register. Increment
registers with count direction setting = ‘count up’ and decrement registers with count
direction setting = ‘count down’)
Invoke the scheduler and schedule the tasks again based on the scheduling algorithm
Delete all the terminated tasks and their associated data structures (TCBs)
Load the context for the first task in the ready queue. Due to the re- scheduling, the
ready task might be changed to a new one from the task, which was pre-empted by the
‘Timer Interrupt’ task
Hard Real-time System:
A Real Time Operating Systems which strictly adheres to the timing constraints
for a task.
A Hard Real Time system must meet the deadlines for a task without any slippage
Missing any deadline may produce catastrophic results for Hard Real Time
Systems, including permanent data lose and irrecoverable damages to the
system/users
As a rule of thumb, Hard Real Time Systems does not implement the virtual
memory model for handling the memory. This eliminates the delay in swapping in
and out the code corresponding to the task to and from the primary memory
The presence of Human in the loop (HITL) for tasks introduces un- expected
delays in the task execution. Most of the Hard Real Time Systems are automatic
and does not contain a ‘human in the loop’
Missing deadlines for tasks are acceptable if the frequency of deadline missing
is within the compliance limit of the Quality of Service(QoS)
Soft Real Time systems most often have a ‘human in the loop (HITL)’
Automatic Teller Machine (ATM) is a typical example of Soft Real Time System. If
the ATM takes a few seconds more than the ideal operation time, nothing fatal
happens.
An audio video play back system is another example of Soft Real Time system. No
potential damage arises if a sample comes late by fraction of a second, for play
back.
In the Operating System context, a task is defined as the program in execution and
the related information maintained by the Operating system for the program
Concurrent execution is achieved through the sharing of CPU among the processes.
A process mimics a processor in properties and holds a set of registers, process status,
a Program Counter (PC) to point to the next executable instruction of the process, a
stack for holding the local variables associated with the process and the code
corresponding to the process
Process
A process, which inherits all
Stack
the properties of the CPU, (Stack Pointer)
can be considered as a virtual
Working Registers
processor, awaiting its turn to
have its properties switched Status Registers
into the physical processor
Program Counter (PC)
When the process gets its turn, its registers and Program counter register
becomes mapped to the physical registers of the CPU
Memory organization of Processes:
The memory occupied by the process is segregated into three regions namely; Stack
memory, Data memory and Code memory.
The Stack memory holds all temporary data such as variables local to the process
The Code memory contains the program code (instructions) corresponding to the
process
The cycle through which a process changes its state from ‘newly created’ to
‘execution completed’ is known as ‘Process Life Cycle’. The various states through
which a process traverses through during a Process Life Cycle indicates the current
status of the process with respect to time and also provides information on what it is
allowed to do next
Created State: The state at which a process is being created is referred as ‘Created
State’. The Operating System recognizes a process in the ‘Created State’ but no
resources are allocated to the process
Ready State: The state, where a process is incepted into the memory and awaiting
the processor time for execution, is known as ‘Ready State’. At this stage, the process
is placed in the ‘Ready list’ queue maintained by the OS
Running State: The state where in the source code instructions corresponding to the
process is being executed is called ‘Running State’. Running state is the state at
which the process execution happens
When a process changes its state from Ready to running or from running to
blocked or terminated or from blocked to running, the CPU allocation for the
process may also change
Threads
Thread Process
Thread is a single unit of execution and is part of Process is a program in execution and
process. contains one or more threads.
A thread does not have its own data memory and Process has its own code memory, data
heap memory. It shares the data memory and heap memory and stack memory.
memory with other threads of the same process.
A thread cannot live independently; it lives within A process contains at least one thread.
the process.
There can be multiple threads in a process. The first Threads within a process share the code, data
thread (main thread) calls the main function and and heap memory. Each thread holds
occupies the start of the stack memory of the separate memory area for stack (shares the
process. total stack memory of the process).
Threads are very inexpensive to create Processes are very expensive to create.
Involves many OS overhead.
Context switching is inexpensive and fast Context switching is complex and involves
lot of OS overhead and is comparatively
slower.
If a thread expires, its stack is reclaimed by the If a process dies, the resources allocated to it
process. are reclaimed by the OS and all the
associated threads of the process also dies.
Advantages of Threads:
1. Better memory utilization: Multiple threads of the same process share the address
space for data memory. This also reduces the complexity of inter thread
communication since variables can be shared across the threads.
3. Speeds up the execution of the process: The process is split into different threads,
when one thread enters a wait state, the CPU can be utilized by other threads of the
process that do not require the event, which the other thread is waiting, for processing.
Multiprocessor systems possess multiple CPUs and can execute multiple processes
simultaneously
The ability of the Operating System to have multiple programs in memory, which are
ready for execution, is referred as multiprogramming
Multitasking refers to the ability of an operating system to hold multiple processes in
memory and switch the processor (CPU) from executing one process to another
process
Context switching refers to the switching of execution context from task to other
During context switching, the context of the task to be executed is retrieved from the
saved context list. This is known as Context retrieval.
‘Context Switching’
‘Context Switching’
Time
• Non-preemptive Multitasking: The process/task, which is currently given the CPU time,
is allowed to execute until it terminates (enters the ‘Completed’ state) or enters the
‘Blocked/Wait’ state, waiting for an I/O. The co- operative and non-preemptive
multitasking differs in their behavior when they are in the ‘Blocked/Wait’ state. In co-
operative multitasking, the currently executing process/task need not relinquish the
CPU when it enters the ‘Blocked/Wait’ sate, waiting for an I/O, or a shared resource
access or an event to occur whereas in non-preemptive multitasking the currently
executing task relinquishes the CPU when it waits for an I/O.
Task Scheduling:
In a multitasking system, there should be some mechanism in place to share the CPU
among the different tasks and to decide which process/task is to be executed at a
given point of time
Scheduling policies forms the guidelines for determining which task is to be executed
when
The scheduling policies are implemented in an algorithm and it is run by the kernel as
a service
Depending on the scheduling policy the process scheduling decision may take place
when a process switches its state to
Allocates CPU time to the processes based on the order in which they enters the
‘Ready’ queue
Drawbacks:
Favors monopoly of process. A process, which does not contain any I/O
operation, continues its execution until it finishes its task
In general, FCFS favors CPU bound processes and I/O bound processes may have to
wait until the completion of CPU bound process, if the currently executing process is
a CPU bound process. This leads to poor device utilization.
The average waiting time is not minimal for FCFS scheduling algorithm
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time
10, 5, 7 milliseconds respectively enters the ready queue together in the order P1, P2, P3.
Calculate the waiting time and Turn around Time (TAT) for each process and the Average
waiting time and Turn Around Time (Assuming there is no I/O waiting for the processes).
P1 P2 P3
0 10 15 22
10 5 7
Assuming the CPU is readily available at the time of arrival of P1, P1 starts executing
without any waiting in the ‘Ready’ queue. Hence the waiting time for P1 is zero.
Waiting Time for P3 = 15 ms (P3 starts executing after completing P1 and P2)
Average waiting time = (Waiting time for all processes) / No. of Processes
Do-)
Average Turn around Time= (Turn around Time for all processes) / No. of Processes
= (Turn Around Time for (P1+P2+P3)) / 3
= (10+15+22)/3 = 47/3
= 15.66 milliseconds
Allocates CPU time to the processes based on the order in which theyare entered
in the ‘Ready’ queue
LCFS scheduling is also known as Last In First Out (LIFO) where the process, which
is put last into the ‘Ready’ queue, is serviced first
Drawbacks:
Favors monopoly of process. A process, which does not contain any I/O operation,
continues its execution until it finishes its task
In general, LCFS favors CPU bound processes and I/O bound processes may have to
wait until the completion of CPU bound process, if the currently executing process is
a CPU bound process. This leads to poor device utilization.
The average waiting time is not minimal for LCFS scheduling algorithm
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time
10, 5, 7 milliseconds respectively enters the ready queue together in the order P1, P2, P3
(Assume only P1 is present in the ‘Ready’ queue when the scheduler picks up it and P2, P3
entered ‘Ready’ queue after that). Now a new process P4 with estimated completion time 6ms
enters the ‘Ready’ queue after 5ms of scheduling P1. Calculate the waiting time and Turn
around Time (TAT) for each process and the Average waiting time and Turn around Time
(Assuming there is no I/O waiting for the processes).Assume all the processes contain only
CPU operation and no I/O operations are involved.
Solution: Initially there is only P1 available in the Ready queue and the scheduling sequence
will be P1, P3, P2. P4 enters the queue during the execution of P1 and becomes the last
process entered the ‘Ready’ queue. Now the order of execution changes to P1, P4, P3, and P2
as given below.
P1 P4 P3 P2
0 10 16 23 28
10 6 7 5
The waiting time for all the processes are given as Waiting
Waiting Time for P4 = 5 ms (P4 starts executing after completing P1. But P4 arrived after
5ms of execution of P1. Hence its waiting time = Execution start time
– Arrival Time = 10-5 = 5)
Waiting Time for P3 = 16 ms (P3 starts executing after completing P1 and P4)
Waiting Time for P2 = 23 ms (P2 starts executing after completing P1, P4 and P3)
Average waiting time = (Waiting time for all processes) / No. of Processes
= (0 + 5 + 16 + 23)/4 = 44/4
= 11 milliseconds
Turn around Time (TAT) for P1 = 10 ms (Time spent in Ready Queue + Execution Time)
Turn around Time (TAT) for P3 = 23 ms (Time spent in Ready Queue + Execution Time)
Turn Around Time (TAT) for P2 = 28 ms (Time spent in Ready Queue + Execution Time)
Average Turn Around Time = (Turn Around Time for all processes) / No. of Processes
= (Turn Around Time for (P1+P4+P3+P2)) / 4
= (10+11+23+28)/4 = 72/4
= 18 milliseconds
Non-preemptive scheduling – Shortest Job First (SJF) Scheduling.
Allocates CPU time to the processes based on the execution completion time for tasks
The average waiting time for a given set of processes is minimal in SJF scheduling
A process whose estimated execution completion time is high may not get a chance to
execute if more and more processes with least estimated execution time enters the
‘Ready’ queue before the process with longest estimated execution time starts its
execution
May lead to the ‘Starvation’ of processes with high estimated completion time
Difficult to know in advance the next shortest process in the ‘Ready’ queue for
scheduling since new processes with different estimated execution time keep entering
the ‘Ready’ queue at any point of time.
The priority of a task is expressed in different ways, like a priority number, the time
required to complete the execution etc.
In number based priority assignment the priority is a number ranging from 0 to the
maximum priority supported by the OS. The maximum level of priority is OS
dependent.
Windows CE supports 256 levels of priority (0 to 255 priority numbers, with 0 being
the highest priority)
The priority is assigned to the task on creating it. It can also be changed dynamically
(If the Operating System supports this feature)
The non-preemptive priority based scheduler sorts the ‘Ready’ queue based on the
priority and picks the process with the highest level of priority for execution.
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time
10, 5, 7 milliseconds and priorities 0, 3, 2 (0- highest priority, 3 lowest priority) respectively
enters the ready queue together. Calculate the waiting time and Turn Around Time (TAT) for
each process and the Average waiting time and Turn Around Time (Assuming there is no I/O
waiting for the processes) in priority based scheduling algorithm.
Solution: The scheduler sorts the ‘Ready’ queue based on the priority and schedules the
process with the highest priority (P1 with priority number 0) first and the next high priority
process (P3 with priority number 2) as second and so on. The order in which the processes
are scheduled for execution is represented as
P1 P3 P2
0 10 17 22
10 7 5
The waiting time for all the processes are given as Waiting
Time for P1 = 0 ms (P1 starts executing first)
Waiting Time for P3 = 10 ms (P3 starts executing after completing P1)
Waiting Time for P2 = 17 ms (P2 starts executing after completing P1 and P3)
Average waiting time = (Waiting time for all processes) / No. of Processes
= (Waiting time for (P1+P3+P2)) / 3
= (0+10+17)/3 = 27/3
= 9 milliseconds
Turn Around Time (TAT) for P1 = 10 ms (Time spent in Ready Queue + Execution Time)
= (10+17+22)/3 = 49/3
= 16.33 milliseconds
Drawbacks:
The technique of gradually raising the priority of processes which are waiting in the
‘Ready’ queue as time progresses, for preventing ‘Starvation’, is known as ‘Aging’.
Preemptive scheduling:
Every task in the ‘Ready’ queue gets a chance to execute. When and how often each
process gets a chance to execute (gets the CPU time) is dependent on the type of
preemptive scheduling algorithm used for scheduling the processes
The scheduler can preempt (stop temporarily) the currently executing task/process
and select another task from the ‘Ready’ queue for execution
When to pre-empt a task and which task is to be picked up from the ‘Ready’ queue for
execution after preempting the current task is purely dependent on the scheduling
algorithm
A task which is preempted by the scheduler is moved to the ‘Ready’ queue. The act of
moving a ‘Running’ process/task into the ‘Ready’ queue by the scheduler, without the
processes requesting for it is known as‘Preemption’
The non preemptive SJF scheduling algorithm sorts the ‘Ready’ queue only after the
current process completes execution or enters wait state, whereas the preemptive SJF
scheduling algorithm sorts the ‘Ready’ queue when a new process enters the ‘Ready’
queue and checks whether the execution time of the new process is shorter than the
remaining of the total estimated execution time of the currently executing process
If the execution time of the new process is less, the currently executing process is
preempted and the new process is scheduled for execution
Always compares the execution completion time (ie the remaining execution time for
the new process) of a new process entered the ‘Ready’ queue with the remaining time
for completion of the currently executing process and schedules the process with
shortest remaining time for execution.
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time
10, 5, 7 milliseconds respectively enters the ready queue together. A new process P4 with
estimated completion time 2ms enters the ‘Ready’ queue after 2ms. Assume all the processes
contain only CPU operation and no I/O operations are involved.
Solution: At the beginning, there are only three processes (P1, P2 and P3) available in the
‘Ready’ queue and the SRT scheduler picks up the process with the Shortest remaining time
for execution completion (In this example P2 with remaining time 5ms) for scheduling. Now
process P4 with estimated execution completion time 2ms enters the ‘Ready’ queue after 2ms
of start of execution of P2. The processes are re-scheduled for execution in the following
order
P2 P4 P2 P3 P1
0 2 4 7 14 24
2 2 3 7 10
Waiting Time for P2 = 0 ms + (4 -2) ms = 2ms (P2 starts executing first and is
interrupted by P4 and has to wait till the completion of P4 to
get the next CPU slot)
Waiting Time for P4 = 0 ms (P4 starts executing by preempting P2 since the
execution time for completion of P4 (2ms) is less than that
of the Remaining time for execution completion of P2
(Here it is 3ms))
Waiting Time for P3 = 7 ms (P3 starts executing after completing P4 and P2)
Waiting Time for P1 = 14 ms (P1 starts executing after completing P4, P2 and P3)
Average waiting time = (Waiting time for all the processes) / No. of Processes
= (0 + 2 + 7 + 14)/4 = 23/4
= 5.75 milliseconds
Turn around Time (TAT) for P2 = 7 ms (Time spent in Ready Queue + Execution Time)
Turn Around Time (TAT) for P4 = 2 ms (Time spent in Ready Queue + Execution Time
= (Execution Start Time – Arrival Time) + Estimated Execution Time = (2-2) + 2)
Turn around Time (TAT) for P3 = 14 ms (Time spent in Ready Queue + Execution Time)
Turn around Time (TAT) for P1 = 24 ms (Time spent in Ready Queue +Execution Time)
Average Turn around Time = (Turn around Time for all the processes) / No. of Processes
The term Round Robin is very popular among the sports and games activities. You might
have heard about 'Round Robin' league or 'Knock out' league associated with any football or
cricket tournament. In the 'Round Robin' league each team in a group gets an equal chance
to play against the rest of the teams in the same group whereas in the 'Knock out' league the
losing team in a match moves out of the tournament .
In Round Robin scheduling, each process in the 'Ready' queue is executed for a pre-defined
time slot.
The execution starts with picking up the first process in the 'Ready' queue. It is executed for
a pre-defined time and when the pre-defined time elapses or the process completes (before
the pre-defined time slice), the next process in the 'Ready' queue is selected for execution.
This is repeated for all the processes in the 'Ready' queue. Once each process in the 'Ready'
queue is executed for the pre-defined time period, the scheduler comes back and picks the
first process in the 'Ready' queue again for execution.
The sequence is repeated. This reveals that the Round Robin scheduling is similar to the
FCFS scheduling and the only difference is that a time slice based preemption is added to
switch the execution between the processes in the `Ready' queue.
Figure: Round Robin Scheduling
Once each process in the ‘Ready’ queue is executed for the pre-defined time period,
the scheduler comes back and picks the first process in the ‘Ready’ queue again for
execution.
Round Robin scheduling is similar to the FCFS scheduling and the only difference is
that a time slice based preemption is added to switch the execution between the
processes in the ‘Ready’ queue
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time 6,
4, 2 milliseconds respectively, enters the ready queue together in the order P1, P2, P3.
Calculate the waiting time and Turn Around Time (TAT) for each process and the Average
waiting time and Turn Around Time (Assuming there is no I/O waiting for the processes) in
RR algorithm with Time slice= 2ms.
Solution: The scheduler sorts the ‘Ready’ queue based on the FCFS policy and picks up the
first process P1 from the ‘Ready’ queue and executes it for the time slice 2ms. When the time
slice is expired, P1 is preempted and P2 is scheduled for execution. The Time slice expires
after 2ms of execution of P2. Now P2 is preempted and P3 is picked up for execution. P3
completes its execution within the time slice and the scheduler picks P1 again for execution
for the next time slice. This procedure is repeated till all the processes are serviced. The order
in which the processes are scheduled for execution is represented as
P1 P2 P3 P1 P2 P1
0 2 4 6 8 10 12
2 2 2 2 2 2
The waiting time for all the processes are given as
Waiting Time for P1 = 0 + (6-2) + (10-8) = 0+4+2= 6ms (P1 starts executing first
and waits for two time slices to get execution back and
again 1 time slice for getting CPU time)
Waiting Time for P2 = (2-0) + (8-4) = 2+4 = 6ms (P2 starts executing after P1
executes for 1 time slice and waits for two time
slices to get the CPU time)
Waiting Time for P3 = (4 -0) = 4ms (P3 starts executing after completing the first time
slices for P1 and P2 and completes its execution in a single time slice.)
Average waiting time = (Waiting time for all the processes) / No. of Processes
= (6+6+4)/3 = 16/3
= 5.33 milliseconds
Turn around Time (TAT) for P1 = 12 ms (Time spent in Ready Queue + Execution Time)
Average Turn around Time = (Turn around Time for all the processes) / No. of Processes
= (12+10+6)/3 = 28/3
= 9.33 milliseconds.
Preemptive scheduling – Priority based Scheduling
Same as that of the non-preemptive priority based scheduling except for the switching
of execution between tasks
In preemptive priority based scheduling, any high priority process entering the
‘Ready’ queue is immediately scheduled for execution whereas in the non-preemptive
scheduling any high priority process entering the ‘Ready’ queue is scheduled only
after the currently executing process completes its execution or only when it
voluntarily releases the CPU
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time
10, 5, 7 milliseconds and priorities 1, 3, 2 (0- highest priority, 3 lowest priority) respectively
enters the ready queue together. A new process P4 with estimated completion time 6ms and
priority 0 enters the ‘Ready’ queue after 5ms of start of execution of P1. Assume all the
processes contain only CPU operation and no I/O operations are involved.
Solution: At the beginning, there are only three processes (P1, P2 and P3) available in the
‘Ready’ queue and the scheduler picks up the process with the highest priority (In this
example P1 with priority 1) for scheduling. Now process P4 with estimated execution
completion time 6ms and priority 0 enters the ‘Ready’ queue after 5ms of start of execution of
P1. The processes are re-scheduled for execution in the following order
P1 P4 P1 P3 P2
0 5 11 16 23 28
5 6 5 7 5
The waiting time for all the processes are given as
Waiting Time for P1 = 0 + (11-5) = 0+6 =6 ms (P1 starts executing first and gets
Preempted by P4 after 5ms and again gets the CPU time after
completion of P4)
Waiting Time for P3 = 16 ms (P3 starts executing after completing P1 and P4)
Waiting Time for P2 = 23 ms (P2 starts executing after completing P1, P4 and P3)
Average waiting time = (Waiting time for all the processes) / No. of Processes
= (6 + 0 + 16 + 23)/4 = 45/4
= 11.25 milliseconds
Turn Around Time (TAT) for P1 = 16 ms (Time spent in Ready Queue + Execution Time)
Turn Around Time (TAT) for P4 = 6ms (Time spent in Ready Queue + Execution Time
= (Execution Start Time – Arrival Time) + Estimated Execution Time = (5-5) + 6 = 0 + 6)
Turn Around Time (TAT) for P3 = 23 ms (Time spent in Ready Queue + Execution
Time) Turn Around Time (TAT) for P2 = 28 ms (Time spent in Ready Queue +
Execution Time) Average Turn Around Time= (Turn Around Time for all the processes) /
No. of Processes
= (16+6+23+28)/4 = 73/4
= 18.25 milliseconds
1. Functional
2. Non-functional requirements.
1. Functional Requirements:
1. Processor support:
2. Memory Requirements:
The RTOS requires ROM memory for holding the OS files and it is
normally stored in a non-volatile memory like FLASH.
3. Real-Time Capabilities:
It is not mandatory that the OS for all embedded systems need to be Real-
Time and all embedded OS’s are ‘Real-Time’ in behavior.
The kernel of the OS may disable interrupts while executing certain services and it
may lead to interrupt latency.
For an embedded system whose response requirements are high, this latency should
be minimal.
It is very useful if the OS supports modularization where in which the developer can
choose the essential modules and re-compile the OS image for functioning.
7. Support for Networking and Communication:
The OS kernel may provide stack implementation and driver support for a bunch of
communication interfaces and networking.
Ensure that the OS under consideration provides support for all the interfaces required
by the embedded product.
Certain OS’s include the run time libraries required for running applications written in
languages like JAVA and C++.
The OS may include these components as built-in component, if not; check the
availability of the same from a third party.
2. Non-Functional Requirements:
It may be possible to build the required features by customizing an open source OS.
The decision on which to select is purely dependent on the development cost, licensing
fees for the OS, development time and availability of skilled resources.
2. Cost:
The total cost for developing or buying the OS and maintaining it in terms of
commercial product and custom build needs to be evaluated before taking a decision on
the selection of OS.
Certain OS’s may be superior in performance, but the availability of tools for
supporting the development may be limited.
4. Ease of Use:
How easy it is to use a commercial RTOS is another important feature that needs to be
considered in the RTOS selection.
5. After Sales:
For a commercial embedded RTOS, after sales in the form of e-mail, on-call services
etc. for bug fixes, critical patch updates and support for production issues etc. should be
analyzed thoroughly.
3.2 TASK COMMUNICATION:
In a multitasking system, multiple tasks/processes run concurrently (in pseudo parallelism)
and each process may or may not interact between. Based on the degree of interaction, the
processes running on an OS are classified as,
1. Co-operating Processes: In the co-operating interaction model one process requires the
inputs from other processes to complete its execution.
2. Competing Processes: The competing processes do not share anything among themselves
but they share the system resources. The competing processes compete for the system
resources such as file, display device, etc.
Co-operation through Sharing: The co-operating process exchange data through some
shared resources.
Co-operation through Communication: No data is shared between the processes. But they
communicate for synchronization.
The mechanism through which processes/tasks communicate each other is known as “Inter
Process/Task Communication (IPC)”. Inter Process Communication is essential for process
co-ordination. The various types of Inter Process Communication (IPC) mechanisms adopted
by process are kernel (Operating System) dependent. Some of the important IPC mechanisms
adopted by various kernels are explained below.
3.2.1.1 Pipes:
'Pipe' is a section of the shared memory used by processes for communicating. Pipes follow
the client-server architecture. A process which creates a pipe is known as a pipe server and a
process which connects to a pipe is known as pipe client. A pipe can be considered as a
conduit for information flow and has two conceptual ends. It can be unidirectional, allowing
information flow in one direction or bidirectional allowing bi-directional information flow. A
unidirectional pipe allows the process connecting at one end of the pipe to write to the pipe
and the process connected at the other end of the pipe to read the data, whereas a bi-
directional pipe allows both reading and writing at one end. The unidirectional pipe can be
visualized as
Anonymous Pipes: The anonymous pipes-are unnamed, unidirectional pipes used for data
transfer between two processes.
Named Pipes: Named pipe is a named, unidirectional or bi-directional pipe for data
exchange between processes. Like anonymous pipes, the process which creates the named
pipe is known as pipe server. A process which connects to the named pipe is known as pipe
client.
With named pipes, any process can act as both client and server allowing point-to-point
communication. Named pipes can be used for communicating between processes running on
the same machine or between processes running on different machines connected to a
network.
Please refer to the Online Learning Centre for details on the Pipe implementation
under Windows Operating Systems.
Message Queue.
Mailbox.
Signaling.
3.2.2.1 Message Queue: Usually the process which wants to talk to another process posts the
message to a First-In-First-Out (FIFO) queue called 'Message queue', which stores the
messages temporarily in a system defined memory object, to pass it to the desired process
(Fig. 10.20). Messages are sent and received through send (Name of the process to which the
message is to be sent,-message) and receive (Name of the process from which the message is
to be received, message) methods. The messages are exchanged through a message queue.
The implementation of the message queue, send and receive methods are OS kernel
dependent. The Windows XP OS kernel maintains a single system message queue and one
process/thread (Process and threads are used interchangeably here, since thread is the basic
unit of process in windows) specific message queue. A thread which wants to communicate
with another thread posts the message to the system message queue. The kernel picks up the
message from the system message queue one at a time and examines the message for finding
the destination thread and then posts the message to the message queue of the corresponding
thread. For posting a message to a thread's message queue, the kernel fills a message structure
MSG and copies it to the message queue of the thread. The message structure MSG contains
the handle of the process/thread for which the message is intended, the message parameters,
the time at which the message is posted, etc. A thread can simply post a message to another
thread and can continue its operation or it may wait for a response from the thread to which
the message is posted. The messaging mechanism is classified into synchronous and
asynchronous based on the behaviour of the message posting thread. In asynchronous
messaging, the message posting thread just posts the message to the queue and it will not wait
for an acceptance (return) from the thread to which the message is posted, whereas in
synchronous messaging, the thread which posts a message enters waiting state and waits for
the message result from the thread to which the message is posted. The thread which invoked
the send message becomes blocked and the scheduler will not pick it up for scheduling. The
PostMessage (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM /Param) or
PostThreadMessage (DWORD idThread, UNT Msg, WPARAM wParam, LPARAM IParam)
API is used by a thread in Windows for posting a message to its own message queue or to the
message queue of another thread.
The PostMessage API does not always guarantee the posting of messages to message queue.
The PostMessage API will not post a message to the message queue when the message queue
is full. Hence it is recommended to check the return value of PostMessage API to confirm the
posting of message. The SendMessage (HWND hWnd, U1NT Msg, WPARAM wParam,
LPARAM 1Param) API call sends a message to the thread specified by the handle hWnd and
waits for the callee thread to process the message. The thread which calls the SendMessage
API enters waiting state and waits for the message result from the thread to which the
message is posted. The thread which invoked the SendMessage API call becomes blocked
and the scheduler will not pick it up for scheduling.
3.2.2.2 Mailbox:
The thread which creates the mailbox is known. as 'mailbox server' and the threads which
subscribe to the mailbox are known as 'mailbox clients'. The mailbox server posts messages
to the mailbox and notifies it to the clients which are subscribed to the mailbox. The clients
read the message from the mailbox on receiving the notification.
Figure: Concept of mailbox based indirect messaging for IPC.
The mailbox creation, subscription, message reading and writing are achieved through OS
kernel provided API calls. Mailbox and message queues are same in functionality. The only
difference is in the number of messages supported by them. Both of them are used for passing
data in the form of message(s) from a task to another task(s).
Mailbox is used for exchanging a single, message between two tasks or between an Interrupt
Service Routine (ISR) and a task. Mailbox associates a pointer pointing to the mailbox and a
wait list to hold the tasks waiting for a message to appear in the mailbox. The implementation
of mailbox is OS kernel dependent. The MicroC/OS-II implements mailbox as a mechanism
for inter-task communication.
3.2.2.3 Signaling:
On security front, RPC employs authentication mechanisms to protect the systems against
vulnerabilities. The client applications (processes)-should authenticate themselves with the
server for getting access. Authentication mechanisms like IDs, public-key cryptography, etc.
are used by the client for authentication. Without authentication, any client can access the
remote procedure. This may lead to potential security risks.
Sockets are used for RPC communication. The socket is a logical endpoint in a two-way
communication link between two applications running on a network. A port number is
associated with a socket so that the network layer of the communication channel can deliver
the data to the designated application. Sockets are of different types, namely, Internet sockets
(INET), UNIX sockets, etc. The INET socket works on internet communication protocol
TCP/IP, UDP (User Datagram Protocol), etc. are the communication protocols used by INET
sockets. INET sockets are classified into:
1. Stream sockets
2. Datagram sockets
Stream sockets are connection-oriented and they use TCP to establish liable connection. On
the other hand, Datagram sockets rely on UDP for establishing a connection. The UDP
connection is unreliable when compared to TCP. The client-server communication model
uses a socket at the client-side and a socket at the server-side. A port number is assigned to
both of these sockets. The client and server should be aware of the port number associated
with the socket. In order to start the communication, the client needs to send a connection
request to the server at the specified port number.
The client should be aware of the name of the server along with its port number. The server
always listens to the specified port number on the network. Upon receiving a connection
request from the client, based on the success of authentication, the server grants the
connection request and a communication channel is established between the client and server.
The client uses the hostname and port number of the server for sending requests and the
server uses the client's name and port number for sending responses.
3.3 TASK SYNCHRONISATION:
In a multitasking environment, multiple processes run concurrently (in pseudo parallelism)
and share the system resources. Apart from this, each process has its own boundary wall and
they communicate with each other with different IPC mechanisms including shared memory
and variables. Imagine a situation where two processes try to access display hardware
connected to the system or two processes try to access a shared memory area where one
process tries to write to a memory location when the other process is trying to read from this.
What could be the result in these scenarios? Obviously unexpected results. How these issues
can be addressed? The solution is, make each process aware of the access of a shared
resource either directly or indirectly. The act of making processes aware of the access of
shared resources by each process to avoid conflicts is known as `Task/Process
Synchronization'. Various synchronization issues may arise in a multitasking environment if
processes are not synchronized properly.
The following sections describe the major task communication/ synchronization issues
observed in multitasking and the commonly adopted synchronization techniques to overcome
these issues.
At the processor instruction level, the value of the variable counter is loaded to the
Accumulator register (EAX register). The memory variable counter is represented using a
pointer. The base pointer register (EBP register) is used for pointing to the memory variable
counter. After loading the contents of the variable-counter to the Accumulator, the
Accumulator content is incremented by one using the add instruction. Finally the content of
Accumulator is loaded to the memory location which represents the variable counter. Both
the processes Process A and Process B contain the program statement counter++; Translating
this into the machine instruction.
Imagine a situation where a process switching (context switching) happens from Process A to
Process B when Process A is executing the counter++; statement. Process A accomplishes the
counter++; statement through three different low-level instructions. Now imagine that the
process switching happened at the point where Process A executed the low-level instruction,
`mov eax,dword ptr [ebp-4]' and is about to execute the next instruction 'add eax,1'.
3.3.1.2 Deadlock:
A race condition produces incorrect results whereas a deadlock condition creates a situation
where none of the processes are able to make any progress in their execution, resulting in a
get of deadlocked processes. A situation very similar to our traffic jam issues in a junction.
In its simplest form 'deadlock' is the condition in which a process is waiting for a resource
held by another process which is waiting for a resource held by the first process.
Hold and Walt: The condition in which a process holds a shared resource by acquiring the
lock controlling the shared access and waiting for additional resources held by other
processes.
No Resource Preemption: The criteria that operating system cannot take back a resource
from a process which is currently holding it and the resource can only be released voluntarily
by the process holding it.
Circular Wait: A process is waiting for a resource which is currently held by another
process which in turn is waiting for a resource held by the first process. In general, there
exists a set of waiting process P0, P1, Pn with P0 is waiting for a resource held by P1 and P1
is waiting for a resource held P0, Pn is waiting for a resource held by P0 and P0 is waiting for
a resource held by Pn and so on... This forms a circular wait queue.
Deadlock Handling: A smart OS may foresee the deadlock condition and will act
proactively to avoid such a situation. Now if a deadlock occurred, how the OS responds to it?
The reaction to deadlock condition by OS is nonuniform. The OS may adopt any of the
following techniques to detect and prevent deadlock conditions.
(i).Ignore Deadlocks: Always assume that the system design is deadlock free. This is
acceptable for the reason the cost of removing a deadlock is large compared to the chance of
happening a deadlock. UNIX is an example for an OS following this principle. A life critical
system cannot pretend that it is deadlock free for any reason.
(ii). Detect and Recover: This approach suggests the detection of a deadlock situation and
recovery from it. This is similar to the deadlock condition that may arise at a traffic junction.
When the vehicles from different directions compete to cross the junction, deadlock (traffic
jam) condition is resulted. Once a deadlock (traffic jam) is happened at the junction, the only
solution is to back up the vehicles from one direction and allow the vehicles from opposite
direction to cross the junction. If the traffic is too high, lots of vehicles may have to be
backed up to resolve the traffic jam. This technique is also known as `back up cars' technique.
Operating systems keep a resource graph in their memory. The resource graph is updated on
each resource request and release.
Avoid Deadlocks: Deadlock is avoided by the careful resource allocation techniques by the
Operating System. It is similar to the traffic light mechanism at junctions to avoid the traffic
jams.
Prevent Deadlocks: Prevent the deadlock condition by negating one of the four conditions
favoring the deadlock situation.
• Ensure that a process does not hold any other resources when it requests a resource. This
can be achieved by implementing the following set of rules/guidelines in allocating resources
to processes.
1. A process must request all its required resource and the resources should be allocated
before the process begins its execution.
2. Grant resource allocation requests from processes only if the process does not hold a
resource currently.
• Ensure that resource preemption (resource releasing) is possible at operating system level.
This can be achieved by implementing the following set of rules/guidelines in resources
allocation and releasing.
1. Release all the resources currently held by a process if a request made by the
process for a new resource is not able to fulfil immediately.
2. Add the resources which are preempted (released) to a resource list describing the
resources which the process requires to complete its execution.
3. Reschedule the process for execution only when the process gets its old resources
and the new resource which is requested by the process.
Imposing these criterions may introduce negative impacts like low resource utilization and
starvation of processes.
Livelock: The Livelock condition is similar to the deadlock condition except that a process in
livelock condition changes its state with time. While in deadlock a process enters in wait state
for a resource and continues in that state forever without making any progress in the
execution, in a livelock condition a process always does something but is unable to make any
progress in the execution completion. The livelock condition is better explained with the real
world example, two people attempting to cross each other in a narrow corridor. Both the
persons move towards each side of the corridor to allow the opposite person to cross. Since
the corridor is narrow, none of them are able to cross each other. Here both of the persons
perform some action but still they are unable to achieve their target, cross each other. We will
make the livelock, the scenario more clear in a later section—The Dining Philosophers '
Problem, of this chapter.
Starvation: In the multitasking cont on is the condition in which a process does not get the
resources required to continue its execution for a long time. As time progresses the process
starves on resource. Starvation may arise due to various conditions like byproduct of
preventive measures of deadlock, scheduling policies favoring high priority tasks and tasks
with shortest execution time, etc.
Let's analyze the various scenarios that may occur in this situation.
Scenario 1: All the philosophers involve in brainstorming together and try to eat together.
Each philosopher picks up the left fork and is unable to proceed since two forks are required
for eating the spaghetti present in the plate. Philosopher 1 thinks that Philosopher 2 sitting to
the right of him/her will put the fork down and waits for it. Philosopher 2 thinks that
Philosopher 3' sitting to the right of him/her will
put the fork down and waits for it, and so on. This forms a circular chain of un-granted
requests. If the philosophers continue in this state waiting for the fork from the philosopher
sitting to the right of each, they will not make any progress in eating and this will result in
starvation of the philosophers and deadlock.
Scenario 2: All the philosophers start brainstorming together. One of the philosophers is
hungry and he/ she picks up the left fork. When the philosopher is about to pick up the right
fork, the philosopher sitting. to his right also become hungry and tries to grab the left fork
which is the right fork of his neighboring philosopher who is trying to lift it, resulting in a
'Race condition'..
Scenario 3: All the philosophers involve in brainstorming together and by to eat together.
Each philosopher picks up the left fork and is unable to proceed, since two forks are required
for eating the spaghetti present in the plate. Each of them anticipates that the adjacently
sitting philosopher will put his/her fork down and waits for a fixed duration grid after this
puts the fork down. Each of them again tries to lift the fork after a fixed duration of time.
Since all philosophers are trying to lift the fork at the same time, none of them will be able to
grab two forks. This condition leads to livelock and starvation of philosophers, where each
philosopher tries to do something, but they are unable to make any progress in achieving the
target.
Solution: We need to find out alternative solutions to avoid the.deadlock, livelock, racing
and starvation condition that may arise due to the concurrent access of forks by philosophers.
This situation can be handled in many ways by allocating the forks in different allocation
techniques including round Robin allocation, FIFO allocation: etc.
But the requirement is that the solution should be optimal, avoiding deadlock and starvation
of the philosophers and allowing maximum number of philosophers to eat at a time. One
solution that we could think of is:
• Imposing rules in accessing the forks by philosophers, like: The philosophers should put
down the fork he/she already have in hand (left fork) after waiting for a fixed duration for the
second fork (right fork) and should wait for a fixed time before making the next attempt.
This solution works fine to some extent, but, if all the philosophers try to lift the forks
at the same time, a livelock situation is resulted.
Another solution which gives maximum concurrency that can be thought of is each
philosopher ac-quires a semaphore (mutex) before picking up any fork. When a philosopher
feels hungry he/she checks whether the philosopher sitting to the left and right of him is
already using the fork, by checking the state of the associated semaphore. If the forks are in
use by the neighboring philosophers, the philosopher waits till the forks are available. A
philosopher when finished eating puts the forks down and informs the philosophers sitting to
his/her left and right, who are hungry (waiting for the forks), by signaling the semaphores
associated with the forks.
Figure: The 'Real Problems' in the 'Dining Philosophers problem' (a) Starvation
and Deadlock (b) Racing (c) Livelock and Starvation
We will discuss about semaphores and mutexes at a latter section of this chapter. In the
operating system context, the dining philosophers represent the processes and forks represent
the resources. The dining philosophers' problem is an analogy of processes competing for
shared resources and the different problems like racing, deadlock, starvation and livelock
arising from the competition.
1. 'Producer thread' is scheduled more frequently than the 'consumer thread': There are
chances for overwriting the data in the buffer by the 'producer thread'. This leads to
inaccurate data.
2. Consumer thread' is scheduled more frequently than the 'producer thread': There are
chances for reading the old data in the buffer again by the 'consumer thread'. This will also
lead to inaccurate data.
The output of the above program when executed on a Windows XP machine is shown in Fig.
10.29. The output shows that the consumer thread runs faster than the producer thread and
most often leads to buffer under-run and thereby inaccurate data.
Note
It should be noted that the scheduling of the threads 'producer_thread' ,and ‘consumer_thread’
is OS kernel scheduling policy dependent and you may not get the same output all the time
when you run this piece of code in Windows XP.
The producer-consumer problem can be rectified in various methods. One simple solution is
the `sleep and wake-up'. The 'sleep and wake-up' can be implemented in various process
synchronization techniques like semaphores, mutex, monitors, etc. We will discuss it in a
latter section of this chapter.
Figure: Output of win32 program illustrating producer-consumer problem
Imagine a situation where Process C is ready and is picked up for execution by the scheduler
and 'Process C' tries to access the shared variable 'X'. 'Process C' acquires the 'Semaphore S'
to indicate the other processes that it is accessing the shared variable 'X'. Immediately after
'Process C' acquires the 'Semaphore S', 'Process B' enters the 'Ready' state. Since 'Process B'
is of higher priority compared to 'Process C', 'Process C' is preempted, and 'Process B' starts
executing. Now imagine 'Process A' enters the 'Ready' state at this stage. Since 'Process A' is
of higher priority than 'Process B', 'Process B' is preempted, and 'Process A' is scheduled for
execution. 'Process A' involves accessing of shared variable 'X' which is currently being
accessed by 'Process C'. Since 'Process C' acquired the semaphore for signaling the access of
the shared variable 'X', 'Process A' will not be able to access it. Thus 'Process A' is put into
blocked state (This condition is called Pending on resource). Now 'Process B' gets the CPU
and it continues its execution until it relinquishes the CPU voluntarily or enters a wait state or
preempted by another high priority task. The highest priority process 'Process A' has to wait
till 'Process C' gets a chance to execute and release the semaphore. This produces unwanted
delay in the execution of the high priority task which is supposed to be executed immediately
when it was 'Ready'. Priority inversion may be sporadic in nature but can lead to potential
damages as a result f missing critical deadlines. Literally speaking, priority inversion 'inverts'
the priority of a high priority task with that of a low priority task. Proper workaround
mechanism should be adopted for handling the priority inversion problem. The commonly
adopted priority inversion workarounds are:
through a mutual exclusion mechanism like Binary Semaphore S. Imagine a situation where
Process C is ready and is picked up for execution by the scheduler and 'Process C' tries to
access the shared variable 'X'. 'Process C' acquires the 'Semaphore S' to indicate the other
processes that it is accessing the shared variable 'X'. Immediately after 'Process C' acquires
the 'Semaphore S', 'Process B' enters the 'Ready' state. Since 'Process B' is of higher priority
compared to 'Process C', 'Process C' is preempted, and 'Process B' starts executing. Now
imagine 'Process A' enters the 'Ready' state at this stage. Since 'Process A' is of higher priority
than 'Process B', 'Process B' is preempted, and 'Process A' is scheduled for execution. 'Process
A' involves accessing of shared variable 'X' which is currently being accessed by 'Process C'.
Since 'Process C' acquired the semaphore for signaling the access of the shared variable 'X',
'Process A' will not be able to access it. Thus 'Process A' is put into blocked state (This
condition is called Pending on resource). Now 'Process B' gets the CPU and it continues its
execution until it relinquishes the CPU voluntarily or enters a wait state or preempted by
another high priority task. The highest priority process 'Process A' has to wait till 'Process C'
gets a chance to execute and release the semaphore. This produces unwanted delay in the
execution of the high priority task which is supposed to be executed immediately when it was
'Ready'. Priority inversion may be sporadic in nature but can lead to potential damages as a
result f missing critical deadlines. Literally speaking, priority inversion 'inverts' the priority of
a high priority task with that of a low priority task. Proper workaround mechanism should be
adopted for handling the priority inversion problem. The commonly adopted priority
inversion workarounds are:
Priority Inheritance: A low-priority task that is currently accessing (by holding the lock) a
shared resource requested by a high-priority task temporarily 'inherits' the priority of that
high-priority task, from the moment the high-priority task raises the request. Boosting the
priority of the low priority task to that of the priority of the task which requested the shared
resource holding by the low priority task eliminates the preemption of the low priority task by
other tasks whose priority are below that of the task requested the shared resource 'and
thereby reduces the delay in waiting to get the resource requested by the high priority task.
The priority of the low priority task which is temporarily boosted to high is brought to the
original value when it releases the shared resource. Implementation of Priority inheritance
workaround in the priority inversion problem discussed for Process A, Process B and Process
C example will change the execution sequence as shown in Figure.
Figure: Handling Priority Inversion problem with priority Inheritance.
Priority inheritance is only a work around and it will not eliminate the delay in
waiting the high priority task to get the resource from the low priority task. The only thing is
that it helps the low priority task to continue its execution and release the shared resource as
soon as possible. The moment, at which the low priority task releases the shared resource, the
high priority task kicks the low priority task out and grabs the CPU - A true form of
selfishness. Priority inheritance handles priority inversion at the cost of run-time overhead at
scheduler. It imposes the overhead of checking the priorities of all tasks which tries to access
shared resources and adjust the priorities dynamically.
Priority Ceiling: In 'Priority Ceiling', a priority is associated with each shared resource. The
priority associated to each resource is the priority of the highest priority task which uses this
shared resource. This priority level is called 'ceiling priority'. Whenever a task accesses a
shared resource, the scheduler elevates the priority of the task to that of the ceiling priority of
the resource. If the task which accesses the shared resource is a low priority task, its priority
is temporarily boosted to the priority of the highest priority task to which the resource is also
shared. This eliminates the pre-emption of the task by other medium priority tasks leading to
priority inversion. The priority of the task is brought back to the original level once the task
completes the accessing of the shared resource. 'Priority Ceiling' brings the added advantage
of sharing resources without the need for synchronization techniques like locks. Since the
priority of the task accessing a shared resource is boosted to the highest priority of the task
among which the resource is shared, the concurrent access of shared resource is automatically
handled. Another advantage of 'Priority Ceiling' technique is that all the overheads are at
compile time instead of run-time. Implementation of 'priority ceiling' workaround in the
priority inversion problem discussed for Process A, Process B and Process C example will
change the execution sequence as shown in Figure.
Figure: Handling Priority Inversion problem with priority Ceiling.
The biggest drawback of 'Priority Ceiling' is that it may produce hidden priority inversion.
With 'Priority Ceiling' technique, the priority of a task is always elevated no matter another
task wants the shared resources. This unnecessary priority elevation always boosts the
priority of a low priority task to that of the highest priority tasks among which the resource is
shared and other tasks with priorities higher than that of the low priority task is not allowed to
preempt the low priority task when it is accessing a shared resource. This always gives the
low priority task the luxury of running at high priority when accessing shared resources.
3.3.2 Task Synchronization Techniques
So far we discussed about the various task/process synchronization issues encountered in
multitasking systems due to concurrent resource access. Now let's have a discussion on the
various techniques used for synchronization in concurrent access in multitasking.
Process/Task synchronization is essential for
The code memory area which holds the program instructions (piece of code) for accessing a
shared resource (like shared memory, shared variables, etc.) is known as 'critical section'. In
order to synchronize the access to shared resources, the access to the critical section should
be exclusive. The exclusive access to critical section of code is provided through mutual
exclusion mechanism. Let us have a look at how mutual exclusion is important in concurrent
access. Consider two processes Process A and Process B running on a multitasking system.
Process A is currently running and it enters its critical section. Before Process A completes its
operation in the critical section, the scheduler preempts Process A and schedules Process B
for execution (Process B is of higher priority compared to Process A). Process B also
contains the access to the critical section which is already in use by Process A. If Process B
continues its execution and enters the critical section which is already in use by Process A, a
racing condition will be resulted. A mutual exclusion policy enforces mutually exclusive
access of critical sections. Mutual exclusions can be enforced in different ways. Mutual
exclusion blocks a process. Based on the behaviour of the blocked process, mutual exclusion
methods can be classified into two categories. In the following section we will discuss them
in detail.
3.3.2.1 Mutual Exclusion through Busy Waiting/Spin Lock: 'Busy waiting' is the simplest
method for enforcing mutual exclusion. The following code snippet illustrates how 'Busy
waiting' enforces mutual exclusion.
The 'Busy waiting' technique uses a lock variable for implementing mutual exclusion. Each
process/ thread checks this lock variable before entering the critical section. The lock is set to
'1' by a process/ thread if the process/thread is already in its critical section; otherwise the
lock is set to '0'. The major challenge in implementing the lock variable based
synchronization is the non-availability of a single atomic instruction which combines the
reading, comparing and setting of the lock variable. Most often the three different operations
related to the locks, viz. the operation of Reading the lock variable, checking its present
value, and setting it are achieved with multiple low-level instructions. The low-level
implementation of these operations are dependent on the underlying processor instruction set
and the (cross) compiler in use. The low-level implementation of the 'Busy waiting' code
snippet, which we discussed earlier, under Windows XP operating system running on an Intel
Centrino Duo processor is given below. The code snippet is compiled with Microsoft Visual
Studio 6.0 compiler.
The assembly language instructions reveals that the two high level instructions
(while(bFlag==false); and bFlag=true;), corresponding to the operation of reading the lock
variable, checking its present value and setting it is implemented in the processor level using
six low level instructions. Imagine a situation where ‘Process 1' read the lock variable and
tested it and found that the lock is available and it is about to set the lock for acquiring the
critical section. But just before 'Process 1' sets the lock variable, 'Process 2' preempts 'Process
1' and starts executing. 'Process 2' contains a critical section code and it tests the lock variable
for its availability. Since 'Process 1' was unable to set the lock variable, its state is still '0' and
'Process 2' sets it and acquires the critical section. Now the scheduler preempts 'Process 2' and
schedules 'Process 1' before 'Process 2' leaves the critical section. Remember, `Process l' was
preempted at a point just before setting the lock variable (‘Process 1' has already tested the
lock variable just before it is preempted and found that the lock is available). Now 'Process 1'
sets the lock variable and enters the critical section. It violates the mutual exclusion policy
and may pro-duce unpredicted results.
Device Driver
Device driver is a piece of software that acts as a bridge between the operating system and
the hardware. In an operating system based product architecture, the user applications talk to
the Operating System kernel for all necessary information exchange including
communication with the hardware peripherals. The architecture of the OS kernel will not
allow direct device access from the user application. All the device related access should flow
through the OS kernel and the OS kernel mutes it to the concerned hardware peripheral. OS
provides interfaces in the form of Application Programming Interfaces (APIs) for accessing
the hardware. The device driver abstracts the hardware from user applications. The topology
of user applications and hardware interaction in an RTOS based system is depicted in Fig.
Device drivers are responsible for initiating and managing the communication with
the hardware peripherals. They are responsible for establishing the connectivity, initializing
the hardware (setting up various registers of the hardware device) and transferring data. An
embedded product may contain different types of hardware components like Wi-Fi module,
File systems, Storage device interface, etc. The initialization of these devices and the
protocols required for communicating with these devices may be different. All these
requirements are implemented in drivers and a single driver will not be able to satisfy all
these. Hence each hardware (more specifically each class of hardware) requires a unique
driver component.
Device Drivers
Hardware
Certain drivers come as part of the OS kernel and certain drivers need to be installed
on the fly. For example, the program storage memory for an embedded product, say NAND
Flash memory requires a NAND Flash driver to read and write data from/to it. This driver
should come as part of the OS kernel image. Certainly the OS will not contain the drivers for
all devices and peripherals under the Sun. It contains only the necessary drivers to
communicate with the onboard devices (Hardware devices which are part of the platform)
and for certain set of devices supporting standard protocols and device class (Say USB Mass
storage device or HID devices like Mouse/keyboard). If an external device, whose driver
software is not available with OS kernel image, is connected to the embedded device (Say a
medical device with custom USB class implementation is connected to the USB port of the
embedded product), the OS prompts the user to install its driver manually. Device drivers
which are part of the OS image are known as 'Built-in drivers' or 'On-board drivers'. These
drivers are loaded by the OS at the time of booting the device and are always kept in the
RAM. Drivers which need to be installed for accessing a device are known. as 'Installable
drivers'. These drivers are loaded by the OS on a need basis. Whenever the device is
connected, the OS loads the corresponding driver to memory. When the device is removed,
the driver is unloaded from memory. The Operating system maintains a record of the drivers
corresponding to each hardware.
It is very essential to know the hardware interfacing details like the memory address assigned
to the device, the Interrupt used, etc. of on-board peripherals for writing a driver for that
peripheral. It varies on the hardware design of the product. Some Real-Time operating
systems like 'Windows CE' support a layered architecture for the driver which separates out
the low level implementation from the OS specific interface. The low level implementation
part is generally known as Platform Dependent Device (PDD) layer. The OS specific
interface part is known as Model Device Driver (MDD) or Logical Device Driver (LDD). For
a standard driver, for a specific operating system, the MDD/LDD always remains the same
and only the PDD part needs to be modified according to the target hardware for a particular
class of devices.
Most of the time, the hardware developer provides the implementation for all on board
devices for a specific OS along with the platform. The drivers are normally shipped in the
form of Board Support Package. The Board Support Package contains low level driver
implementations for the onboard peripherals and OEM Adaptation Layer (OAL) for
accessing the various chip level functionalities and a bootloader for loading the operating
system. The OAL facilitates communication between the Operating System (OS) and the
target device and includes code to handle interrupts, timers, power management, bus
abstraction; generic I/O control codes (IOCTLs), etc. The driver files are usually in the form
of a dll file. Drivers can run on either user space or kernel space. Drivers which run in user
space are known as user mode drivers and the drivers which run in kernel space are known as
kernel mode drivers. User mode drivers are safer than kernel mode drivers. If an error or
exception occurs in a user mode driver, it won't affect the services of the kernel. On the other
hand, if an exception occurs in the kernel mode driver, it may lead to the kernel crash. The
way how a device driver is written and how the interrupts are handled in it are operating
system and target hardware specific. However regardless of the OS types, a device driver
implements the following:
Host and target machines, linker/locators for embedded software, getting embedded software
into the target system; Debugging techniques: Testing on host machine, using laboratory
tools, an example system.
----------------------------------------------------------------------------------------------------------------
• Host:
– A computer system on which all the programming tools run
– Where the embedded software is developed, compiled, tested, debugged,
optimized, and prior to its translation into target device.
• Target:
– After writing the program, compiled, assembled and linked, it is moved to target
– After development, the code is cross-compiled, translated – cross-assembled,
linked into target processor instruction set and located into the target.
Cross Compilers:
• A cross compiler that runs on host system and produces the binary
instructions that will be understood by your target microprocessor.
• Most desktop systems are used as hosts come with compilers, assemblers, linkers
that will run on the host. These tools are called native tools.
• A cross compiler that runs on host system and produces the binary instructions that
will be understood by your target microprocessor. This cross compiler is a program
which will do the above task. If we write C/C++ source code that could compile on
native compiler and run on host, we could compile the same source code through
cross compiler and make run it run on target also.
• That may not possible in all the cases since there is no problem with if, switch and
loops statements for both compilers but there may be an error with respect to the
following:
In Function declarations
The size may be different in host and target
Data structures may be different in two machines.
Ability to access 16 and 32 bit entries reside at two machines.
Sometimes cross compiler may warn an error which may not be warned by native
complier.
The figure shows the process of building software for an embedded system.
As you can see in figure the output files from each tool become the input files for
the next. Because of this the tools must be compatible with each other.
A set of tools that is compatible in this way is called tool chain. Tool chains that run
on various hosts and builds programs for various targets.
• Linker:
• Locator:
• produces target machine code (which the locator glues into the
RTOS) and the combined code (called map) gets copied into the
target ROM
Linking Process shown below:
• The native linker creates a file on the disk drive of the host system that is
read by a part of operating system called the loader whenever the user
requests to run the programs.
• The loader finds memory into which to load the program, copies the
program from the disk into the memory
• Address Resolution:
• Above Figure shows the process of building application software with native tools.
One problem in the tool chain must solve is that many microprocessor instructions
contain the addresses of their operands.
• the above figure MOVE instruction in ABBOTT.C will load the value of variable
idunno into register R1 must contain the address of the variable. Similarly CALL
instruction must contain the address of the whosonfirst. This process of solving
problem is called address resolution.
• When abbott.c file compiling,the compiler does not have any idea what the address
of idunno and whosonfirst() just it compiles both separately and leave them as
object files for linker.
• Now linker will decide that the address of idunno must be patched to whosonfirst()
call instructoin. When linker puts the two object files together, it figures out idunno
and whosonfirst() are in relation for execution and places in executable files.
• After loader copies the program into memory and exactly knows where idunno and
whosonfirst() are in memory. This whole process called as address resolution.
In most embedded systems there is no loader, when the locator is done then output will be
copied to target.
Therefore the locator must know where the program resides and fix up all memories.
Locators have mechanism that allows you to tell them where the program will be in the
target system. Locators use any number of different output file formats.
The tools you are using to load your program into target must understand whatever file
format your locator produces.
Another issue that locators must resolve in the embedded environment is that some parts of
the program need to end up in the ROM and some parts need to end up in RAM.
For example whosonfirst() end up in ROM and must be remembered even power is off. The
variable idunno would have to be in RAM, since it data may be changed.
This issue does not arise with application programming, because the loader copies the entire
program into RAM.
Most tools chains deal with this problem by dividing the programs into segments.
Each segment is a piece of program that the locator can place it in memory
independently of other segments.
Segments solve other problems like when processor power on, embedded system
programmer must ensure where the first instruction is at particular place with the help of
segments.
The linker/ Locator reshuffle these segments and places Z.asm start up code at
where processor begins its execution, it places code segment in ROM and data
segment in RAM. Most compilers automatically divide the module into two or
more segments: The instructions (code), uninitialized code, Initialized, Constant
strings. Cross assemblers also allow you to specify the segment or segments into
which the output from the assembler should be placed. Locator places the segments
in memory. The following two lines of instructions tells one commercial locator
how to build the program.
The –Z at the beginning of each line indicates that this line is a list of segments.
We can specify the address ranges of RAM and ROM, the locator will warn you
if program does not fit within those functions.
We can specify the address at which the segment is to end, then it will place the
segment below that address which is useful for stack memory.
We can assign each segment into group, and then tell the locator where the group
go and deal with individual segments.
Where the variable ifreq must be stored. In the above code, in the first case ifreq the initial
value must reside in the ROM (this is the only memory that stores the data while the
power is off).In the second case the ifreq must be in RAM, because setfreq () changes it
frequently.
The only solution to the problem is to store the variable in RAM and store the initial value
in ROM and copy the initial value into the variable at startup. Loader sees that each
initialized variable has the correct initial value when it loads the program. But there is no
loader in embedded system, so that the application must itself arrange for initial values to
be copied into variables.
The locator deals with this is to create a shadow segment in ROM that contains all of the
initial values, a segment that is copied to the real initialized - data segment at start up.
When an embedded system is powdered on the contents of the RAM are garbage. They
only become all zeros if some start up code in the embedded system sets them as zeros.
Locator Maps:
• Most locators will create an output file, called map, that lists where the
locator placed each of the segments in memory.
RAM is faster than ROM and other kinds of memory like flash. The fast microprocessors
(RISC) execute programs rapidly if the program is in RAM than ROM. But they store the
programs in ROM, copy them in RAM when system starts up.
The start-up code runs directly from ROM slowly. It copies rest of the code in RAM for
fast processing. The code is compressed before storing into the ROM and start up code
decompresses when it copies to RAM.
The system will do all this things by locator, locator must build program can be stored at
one collection of address ROM and execute at other collection of addresses at RAM.
Getting embedded software into the target system:
• The locator will build a file as an image for the target software. There
are few ways to getting the embedded software file into target
system.
– PROM programmers
– ROM emulators
– In circuit emulators
– Flash
– Monitors
PROM Programmers:
The classic way to get the software from the locator output file into target system by
creating file in ROM or PROM.
Creating ROM is appropriate when software development has been completed, since
cost to build ROMs is quite high. Putting the program into PROM requires a device
called PROM programmer device.
PROM is appropriate if software is small enough, if you plan to make changes to the
software and debug. To do this, place PROM in socket on the Target than being soldered
directly in the circuit (the following figure shows). When we find bug, you can remove
the PROM containing the software with the bug from target and put it into the eraser (if
it is an erasable PROM) or into the waste basket. Otherwise program a new PROM with
software which is bug fixed and free, and put that PROM in the socket. We need small
tool called chip puller (inexpensive) to remove PROM from the socket. We can insert
the PROM into socket without any tool than thumb (see figure8). If PROM programmer
and the locator are from different vendors, its upto us to make them compatible.
Fig : Semantic edge view of socket
ROM Emulators:
Other mechanism is ROM emulator which is used to get software into target. ROM emulator
is a device that replaces the ROM into target system. It just looks like ROM, as shown
figure9; ROM emulator consists of large box of electronics and a serial port or a network
connection through which it can be connected to your host. Software running on your host
can send files created by the locator to the ROM emulator. Ensure the ROM emulator
understands the file format which the locator creates.
If we want to debug the software, then we can use overlay memory which is a common
feature of in-circuit emulators. In-circuit emulator is a mechanism to get software into target
for debugging purposes.
Flash:
If your target stores its program in flash memory, then one option you always have is to
place flash memory in socket and treat it like an EPROM .However, If target has a serial
port, a network connection, or some other mechanism for communicating with the outside
world, link then target can communicate with outside world, flash memories open up
another possibility: you can write a piece of software to receive new programs from your
host across the communication link and write them into the flash memory. Although this
may seem like difficult
The reasons for new programs from host:
You can load new software into your system for debugging, without pulling chip out of
socket and replacing.
Downloading new software is fast process than taking out of socket, programming and
returning into the socket.
If customers want to load new versions of the software onto your product.
Monitors:
It is a program that resides in target ROM and knows how to load new programs onto the
system. A typical monitor allows you to send the data across a serial port, stores the
software in the target RAM, and then runs it. Sometimes monitors will act as locator also,
offers few debugging services like setting break points, display memory and register values.
You can write your own monitor program.
DEBUGGING TECHNIQUES
Introduction:
While developing the embedded system software, the developer will develop the code with
the lots of bugs in it. The testing and quality assurance process may reduce the number of
bugs by some factor. But only the way to ship the product with fewer bugs is to write
software with few fewer bugs. The world extremely intolerant of buggy embedded systems.
The testing and debugging will play a very important role in embedded system software
development process.
This saves time and money. Early testing gives an idea of how many bugs you have and
then how much trouble you are in.
BUT: the target system is available early in the process, or the hardware may be buggy and
unstable, because hardware engineers are still working on it.
Exercise all exceptional cases, even though, we hope that they will never happen, exercise
them and get experience how it works.
BUT: It is impossible to exercise all the code in the target. For example, a laser printer may
have code to deal with the situation that arise when the user presses the one of the buttons
just as a paper jams, but in the real time to test this case. We have to make paper to jam and
then press the button within a millisecond, this is not very easy to do.
Develop reusable, repeatable tests:
It is frustrating to see the bug once but not able to find it. To make refuse to happen again,
we need to repeatable tests.
Example: In bar code scanner, while scanning it will show the pervious scan results every
time, the bug will be difficult to find and fix.
Like telegraph “seems to work” in the network environment as it what it sends and receives
is not easy as knowing, but valuable of storing what it is sending and receiving.
BUT: It is difficult to keep track of what results we got always, because embedded systems
do not have a disk drive.
Conclusion: Don’t test on the target, because it is difficult to achieve the goals by testing
software on target system. The alternative is to test your code on the host system.
The following figure shows the basic method for testing the embedded software on the
development host. The left hand side of the figure shows the target system and the right
hand side shows how the test will be conducted on the host. The hardware independent code
on the two sides of the figure is compiled from the same source.
Conclusion: Using this technique you can design clean interface between hardware
independent software and rest of the code.
Calling Interrupt Routines by scaffold code:
Based on the occurrence of interrupts tasks will be executed. Therefore, to make the system
do anything in the test environment, the test scaffold must execute the interrupt routines.
Interrupts have two parts one which deals with hardware (by hardware dependent interrupt
calls) and other deals rest of the system (hardware independent interrupt calls).
One interrupt routine your test scaffold should call is the timer interrupt routine. In most
embedded systems initiated the passage of time and timer interrupt at least for some of the
activity. You could have the passage of time in your host system call the timer interrupt
routine automatically. So time goes by your test system without the test scaffold software
participation. It causes your test scaffold to lose control of the timer interrupt routine. So
your test scaffold must call Timer interrupt routine directly.
A test scaffold that calls the various interrupt routines in a certain sequence and with certain
data. A test scaffold that reads a script from the keyboard or from a file and then makes calls
as directed by the script. Script file may not be a project, but must be simple one.
#frame arrives
# Dst Src Ctrl mr/56 ab
#Backoff timeout expires Kt0
#timeout expires again Kt0
#sometime pass Kn2
Kn2
#Another beacon frame arrives
Each command in this script file causes the test scaffold to call one of the interrupts in the
hardware independent part.
In response to the kt0 command the test scaffold calls one of the timer interrupt routines. In
response to the command kn followed by number, the test scaffold calls a different timer
interrupt routine the
indicated number of times. In response to the command mr causes the test scaffold to write
the data into memory.
Features of script files:
The commands are simple two or three letter commands and we could write the parser more
quickly.
Comments are allowed, comments script file indicate what is being tested, indicate what
results you expect, and gives version control information etc.
Data can be entered in ASCII or in Hexadecimal.
Targets that have their radios turned off and tuned to different frequencies do not receive
the frame.
The scaffold simulates the interference that prevents one or more stations from receiving
the data. In this way the scaffold tests various pieces of software communication properly
with each other or not.(see the above figure).
Volt meters:
Volt meter is for measuring the voltage difference between two points. The common use of
voltmeter is to determine whether or not chip in the circuit have power. A system can suffer
power failure for any number of reasons- broken leads, incorrect wiring, etc. the usual way
to use a volt meter It is used to turn on the power, put one of the meter probes on a pin that
should be attached to the VCC and the other pin that should be attached to ground. If volt
meter does not indicate the correct voltage then we have hardware problem to fix.
Ohm Meters:
Ohm meter is used for measuring the resistance between two points, the most common use
of Ohm meter is to check whether the two things are connected or not. If one of the address
signals from microprocessors is not connected to the RAM, turn the circuit off, and then put
the two probes on the two points to be tested, if ohm meter reads out 0 ohms, it means that
there is no resistance between two probes and that the two points on the circuit are therefore
connected. The product commonly known as Multimeter functions as both volt and Ohm
meters.
Oscilloscopes:
It is a device that graphs voltage versus time, time and voltage are graphed horizontal and
vertical axis respectively. It is analog device which signals exact voltage but not low or high.
Features of Oscilloscope:
You can monitor one or two signals simultaneously.
You can adjust time and voltage scales fairly wide range.
You can adjust the vertical level on the oscilloscope screen corresponds to
ground. With the use of trigger, oscilloscope starts graphing. For example we can tell the
oscilloscope to start graphing when signal reaches 4.25 volts and is rising.
Oscilloscopes extremely useful for Hardware engineers, but software engineers use them
for the following purposes:
1. Oscilloscope used as volt meter, if the voltage on a signal never changes, it will
display horizontal line whose location on the screen tells the voltage of the signal.
2. If the line on the Oscilloscope display is flat, then no clocking signal is in
Microprocessor and it is not executing any instructions.
3. Use Oscilloscope to see as if the signal is changing as expected.
4. We can observe a digital signal which transition from VCC to ground and vice versa
shows there is hardware bug.
Logic Analyzers:
This tool is similar to oscilloscope, which captures signals and graphs them on its screen.
But it differs with oscilloscope in several fundamental ways
A logic analyzer track many signals simultaneously.
The logic analyzer only knows 2 voltages, VCC and Ground. If the voltage is in
between VCC and ground, then logical analyzer will report it as VCC or Ground but
not like exact voltage.
All logic analyzers are storage devices. They capture signals first and display them
later.
Logic analyzers have much more complex triggering techniques than oscilloscopes.
Logical analyzers will operate in state mode as well as timing mode.
Example: After finishing the data transmitting, we can attach the logical analyzer to RTS
and its signal to find out if software lowers RTS at right time or early or late. We can also
attach the logical analyzer, to ENABLE/ CLK and DATA signals to EEPROM to find if it
works correctly or not.(see fig6).
Figure7 shows a typical logic analyzer. They have display screens similar to those of
oscilloscopes. Most logic analyzers present menus on the screen and give you a keyboard to
enter choices, some may have mouse as well as network connections to control from work
stations. Logical analyzers include hard disks and diskettes. It can be attached to many
signals through ribbons. Since logic analyzer can attach to many signals simultaneously, one
or more ribbon cables typically attach to the analyzer.
Logical Analyzer in State Mode:
In the timing mode, logical analyzer is self clocked. That is, it captures data without
reference to any events on the circuit. In state mode, they capture data when some particular
event occur, called a clock occurs in the system. In this mode the logical analyzer see what
instructions the microprocessor fetched and what data it read from and write to its memory
and I/O devices. To see what instructions the microprocessor fetched, you connect logical
analyzer probes to address and data signals of the system and RE signal on the ROM.
Whenever RE signal raise then logical analyzer capture the address and data signals. The
captured data is called as trace. The data is valid when RE signal raise. State mode analyzers
present a text display as state of signals in row as shown in the below figure.
The logical analyzer in state mode extremely useful for the software engineer,
1. Trigger the logical analyzer, if processor never fetch if there is no memory.
2. Trigger the logical analyzer, if processor writes an invalid value to a particular
address in RAM.
3. Trigger the logical analyzer, if processor fetches the first instruction of ISR and
executed.
4. If we have bug that rarely happens, leave processor and analyzer running overnight
and check results in the morning.
5. There is filter to limit what is captured.
In circuit emulators:
In-circuit emulators also called as emulator or ICE replaces the processor in target system.
Ice appears as processor and connects all the signals and drives. It can perform debugging,
set break points after break point is hit we can examine the contents of memory, registers,
see the source code, resume the execution. Emulators are extremely useful, it is having the
power of debugging, acts as logical analyzer. Advantages of logical analyzers over
emulators:
Logical analyzers will have better trace filters, more sophisticated triggering
mechanisms.
Logic analyzers will also run in timing mode.
Logic analyzers will work with any microprocessor.
With the logic analyzers you can hook up as many as or few connections as you like.
With the emulator you must connect all of the signal.
Emulators are more invasive than logic analyzers.
See the above figure, Monitors are extraordinarily valuable, gives debugging interface
without any modifications.
Disadvantages of Monitors:
The target hardware must have communication port to communicate the debugging
kernel with host program. We need to write the communication hardware driver to
get the monitor working.
At some point we have to remove the debugging kernel from your target system and
try to run the software without it.
Most of the monitors are incapable of capturing the traces like of those logic
analyzers and emulators.
Once a breakpoint is hit, stop the execution can disrupt the real time operations so
badly.
Other Monitors:
The other two mechanisms are used to construct the monitors, but they differ with normal
monitor in how they interact with the target. The first target interface is with through a ROM
emulator. This will do the downing programs at target side, allows the host program to set
break points and other various debugging techniques.
UNIT V
----------------------------------------------------------------------------------------------------------------
The memory holds both data and instructions, and can be read or written when given
an address. A computer whose memory holds both data and instructions is known as a
von Neumann machine
The CPU has several internal registers that store values used internally. One of those
registers is the program counter (PC) ,which holds the address in memory of an
instruction.
The CPU fetches the instruction from memory, decodes the instruction, and executes
it.
The program counter does not directly determine what the machine does next, but
only indirectly by pointing to an instruction in memory.
179
2. Harvard architecture:
Advantage:
The separation of program and data memories provides higher performance for digital
signal processing.
180
streaming data:
• greater memory bandwidth;
• more predictable bandwidth
There is no exclusive Multiplier It has MAC (Multiply Accumulate)
No Barrel Shifter is there Barrel Shifter help in shifting and rotating
operations of the data
The programs can be optimized in lesser size The program tend to grow big in size
Used in conventional processors found in Used in DSPs and other processors found in
PCs and Servers, and embedded systems latest embedded systems and Mobile
with only control functions. communication systems, audio, speech, image
processing systems
RISC CISC
RISC stands for Reduced Instruction Set CISC stands for Complex Instruction Set
Computer Computer
Hardware plays major role in CISC Software plays major role in CISC
processors processors
RISC processors use single clock to execute CISC processors use multiple clocks for
an instruction execution.
Memory-to-memory access is used for data intermediate registers are used for data
manipulations is RISC processors manipulation
181
II. ARM(Advanced RISC Machine) Processor:
ARM instructions are written one per line, starting after the first column.
Comments begin with a semicolon and continue to the end of the line.
A label, which gives a name to a memory location, comes at the beginning of the line,
starting in the first column.
Here is an example:
182
Data Operations in ARM:
ARM is a load-store architecture—data operands must first be loaded into the CPU
and then stored back to main memory to save the results
183
4. Comparison Instructions
5. move instructions
6. Load store instructions
Instructions examples:
ADD r0,r1,r2
This instruction sets register r0 to the sum of the values stored in r1 and r2.
ADD r0,r1,#2 (immediate operand are allowed during addition)
RSB r0, r1, r2 sets r0 to be r2-r1.
bit clear: BIC r0, r1, r2 sets r0 to r1 and not r2.
Multiplication:
184
no immediate operand is allowed in multiplication
two source operands must be different registers
MLA: The MLA instruction performs a multiply-accumulate operation, particularly useful
in matrix operations and signal processing
MLA r0,r1,r2,r3 sets r0 to the value r1x r2+r3.
Shift operations:
Logical shift(LSL, LSR)
Arithmetic shifts (ASL, ASR)
A left shift moves bits up toward the most-significant bits,
right shift moves bits down to the least-significant bit in the word.
The LSL and LSR modifiers perform left and right logical shifts, filling the
least-significant bits of the operand with zeroes.
The arithmetic shift left is equivalent to an LSL, but the ASR copies the sign
bit—if the sign is 0, a 0 is copied, while if the sign is 1, a 1 is copied.
Rotate operations: (ROR, RRX)
The rotate modifiers always rotate right, moving the bits that fall off the least-
significant bit up to the most-significant bit in the word.
The RRX modifier performs a 33-bit rotate, with the CPSR’s C bit being inserted
above the sign bit of the word; this allows the carry bit to be included in the rotation
Compare instructions: (CMP, CMN)
compare instruction modifies flags values (Negative flag, zero flag, carry flag,
Overflow flag)
CMP r0, r1 computes r0 – r1, sets the status bits, and throws away the result of the
subtraction.
CMN uses an addition to set the status bits.
TST performs a bit-wise AND on the operands,
while TEQ performs an exclusive-or
185
ARM Register indirect addressing:
Expression: x = (a+b)-c
program:
186
Expression:
y=a*(b+c)
program 3:
For instance, LDR r0,[r1,#16] loads r0 with the value stored at location r1+16.( r1-base
address, 16 is offset)
auto indexing
post indexing
187
Auto-indexing updates the base register, such that LDR r0,[r1,#16]! ----first adds 16 to the
value of r1, and then uses that new value as the address. The ! operator causes the base
register to be updated with the computed address so that it can be used again later.
Post-indexing does not perform the offset calculation until after the fetch has been
performed. Consequently,
LDR r0,[r1],#16 will load r0 with the value stored at the memory location whose address is
given by r1, and then add 16 to r1 and set r1 to the new value.
Branch Instructions
1. conditional instructions(BGE-- B is branch, GE is condition)
2. unconditional instructions(B)
the following branch instruction B #100 will add 400 to the current PC value
188
example for flow of control programs:
Branch and Link instruction (BL) for implementing functions or sub routines or
procedures:
189
**Note : for more programs, refer class notes.
190
14. High-Speed Floating Point Capability
15. Extended Floating Point
16. The SHARC supports floating, extended-floating and non-floating point.
17. No additional clock cycles for floating point computations.
18. Data automatically truncated and zero padded when moved between 32-bit
memory and internal registers.
Programming model gives the registers details. The following registers are used in
SHARC processors for various purposes:
R0
SHARC Programming model
R1
R2
R3
R4
R5
31 ASTAT 0
R6
R7 STKY 0
31
R8
R9
R10 31 MODE1 0
R11
R12
R13
R14
R15
Status registers:
ASTAT: arithmetic status.
STKY: sticky.
MODE 1: mode 1.
The STKY register is a sticky version of ASTAT register, the STKY bits are set
along with ASTAT register bits but not cleared until cleared by an instruction.
The SHARC perform saturation arithmetic on fixed point values, saturation mode is
controlled by ALUSAT bit in MODE1 register.
All ALU operations set AZ (zero), AN (negative), AV (overflow), AC (fixed-point
carry), AI (floating-point invalid) bits in ASTAT.
191
Data Address Generators(DAG)
DAG1 registers
I0 M0 L0 B0
I1 M1 L1 B1
I2 M2 L2 B2
I3 M3 L3 B3
I4 M4 L4 B4
I5 M5 L5 B5
I6 M6 L6 B6
I7 M7 L7 B7
192
DAG2 registers
I8 M8 L8 B8
I9 M9 L9 B9
I10 M10 L10 B10
I11 M11 L11 B11
dual add-subtract;
193
Bus Architecture:
2. Absolute address
4. base-plus-offset mode
5. Circular Buffers
1. The Simplest addressing mode provides an immediate value that can represent the
address.
Example : R0=DM(0X200000)
2. An Absolute address has entire address in the instruction, space inefficient, address
occupies the more space.
3. A post modify with update mode allows the program to sweep through a range of
address. This uses I register and modifier, I registers shows the address value and modifier
(M register value or Immediate value) is update the value.
For load
R0=DM(I3,M1)
For store : DM(I3,M1)=R0
4. The base-plus-offset mode here the address computed as I+M where I is the base and
M modifier or offset.
I0=0x2000000 and M0= 4 then the value for R0 is loaded from 0x2000004
194
This mode uses L and B registers, L registers is set with +ve and nonzero value at staring
point, B register is stored with same value as the I register is store with base address.
If I register is used in post modify mode, the incremental value is compared to the sum of
L and B registers, if end of the buffer is reached then I register is wrapped around.
6. Bit reversal addressing mode : this is used in Fast Fourier Transform (FFT ). Bit
reversal can be performed only in I0 and I8 and controlled by BR0 and BR8 bits in the
MODE1 register.
BASIC addressing:
Immediate value:
R0 = DM(0x20000000);
Direct load:
Direct store:
expression:
x = (a + b) - c;
program:
R0 = DM(_a) ! Load a
R1 = DM(_b); ! Load b
R3 = R0 + R1;
R2 = DM(_c); ! Load c
R3 = R3-R2;
expression :
195
y = a*(b+c);
program:
R1 = DM(_b) ! Load b
R2 = DM(_c); ! Load c
R2 = R1 + R2;
R0 = DM(_a); ! Load a
R2 = R2*R0;
SHARC jump:
JUMP foo
direct;
indirect;
PC-relative.
• SHARC is modified Harvard architecture. – On chip memory (> 1Gbit) evenly split
between program memory (PM) and data memory (DM) – Program memory can be used to
store some data. – Allows data to be fetched from both memory in parallel
196
The SHARC ALU operations:
197
198
UNIT V - part II
Contents:
I. bus protocols,
II. I2 C bus ,
III. CAN bus;
IV. internet enabled systems,
V. design example elevator controller.
I. BUS PROTOCOLS:
For serial data communication between different peripherals components , the following
standards are used :
VME
PCI
ISA etc
For distributing embedded applications, the following interconnection network protocols are
there:
I2C
CAN etc
I2 C :
it uses only two lines: the serial data line (SDL) for data and the serial clock line
(SCL), which indicates when valid data are on the data line
199
The basic electrical interface of I2C to the bus is shown in Figure
A pull-up resistor keeps the default state of the signal high, and transistors are used
in each bus device to pull down the signal when a 0 is to be transmitted.
The open collector/open drain circuitry allows a slave device to stretch a clock signal
during a read from a slave.
The master is responsible for generating the SCL clock, but the slave can stretch the
low period of the clock
The I2C bus is designed as a multimaster bus—any one of several different devices
may act as the master at various times.
As a result, there is no global master to generate the clock signal on SCL. Instead, a
master drives both SCL and SDL when it is sending data. When the bus is idle, both
SCL and SDL remain high.
200
When two devices try to drive either SCL or SDL to different values, the open
collector/ open drain circuitry prevents errors
Address of devices:
A device address is 7 bits in the standard I2C definition (the extended I2C allows 10-
bit addresses).
The address 0000000 is used to signal a general call or bus broadcast, which can be
used to signal all devices simultaneously. A bus transaction comprised a series of 1-
byte transmissions and an address followed by one or more data bytes.
data-push programming :
I2C encourages a data-push programming style. When a master wants to write a slave,
it transmits the slave’s address followed by the data.
Since a slave cannot initiate a transfer, the master must send a read request with the
slave’s address and let the slave transmit the data.
Therefore, an address transmission includes the 7-bit address and 1 bit for data
direction: 0 for writing from the master to the slave and 1 for reading from the slave
to the master
201
The below figure is showing write and read bus transaction:
4. then stop signal is issued by setting both SCL and SDL high.
202
I2C interface on a microcontroller:
The CAN bus was designed for automotive electronics and was first used in production cars
in 1991.
The CAN bus uses bit-serial transmission. CAN runs at rates of 1 MB/s over a twisted pair
connection of 40 m.
An optical link can also be used. The bus protocol supports multiple masters on the bus.
each node in the CAN bus has its own electrical drivers and receivers that connect the
node to the bus in wired-AND fashion.
203
In CAN terminology, a logical 1 on the bus is called recessive and a logical 0 is
dominant.
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the
bus pulls the bus down (making 0 dominant over 1).
When all nodes are transmitting 1s, the bus is said to be in the recessive state; when a
node transmits a 0, the bus is in the dominant state. Data are sent on the network in
packets known as data frames.
A data frame starts with a 1 and ends with a string of seven zeroes. (There are at least
three bit fields between data frames.)
The first field in the packet contains the packet’s destination address and is known as
the arbitration field. The destination identifier is 11 bits long.
The trailing remote transmission request (RTR) bit is set to 0 if the data frame is used
to request data from the device specified by the identifier.
When RTR 1, the packet is used to write data to the destination identifier.
The control field provides an identifier extension and a 4-bit length for the data field
with a 1 in between. The data field is from 0 to 64 bytes, depending on the value
given in the control field.
A cyclic redundancy check (CRC) is sent after the data field for error detection.
The acknowledge field is used to let the identifier signal whether the frame was
correctly received: The sender puts a recessive bit (1) in the ACK slot of the
acknowledge field; if the receiver detected an error, it forces the value to a dominant
(0) value.
204
If the sender sees a 0 on the bus in the ACK slot, it knows that it must retransmit. The
ACK slot is followed by a single bit delimiter followed by the end-of-frame field.
since CAN is a bus, it does not need network layer services to establish end-to-end
connections.
The protocol control block is responsible for determining when to send messages,
when a message must be resent due to arbitration losses, and when a message should
be received.
IP Protocol:
it is an internetworking standard.
an Internet packet will travel over several different networks from source to
destination.
The IP allows data to flow seamlessly through these networks from one end user to
another
205
When node A wants to send data to node B, the application’s data pass through
several layers of the protocol stack to send to the IP.
IP creates packets for routing to the destination, which are then sent to the data link
and physical layers.
A node that transmits data among different types of networks is known as a router.
IP Packet Format:
The maximum total length of the header and data payload is 65,535 bytes.
An Internet address is a number (32 bits in early versions of IP, 128 bits in IPv6). The
IP address is typically written in the form xxx.xx.xx.xx.
packets that do arrive may come out of order. This is referred to as best-effort
routing. Since routes for data may change quickly with subsequent packets being
routed along very different paths with different delays, real-time performance of IP
can be hard to predict.
206
relationships between IP and higher-level Internet services:
Using IP as the foundation, TCP is used to provide File Transport Protocol for batch file
transfers, Hypertext Transport Protocol (HTTP) for World Wide Web service, Simple Mail
Transfer Protocol for email, and Telnet for virtual terminals. A separate transport protocol,
User Datagram Protocol, is used as the basis for the network management services provided
by the Simple Network Management Protocol
207