RP E9788793102293
RP E9788793102293
RP E9788793102293
Design — Use of
Microcontroller
RIVER PUBLISHERS SERIES IN SIGNAL,
IMAGE & SPEECH PROCESSING
Volume 2
The Field of Interest are the theory and application of filtering, coding, trans-
mitting, estimating, detecting, analyzing, recognizing, synthesizing, record-
ing, and reproducing signals by digital or analog devices or techniques. The
term “signal” includes audio, video, speech, image, communication, geophys-
ical, sonar, radar, medical, musical, and other signals.
• Signal Processing
• Image Processing
• Speech Processing
Aalborg
Published, sold and distributed by:
River Publishers
PO box 1657
Algade 42
9000 Aalborg
Denmark
Tel.: +4536953197
EISBN: 978-87-93102-29-3
ISBN: 978-87-92329-40-0
© 2010 River Publishers
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, mechanical, photocopying,
recording or otherwise, without prior written permission of the publishers.
Dedication
v
This page intentionally left blank
Preface
Electronic circuit design is not a new activity; there have always been good
designers who create good electronic circuits. For a long time, designers used
discrete components to build first analogue and then digital systems. The main
components for many years were: resistors, capacitors, inductors, transistors
and so on. The primary concern of the designer was functionality however,
once functionality has been met, the designer’s goal is then to enhance per-
formance.
Embedded system design is not just a new label for an old activity; embed-
ded system designers today face new challenges. They are asked to produce
increasingly complex systems using the latest technologies, but these technolo-
gies are changing faster than ever. They are asked to produce better quality
designs with a shorter time-to-market. They are asked to implement func-
tionality but more importantly to satisfy numerous other constraints. It is not
enough, for example, to have a mobile phone that you can communicate with;
it must also be small, light weight, have a reasonable cost, consume minimal
power and secure the contents of the messages while communicating etc. To
communicate is the functionality, while the size, weight, power consumption
and the use of encryption for secure data exchange are constraints that must
be achieved for an acceptable mobile handset. To achieve the current goals
of design, the designer must be aware with such design constraints and more
importantly, the factors that have a direct effect on them.
This book consists of three parts. The first part comprises Chapters 1–3.
Chapters 1 and 2, cover the important concepts of designing under constraints.
In Chapter 1, we introduce the reader to a good amount of needed knowledge
about design constraints and design metrics while in Chapter 2, we give the
reader a complete idea of the design flow process and the challenges he will
vii
viii Preface
face. More importantly, how he is to select one of the many options available
to him.
In Chapter 3, we introduce the reader to the rest of the book; the other
two parts. We use it to familiarize the reader with terminologies such as;
microprocessor, microcontroller, microprocessor-based and microcontroller-
based systems, the organization and the building blocks of the microprocessor
and microcontroller, etc. We briefly introduce the reader to the main build-
ing blocks of the microcontroller that will be the subject of the rest of the
book; timers, counters, the watchdog timer, ADC, DAC, serial communica-
tion, memory, need of programming, etc.
Part 2 comprises Chapters 4 and 5 which cover the programming part of
microcontrollers. The two chapters give the reader a very clear idea of instruc-
tions and the instruction set, instruction cycles, addressing modes, assembly
language, how to write a program, etc. We use for demonstration the instruc-
tion sets of the Atmel AVR series and Intel 8051 family.
The remaining Chapters 6–9 together form Part 3 of the book. Each chapter
deals with one subsystem that can be considered, from the embedded system
design point of view as a single purpose processor. For example, timers and
counters if implemented as an IC chip, then, they are single-purpose proces-
sor, or subsystem. Similarly for: ADC, DAC, UART, CAN, I2C, etc. To make
our book useful for a wider class of readers, we introduce each item as if it
was a discrete subsystem and then we discuss its implementation as part of
the resources of a microcontroller. The use of a microcontroller to implement
the functionality of each subsystem and how to use it in many possible appli-
cations is then given. This explains the main reason behind calling our book
“Digital System Design” and not “Embedded System Design”; the use of the
microcontroller as the processor in the system. In Chapter 6, we discuss the
memory system; Chapter 7 considers timer/counters, while Chapter 8 treats
the main components of any Data Acquisition System. Finally Chapter 9 con-
siders the communication between microcontrollers and the outside world.
Each chapter in this part can be considered as a complete unit that covers a
topic. The reader can read them in any order he prefers.
Contents
ix
x Contents
Index 541
List of Abbreviations
ACK Acknowledge
ADC Analogue-to-Digital Converter
ALU Arithmetic and Logic Unit
AU Arithmetic Unit
ASIC Application Specific Integrated Circuit
ASIP Application-specific instruction-set processor
BCD Binary Coded Decimal
CAD Computer Aided Design
CAN Controller Area Network
CISC Complex Instruction Set Computer
CMOS Complementary Metal oxide Semiconductor
CPI Cycles per Instruction
CPLD Complex PLD
CPU Central Processing Unit
CRC Cyclic Redundancy Checks
CSMA Carrier-Sense Multiple Access
CSMA/CA Carrier-Sense Multiple Access with Collision Avoidance
DAC Digital-to-Analogue Converter
DAS Data Acquisition System
DMA Direct Memory Access
DRAM Dynamic Random Access Memory
DRO Destructive Read Out
DSP Digital Signal Processor
EA Effective Address
EDA Electronic Design Automation
EEPROM Electrically Erasable Programmable Read Only Memory
xvii
xviii List of Abbreviations
1.1 Introduction
1
2 Processor Design Metrics
While trying to satisfy some constraints, the designers may face difficul-
ties from constraints that require trade-offs. Such constraints compete with
each other; improving one leads to worsening the other. It is difficult to find
optimum solution for such competing constraints. The best example of such
trade-offs is between performance, size and power constraints. It is impossible
to simultaneously design for high performance, low power consumption and
small size. These three constraints cannot be optimised simultaneously; reduc-
ing the size causes performance to suffer; improving performance increases
power dissipation. The designer must find schemes that help to satisfy some
metrics without degrading others. Design may thus be said to be a matter of
optimising constraints while maintaining full system functionality (the system
requirements and specifications).
1.1 Introduction 3
It is expected then that more than one design can fulfill the required func-
tionality. The real challenge facing any designer is not merely to design for
functionality but to find the implementation that can simultaneously optimises
a large number of design metrics.
During the development phases of a design, many challenges arise that
require a large number of decisions arise. Some of these decisions require
knowledge of the most suitable way of approaching the solution; others require
knowledge of the available IC technologies in the market and so on. Out of
the many decisions, there are three fundamental decisions that have a direct
effect on the way of optimising the design metrics. These decisions are related
to the technologies the designer will use while developing the system. These
technologies are:
• Design technology: Which design flow and which top-down model can
we use to speed up the design process?
• Processor technology: Which processor technology (software, hard-
ware, combination between hardware and software, etc.) can we applying
in order to implement each functional block?
• IC technology: Which type of IC technology (VLSI, PLD, FPGA, etc)
is suitable to implement the chosen processor?
The selection of the proper technology at each stage determines the effi-
ciency with which the design can be converted from concept to architecture,
to logic and memory, to circuit, and ultimately to physical layout.
The goal (the function) in this example is to purchase groceries that cost
$20.00. The constraints are:
• There is enough to serve four people, and
• The meal must include items from the four basic groups.
This is not a single answer problem; it has many answers depending on
the price of the individual items, the appetites of the visitors etc. In real-life
everyone faces such design problems; they are in fact more common than
analysis problems. To solve this design problem we must use analysis. We
need to use multiplication and addition as tools to get possible answers to the
design problem. Solving a design problem needs an ability to analyse and takes
place normally by trial-and-error, until an acceptable solution is achieved.
Power
Performance Size
NRE cost
Figure 1.1 Design metric competition — improving one may worsen others.
Time-to-market and Ease of use are some of the metrics that affect the cost
and price. Sometimes, a cost-performance metric may be more important
than cost and performance separately.
3. Power Consumption Metrics: Metrics of this group measure the power
consumption of the system. These metrics are gaining importance in
many fields as battery powered mobile systems become prevalent and
energy conservation becomes more significant.
4. System Effectiveness Metrics: In many applications such as military
applications, how effective the system is in implementing its target is
more important than cost. Reliability, Maintainability, Serviceability,
design adequacy and flexibility are related to the metrics of this group.
5. Others: These are metrics that include those that may guide the designer
to select from many off-the-shelf components that can do the job. Ease
of use, software support, safety and the availability of second source
suppliers are some of the metrics of this group.
Definitions:
Latency or Response time: The time between the start of the task’s execution
and the end. For example, producing one car takes 4 hours.
Throughput: The number of tasks that can be processed per unit time. For
example, an assembly line may be able to produce 6 cars per day.
The main concern in the two cases, throughput and response time, is time.
The computer that performs the same amount of work in the least time is the
fastest. If we are speaking of a single task, then we are speaking of response
time, while if we are speaking of executing many tasks, then we are speaking
about throughput. The latency metric is directly related to the execution time
while throughput measures the rate of implementing a given task. We can
expect many metrics measuring throughput based on the definition of the
task. The task may be an instruction as in case of MIPS (see 1.3.2.2), or it
may be floating-point operations as in case of MFLOPS (see 1.3.2.3) or any
other task. Besides execution time and rate metrics, there is a wide variety
of more specialised metrics used as indices of computer system performance.
Unfortunately, as we shall see later, many of these metrics are often used but
interpreted incorrectly.
10 Processor Design Metrics
There are many different metrics that have been used to describe the per-
formance of a computer system. Some of these metrics are commonly used
throughout the field, such as MIPS and MFLOPS (which are defined later in
this chapter), whereas others are introduced by manufacturers and/or designers
for new situations as they are needed. Experience has shown that not all of
these metrics are ‘good’ in the sense that sometimes using a particular metric
out of context can lead to erroneous or misleading conclusions. Consequently,
it is useful to understand the characteristics of a ‘good’ performance metric.
This understanding will help when deciding which of the existing perfor-
mance metrics to use for a particular situation and when developing a new
performance metric.
A performance metric that satisfies all of the following requirements is
generally useful to a performance analyst in allowing accurate and detailed
comparisons of different measurements. These criteria have been developed by
1.3 Performance Design Metrics 11
Many measures have been devised in an attempt to create standard and easy-to-
use measures of computer performance. One consequence has been that simple
metrics, valid only in a limited context, have been heavily misused such that
using them normally results in misleading conclusions, distorted results and
incorrect interpretations. Clock rate, MIPS and MFLOPS are the best examples
of such simple performance metrics; using any of them results in misleading
and sometimes incorrect conclusions. These three metrics belong to the same
family of performance metrics that measure performance by calculating the
rate of occurrence of an event. In Section 1.3.2.8 we give an example that
highlights the danger of using the wrong metric (mainly the use of means-
based metrics or using the rate as a measure) to reach a conclusion about
computer performance. In most cases it is better to use metrics that use the
execution time as a base for measuring the performance.
1.3 Performance Design Metrics 13
calculate MIPS for two computers with the same instruction set but one of them
has special hardware to execute floating-point operations and another machine
using software routines to execute the floating-point operations. The floating-
point hardware needs more clock cycles to implement one floating point
operation compared with the number of clock cycles needed to implement
an integer operation. This increases the average value of the CPI (cycles per
instruction) of the machine which in turn, according to equation (1.1), results
in a lower MIPS rating. On the other hand however, the software routines
that were needed to execute floating point operation consisted of many simple
instructions, now being replaced by a single hardware instruction and thus
executing much faster. Hence, the inclusion of floating point hardware will
result in a machine that has a lower MIPS rating but can do more work, thus
highlighting the drawback of MIPS as a metric. Example 1.4 further illustrates
this effect.
tinkering. For example, many compiler developers have used these bench-
marks as practice programmes, thereby tuning their optimisations to the
characteristics of this collection of applications. As a result, the execution
times of the collection of programmes in the SPEC suite can be quite sensitive
to the particular selection of optimisation flags chosen when the programme is
compiled. Also, the selection of specific programmes that comprise the SPEC
suite is determined by a committee of representatives from the manufacturers
within the cooperative. This committee is subject to numerous outside pres-
sures since each manufacturer has a strong interest in advocating application
programmes that will perform well on their machines. Thus, while SPEC is
a significant step in the right direction towards defining a good performance
metric, it still falls short of the goal.
1.3.2.5 Comments
As mentioned before any performance metric must be reliable. The majority
of the above mentioned metrics are not reliable. The main reason that makes
them unreliable is that they measure what was done whether or not it was
useful. Such metrics are called means-based metrics. The use of such metrics
may lead to wrong conclusions concerning the performance of the system.
To avoid such problems, we must use metrics that are based on the def-
inition of performance, i.e. the execution time. Such metrics are ends-based
metrics and measure what is actually accomplished. The difference between
the two classes of performance metrics is highlighted in Section 1.3.2.8.
CPU time = CPU clock cycles for a programme ∗ Clock cycle time
For any processor the clock cycle time (or clock rate) is known and it is
possible to measure the CPU clock cycles. CPU time can also be expressed
in terms of number of instruction executed (called instruction count IC), and
the average number of clock cycles per instruction (CPI):
Where:
CPI = (CPU clock cycles for a programme)/IC
Equation (1.4) can be written as:
Seconds I nstruction Cycles Seconds
CPU time = = × × (1.5)
program P rogram I nstruction Cycle
In the general case, executing the programme means the use of different
instruction types each of which has its own frequency of occurrence and CPI.
Example 1.1
When running specific programme on a given computer we measured the
following parameters:
• Total instruction count (IC): 30 000 000 instructions
• Average CPI for the programme: 1.92 cycles/instruction
• CPU clock rate: 350 MHz.
Calculate the execution time for this programme.
Solution:
We use equation (1.5) to calculate the execution time:
CPU time = (Seconds/ Programme)
= (Instructions/ Programme) × (Cycles/Instruction)
× (Seconds/ Cycle)
CPU time = Instruction count × CPI × Clock cycle
= 30, 000, 000 × 1.92 × 1/clock rate
= 30, 000, 000 × 1.92 × (1/350) × 10−6
= 0.1646seconds
assembly line B is 2. Then we also can say that A is 2 times faster than B and
B is 2 times slower than A.
Another technique for comparing performance is to express the perfor-
mance of a system as a percent change relative to the performance of another
system. Such a measure is called relative change. If, for example, the through-
put of system A is R1, and that of system B is R2, the relative change of system
B with respect to A, denoted 2,1 (that is, using system A as the base) is then
defined to be:
R2 − R 1
Relative change of system B w.r.t. system A = 2,1 =
R1
Typically, the value of 2,1 is multiplied by 100 to express the relative
change as a percentage with respect to a given basis system. This definition of
relative change will produce a positive value if system B is faster than system
A, whereas a negative value indicates that the basis system is faster.
An example of how to apply these two normalization techniques, the
speedup and relative change of the systems is shown in Table 1.1 are found
using system 1 as the basis. From the raw execution times, we can see that
system 4 is the fastest, followed by systems 2, 1, and 3, respectively. However,
the speedup values give us a more precise indication of exactly how much
faster one system is than the other. For instance, system 2 has a speedup
of 1.33 compared with system 1 or, equivalently, it is 33% faster. System 4
has a speedup ratio of 2.29 compared with system 1 (or it is 129% faster).
We also see that system 3 is actually 11% slower than system 1, giving it a
slowdown factor of 0.89.
Normally, we use the speedup ratio and the relative change to compare the
overall performance of two systems. In many cases it is required to measure
how much the overall performance of any system can be improved due to
changes in only a single component of the system. Amdahl’s law can be
used, in such cases, to get the impact of improving a certain feature on the
performance of the system.
Table 1.1 An example of calculating speedup and relative change using system 1 as the basis.
Execution time Speedup Relative change
System X Tx (s) Sx,1 2,1 (%)
1 480 1 0
2 360 1.33 +33
3 540 0.89 −11
4 210 2.29 +129
20 Processor Design Metrics
Amdahl’s Law
The performance gain that can be obtained by improving some portion of
a computer can be calculated using Amdahl’s Law. Amdahl’s Law states:
“The performance improvement to be gained from using some faster mode
of execution is limited by the fraction of the time the faster mode can be used”.
Amdahl’s law can be expressed mathematically by the equation:
Ttotal Ttotal
=
Tenhanced (Ttotal − Tcomponent ) + (Tcomponent /n)
Where:
Ttotal = System metric prior to enhancement,
Tenhanced = System metric after enhancement,
Tcomponent = Contribution of the component to be improved to the system
metic,
n = The amount of the enhancement.
Example 1.2 Use of Amdahl’s law
Consider a system with the following characteristics: The task to be anal-
ysed and improved currently executes in 100 time units, and the goal is to
reduce execution time to 80 time units. The component that is to be enhanced in
the task is currently using 40 time units. Calculate the amount of enhancement
needed.
Solution:
Use the above equation we get:
100 100
=
80 (100 − 40) + (40/n)
From which we can get n = 2. This means that to achieve the required
task, the component to be enhanced must be improved by speed up ratio of 2.
In our case its execution time must go from 40 time units to 20 time units.
Table 1.2
Instruction type Frequency Clock Cycle Count
(a) Before optimization
Arithmetic & Logic 43% 1
Loads 21% 2
Stores 12% 2
Branches 24% 2
(b) After optimization
Arithmetic & Logic 27.4% 1
Loads 26.8% 2
Stores 15.3% 2
Branches 30.5% 2
can lead to wrong conclusions. The use of ends-based metrics that measure
what is actually accomplished are more accurate and lead to correct decisions.
The reason is that ends-based metrics uses the execution time to measure the
performance.
To understand the difference between the rate metrics and the execution-
time based metrics we are considering here two examples.
Example 1.3 Effect of Compiler Variations on MIPS Rating and Perfor-
mance:
Table 1.2 shows the instruction usage before and after optimizing the
compiler of a load-store machine. The system clock is running on 500 MHz.
This development results in reducing the instruction count IC as follows:
ICoptimised = 0.785 ICunoptimised
Calculate the MIPS rating before and after optimization and also the
speedup ratio. Comment on the results.
Solution: From the tables we can get:
CPIunoptimised = 0.43 × 1 + 0.21 × 2 + 0.21 × 2 + 0.24 × 2 = 1.57
CPIoptimised = 0.274 × 1 + 0.268 × 2 + 0.153 × 2 + 0.305 × 2 = 1.726
Calculation of MIPS:
From equation (1.1), we get:
MIPSunoptimised = 500 MHz/1.57 ∗ 106 = 318.5
MIPSoptimised = 500 MHz/1.726 ∗ 106 = 289.0
Calculation Speedup ratio:
22 Processor Design Metrics
Discussions:
Taking MIPS as metric:
MIPS before optimization = 318
MIPS after optimization = 289
Conclusion 1:
The optimised code is lower than the unoptimised: 289 versus 318.
When taking Speedup as performance metric:
The speedup of the optimised code relative to the unoptimised code =
3.14/2.72 = 1.15
Conclusion 2:
If we take speedup ratio as a measure for performance, then the system
after optimizing the compiler is 1.15 faster than before optimization.
Conclusion 3:
We reach completely different conclusions when using two different types
of metrics. We reduced the total CPU time from 3.14 ∗ 10−9 IC to 2.72 ∗
10−9 IC but the MIPS shows that the unoptimised is better. This is a misleading
conclusion.
Example 1.4
The programme given in Figure 1.2 calculates the vector dot-product. The
programme executes N floating-point addition and multiplication operations
for a total of 2N floating-point operations. This programme may be modified
to take the form given in Figure 1.3. For each version calculate:
• the execution time,
• the MFLOPS,
1.3 Performance Design Metrics 23
s = 0;
for (i = 1; I<N: i++)
s = s + x[i] ∗ y[i]
Figure 1.2 A vector dot-product programme example.
s =0;
for (i = 1; i <N; i++)
If (x[i] != 0 && y[i] != 0)
S = s + x[i] * y[i];
Figure 1.3 Vector dot-product programme modified to calculate only non zero elements.
R1 = 2N
N(Tadd +Tmult )
= 2
Tadd +Tmult
F LOP S
Cycle
Programme 2
This option of the programme takes into consideration the fact that there
is no need to perform the addition or multiplication operations for elements
whose value is zero. In such a case it may be possible to reduce the total execu-
tion time if many elements of the two vectors are zero. The programme given
in Figure 1.3 performs the floating-point operations only for those nonzero
elements. If the conditional if statement requires Tif cycles to execute, the
total time required to execute this programme is
where f is the fraction of N for which both x [i] and y [i] are nonzero.
24 Processor Design Metrics
R1 = 2NF
N[Tif +f (Tadd +Tmult )]
= 2f
Tif +f (Tadd +Tmult )
F LOP S
Cycle
For the designer to select one of the two options, he can consider practical
values for the parameters Tadd , Tmult , Tif and the fraction f . For example,
if Tif = 4 cycles, Tadd = 5 cycles, Tmult = 10 cycles, f = 10%, and the
processor’s clock rate is 250 MHz (i.e. one cycle is 4 ns), he gets:
Ttotal1 = 60N ns and
Ttotal2 = N [4 + 0.1(5 + 10)] * 4 ns = 22N ns.
The speedup of programme 2 relative to programme 1 then is found to be:
S2,1 = 60N /22N = 2.73.
The speed up ratio 2,1 = N (60 − 22)/22N = 172.7%
Calculating the execution rates realized by each programme with these
assumptions produces:
R1 = 2/(60 ns) = 33 MFLOPS and
R2 = 2(0.1)/(22 ns) = 9.09 MFLOPS.
Thus, even though we have reduced the total execution time from Ttotal1
= 60N ns to Ttotal2 = 22N ns, the means-based metric (MFLOPS) shows
that programme 2 is 72% slower than programme 1. The ends-based metric
(execution time), however, shows that programme 2 is actually 172.7% faster
than programme 1.
We reach completely different conclusions when using these two different
types of metrics because the means-based metric unfairly gives programme 1
credit for all of the useless operations of multiplying and adding zero. This
example highlights the danger of using the wrong metric to reach a conclusion
about computer-system performance.
accomplished are more accurate and lead to correct decisions. The reason is
that ends-based metrics uses the execution time to measure the performance.
The performance measures mentioned above can be used to measure the per-
formance or compare the performance of working machines. The result has
no effect on the design since it is ready completed, implemented and working.
The designer can do nothing if the measured performance does not fulfill
the original requirement. The designer thus needs a performance measure that
can be used during the design stages and before the implementation. Such per-
formance measures can guide the designer while selecting between software
or hardware implementation, which algorithm to use, which IC technology
that will help fulfill the requirement etc. The importance of having such a
performance measure is related directly to the nature of the design cycle.
The design cycle, to be covered in Chapter 2, has different levels of
abstractions. The designer refines the system while moving from one abstrac-
tion level to the next one. The requirements (functional and non-functional)
list can be considered as the highest level of abstraction in which we use
natural language (like English) to describe the system. The designer refines
this level of abstraction to get the next level; the system level of abstraction.
The process of refining the system continues to the lowest abstraction level
which is the implementation. The implementation level consists of machine-
code for general-purpose processors and a gate-level netlist for single-purpose
processor. For the final product to fulfill the design metrics, e.g. performance,
it is important for the designer to analyse and optimise these metrics at each
level of abstraction.
On the highest level of abstraction we normally use algorithms to convert
the functional requirements (describing the behaviour of the system) into struc-
tural requirements. In this stage the designer has no idea about the processor
technology (software or hardware) to use to implement each functional block
or about the IC technology needed. On such a level of abstraction the absolute
values of the metrics, e.g. how many micro seconds difference in the execution
time or how many microwatts of power consumption are not relevant. As a
matter of fact at this level of abstraction it is practically impossible to define
exact values for execution time or power consumption since neither the pro-
cessor technology nor the IC technology has been chosen. On such a level of
abstraction we focus on the relative performance of different algorithms and
26 Processor Design Metrics
designs. The algorithms and designs at this level are completely independent
of the hardware or physical implementation used in the latter stages.
Example 1.5
In this example we analyse a simple algorithm that accepts as input, an
array of n integers and returns the sum of the elements of the array.
The algorithm is given in the second column of Table 1.3
Example 1.6
Suppose we are given an English short story TEXT, and suppose we want
to search through TEXT for the first occurrence of a given 3-letter word W. If
W is the 3-letter word “the” then it is likely that W occurs near the beginning
of TEXT, so f (n) will be small. On the other hand, if W is the 3-letter word
“zoo,” then W may not appear in TEXT at all, so f (n) will be large.
The above discussion leads us to the question of finding the complexity
function f (n) for certain cases. The two cases one usually investigates in
complexity theory are as follows:
1. Worst case: the maximum value of f (n) for any possible input
2. Average case: the expected value of f (n)
Sometimes we also consider the minimum possible value of f (n), called the
best case.
The analysis of the average case assumes a certain probabilistic distribution
for the input data; one such assumption might be that all possible permutations
of an input data set are equally likely. The average case uses the following
concept from probability theory. Suppose the numbers n1 , n2 , . . . nk occur
with respective probabilities p1 , p2 , . . . pk . Then the expectation or average
value E is given by
E = n 1 p1 + n 2 p2 + · · · + n k pk
Solution
The complexity of the search algorithm is given by the number C of
comparisons between ITEM and DATA[K]. We seek C(n) for the worst case
and the average case.
30 Processor Design Metrics
Algorithm: (Linear Search) A linear array DATA with N elements and a specific ITEM of information are
given. This algorithm finds the location LOC of ITEM in the array DATA or sets LOC= 0.
1. [Initialize ] Set K : =1 and LOC : = 0.
2. Repeat Steps 3 and 4 while LOC = 0 and K ≤ N.
3. If ITEM = DATA[K], then: Set LOC:= K.
4. Set K := K + 1. [Increments counter.]
[End of Step 2 loop.]
5. [Successful?]
If LOC =0, then:
Write: ITEM is not in the array DATA.
Else:
Write: LOC is the location of ITEM.
[End of If structure]
6. Exit.
Worst Case
Clearly the worst case in this example occurs when ITEM is the last
element in the array DATA or is not there at all. In either situation, we have
C(n) = n
Accordingly, C(n) = n is the worst-case complexity of the linear search
algorithm.
Average Case
Here we assume that ITEM does appear in DATA, and that it is equally
likely to occur at any position in the array. Accordingly, the number of com-
parisons can be any of the numbers 1, 2.3, . . . , n, and each number occurs
with probability p = 1/n. Then
1.1 2.1 1
C(n) = + + ··· + n ·
n n n
1
= (1 + 2 + · · · + n) ·
n
[n(n + 1)/2.1] n+1
= =
n 2
This agrees with our intuitive feeling that the average number of compar-
isons needed to find the location of ITEM is approximately equal to half the
number of elements in the DATA list.
Remark: The complexity of the average case of an algorithm is usually
much more complicated to analyse than that of the worst case. Moreover,
the probabilistic distribution that one assumes for the average case may not
actually apply to real situations. Accordingly, in many cases and unless other-
wise stated or implied, the complexity of an algorithm shall mean the function
1.3 Performance Design Metrics 31
which gives the running time of the worst case in terms of the input size. This
is not too strong an assumption, since the complexity of the average case for
many algorithms is proportional to the worst case.
Order of growth: Big O Notation or Notation
In the last analysis we ended with some expressions that take, in general,
the form of a polynomial of n, e.g.
C(n) = an2 + bn + c
Where a, b, and c are some constants, which depend on the statement cost
ci . It is possible to make some more simplifications on such expressions to
get what is known as the “rate of growth” or “order of growth”. It is actually
the rate of growth or the order of growth, of the running time that interests
the designer. To get the rate of growth, we consider only the first term of the
polynomial (e.g. an2 ). The rest of the formula, the lower order terms (e.g. bn
+c), are relatively insignificant for large n. The order of growth ignores also
the constant coefficient of the leading term, since constant factors are less
significant than the rate of growth in determining computational efficiency for
large inputs. We end with measuring the rate of growth by comparing f (n)
with some standard function, such as:
log2 n, n, n log2 n, n2 , n3 , 2n
The rates of growth for these standard functions are indicated in Table 1.4,
which gives their approximate values for certain values of n. Observe that
the functions are listed in the order of their rates of growth: the logarithmic
function log2 n grows most slowly, the exponential function 2n grows most
rapidly, and the polynomial functions a c grow according to the exponent c.
One way to compare the function f (n) with these standard functions is to use
the functional O notation defined as follows:
Suppose f (n) and g(n) are functions defined on the positive integers with
the property that f (n) is bounded by some multiple of g(n) for almost all n.
That is, suppose there exist a positive integer n0 and a positive number M such
that, for all n > n0 , we have
|f (n)| ≤ M|g(n)|
which is read “f (n) is of order g(n).” For example, any polynomial P(n) of
degree m can be written using O-notation as P(n) = O(nm ); e.g.,
8n3 − 576n2 + 832n − 248 = O(n3 )
To indicate the convenience of this notation, we give the complexity of
certain well known searching and sorting algorithms in Table 1.5.
1.4.1 Time-to-Market
Time-to-market and product lifetime (market window): Each product has a
lifetime cycle which, in general, consists of a start up period beginning when
the product is first released to market, a maturity period (market-window)
where sales reach a peak and an end-of-life period where the product sales
decline until it goes out of the market (see Figure 1.5). The fast growth of
1.4 Economic Design Metrics 33
microelectronic industries and the new fields of applications that use micro-
electronic components reflect directly on the lifetime of the products. Some
of the effects are:
• Life cycle: Till recently products had a life cycle of 3 to 5 years. The
lifetime of the microelectronic products is now often only one year.
• The start up period has been shortened and the product reaches maturity
faster.
• Higher and narrower peak.
• The product reaches its end of life cycle faster.
This means that we are now in an era that is characterised by short market-
windows. Missing this short market window means a great loss in sales. It is
reported that in some microelectronic industries one day delay in introducing
the product to the market can cause a loss of a one-million dollars. One way for
the industry to respond to these trends is to move faster towards programmable
technologies.
Loss of revenue due to delayed entry: Let’s investigate the loss of revenue
that can occur due to a delayed entry of a product in the market. We’ll use a
simplified model of revenue that is shown in Figure 1.6(b). This model assumes
the market peaks mid way through the product life-cycle (denoted by W) and
that the market peaks at the same time, even for a delayed entry. The revenue
for an on-time market entry is the area of the triangle labelled On-time, and
the area of the triangle labelled Delayed represents the revenue for delayed
entry product. The difference between the areas of the two triangles represents
the loss in the revenue due to delayed entry. The percentage revenue loss can
34 Processor Design Metrics
Peak Revenue
Revenue ($) On-time
Time
D
Time (months) W 2W
On Time Delayed
Entry Entry
(a) (b)
be calculated as follows:
Percentage loss = {[(area of on-time triangle) – (area of delayed triangle)]/
[area of on-time]} × 100%
If we consider the lifecycle of Figure 1.6 we get (assuming that triangle has
an angle of 45◦ :
The product cost is related to the fixed and variable costs by the equation:
If a product made from parts (units), then the total cost for any part is
Total part cost = fixed part cost + variable cost per part
∗ volume of parts (1.7)
The selling price Stotal of a single part, an integrated circuit in our case, may
be given by
Stotal = Ctotal /(1 − m) (1.8)
Where
• Ctotal is the manufacturing cost of a single IC to the vendor
• m is the desired profit margin
The profit margin has to be selected to ensure a gross profit after overhead and
the cost of sales (marketing and sales costs) have been considered. Normally
a profit model representing the profit flow during the product lifetime is used
to calculate the profit margin m.
Each term in the cost equations ((1.6 and 1.7) has an effect on the final
unit price (Ctotal and Stotal ) and also plays an important role in selecting
the best technology that can be used to implement the product to get the best
performance with minimum unit price. We start by understanding the meaning
of fixed and variable costs.
These costs are amortized over the total number of ICs sold. Ftotal , the
total non recurring cost, is given by
The NRE costs can be amortized over the lifetime volume of the product.
Alternatively, they can be viewed as an investment for which there is a required
rate of return. For instance, if $1 M is invested in NRE for a chip, then $10 M
has to be generated for a rate of return of 10.
vendor. As a guide the per annum costs might break down as follows (these
figures are in US dollars for engineers in the USA around 2004):
Salary: $50—$100 K
Overhead: $10—$30 K
Computer: $10 K
CAD Tools (digital front end): $10 K
CAD Tools (analogue): $ l00 K
CAD Tools (digital back end): $ 1M
The cost of the back-end tools clearly must be shared over the group
designing the chips. Increasing the productivity of the members of the design
team is an effective way of reducing the engineering costs.
Example 1.8
You are starting a company to commercialise your brilliant research idea.
Estimate the cost to prototype a mixed-signal chip. Assume you have seven
digital designers (each of salary $70 K and costs an overhead of $30 K), three
analogue designers (each of salary $100 K and costs an overhead of $30 K),
38 Processor Design Metrics
and five support personnel (each of salary $40 K and an overhead of $20 K)
and that the prototype takes two fabrication runs and two years.
Solution:
Entry
Tools
9% Salary
26%
Computer
4%
Overhead
11%
Figure 1.7 Breakdown of prototyping costs for a mixed signal IC.
be written. This is valid also even for application-specific ICs that are not
sold outside the company that developed them. From time to time, application
notes describing how to use the system or the IC may be needed. In addition,
specific application support may have to be provided to help particular users.
This is especially true when the product is an Application Specific IC (ASIC).
In such systems the designer usually becomes the main player who knows
everything on the circuit.
Of course, every product designed should have accompanying documen-
tation that explains what it is and how to use it. This even applies to chips
designed in the academic environment because the time between design sub-
mission and fabricated chip can be quite large and can tax even the best
memory.
CBIC. Suppose we have the following (imaginary) fixed and variable costs
(the meaning of fixed and variable costs will be discussed later):
• FPGA: $21,800 (fixed) and $39 (variable)
• MGA: $86,000 (fixed) and $10 (variable)
• CBIC $146,000 (fixed) and $8 (variable)
Ignoring all other design metrics, like time-to-market, the best technology
choice will depend on the number of units we plan to produce (more accurately,
the volume of parts). In Figure 1.8, for each of the three technologies, we plot
total cost versus the number of parts sold. From the plot we can calculate the
following break-even volumes:
• FPGA/MGA @ 2000 parts
• FPGA/CBIC @ 4000 parts
• MGA/CBIC @ 20,000 parts
The above results mean that, of the three technologies, FPGA technology
yields the lowest total cost for low volumes, namely for volumes between 1
and 2000. MGA technology yields the lowest total cost for volumes between
2000 and 20,000 and CBIC technology yields the lowest cost for volumes
above 20,000.
Recently, the rapid rise in costs associated with the use of fine design rules
has become an issue at semiconductor manufacturing plants. Similarly, the
cost of producing the first sample of a new chip, the “first silicon", has been
1.5 Power Design Metrics 41
increasing every year. Manufacturers noted that the cost for the “first silicon"
in the latest 90 nm process is six times that of the earlier 0.35 um process.
To handle these increasing costs, industry has concentrate on producing
dynamically programmable devices, which allow the LSI functions to be
determined after the chip is produced, rather than letting the end users to
create high-cost custom LSIs for his application. As a matter of fact the trends
towards increased performance and reduced costs in programmable devices
(reconfigurable LSIs) are progressing surely from the fact that the difference
in cost between FPGA and ASIC solutions continues increase every year
In the early days of computers, there was no large concern about how much
power a processor used. At that time the number of computers was small
and the amount of work needed from them was not that much. Now with
hundreds of millions of computers plus the possibility that one organization
may have thousands, things have changed completely; power consumption
has become a major concern. An additional change that has driven the power
issue is the massive change to mobility. Mobile systems are everywhere
today (e.g. mobile communication, mobile computing, medical implants,
deep space applications, etc.) almost exclusively using batteries as the power
source. The limited and costly power offered by batteries has placed another
constraint on the design of such mobile systems; how to reduce the power
consumption?
In general, the power usage and the voltage support of a processor are
important for the following reasons:
Newer processors strive to add additional features, integrate more and more
peripherals and to run at faster speeds, all of which tend to increase power
42 Processor Design Metrics
There are three conditions that have to be considered when planning for a
microcontroller’s power consumption in an application.
1. The “intrinsic power,” which is the power required just to run the
microcontroller.
2. The “I/O drive power,” which takes into account the power consumed
when the microcontroller is sinking/sourcing current to external I/O
devices.
3. The power consumed when the microcontroller is in “sleep” or “standby”
mode and is waiting with clocks on or off for a specific external event.
Taking these three conditions into account when planning an application
can change an application that would have only run for a few hours to literally
several months.
The addition of variable clocks, sleep mode and low supply voltage has
tended to shift the power issue away from the CPU toward peripheral devices
and other components. The IC now represents a complete system due to the
increase of the integration. Many ICs now includes on chip many power-
consuming peripherals alongside embedded cores. This means that when
speaking on power consumption, we must consider the power consumption of
the entire system. Overall power consumption differs considerably depending
on the system design and the degree of integration. Increasingly the processor
core is only a small part of the entire system.
It is very important for a design group to measure the performance and the
capabilities of the product they just finished and to determine that it is capable
of accomplishing its mission. There are a number of metrics that indicate the
overall performance of a system. The most popular are reliability, availability
and system effectiveness. Reliability (as will be explained latter) is the proba-
bility of continued successful operation, whereas availability (see later) is the
probability that the system is operational and available when it is needed. Sys-
tem effectiveness is the overall capability of a system to accomplish its mission
and is determined by calculating the product of reliability and availability.
The concept of system effectiveness was introduced in the late 1950s and
early 1960s to describe the overall capability of a system for accomplishing its
intended mission. The mission (to perform some intended function) was often
referred to as the ultimate output of any system. Various system effectiveness
definitions have been presented. The definition in ARINC is:
From the above simple discussion, it is clear that any model for system
effectiveness must include many different attributes. Many of the design met-
rics given in Section 1.2 represent some of the attributes. Beside reliability,
maintainability, serviceability, availability and design adequacy mentioned in
Section 1.2:
• Repairability: the probability that a failed system will be restored to
operable condition in a given active repair time
• Capability: a measure of the ability of an item to achieve mission
objectives given the conditions during the mission
• Dependability: a measure of the item operating condition at one or more
points during the mission, including the effects of reliability, maintain-
ability and survivability, given the item conditions at the start of the
mission
• Human performance: Human performance and environmental effects
are attributes used in some system effectiveness models.
In the following, we discuss some of the attributes (also design metrics)
used in many of the system effectiveness models.
1.6.1.1 Reliability
Reliability is an attribute of any component (software, hardware, or a network,
for example) that consistently performs according to its specifications. It has
long been considered one of three related attributes that must be considered
when making, buying or using any product or component. Reliability, avail-
ability and maintainability (RAM for short) are considered to be important
aspects to design into any system. (Note: sometimes the term serviceability
is used instead of maintainability. In this case RAS is used instead of RAM).
Quantitatively, reliability can be defined as the probability that the system
will perform its intended function over the stated duration of time in the
specified environment for its usage. In theory, a reliable product is totally free
of technical errors; in practice however vendors frequently express a product’s
reliability quotient as a percentage.
Software bugs, instruction sensitivity and problems that may arise due to
durability of the EEPROM and Flash memories (the nature of the EEPROM
architecture, limits the number of updates that may be reliably performed on
a single location — called the durability of the memory.) are some of the
possible reasons for the failure of embedded systems.
Reliability of a system depends on the number of devices and connections
used to build the system. As the number of devices and the number of in-
terconnections increases, the chance of system unreliability becomes greater,
since the reliability of any system depends on the reliability of its components.
The relationship between parts reliability and the system reliability depends
48 Processor Design Metrics
1 2 i n
in Figure 1.10. There are n paths connecting input to output, and the system
fails if all the n components fail. This is sometimes called a redundant config-
uration. The word “redundant” is used only when the system configuration is
deliberately changed to produce additional parallel paths in order to improve
the system reliability. Thus, a parallel system may occur as a result of the
basic system structure or may be produced by using redundancy specifically
to enhance the reliability of the system.
In a parallel configuration consisting of n components, the system is suc-
cessful if any one of the n components is successful. Thus, the reliability of a
parallel system is the probability of the union of the n events A1 , Aa , . . . , An .
which can be written as:
Rp = ρ(A1 ∪ A2 ∪ . . . ∪ An )
= 1 − ρ(A1 ∩ A2 ∩ . . . ∩ An )
n n
= 1− ρ(Ai ) = 1 − {1 − ρ(Ai )}
i=1 i=1
The last equality holds since the component failures are independent.
Therefore, the reliability of a parallel system is:
n
Rs = 1 − (1 − ri ) (1.11)
i=1
Equations (1.10) and (1.11) show how the configuration of the components
affects the system reliability. In addition, it is possible to recognize two distinct
50 Processor Design Metrics
and viable approaches to enhance system reliability; one on the level of the
components and the second on the level of the overall system organization.
1. Component technology: The first approach is based on component tech-
nology; i.e., manufacturing capable of producing components with the
highest possible reliability, followed by parts screening, quality control,
pretesting to remove early failures (infant mortality) etc.
2. System organization: The second approach is based on the organization
of the system itself i.e., fault-tolerant architectures that make use of pro-
tective redundancy to mask or remove the effects of failure, and thereby
provide greater overall system reliability than would be possible by the
use of the same components in a simplex or non-redundant configuration.
When designing for reliability, the primary goal of the designer is to find
the best way to increase system reliability. Accepted principles for doing this
include:
1. to keep the system as simple as is compatible with performance
requirements;
2. to increase the reliability of the components in the system;
3. to use parallel redundancy for the less reliable components;
4. to use standby redundancy (hot standby) components that can be switched
in when failures occur;
5. to use maintenance repair where failed components are replaced but not
automatically switched in;
6. to use preventive maintenance such that components are replaced by new
ones whenever they fail, or at some fixed time interval, whichever comes
first;
7. to use better arrangement for exchangeable components;
8. to use large safety factors or management programmes for product
improvement.
1.6.1.2 Maintainability
A qualitative definition of maintainability M is given by Goldamn and Slattery
(1979) as:
“…The characteristics (both qualitative and quantitative) of material design
and installation which make it possible to meet operational objectives with
a minimum expenditure of maintenance effort (manpower, personnel skill,
test equipment, technical data, and maintenance support facilities) under
operational environmental conditions in which scheduled and unscheduled
maintenances will be performed”
1.7 Summary of the Chapter 51
that have a direct effect on the performance and the efficiency of a system. We
studied several measures of performance and compared them. These measures
can start at a high level with big O analysis and then moved to a lower level
of detail when examining the performance of the system. We introduced a
several metrics for assessing embedded/digital performance. We introduced
some ways for reducing the power consumption including power management.
2.1 Introduction
In Chapter 1 we defined the design of a system as “the task of defining the
functionality of a system and converting that functionality into a physical
implementation, while satisfying certain constrained design metrics and op-
timizing other design metrics”. We mentioned also that the functionality of
modern microelectronics and embedded systems are becoming more and more
complex and creating a physical implementation that satisfies constraints is
also very difficult. The reasons of this design complexity are coming from the
hundreds of options that the designers can use to design and to implement the
target system. The design is not a one step process; it is a multi-steps operation.
At each step there are challenges for the designer that require some decisions.
Some of the challenging questions that the designer will face are: which IC
technology he is going to use? Is the software solution better or the hardware
55
56 A System Approach to Digital System Design
User’s need
Requirement
Analysis
Requirement Definition
Specifications
Functional Specifications
Functional Design
(System Architecture)
Target System
Solution
studied to determine what the user wants or needs the system to do. It may
be necessarily for the designer at this stage to talk with everyone involved
directly and sometimes indirectly while the system under design (i.e. all the
stakeholders); to listen to different opinions; to see how the design might affect
the people who have to work with it and what they are expecting from the new
system. It is important for the designer to take his time to look at different
views of the problem, to look at it from both the inside and the outside. This
will end with a list of requirements. This requirement is defined from the
user’s point of view, and without any reference to possible solutions. Some
non technical matters that may also be part of the requirements analysis are the
determination of the overall schedule and budget. The results of the analysis
are documented in a requirements definition.
2.2 System Design Flow 59
2.2.2 Specifications
Based on the requirements definition, the essential functions of, and opera-
tional constraints for, the system are determined. Functional specifications
define system features that can be directly tested. In addition to required
functions, goals and objectives are also specified. What the system should do
is determined, but not how it will be done. The user interface to the system is
also defined. The desired functions and user interface are clearly documented
in a functional specification.
The functional specifications outline the functional requirement of the
entire system as supplied by the user. This may be altered during the de-
sign process to accommodate additional detailed specifications. However, the
specification should be complete and formalized before the final stages of
the design process take place, and should be considered to be unalterable.
Sometimes it may be absolutely necessary to make changes to the specification
but this should not be a normal requirement.
Design
Functional Hardware
Specifications
Buses and
Processors
Connections
Implementation
Processor Selection
ASIP Customer
Microprocessor
(microcontroller) (ASIC)
Prototype and
Testing testing
Solution
will be used, how fast they are and so on. This enables the test procedure that
will be used on the completed prototype to be identified, and the test parameters
determined. Testing should always be performed to the initial specification,
and not just to verify that the hardware is working.
each functional block and which IC technology we are going to use for imple-
mentation. For example, the microprocessor can be selected as the execution
unit based upon an optimization of various parameters. These might include
speed of execution, data bus size, and so on, which can be considered to be
the technical considerations. These are relatively easy to decide unless the
product being designed is a very special one requiring, for example, a very
high speed of execution and/or a very high accuracy. In such cases we have to
use a custom (single-purpose) processor.
However, there are non-technical parameters which influence these deci-
sions and often these parameters are considered first, and can be termed the
product production considerations. These include the non-ideal parameters
related to cost. If an investment has previously been made in a particular type
of processor development aid, then unless there is some extremely important
overriding technical reason, such as those given above, then that processor
will be used in all new products.
b) IC Technology selection
This stage starts by selecting the IC technologies (VLSI, ASIC, PLD,
FPGA, etc.) that will be used in implementation. The microprocessor, memory,
and peripheral ICs are chosen and their interconnection to implement the sub-
functions is specified. Timing analysis is performed. Detailed analogue and
digital circuit design is carried out. Schematic diagrams and timing diagrams
documents are implemented. A hardware prototype is then constructed.
The selections of the processor technology and the IC technology during
the implementation stage are one of the main topics of “embedded system
design”. The reader can get more about that from any book on embedded
system.
We note here that hardware design and implementation may involve design
at any or all of the following levels:
• component
• printed circuit board
• system
Some systems are designed entirely at the component level. At this level
of design, you must select and specify the interconnection of discrete ICs.
This step is typical for small embedded systems consisting of only one or a
few printed circuit boards.
The hardware for other systems may be implemented at the board level.
At this level, one can select commercially available printed circuit boards
that provide the required subsystems. These boards are combined by plugging
2.2 System Design Flow 63
them into a backplane that carries the system bus connections. The backplane
and bus signals may correspond to one of numerous bus standards. These
bus standards are independent of a specific microprocessor. If a subsystem
function is not commercially available, then it is necessary for you to design,
from the component level, a board that provides the required function and is
compatible with the system bus.
At the system level, hardware is purchased as a complete, or essentially
complete, system; if the complete system hardware cannot be purchased, it
may still be necessary to design one or more special function circuit boards,
compatible with the system bus, to complete the system.
Successful hardware design at the component and board levels requires
that you have a thorough understanding of logic and circuit design principles
and an understanding of the function and operation of commonly used ICs.
The design of a board for use at the system level requires an understanding of
bus interface design and the details of the particular system bus used.
In Section 2.4 we shall discuss “IC Technology” in details and intro-
ducing the design constraints that guide the designer to select the suitable
IC technology for implementing his design.
either a functional equivalent or a test ASIC, so that faults and errors can be
detected. The ASIC can then be made with a high degree of confidence that it
will work first time.
Physical prototyping considers the electrical and mechanical connec-
tions. Should the circuit be soldered together using a standard printed circuit
board or should wire wrap be used are considered here. Wire wrap is easier to
construct, quicker and easier to make changes to, but the sockets are expensive
and the method has a maximum useful frequency of 8–10 MHz before signals
become corrupted. It is also a method which cannot be used for the final
product as once the design is finalized and no more changes are to be made, it
takes a long time to produce a completely connected circuit, when compared
to using a PCB.
Alternatively a custom printed circuit board (PCB) could be produced
if high frequency operation is required, although the cost involved means
that it would usually only be used for the production version, and not just
for functional testing. This would mean producing any ASICs before final
prototype testing which would require parts of the product to be developed
in isolation, converted into ASICs, tested, and then placed into the complete
product for final prototyping and testing. Finally the product has to be tested
against the original specification, as that is the measure of the success of the
technical design. Commercial considerations involved in the marketing of
products are not considered here. At this stage testing will revert to the black
box approach, see Figure 2.4, where the product is considered only in terms
of its inputs and outputs. Following this stage of testing, there will be further
product testing when the software and hardware are combined into the final
complete product.
Software design has much the same approach as hardware design in that the
functional software description is derived from the system specification. This
is then followed by the implementation stage and finally the prototype and
testing stages.
2.2.5.2 Implementation
Once the functional design has been completed then the implementation
can be performed. This involves the selection of the appropriate assembler
and/or compiler(s). A microprocessor will generally have only one assembly
language so that most assemblers will be similar. However, as there are a
variety of High Level Languages (HLL), a decision has to be made as to which
HLL to choose, and then which compiler to select and from which company.
Although, nominally all compilers for a particular HLL are the same, there
are considerable differences in speed of compilation, the size and speed of
execution of the code produced, and other factors, such as the ability to link
easily to assembly language programmes. Some of these decisions are also
similar to those used for selecting the microprocessor and its development
aids, and are based on commercial as well as technical considerations.
Each HLL has its own advantages; some of them are good for programmes
involving large amount of calculations, some are good for block structured
programmes, and others (e.g. C and C++) is good for maintaining close control
of the hardware of the computer, and for producing compact, fast-executing
programmes.
A large programme, which has several of these characteristics, could be
written in several HLLs as well as a LLL, i.e. a mixed assembly and HLL.
68 A System Approach to Digital System Design
This would enable the most efficient language to be used in different parts of
the programme in order to produce the optimum solution. However this would
be unusual and normally the most appropriate HLL would be chosen for the
complete programme.
System Integration
System Validation
Solution:
Operation, Evaluation
Maintenance
is the same, but that the hardware of the design is adapted for production. This
may involve designing a PCB or using programmable logic arrays (PLA) rather
than discrete logic and using surface mount components so that automated
assembly techniques can be employed and so on.
The pre-production prototype is then tested against the specifications again
to ensure that it is still functionally correct. Finally the production version is
made which should be identical to the pre-production version, but will be
made in the numbers required. It is only when large numbers are being made
that difficulties in the production version may occur, due to variations in the
production processes themselves. It is the job of the production engineer to
ensure that these difficulties are overcome.
System
System Specifications
Hardware Software
SW:
Design HW Specification SW Specifications Requirement
Tech. Eng.
Hardware Software
Processor
Implementation Code Programming
& IC Tech.
The selection of the proper technology at each stage determines the efficiency
with which the design can be converted from concept to architecture, to logic
and memory, to circuit, and ultimately to physical layout. The efficiency of
accomplishing any task or subtask is measured, as mentioned before, by the
design metrics. The decision of the designer to select one among many avail-
able technologies is guided by the design metrics and also which metrics he
has to fulfill and which he can optimize.
In the following sections the various technologies the designer uses during
the different design phases are considered.
Design technology deals with the sequence and steps which the designer will
follow to convert the desired system functionality (or system requirements)
into an implementation. During the design and implementation process, it is
not enough to optimize the design metrics, but it is important also to do so
quickly. This has direct effect on the productivity and the time-to-market. Any
delay in the time-to-market reduces the revenue and it may let the manufacturer
incurs losses rather than the expected profit. This explains why improving
design technology to enhance productivity and reaching the market in time
has been the main concern of the software and hardware engineers for some
decades now.
Different tools and design approaches are available for the reader in many
embedded systems books. One of the main design tools that help the designer
of a complex digital system, which will briefly introduced here, is the con-
cept of “design partitioning”. Two general design approaches are introduced
briefly here; structured design approach, and multiple description domains
(the Y-chart) approach. The first approach uses the concepts of hierarchy,
regularity, modularity and locality to handle the complexity of any system. The
Y-chart or behavior/structural/physical domains system descriptions describes
the system in three domains; behavior, structure and physical.
2.4 Design Technology 73
The first common technique that can be used to design any complex system is
the use of multiple views to provide differing perspectives. Each view contains
an abstraction of the essential artifact, which is useful in aggregating only the
information relevant to a particular facet of the design. In microelectronics, for
example, a VLSI circuit can be viewed physically as a collection of polygons
on different layers of a chip, structurally as a collection of logic gates, or
behaviorally as a set of operational restrictions in a hardware-description lan-
guage. A good designer is the one who can switch between the different views
when building his circuit. This is because each view has its own advantages
and merits that can help the design process. The concept of multiple views
is not closed on the microelectronic system design, but it is in use in many
other disciplines. For example a mechanical engineer will use the multiple
views concept for better understanding of the project under design. Another
example is the use of this concept in the civil engineering field. We cannot
say that the set of floor-plans produced by an architect are enough to build
the house. More accurately, these floor-plans are reflecting the view of the
architect to the project. It is necessary to know how the civil engineer will
see the project (the design details of the structure), to know the view of the
plumbers (plumbing plans), electrical engineers (electrical wiring plans), etc.
In many cases it is possible to use some views to derive from them other views,
but it is more useful to consider the different views separately to have different
way of thinking which will be a form of a “double check" for the designer.
To make the concept of views more understandable in case of designing
digital system, Gajski developed the Y-chart shown in Figure 2.8. The radial
lines on the Y-chart represent three distinct design domains (view is replaced by
domain in the Gajski Y-chart): behavioral, structural, and physical. These
domains can be used to describe the design of almost any artifact and thus
form a very general taxonomy for describing the design process. Within each
domain there are a number of levels of design abstraction that start at a very
high level and descend eventually to the individual elements that need to be
aggregated to yield the top level function (i.e., in the case of chip design and
transistors).
The behavioral domain describes what a particular system does or what
we wish to accomplish with a system. For instance, at the highest level we
might state that we desire to build a chip that can generate audio tones of spe-
cific frequencies (i.e., a touch-tone generator for a telephone). This behavior
2.4 Design Technology 75
which in turn allows the required behavior. At high levels this might consist
of an engineering drawing showing how to put together the keyboard, tone
generator chip, battery, and speaker in the associated housing. At the top chip
level, this might consist of a floor plan, and at lower levels, the actual geometry
of individual transistors.
Using the Y-chart, the design process can be viewed as making transfor-
mations from one domain to another while maintaining the equivalency of the
domains. Behavioral descriptions are transformed to structural descriptions,
which in turn are transformed to physical descriptions. These transformations
can be manual or automatic. In either case, it is normal design practice to verify
the transformation of one domain to the other by some checking process that
compares the pre- and post-transformation design. This ensures that the design
intent is carried across the domain boundaries. Hierarchically specifying each
domain at successively detailed levels of abstraction allows us to design very
large systems.
The Y-chart is rearranged in Table 2.1 to show the tools that may be used
at each abstraction level in the different domains.
We need to describe the design in one way or another at each and every
stage. The representation and notations used for describing the design are
extremely important because they can affect how designs can be view and
refined. The most appropriate description for a design depends on which level
of design the design working in and on the type of system he is designing.
As an example, as shown in Table 2.1, top-level specification could be in
natural language (as in many formal specifications), in equations form (such as
Table 2.2 Comparison between using hardware language and block diagrams.
Schematic capture Hardware Description Languages
Advantages — Good for multiple data flow — Flexible and can be described
— Give overview picture through parameters
— Relates to hardware better — Excellent for optimisation &
— Does not need to be good in synthesis
computing — Direct mapping to algorithms
— High information density — The best technique for designing
— Back annotations possible datapaths
— Mixed analogue/digital possible — Readily interfaced to optimiser
— Easy to handle and transmit
(electronically)
actions and functions into self- contained blocks with defined interfaces to
the other blocks. Each block is then subjected to a top-down design which
considers the actions required of each block. These actions are then broken
down into smaller sub-actions which are effectively simpler blocks with sim-
pler interfaces. These simpler blocks are then broken down into even simpler
blocks. This process is repeated until the blocks being produced contain only
single actions which can be implemented directly. As the blocks are completely
defined and the interfaces between all the blocks are completely defined,
by implementing the lowest level of blocks the product is also implemented
completely. This approach is taken from a well known software strategy; to
handle a large software programme split it into smaller and smaller sections till
we reach simple functions and simple subroutines that can be easily interfaced.
The idea behind this approach is the fact that any system can be more
easily understood at the top level by viewing units as black boxes with well-
defined interfaces and functions rather than looking at each individual basic
component, e.g. transistor. Consider for example a radio system: a radio may
have hundreds of parts in it, but they are more easily viewed when divided into
groups such as the tuner, amplifier, power supply, and speaker. Each group
can then be viewed in subgroups; for example, by dividing the amplifier into
its first stage and its second stage. The bottom of the structural hierarchy is
reached when all the parts are basic physical components such as transistors,
resistors, and capacitors. This hierarchical composition enables a designer to
visualize an entire related aspect of some object without the confusing detail
of subparts and without the unrelated generality of superparts. For example,
the hierarchical view of an automobile fuel-supply system is best visualized
without the distracting detail of the inner workings of the fuel pump and also
without the unrelated generality of the car’s acceleration requirements and
maximum speed.
Use of hierarchical structure approach is the best way that can be used
for designing any moderately complex system. Top-down and bottom-up
are the most common hierarchical organizations. In top-down a circuit is
designed with successively more detail; in bottom-up a circuit is viewed with
successively less detail.
R R R R Decode
7490 7490 7490 7490 counters
A B C D A B C D A B C D A B C D
Code
converter
7447A 7447A 7447A 7447A
Blinking
A
Bigger value TOP LEVEL
B A B
Select
B Subtractor
A
A B
Add
Negate Sign
Multiplexer
Complement Increment
Inc
Inc
Inc
Microprocessor
1 bit
8 bits
into submodules till we reach a stage that we cannot divide more, it is very
possible to end with a huge number of different submodules. The design and the
implementation of such large number of submodules represent a new complex
problem. This indicates that using the concept of hierarchy alone does not solve
such a complex problem. Other design concepts and design approaches are
also needed to further simplify the design process. Normally the concepts of
regularity, modularity and locality are used together with hierarchy. The reader
can get more information on this subject from any embedded system book.
In Section 2.6 we will discuss the topic “processor technology”; the type of
the processor that the designer is going to use to implement the data execution
and manipulation unit in his design. A wide class of designers uses standard
off-the-shelf components. Three of the main standard off-the-shelf compo-
nents are: microprocessor, microcontroller and digital signal processor (DSP).
The use of these components results in a programmable digital system; the sys-
tem functionality can be changed by reprogramming without the need, in most
cases, for major hardware changes. The use of microprocessor/microcontroller
as the core building block of the digital system results in what is called
“microprocessor-based system” and “microcontroller-based system”. Using
such technique helps the designer to achieve: short time-to-market, less NRE,
and more flexibility for the system.
Because of the wide spread of such digital systems, we are considering in
this section how to use the structured top-down approach for designing such
systems; designing a programmable digital system.
Designing microprocessor/microcontroller-based system starts, as for any
other system, by creating the specifications of the system under design. This
can be obtained by considering what actions are required of the system and
then detailing all of them and their side-effects completely. The specifications
are then implemented. If there are any errors in the specifications they will be
implemented in the end product (the system), so it is very important to ensure
that the specification is error-free. One method of producing the specifications
is to consider what actions and functions the system should perform. Some
will be directly defined by the nature of the system; others will have to be
deduced.
Microprocessor/microcontroller-based systems as any programmable sys-
tem have several basic and common functions that normally required
performing:
Because of these common features of the two systems, it is expected that the
operational block diagram, and in many cases, the functional block diagram,
on the higher level of abstraction will be the same for them.
88 A System Approach to Digital System Design
User
the interface:
• The functions inside the boundary are only allowed to communicate with
functions outside the boundary through defined interfaces.
An interface can be considered to be the connection between dissimilar
systems and in this context it will be a combination of hardware and software.
The use of interfaces enables the functions inside the boundary to be
invisible to those outside so that the block functions can be specified in terms
of the interfaces. A block may then be replaced at any time with an alternative
block, provided the interface is maintained.
If the system under design deals with a lot of input data coming from
different input units, then there is a possibility of having some errors in the
data enters. To avoid the effect of such situation, it is recommended to use
the interfaces that contain error detection procedures to ensure that errors are
not transmitted throughout the entire system and detected errors are handled
in a specified and controlled manner. Testing can be reduced to testing at
the interfaces to ensure that if correct data enters then correct solutions are
produced and that incorrect data is detected and prevented from propagating
throughout the entire system. If a system is found to be producing incorrect
results it is only necessary to trace the erroneous data at the interfaces until the
block which has an error-free input but a faulty output is detected. The error
has then been isolated within that particular block.
If the functions inside the boundary do not generate errors and the designer
discovers the presence of errors, then errors must enter via the interfaces.
Therefore, and as a general rule, to minimize the possibility of erroneous input,
the number of interfaces per block in Figure 2.16 should be minimized. In order
90 A System Approach to Digital System Design
Microcomputer or
microcontroller
system
which identifies each operational block and the interfaces each requires. In
this example, each of the blocks has a single interface to the operating system,
except for the operating system which has three interfaces corresponding to
the three blocks. Each block is complete in itself with the defined interfaces
indicating the how, when and why of data and information communication
with the other blocks. The interfaces are the only means of transferring data
and information and can be hardware, software, or a combination of the two.
The number of the levels in the top-down hierarchy defines the number of
interfaces required. This, in turn, defines the type of interface used; software,
hardware, or a combination.
Top-down approach is then used with each of the four blocks. In the
following we are considering, as an example, the user interface block (the
rest will be considered in the Case Study). The user interface performs two
basic operations, input of data and commands from the user which form one
sub-function, and output of data and results as a second sub-function. The user
interface, accordingly, can be divided into a number of sub-functions depends
on the number and types of the input and output devices connected to the
system. In Figure 2.19 we divided the user interface into four sub-functions
which make up the operations of the user interface. The main sub-function is
the link to the operating system and three interfaces to specific I/O devices.
92 A System Approach to Digital System Design
User interface
Interface to:
Operating system
1) Input
device
2) Output Operating Operating Operating
device 1 system system system
3) Output
device 2
In this way, the blocks can communicate in a structured way with the higher
levels. It is clear that in Figure 2.19 we assumed one input unit and two output
units. As there are many forms of output, such as visual display units (VDUs),
graphics displays, serial communication links and parallel printer ports and
printers, each of which needs to be handled in a different way, a block is
required for each one. Similarly for the input devices, if there are several
input devices, each one will require a separate sub-functional block. Input
devices can be sensors (in case of data acquisition systems), keyboard, etc. To
determine exactly the number and types of input and output devices and what
actions each block has to perform, a detailed specification has to be available.
Effect of the number of top-down levels on the performance
In top-down approach, if a sub-block has to communicate with the higher
level, this has to be performed via the interface link defined at the same level.
This achieves the error checking and correct formatting of data required, but
unfortunately it results in slowing down the communication process as each
additional interface takes a finite time to execute. This is one factor which has to
be optimized for a particular design. The use of well-structured communication
interfaces produces error free, robust and reliable links, but with the drawback
that each level introduces additional delays.
2.4 Design Technology 93
Design process
(Slower per level)
Top level
Increasing
hardware
content of Increasing
interfaces software
(decreased content of
complexity interfaces
(increased
complexity)
As a general rule the more levels the top-down design introduces the
slower the communication process is. This fact is very important when the
designer is designing for high performance. It is very possible in the cases when
the performance is the main design constraint, e.g. in real time systems, the
designer avoids using programmable systems as a solution and goes to another
approaches that implements the performance constraint. In many cases, it is
possible to improve the performance by using hardware, which is much faster,
to implement the interfaces at the lower levels. This is illustrated in Figure 2.20
which also indicates that the higher levels are implemented mainly in software
which is correspondingly slower. However, the data at the higher levels is more
significant and there is less of it and this tends to reduce the impact of slow
high level communication interfaces.
The reason the low level functions are implemented in hardware is that
the interfaces become simpler and more easily implemented in hardware. As
more levels are created in the design process, the additional delay in commu-
nication links increases significantly, as seen in Figure 2.21. This only applies
to products containing both hardware and software, such as microcontrollers
and microcomputers.
To determine the optimum combination of hardware/software interfaces
the response time of the system is considered. This is known as the real-
time response and a system can be said to have a real-time response when
there is no user-perceivable delay from the initiation of an action to the
response of the system with the desired data and/or results. This makes the
assessment of real-time response subject to the user and different users may
well view the same system in different ways. A user is not necessarily a
94 A System Approach to Digital System Design
person but can be other computers and hardware systems. In general, good
requirement engineering solves this situations; good requirement engineering
means unambiguous requirement. Accordingly, the initial requirements from
the system will define exactly what the client means by real time and gives
quantified values for the performance.
Figure 2.22 illustrates how the acceptable response time (the performance)
represents the limit that to be placed onto the interface execution time and
accordingly on the design levels.
The design level required is mainly determined by the complexity of
the product as any product has to be reduced to similar simple blocks. The
more complex a product the more levels are required, assuming compatible
complexity of each level. Some variation in the number of levels is possible
by increasing the number of blocks per level to achieve the individual block
simplicity. This reduces the number of levels needed but it also reduces the
advantage of using top-down design and an alternative design process may be
required.
Once the design level and the interface execution times have been deter-
mined, a range of solutions from mostly hardware to mostly software can then
be implemented.
If the response time cannot be achieved, it may be necessary to implement
a special interface linking a low level functional block with a block much
higher up in the design, as illustrated in Figure 2.23. However, this way is not
2.5 IC-Technology; Implementation Technology 95
Special
Interface
Special
interface
built leaving the rest of it together with the upper layers (the connection
layers) to finish them according to the application. ASIC comes with arrays
of gates already formed during the fabrication phase. In a standard-cell ASIC
technology, logic-level cells (forming the middle layers) are ready on-chip,
and the task of the user is to arrange and connect the cells in a way suitable
for the implementation of his application.
Programmable Logic Devices (PLD): In a programmable logic device
(PLD) technology, all layers already exist. The layers implement a pro-
grammable circuit. Programming, in this case, means creating or destroying
connections between the wires that connect the gates. This takes place by either
blowing a fuse or setting a bit in a programmable switch. PLD has two main
types: Programmable logic arrays (PLA) (consists of programmable array of
AND gates and programmable array of OR gates) and programmable array
logic (PAL) (consists of just one programmable array). During the past decade
a complex form of PLD known as field programmable gate array (FPGA)
started to be very popular between embedded system designers. FPGAs offer
more general connectivity among blocks of logic, rather than just arrays of
logic as with PALs and PLAs.
In the next subsection a brief idea on PLD technology is given. For ASIC
design we refer the reader to any VLSI textbook for the details of VLSI
technology.
Often, the cost, speed, or power dissipation of a microprocessor may not meet
the system goals and the designer has to find an alternative solution. In such
cases, normally, going to full-custom IC technology to implement the required
task is not the possible solution because of the increased design time and the
very high NRE cost. In many of such cases, the solution is to use programmable
IC technology. A variety of programmable ICs, known as programmable logic
devices (PLD), are available now days that can give a solution that can be more
efficient than general purpose microprocessor and at the same time faster to
develop than dedicated IC and does need much of investment. In contrast to
the dedicated IC that comes with fixed-function logic determined at the time
of manufacture and could not be changed, the PLD is supplied to the user with
no logic function programmed at all. It is up to the designer to make the PLD
perform in whatever way a design requires; only those functions required
by the design need be programmed. PLD offers the digital circuit designer
2.5 IC-Technology; Implementation Technology 99
the possibility of changing design function even after it has been built. PLD
can be programmed, erased, and reprogrammed many times, allowing easier
prototyping and design modification.
In a programmable logic device (PLD) technology, the word programming
has a lower-level meaning than a software programme. The programming that
takes place may consist of creating or destroying connections between wires
that connect gates, either by blowing a fuse, or setting a bit in a programmable
switch. PLDs can be programmed from a personal computer (PC) or work
station running special software.
PLD is a generic term. There is a wide variety of PLD types, including
PAL (programmable array logic), GAL (generic array logic), EPLD (erasable
PLD or electrically PLD), CPLD (complex PLD), FPGA (field-programmable
gate array) as well as several others.
CPLD means, normally, a device based on a programmable sum-of-
product (SOP) array, with several programmable sections that are connected
internally. In effect, a CPLD is several interconnected PLDs on a single chip.
This structure is not apparent to the user.
Processors
9
y(n) = bk x(n − k)
k=0
(a)
The three types of processors that can implement this functionality are
shown by Figure 2.27b, Figure 2.27c, and Figure 2.27d consequently.
• NRE costs: NRE costs are low. It is very possible that the NRE needed
to develop the GPP is very high, but the manufacturer is producing a
large quantity. The manufacturer is distributing the NRE cost equally
between all the units he produced. It is expected that the share of NRE
per processor will be very small. From the other hand, the cost of the
designer of the system to write the required programme is considered as
NRE related to the system. But this cost is very small compared to the
cost if he is going to design hardware.
• Flexibility: Flexibility is high. It is very easy to change the functionality
of the system by just writing a new programme without the need of any
extra hardware.
• Unit cost: If the customer who ordered the system under design will use
one set of it only, then using PGG will reduce the cost: buying ready
made processor is much cheaper than designing it. On the other hand,
if the customer who ordered the system is planning to produce many of
them, it is very possible that cost will be high. For large quantities it may
be cheaper to design your processor than to use GPP.
• Size: May be large. Using PGG means, normally, that we are using extra
components not need by the application.
• Power consumption: May be large. Again, the use of unnecessary
components increases the power consumption.
• Performance: GPP may improve the performance if we are using it in
computational-intensive applications. In other applications, the use of
PGG may degrade the performance. This is because using PGG means
that the designer selecting the software option for solving his problem.
General- Single-
purpose ASIP purpose
processor processor
Digital Circuit
TTL logic
CMOS logic Full-custom Semi-Custom
Various circuits circuits
microprocessors
ASIC ships Very high risk Good compromise
Expensive, long design between risk, cost,
time, performance, design
good for high volume, time
good for high
performance
Keeping this fact in mind, Figure 2.30 summarize the possible processor
and IC technologies that may be used to implement any digital circuit. The
figure gives the main features, advantages and limitation of each technology.
3.1 Introduction
In Chapter-2 we classified processors into three classes: General Purpose
Processor (GPP), Application Specific Instruction Set Processor (ASIP), and
Single (Special) Purpose Processor (SPP). The microprocessor is designed to
be a GPP; It is designed to be able to handle massive amount of data of any type
(e.g. bytes, words, integers, real, complex, etc.) and it is possible to use it in any
application, just programme it. This explains why the microprocessor is the
brain of any general purpose computing system whether it is minicomputer,
minicomputer, main frame or even super computer. The microcontroller, on
the other hand, is one of the popular forms of ASIP; it is a processor optimized
to work in control-oriented applications.
It is the goal of this chapter to introduce the microprocessor and microcon-
troller to the reader; the structure, the main building blocks, the differences
and the capabilities. This chapter is necessary to give the reader an idea on the
rest of the book; the microcontroller resources, capabilities and applications.
Figure 3.1 depicts the overall architecture of a typical general purpose
embedded system that is traditionally known as microcomputer system. The
components shown in the figure are: the Input/Output unit, the memory unit,
113
114 Introduction to Microprocessors and Microcontrollers
System bus
Timing
Memory Unit
Memory
Module
CPU
Arithmetic/
Logical Unit Memory
(ALU) module
Interface
Bus
control
logic
Input/Output subsystem
Control
Unit
Interface Mass
storage
device
(Microprocessor)
Interface I/O device
the arithmetic/Logic unit (ALU), the control unit, timing circuitry, bus control
logic, and system bus. The arithmetic and logic unit together with the control
unit form the central processing unit (CPU) of the microcomputer. When
implementing the CPU as a single IC it is called “microprocessor” unit (MPU).
The MPU is a general purpose processor its main purpose is to decode the
instructions given to it and uses them to control the activity within the system.
It also performs all arithmetic and logical operations.
The timing circuitry, or clock, generates one or more trains of evenly speed
pulses needed to synchronize the activity within the microprocessor and the
bus control logic. The pulse trains output by the timing circuitry normally have
the same frequency but are offset in time, i.e., are different phases. Micropro-
cessors require from one to four phases, with earlier processors requiring more
than one phase. For many of the recent microprocessors, the timing circuitry,
except for oscillator, is included in the same integrated circuit.
3.1 Introduction 115
The memory is used to store both the data and programmes that are cur-
rently being used. It is normally broken into several modules, with each module
containing several thousands (or several hundreds of thousands) locations.
Each location may contain part or all of a datum or instruction and is associated
with an identifier called a memory address (or simply address). The CPU does
its work by successively inputting, or fetching, instructions from memory and
carrying out the tasks dictated by them.
The I/O subsystem may consist of a variety of devices, e.g. mass stor-
age units, analogue to digital converters (A/D), digital to analogue (D/A)
converters, etc.
The system bus is a set of conductors used to connect the CPU to its
memory and I/O devices. It is over these conductors, which may be wires in a
cable or lines on a printed circuit (PC) board that all information must travel.
The bus specifications determine exactly how information is transmitted over
the bus. Normally, the bus conductors are separated into three groups:
• The data lines for transmitting the information.
• The address lines, which indicate where the information is to come from
or is to be placed.
• The control lines, which regulate the activity on the bus.
The signals on the bus must be coordinated with the signals created by the
various components connected to the bus. The circuitry needed to connect the
bus to a device is called an interface, and the bus control logic is the interface
to the CPU.
Memory interfaces consist primarily of the logic needed to decode the
address of the memory location being accessed and to buffer the data onto or
off of the bus, and of the circuitry to perform memory reads or writes. I/O
interfaces may be quite simple or very complex. All I/O interfaces must be
capable of buffering data onto and/or off the system bus, receiving commands
from the CPU, and transmitting status information from their associated de-
vices to the CPU. In addition, those connected to mass storage units must be
capable of communicating directly with memory and this required them to
have the ability to control the system bus. The communication between an I/O
interface and the data bus is accomplished through registers which are referred
to as I/O ports.
can achieve, i.e. it is a specification for “what” the processor will do. It defines
the functional capabilities of the computer. It is also related to the interface
between hardware and software. This is directly related to the instruction set,
the addressing modes and the types of data that the computer can handle.
Computer architecture answers the question of “what” the processor will do,
but does not answer the question of “how” to do it. Computer “Microarchi-
tecture” answers the “how” question. While architecture is directly related to
design for functionality, the answer of the “how” is related to implementing
design constraints, e.g. performance, power consumption, etc. The designer,
for example, may use pipelining, super-pipelining, cache memory with dif-
ferent levels, fast adders, etc. to get better performance. It is possible to have
more than one microarchitecture all of them are answering the same question
of how to implement a given architecture. The microarchitecture is not visible
for the user but he can feel its effect; a user may execute his programme on
two processors with different microarchitecture, the two will produce the same
result but one of them will execute the programme faster than the other.
The Intel ×86 families is the best example that shows the difference be-
tween architecture (functionality) and microarchitecture (how to implement
the functionality). Intel 8086, ×286, ×386, ×486, Pentium I up to Pentium IV
represent different microarchitecture for the same architecture. The improve-
ment that happened in the performance of the ×86 family members is entirely
due to advancement in the internal organization (the microarchitecture). The
internal organization of the processor has, generally, a large impact on its
performance; how fast the processor executes a programme (or instruction).
Many users and many authors are using the two terminologies “architec-
ture” and “microarchitecture” alternatively. To avoid any misunderstanding,
we prefer the use of terminology “organization” also “structure” to replace
the word “microarchitecture.
The internal organization (or microarchitecture) of any processor is broken
down into different functional units. Each unit implements specific function.
The flow of the data between the different blocks takes place under the control
of the control unit to guarantee that the instructions are executed correctly.
The way of executing the instructions may change from machine to machine
but the organization of all of them contains the same basic functional units.
To make use of the functional capabilities of the processor, it must be
supported by a minimum amount of hardware that facilitates its communi-
cation with the external words. The processor together with the supporting
hardware forms together what we call the computer. Figure 3.1 represents the
general organization of many computers. In Figure 3.1, the CPU administers
3.2 The Microprocessor 117
all activity in the system and performs all operations on the data as described
in the instruction set.
Internal Buses
Arithmetic Unit
Controller
Clock Control Unit
Working
registers
Program counter (PC)
Processor Status
Register (PSR)
:
Stack pointer (SP)
I/O Control
Logic
following groups; working register group, address register group and special
register group.
A POP operation removes the top element of the stack and deposits it into a
register (normally the accumulator). Stacks are necessary to provide interrupt
and subroutine levels. They can be implemented in two essential ways- through
hardware and through software.
A hardware stack is implemented directly on the microprocessor chip by
a set of internal registers. If N registers are dedicated to a stack operation, we
call N as the depth of the stack. The advantage of a hardware stack is the high
speed inherent to the use of internal registers, while the disadvantage is the
limitation imposed on the depth of the stack. Whenever the N registers are
full, the internal stack is full. So that the stack can continue to be used after it is
full, all of the N registers must be copied into the memory. This process gives
rise to another problem: one must be able to tell when the stack is empty or
full. Most of the manufacturers of the early microprocessors simply forgot to
incorporate these details into their designs, and most of the processors did not
provide a “stack full” or a “stack-empty” flag (e.g. Z80). Unfortunately, in
these early microprocessors one can be merrily pushing items into stack only
to have them fall through the last word, without giving any indication to the
programmer that something might be wrong. Naturally, this is a programming
error; the programmer should know better. In practice, a flag would solve
this problem. Conversely, it is possible for the programmer to keep pulling
elements out of a hardware stack forever. For this reason, it is advisable to
have a “stack-empty” indicator.
The alternative to a hardware stack is a software stack. To provide for
unlimited growth, the stack is implemented in the read/write memory of the
system, i.e. the RAM. The base of the stack is arbitrarily selected by the
programmer. The top of the stack is contained and updated automatically
within the stack pointer (SP) register. Each time a PUSH operation is executed,
the SP is incremented or decremented, depending on the convention used (i.e.,
depending on whether the memory “grows” or “shrinks” from bottom to top).
Similarly, each time a POP is done, the stack pointer is immediately updated.
In practice, the SP usually points to the word above the last element of the
stack. The goal is to provide the fastest possible PUSH operation. With this
convention, the stack pointer can be used directly, without having to wait till
it is incremented when saving a word quickly on the stack.
In case of microcontrollers, e.g. Intel 8031/8051 and Atmel AVR, the
system uses an area within the internal SRAM as the system stack. In case of
Intel 8051, the stack is kept in the internal RAM and is limited to addresses ac-
cessible by indirect addressing. These are the first 128 bytes on the 8031/8051
or the full 256 bytes of on-chip RAM on the 8032/8052. The programmer
can reinitialize the SP with the beginning of the stack or he can let it retain its
3.2 The Microprocessor 123
Data Bus
Stack base
PUSH
Microprocessor Memory
Register
PUSH
Base
default value upon system reset. The rest value of 07 H maintains compatibility
with the 8051’s predecessor, the 8048, and results in the first stack write storing
data in location 08 H.
In other cases, as Code VisionAVR C language compiler, the compiler
implements two stacks: The system stack starts at the top of the SRAM area
and is used to store return addresses and processor control information. The
data stack starts below the system stack and works its way down through
memory, in a manner similar to the system stack, and is used to store temporary
data, such as the local variables used in a function.
In microcontrollers, the stack is accessed, as any microprocessor, explicitly
by the PUSH and POP instructions, and implicitly during operations such
as subroutines calls, function calls and interrupt service action. In the first
124 Introduction to Microprocessors and Microcontrollers
case, the stack can be used in the evaluation of arithmetic expressions (See
example 3.1).
Use of stack
Some of the applications of the stack in computer are:
(a) To store the return address — managing procedure calls and returns.
One of the important developments of programming languages is the use
of procedure. The procedure is a complete programme incorporated within
the main programme. It is used to achieve economy and more important
to achieve regularity. A procedure allows the programmer to use the same
piece of code many times the matter that achieves effort and minimizes the
storage area needed to store the programme. Modularity is achieved by divid-
ing the complex programme into smaller programmes. Calling a procedure
lets the programme branches from the current location to the procedure, exe-
cutes the procedure, and then returns back from the procedure to the place from
which it was called. Figure 3.5 illustrates the use of the procedure. Actually,
as shown in Figure 3.5, it is possible to call a procedure within a procedure;
P1
2000 2500
P2
Return
Return
2601
2101 2101 2101
XXX XXX XXX XXX XXX
Initial Call P1 Call P2 Return Return
State from P2 from P1
(b) Stack contents
Figure 3.5 Use of stack to store return addresses.
3.2 The Microprocessor 125
this is called nesting of procedures. In such cases each procedure has its own
return. For proper operation, the return address of each procedure must be
stored safely to use it when the execution if the procedure completes. From
Figure 3.5, it is clear that the order of returning from the nested procedures
is following the opposite order of calling them. In other words they follow
the last–in-first-out mechanism. This can be accomplished by using a stack;
pushing the return addresses in the order of calling the procedures and popping
them back in the reverse order after completing the procedures.
(b) To execute arithmetic operations.
Example 3.2 explains such application. In general, to use the stack to
perform arithmetic operation we have to convert the usual “infix” notation
that we are familiar with into what is called “reverse Polish” or “postfix”
notation. For example, the expression (A + B) ∗ C, in “infix” notation, takes
the form AB + C*, in “reverse Polish” notation. This subject is out of the
scope of this book, this is why in Example 3.1 we showed how to use the stack
to execute an expression written in the normal notation, i.e. written in infix
notation.
Example 3.1 PUSH and POP operations
This example describes a PUSH A and a POP A instructions, where the
contents of the accumulator will be transferred to or read from the top of the
stack (TOS). In the PUSH operation illustrated in Figure 3.6a, the contents
of the accumulator are transferred to the top of the stack. The contents of
Microprocessor Memory
Data bus
Microprocessor
PUSH
Accumulator
1001 Accumulator
Address
SP
Stack
1002
Stack base SP
(a) (b)
Figure 3.6 PUSH a instruction.
126 Introduction to Microprocessors and Microcontrollers
the stack pointer (SP), in our case 1001, points to the address of the first
available location in the stack. After the PUSH is executed, SP is automatically
incremented to the value 1002 (see Figure 3.6b) and point to the new “first
available location”.
Conversely, POP A fetches the top element of the stack (see Figure 3.7)
and loads it into the accumulator. The initial value of SP was 1002. It is auto-
matically decremented to 1001 prior to the memory fetch. The final contents
of the stack appear in Figure 3.7b.
Example 3.2 Use of stack for Expression Evaluation
Consider the following programme fragment. Show the contents of the
stack at the end of executing each step in the programme and give an expression
describes the value of the result X (Use TOS as top of stack).
PUSH V ; TOS ← V
PUSH W ; TOS ← W
ADD ; TOS ←(W + V)
PUSH V ; TOS ← V
PUSH X ; TOS ← X
ADD ; TOS ← (X + V)
MUL ; TOS ← (X + V) ∗ (W + V)
PUSH Y ; TOS ← Y
ADD ; TOS ← Y + (X + V) ∗ (W + V)
POP Z
Microprocessor Memory
Data bus
POP
Accumulator
1001
Address 1000
1001 1000
Stack
SP
Has been decremented Stack
prior to POP
Stack base
(a) (b)
Figure 3.7 POP a instruction.
3.2 The Microprocessor 127
Z = Y + (X+V)*(W+V)
X
W V V (X+V)
V V (W+V) (W+V) (W+V) (W+V)
PUSH V PUSH W ADD PUSH V PUSH X ADD
Y
(X+V)*(W+V) (X+V)*(W+V) (X+V)*(W+V)+Y
MUL PUSH Y ADD POP Z
Figure 3.8 Use of the stack.
Answer: The contents of the stack during the execution of each instruction of
this programme are shown in Figure 3.8
The programme is calculating the value of the expression:
Z = Y + (X + V) ∗ (W + V)
Source #1 Sourse #2
Control
Select
Status Register
Operation
Destination (result)
Figure 3.9 Arithmetic logic unit block diagram.
the operation; lines for the output of status indications (originate in the logic
circuitry associated with the ALU and terminate at the status register). One of
the differences between the different available microprocessors is the source
of the two operands required for execution and the destination of the result.
In accumulator–based processor (mainly CISC structure) the accumulator
represents one source with another word. The accumulator in such structure
is the destination of the result. In RISC processors, all the general-purpose
registers are operational registers doing the function of the accumulator. The
sources and the destination in RISC processors are working registers. In case
if one (or the two) operands are in the memory, RISC structures use special
instruction to bring that operand from the memory to one of the registers at
first before operating on it. Other processors allow the user to define the two
different sources and the distention.
To improve the performance of the CPU, it is normal to divide the ALU
into two units: an arithmetic unit (AU) and a logic unit (LU). For achieving
further improvement in the performance, some processors use:
The control unit directs the operations carried by the processor. The control
unit receives information from the other units and in turn transmits information
back to each of them. When it is time for some input information to be read to
the memory (this is defined by the current instruction in execution), the control
unit stimulates the input device appropriately and sets up a path from input
device to the proper place in the arithmetic unit (normally the accumulator)
or memory unit. Similarly, when an arithmetic operation is to be performed,
the control unit must arrange for the appropriate operands to be transferred
from their source locations to the arithmetic unit, for the proper operation to
be executed and for the result to go back to the defined destination.
The functions performed by the control unit are defined mainly by the
instruction set of the processor. The internal structure (the organization) of the
CPU has also great influence on such functions.
In its simplest form, Figure 3.2, the control unit contains:
Increment
Parallel
Load
To Data
To Control Store Bus
External
Address & Control Data bus
Information
ROM Buffer Address bus
Circuitry
Control bus
RAM
Address & Control Information
Figure 3.11 Basic microprocessor organization (CPU).
132 Introduction to Microprocessors and Microcontrollers
the interface between the internal blocks of the CPU are called “internal buses”
and those used to interface the processor to the I/O devices are called “external
buses”; data bus, address bus and control bus. The microprocessor creates the
“external bus system”.
The use of bus systems allows the user to connect new devices easily.
In case of computer systems that use a common bus, it is easy to move the
peripherals between them. The bus system has another advantage; it is a low
cost system since it is a single set of wires that is shared in multiple ways.
On the other hand, the limited bandwidth of the bus creates a communication
bottleneck. For internal bus, such limitations affect directly the performance
of the system, while the bandwidth limitations of the external bus limit the
maximum I/O throughput. In many processors, to improve the overall per-
formance of the system, they use internal bus wider than the external bus
(normally twice).
System buses can be classified into:
• CPU-memory buses: This class is normally short, high speed and matched
to the memory system to maximize memory-CPU bandwidth.
• I/O buses: This class of buses is characterized by high length, many
types of I/O devices can be connected to it, have a wide range in the data
bandwidth of the devices connected to them, and normally follow bus
standards.
Traditionally the buses are classified into three function groups:
Data bus:
This is a set of wires that carry data being transferred between the CPU
and the external devices connected to it, e.g. memory units and different types
of I/O devices. The number of lines that form the data bus is referred to as
the width of the data bus. The data bus width has a direct affect on the overall
performance of the system. This is because the majority of the move operations
are between the CPU and external memory (RAM or ROM). The width of the
data bus defines the amount of data that the bus can support per second. In some
applications the limitation in the amount of data that the bus can support causes
a form of bottleneck. If the system has a CPU with tremendous computational
power (can handle huge amount of data per second) at the same time it has
memory system that has a huge capacity, then it is expected that the amount of
data movement between the CPU and the memory per second to be huge. The
bottleneck happens when the bandwidth of the data bus cannot support such
amount of data. The data bus, in such cases, puts limitation on the number of
3.3 Microcontrollers 133
devices that can be connected to the bus system; adding more devices means
more data per second has to move between the CPU and the devices.
It is important to remember here that when we say “32-bit processor” we
do not mean that the data bus width is 32 bits. 32-bit processor means that
the working registers are of width 32 bits each. As mentioned before, 32-bit
processor means that the processor operates on operands of width 32-bit.
The data bus is bidirectional; data may travel in either direction depending
on whether a read or write operation is intended.
Note: We used the term “data” in its general sense: the “information” that
travels on the data bus may be the instructions of the programme, an address
appended to an instruction, or data used by the programme.
Address bus:
The signals on the address bus represent the address of a location in the
memory that represents a source of an operand or a destination of a result.
The address can be also of an instruction or the address of an I/O port. The
address bus width defines the memory space; the maximum number of memory
locations that can be addresses. An n-bit address bus means that the memory
space contains 2n locations. When we say that our computer has a memory of
1 gaga byte (1030 bytes) this does not necessarily mean that the address bus
is 30-bit width. It means that the internal RAM occupies an area of 1 gaga
byte in the memory map (See Chapter 6). The address bus is unidirectional;
address information is always supplied by the CPU (or sometimes by direct
memory access DMA circuitry in addition to CPU) and the bus carries them
to the different memory storage areas.
Control bus:
The control bus is used to control the access to and the use of the data and
address lines. Because the data and address lines are shared by all components,
there must be a means of controlling their use. Each combination of the control
signals has a specific role in the orderly control of system activity. As a rule,
control signals are timing signals supplied by the CPU to synchronize the
movement of information on the address and data buses. The control bus
provides, at least, four functions:
• Memory synchronization
• Input-output synchronization
• CPU (microprocessor) scheduling — interrupt and direct memory access
(DMA)
• Utilities, such as clock and reset.
134 Introduction to Microprocessors and Microcontrollers
3.3 Microcontrollers
The microcontroller is an Application-Specific Instruction-set Processor
(ASIP), that can serve as a compromise between the general-purpose processor
(the microprocessor) and the single-purpose processor (application specific
integrated circuit ASIC). An ASIP is a programmable processor optimized
for a particular class of applications having common characteristics, such
as embedded control, digital-signal processing, or telecommunications. The
designer of such a processor can optimize the datapath for the application
class, perhaps adding special functional units for common operations and
eliminating other infrequently used units. The microcontroller is a processor
optimized for embedded control applications.
Microcontrollers used in a wide number of electronic systems such as:
• Engine management systems in automobiles
• Keyboard of a PC
• Electronic measurement instruments (e.g., digital multimeters, frequency
synthesisers, and oscilloscopes)
• Printers
• Mobile and satellite phones, video phones
• Televisions, radios, stereo systems, CD players, tape recording equipment
• Hearing aids, medical testing systems
• Security alarm systems, fire alarm systems, and building services systems
• Smart ovens/dishwashers, washers and dryers
• Factory control
• Many, many more
To understand how the architecture of the microcontroller is optimized
for all the above mentioned application, we are going to start by considering
the general block diagram of the microcomputer given in Figure 3.1. This
general block diagram can be simplified to take that given in Figure 3.12. The
microprocessor cannot work alone; it works only with minimum supporting
hardware. The CPU and the supporting hardware are connected together
with the external bus system to form a general purpose microprocessor-
based-system which we call it normally a “microcomputer”. The minimum
supporting hardware includes: external clock, RAM, ROM, support for
external I/O peripherals, besides the external bus system. Beside this minimum
supporting hardware, the microprocessor-based system may include interrupt
control circuit, serial interface, parallel interface, and timers.
The microcontroller is a microcomputer on-chip. It integrates many of
the components of a microprocessor-based system (the components inside
3.3 Microcontrollers 135
External External
Clock Interrupt Serial Device Parallel device
Internal
Clock
CPU
Address, Data, and control busses
RAM ROM
the dotted line of Figure 3.12 onto a single chip. It is a complete computer
system optimized for hardware control that encapsulates the entire processor,
memory, and all of the I/O peripherals on a single chip. It needs only to be
supplied power and clocking to start work. Being on the same chip enhances
the processing speed because internal I/O peripherals take less time to read or
write than external devices do.
Most microcontrollers, beside memory and I/O peripherals, combine on
the same chip:
For any microcontroller to be able to do its work, it must have some un-
avoidable sections (resources) on the chip, irrespective to the type or the
manufacturer of the microcontroller. These resources include: CPU, I/O unit
(ports), memory unit, serial communication, timer, watchdog timer, reset and
brownout detector, oscillator and, in many cases, analogue-to-digital con-
verter. Besides these hardware resources any microcontroller needs software
(programme) resources. In the following we are briefly introducing these
resources. The rest of the book is discussing in details these resources and
how to use them.
3.3.1.1 CPU
CPU is the ‘computer’ part of the Microcontroller. Its main function is to
run the programmes supplied by the designer. It does this by using memory,
some registers, and the programme memory. Each manufacturer uses one
microprocessor to be the base of his microcontroller devices. The ALU, in
each case, is set up according to the requirements of the instruction set being
executed by the timing and control block. For example Intel uses basic 8085
microprocessor core in Intel 8051 microcontroller. Similarly Motorola uses
basic 6800 microprocessor core in M68HC11 microcontroller devices.
units is not enough, additional external memory units are to be used with the
microcontroller. The memory devices directly accessible by the CPU consist
of semiconductor ICs called RAM and ROM.
The architecture of the microcontroller defines whether the data and the
programmes are going to be stored in the same memory or in two separate
memory units. In CISC structures one memory (with one memory map) is used
to store the data and the programmes. To allow faster access and increased
capacity, RISC structures use Harvard model. In RISC structures, the data
and the programmes are separated from each other and two memory units,
each has its own memory map, are required. The two memory units are:
• Programme Memory: The programme memory stores the instructions
that form the programme. To accommodate large programmes, the pro-
gramme memory may be partitioned as internal programme memory and
external programme memory in some controllers.
• Data Memory: The data memory is used by the controller to store data.
The CPU uses RAM to store variables as well as stack. The stack is used
by the CPU to store return addresses from where to resume execution
after it has completed a subroutine or an interrupt call.
Microcontrollers come with several different types of memory. The amount
of memory on a single chip varies quite a bit by manufacturer.
Memory unit is the subject of Chapter 6.
Ports
Ports for microcontroller are the same as the external buses of the micropro-
cessor. The main difference between ports and buses is that many devices can
be connected to the same bus but only one device can be connected to each port.
In other words, the port is a bus that we can connect to it only one device (input
or output device). Bus is a group of wires; similarly the port is a group of bins
on a microcontroller. The busses start at registers inside the microprocessor;
the port is a data register inside the microcontroller. The busses are interfaces
between CPU and external world, similarly, the ports represented by its register
represents the interface (the connection) between the CPU and the outside
world. Ports can be either input ports, output ports or bidirectional ports. To
define the direction, each port has a “data direction” register associated with
its data register. This allows each pin to be set individually as an input or
output before the data is read from or written to the port data register. When
working with ports, first of all it is necessary to choose, based on the type
of the peripheral, which port we need to work with, and then to send data
to, or take it from the port. As far as the programme interaction with port is
concerned, the port acts like a memory location. If a binary code is presented
to the input pins of the microcontroller by external device (for instance, a set
of switches), data is latched into the register allocated to that port when it is
read. This input data can then be moved (copied), using the proper instruction,
into another register for processing. If a port register is initialized for output,
the data moved to that register is immediately available at the pins of the chip.
It can then be displayed, for example, on a set of LEDs.
Ports can also be “Parallel” or “Serial”, “synchronous or asynchronous”
and “duplex or half duplex”.
as well as internal), and many other applications. The timer is the subject of
Chapter 7.
detector is a circuit that monitors the power supply voltage, and if there is a
momentary drop in voltage, resets the processor so that the drop in voltage
does not corrupt register and memory contents, which could lead to faulty
operation of the controller. The reset can be caused also by a clock monitor
detector; if the clock slows down below certain threshold frequencies due to
a fault.
3.3.1.10 Programme
The above sections have focused on computer systems hardware with only a
passing mention of the programmes, or software that makes them work. The
relative emphasis placed on hardware versus software has shifted dramatically
in recent years. Whereas the early days of computing witnessed the materials,
manufacturing, and maintenance costs of computer hardware far surpassing
the software costs, today, with mass-produced LSI (large scale integrated)
chips, hardware costs are less dominant. It is the labor-intensive job of writing,
documenting, maintaining, updating, and distributing software that constitutes
the bulk of the expense in automating a process using computers.
Assembly language and programming using assembly language are the
subject of Chapters 4 and 5.
ROM EEPROM
RAM
Analogue I/O
Serial
A/D
Microprocessor
PWM
Software Kernel:
This term is applied to the set of programmes in an operating system which
implement the most primitive of the functions of the system. Typical kernel
contains programmes for four types of functions:
• Process management-Routines for switching processors among pro-
cesses; for scheduling; for sending messages or timing signals among
processes; and for creating and removing processes.
144 Introduction to Microprocessors and Microcontrollers
In some systems, the kernel is larger and provides for more than these
classes of functions, in others, it is smaller.
The actual types, number and size of the supporting hardware and the
software define the capabilities of the system (we call it easily computer
system) and they are used to classify the system as microcomputers, minicom-
puters, mainframe computers, or supercomputers. All microprocessor-based
system have a high RAM-to ROM ratio, with user programmes executing in
a relatively large RAM space and hardware interfacing routines executing in
a small ROM space.
The microcontroller, on the other hand, is the heart of any
microcontroller–based system (See Figure 3.13b) which we can easily
call an “embedded-system” (This does not mean that microprocessors or
single-purpose processors cannot be used to build embedded systems). Such
systems are single-function (executes a specific programme repeatedly), have
tightly constrained on the design metrics (for example, small size, consumes
low power, etc.), and they have real time reaction response. In such cases,
the supporting units are peripheral devices (e.g., stepper motor, display unit,
sensors, actuators, etc.). The type and the number of the peripherals depend
on the application. External RAM and/or EPROM may be needed in some
applications. Users of such products are quite often unaware of the existence of
microcontrollers: to them, the internal components are but an inconsequential
detail of design. Consider as examples microwave ovens, programmable
thermostats, electronic scales, digital cameras, and even cars. The electronics
within each of these products typically incorporates a microcontroller
interfacing to push buttons, switches, lights, and alarms on a front panel; yet
user operation mimics that of the electromechanical predecessors, with the
exception of some added features. The microcontroller is invisible to the user.
Unlike computer systems, Microcontroller — based systems have a high
ROM-to-RAM ratio. The control programme, perhaps relatively large, is
stored in ROM, while RAM is used only for temporary storage.
3.4 Microprocessor-Based and Microcontroller-Based Systems 145
Definition:
RISC, or Reduced Instruction Set Computer, is a type of microprocessor
architecture that utilizes a small, highly-optimized set of instructions, rather
than a more specialized set of instructions often found in other types of
architectures.
The current AVRs offer, in general, the following wide range of features,
“Taken from the data sheets of AVR”:
Programme Execution
Atmel’s AVRs have a single level pipeline design. The next machine in-
struction is fetched as the current one is executing. Most instructions take
just one or two clock cycles, making AVRs relatively fast among the eight-
bit microcontrollers. The AVR families of processors were designed for the
efficient execution of compiled C code.
The AVR instruction set is more orthogonal than most eight-bit microcon-
trollers, however, it is not completely regular:
• Pointer registers X, Y, and Z have addressing capabilities that are different
from each other.
• Register locations R0 to R15 have different addressing capabilities than
register locations R16 to R31.
• I/O ports 0 to 31 have different addressing capabilities than I/O ports 32
to 63.
• CLR affects flags, while SER does not, even though they are comple-
mentary instructions.
CLR set all bits to zero and SER sets them to one. (Note though, that
neither CLR nor SER are native instructions. Instead CLR is syntactic
sugar for [produces the same machine code as] EOR R,R while SER is
syntactic sugar for LDI R,$FF. Math operations such as EOR modify
flags while moves/loads/stores/branches such as LDI do not.)
Speed: The AVR line can normally support clock speeds from 0-16MHz,
with some devices reaching 20MHz. Lower powered operation usually re-
quires a reduced clock speed. All AVRs feature an on-chip oscillator, removing
the need for external clocks or resonator circuitry. Because many operations
on the AVR are single cycle, the AVR can achieve up to 1MIPS per MHz.
3.5 Practical Microcontrollers 149
AVR Groups: AVRs are generally divided into three broad groups:
• tinyAVRs
◦ 1–8 kB programme memory
◦ 8–20 pin package
◦ Limited peripheral set
• megaAVRs
◦ 4–256 kB programme memory
◦ 28–100 pin package
◦ Extended instruction set (Multiply instructions and instructions for
handling larger programme memories)
◦ Extensive peripheral set
• Application specific AVRs
◦ megaAVRs with special features not found on the other members of
the AVR family, such as LCD controller, USB controller, advanced
PWM etc.”
• Idle mode: The Idle mode stops the CPU while allowing the SRAM,
Timer/Counters, SPI port, and Interrupt system to continue functioning.
• Power-down mode: This mode saves the Register contents but freezes
the Oscillator, disabling all other chip functions until the next interrupt
or hardware reset.
3.5 Practical Microcontrollers 151
Harvard Von-Neumann
access external memory using Port 0 and Port 2, which act as multiplexed data
and address lines. Some of Port 1 and Port 3 pins also have a dual purpose,
providing connections to timers, serial port and interrupts. The 8031 was
a version of 8051 without internal ROM, the application programme being
stored in a separate EPROM.
The 8051 architecture has a very nice Boolean processing unit that allows
the user to address certain ranges of memory as bits as well as bytes. It is a
unique and often times useful feature.
Another great plus for the 8051 family is the large number of chips avail-
able. Many different manufacturers have 8051 based parts, and it isn’t too
difficult to find information on the web. There are chips with almost every
conceivable set of peripherals onboard. Some are readily available, some are
not. The user can get, for example, chips such as the 8052BASIC chip, which
have an onboard BASIC interpreter. 8051 has the biggest variety of micros on
earth.
The Intel 8051 is packaged in a 40-pin chip. It has 17 address lines,
including the PSEN line (programme store enable), an 8-bit data bus, and
an 8-bit arithmetic unit.
In addition to the 8-bit data bus, there are also three other 8-bit ports for
data exchange with external peripherals. It has Harvard architecture in which
the programme memory address space is separate from the data memory space.
Figure 3.16 shows the block diagram of the 8051. Port 0 functions as the normal
I/O bus, with ports 1 to 3 available for data transfers to other components of
the system in I/O-intensive applications.
The lower 4 kbytes of programme ROM are implemented on the processor
chip with the lower 128 bytes of data RAM. In addition, 21 special function
registers in data memory space also appear on the chip.
Although we shall not explore timer chips or chips that convert between
parallel and serial words until a later chapter, we shall mention here that
these functions are also implemented on the processor chip. Normally, a
separate timer chip or parallel / serial converter chip (universal asynchronous
receiver/transmitter or UART) must be used. The two timers on the 8051 can
be programmed to count the number of machine cycles beginning at some
specified point in a programme or to count external events represented by the
assertion of either of two input pins. When counting machine cycles, accurate
time periods can be established, as these cycles are referenced to the master
clock.
The 21 special function registers (SFRs) relate to several different
functions of the 8051. Several registers are used in connection with the
3.5 Practical Microcontrollers 153
timer/counters and the serial data port. An accumulator and a B register are
available for data manipulation. A stack pointer, a data pointer, and a pro-
gramme status word register are also included in this group of special function
registers. Interrupt enable, interrupt disable, and priority information are also
controlled from registers in this group. Another important set of registers is
associated with the ports, which contain information relating to the definition
of function and the direction of the port pins.
Many instrumentation applications use the 8051 as a stand-alone chip.
With sufficient on-chip programme memory, minimum data memory systems
can be implemented without peripheral chips. The internal timer and serial
port allow this device to solve a wide variety of instrumentation problems.
The accumulator, or A register, is used for the microcontroller’s normal
arithmetic and logic operations. Only unsigned, binary, integer arithmetic is
performed by the arithmetic unit. In two-operand operations, such as add, add-
with-carry, and subtract-with-borrow, the A register holds the first operand and
receives the result of the operation. The multiply operation causes the 1-byte
contents of register A to be multiplied by the 1-byte contents of register B. The
result is a 16-bit number with the most significant byte in register B and the
154 Introduction to Microprocessors and Microcontrollers
least significant byte in register A. The divide operation divides the contents
of register A by the contents of register B. The integer quotient is contained
by register A, with a 1-byte remainder contained in register B.
Figure 3.18 Memory maps: (a) internal data RAM registers and (b) special-function registers.
bank can then be addressed with only 3 bits. This approach minimizes the
number of bits to the extent, in some cases, that an 8-bit instruction word can
carry both the op code and the register address. A single instruction can be
used to set the proper bits of the PSW, thereby selecting the desired bank. All
register references now apply to the selected bank, until a new bank is selected
by modifying the PSW.
The stack can be located anywhere within the internal data RAM and is
thus limited in depth to the number of consecutive unused locations of internal
data RAM.
In addition to allowing any of the 128 byte locations of RAM to be ac-
cessed, the internal data RAM allows certain locations to be addressed as bits.
3.5 Practical Microcontrollers 157
The 16 byte locations from 32 through 47 contain a total of 128 bits. These
bits are numbered from 0, starting with the LSB of location 32 and proceeding
to number 127, which is the MSB of location 47. Whenever a bit instruction
is used, the bits direct address is specified in an 8-bit code.
The 20 SFRs occupy 21 locations of the remaining 128 address spaces.
Figure 3.19(b) shows the locations of the SFRs. If any other locations between
128 and 255 — except those of the SFRs — are read from, unknown data will be
accessed. The SFRs can be accessed directly, but in many cases, the instruction
word implies a specific register. For example, add instructions imply the use
of the accumulator (register A), and the multiply instruction implies the use
of registers A and B.
Every SFR located at an address that is a multiple of 8 is bit addressable,
which means that bytes 128, 136, 144, 152, 160, 168, 176, 184, 208, 224, and
240 are bit addressable. These bits are numbered from 128 to 255, beginning
with the LSB of location 128 and proceeding through the MSB of location 240.
The SFRs are given in the list given in Figure 3.20.
The registers that are both byte- and bit- addressable are A, B, PSW, P0,
P1, P2, P3, IP, IE, TCON, and SCON.
Figure 3.19 Bit addresses: (a) internal data RAM and (b) SFR group.
158 Introduction to Microprocessors and Microcontrollers
The memory map of Figure 3.18 shows an external data memory space
consisting of 64 kbytes. For accesses to addresses under 256 of external RAM,
the RAM address can be contained in R0 or R1 of the internal RAM bank.
For addresses requiring 16 bits, this value is placed in the DPTR register. The
MOVX instruction will either drive the address bus with the 8-bit address
contained in R0 or R1 or the 16-bit address contained in DPTR, depending on
the instruction word.
A or ACC Accumulator
B B register
PSw Programme status word
SP Stack pointer
DPTR Data pointer (2 bytes)
P0 Port 0
P1 Port 1
P2 Port 2
P3 Port 3
IP Interrupt priority
IE Interrupt enable
TMOD Timer/counter mode
TCON Timer/counter control
TH0 Timer/counter 0 (high byte)
TL0 Timer/counter 0 (low byte)
TH 1 Timer/counter 1 (high byte)
TL 1 Timer/counter 1 (low byte)
SCON Serial control
SBUF Serial data buffer
PCON Power control
4.1 Introduction
Assembly language of any processor compromises of a set of (single-line)
instructions. Each instruction represents a single operation that can be per-
formed by the processor (can be microprocessor, microcontroller, or special
purpose). In a broader sense, an “instruction" may be any representation of an
element of an executable programme, such as a bytecode. The collection of the
instructions that can be performed by a given processor is called “instruction
set”. Assembly language programme is a sequence of instructions selected
to achieve a required task. The assembly programme must be converted to
object code in the form of machine code (machine language) by means of an
assembler programme. The system stores (or the programmer download) the
machine code in the memory of the system for execution.
Most instruction sets includes operations related to:
• Data Movement (Data Transfer)
– Set a register (a temporary “scratchpad" location in the CPU itself)
to a fixed constant value.
– Move data from a memory location to a register, or vice versa. This
is done to obtain the data to perform a computation on it later, or to
store the result of a computation.
– Read and write data from hardware devices.
161
162 Instructions And Instruction Set
• Input Output:
This operation issues command to I/O modules. If memory-mapped I/O,
it determines the memory mapped address.
• Graphics, multimedia and DSP:
These are special instructions optimized to speed up the processing of
the data in the field of graphics, multimedia and DSP. For example, multi-
media instructions are used to accelerate the processing of different forms of
multimedia data. These instructions implement the ISA concept of subword
parallelism, also called packed parallelism or microSIMD parallelism.
• Saving many registers on the stack at once.
• Moving large blocks of memory.
• Performing an atomic test-and-set instruction.
Table 4.1 lists of common types of instruction in each category. Depending
on the nature of a particular instruction, it may assemble to between one to
three (or more) bytes of machine code.
MOV 4 Bh, 26 h
164 Instructions And Instruction Set
In this instruction MOV represents the op-code part and indicates move
operation, i.e. it is used to move data from one location (source) to another
location (destination). The rest of the instruction represents the specifier part;
in this case it represents addresses of two operands one stored at address 26 h
(source address) and the other at address 4 Bh (destination address). We note
here that the op-code is separated from the first operand by a space, and the
operands are separated by commas.
MOV DS, AX
where the value in register operand ‘AX’ is to be moved into register ‘DS’.
Depending on the instruction, there may be zero, one, two, or more operands.
166 Instructions And Instruction Set
through in the processor, is called the execution path of the instruction or the
instruction cycle. The number of stages in the execution path is an architec-
tural feature which can be changed according to the intended exploitation of
instruction level parallelism. In its simplest form, the execution path consists
of two phases:
• Fetch: In this phase the processor reads (fetch) instructions from memory,
one at a time, and copied it into the control unit.
• Execute: This is the phase in which the system executes the instruction.
Normally, the instruction cycle is called fetch-encode-execute. In our sim-
ple form, the encoding is part of the execute cycle. Programme execution
consists of repeating the process of instruction fetch and instruction execution.
The instruction fetch as well as the instruction execution may involve several
operations and depends on the nature of the instruction (e.g. depends on the
addressing mode, operation code etc.).
The processing required for a single instruction is called an instruction
cycle. Using the simplified two-step description given previously, the instruc-
tion cycle is shown in Figure 4.2. The two steps are referred to as the fetch
cycle and the execute cycle. Programme execution halts only if the machine is
turned off, when some sort of error occurs or when a programme instruction,
which halts the computer, is encountered.
At the beginning of each instruction cycle, the processor fetches an instruc-
tion from memory. The programme counter (PC) normally holds the address of
the instruction to be fetched next. Unless told otherwise, the processor always
increments the PC after each instruction fetch so that it will fetch the next
instruction in sequence. The fetched instruction is loaded into a register in the
processor known as the instruction register (IR). The operation code part of the
instruction specifies the action the processor is to take. The processor interprets
the instruction (by using instruction decoder) and performs the required action.
The action, as mentioned before, can involve a combination of data transfer,
4.2.4 Labels
Many of the jump and call instructions take an operand which is actually a
number representing a destination address. However, in most programming
situations it is more convenient to replace this by a label, which is a name
given to an address. During the assembly process the appropriate number is
calculated and put into the object code. The use of labels enables changes to
be made to the programme without recalculating the numerical values.
IR_LD Instruction
IR_OE register Program PC_INC
reset counter PC-OE
Instruction
decoder
Read-only ROM_OE
Control memory
bus Controller
ALU_OE
S1 ALU
Output OR_LD
S2
S3 register
MDR_LD
Memory
data
register Control bus
Data bus
Address bus
Clock
Reset
by Output Enable (OE) and Load (LD) signals generated by the control unit
(the controller).
A processor’s instruction set specifies what transfers between registers
(working registers, memory registers, and I/O registers) and what transfor-
mations on data in the processor’s registers are possible with the processor’s
architecture.
The action of an instruction can be described by identifying the data trans-
fers needed to do the requested work. The specification of this work is done
by a register transfer language (RTL) (also called register transfer notation
RTN). RTL descriptions specify the order of register transfers and arithmetic
action required to carry out the work of an instruction. This information can
then be utilized by the designer of the control system to identify the order of
activation of control lines to actually cause the desired transfers. This goes out
with the strategy of splitting any processor into two sections: datapath and
controller. The RTL specifications can be used to design the datapath section
which includes identifying the data storage spaces (registers and memory), the
arithmetic/logic capabilities and the interconnections required to guarantee the
flow of the data between the different parts. The design of the control section
is then carried out so that the timing requirements of the system are met.
A register transfer language can become as simple or as complex as needed
to specify the transfers required in the system. Since we will be using in this
chapter an RTL to describe the actions of the instructions, we will describe
here a few of the basic operations of RTL.
The basic operation is the transfer of the contents of one register, to another:
[Rd] ← [Rs]
This specifies that the contents of register Rs (source register) are transferred
to register Rd (destination register). For example, [MAR] ← [PC] means that
the contents of the programme counter (PC) are transferred to the memory
address register (MAR). If the data paths of the system are rich enough to
allow multiple operations in the same time period, these can be represented
by specifically linking the transfer together:
[PC] ←[PC] + 1
[IR] ← [MAR]
It identifies that in the same time period the value of the PC is incremented
and the contents of the MBR are transferred to the IR. Normally, all of the
information is involved in the transfer. However, if a subset of the information
4.3 Describing the Instruction Cycle: Use of Register Transfer Language (RTL) 171
is to be transferred, then the specific bits are identified by the use of pointed
brackets:
ALU ← IR 3 : 0
specifies that bits 3 to 0 of the instruction register are directed to the ALU. As
another example, MAR ← IR <adr> means the address part of the instruction
is directed to the MAR.
Similarly, locations of memory or a set of registers are specified with
square brackets:
M[(MAR)] ← REG[Rs]
indicating that the contents of register Rs in a general register set (REG) is
transferred to location in memory identified by the memory address register.
The same action may be described as:
M[(MAR)] ← [Rs]
For operations that are conditional in nature, we include normally an “If”
facility patterned after the C language if construct:
If (carry == 1) [PC] ← [PC] + a
Else [PC] ← [PC] + 1
Identifying that if the carry is equal to 1, the programme counter is adjusted
by a factor “a”; otherwise the programme counter is incremented.
Table 4.2 gives summary of the RTL notations that are used in explaining
the instruction cycle.
example, consider the add instruction ADD X. Addition needs two operands.
As we shall see later, the instruction ADD X is a single-address instruction.
It gives the address of one of the two operands in the memory and considers
implicitly that the second operand is the contents of the accumulator. Accord-
ingly, the instruction ADD X means, add the contents of memory location
X to the value stored currently in the accumulator and store the result in
the accumulator. We will further assume that the address X is adequately
contained in the instruction itself (direct address mode), thus no additional
information beyond the instruction will be required. With those assumptions,
a set of data transfers that will perform the work of the addition instruction
follows (Figure 4.4).
The fetch cycle of this instruction is identical to most of the fetch cycles of
other instructions: get the instruction and bring it into the instruction register
(IR), then increment the programme counter to prepare for the next instruction
in the programme. Figure 4.5 provides the block diagram of that part of the
CPU which is involved in the fetch part of the instruction cycle, i.e. it shows
the address paths and the paths needed to read an instruction from memory.
The real work begins in step 5, where the address of the operand is trans-
ferred to the MAR. The intended operand of the instruction, the value stored
at location X, is then transferred (step 6) to the MBR. Since the address is
contained in the instruction, the value of X needed for step 5 can come from
either the instruction register or the MBR. Finally the value is added (step 7)
to the value currently in the accumulator, and the result left there. Figure 4.6
shows the address and data paths during the fetch and execute phases of the
instruction.
4.3 Describing the Instruction Cycle: Use of Register Transfer Language (RTL) 173
Incrementer
Address
Memory
Step 3: PC [PC] + Inc
Data
:
Op-code Address
Address paths
Control Signals
Figure 4.5 Address and data paths during the fetch phase.
The above example shows that the fetch cycle shown in Figure 4.2, consists
of more than one step (four steps in the example) and similarly the execute
cycle (three steps in the example). Figure 4.7, accordingly, gives a more re-
alistic description for the instruction cycle. The fetch cycle is approximately
the same for most of the instructions, while the steps included in the execute
cycle change from one instruction to the other. The changes depend on the
number of addresses included in the instruction and also on the address mode
(see later for addressing modes).
Figure 4.7 is given in the form of a state diagram. The states included,
represents all the steps given in Figure 4.6, but in a more general form. For
any given instruction cycle, some states may be 0 and other may be visited more
than once. For example, if the instruction contains more than one operand, the
path between operand address calculation and operand fetch must be visited a
number of times equal to the number of operands required by the instruction.
The states can be described as follows:
• Instruction address calculation: determine the address of the next in-
struction to be executed. Usually, this involves adding a fixed number to
the address of the previous instruction. For example, if each instruction
is 16-bits long and memory is organized into 16-bit words, then add 1
to the previous address. If, instead, memory is organized as individually
addressable 8-bit bytes, then add 2 to the previous address. Considering
Figure 4.4, this means that Inc of step 4 is 1 or 2 respectively.
174 Instructions And Instruction Set
Incrementer
Address
Data
Accumulator
Control Unit
Clock
Figure 4.6 The CPU address and data paths during the fetch and execute phases.
• Instruction fetch: read instruction from its memory location into the
processor.
• Instruction operation decoding: analyze instruction to determine type
of operation to be performed and operand(s) to be used.
• Operand address calculation: if the operation involves reference to an
operand in memory or variable via I/O, then determine the address of the
operand.
• Operand fetch: fetch the operand from memory or read it from I/O.
• Data operation: perform the operation indicated in the instruction.
• Operand store: write the result into memory or out to I/O.
The instruction cycle of Figure 4.7 deals with instructions required to
work in machines where work is defined as arithmetic or logic operations.
4.4 Instruction Classifications According to Number of Operands 175
Multiple Multiple
operand result
Such actions, i.e. calculation of values, cover only one type of operation that
computers must provide. In addition to computing, a machine must be able
to make decisions, transfer information, and control devices. The instruction
cycle in this case, a case where instructions are oriented toward input/output
operations and used to control the programme flow, will be considered at a later
stage. Table 4.3 summarizes the instruction cycle steps of some instructions.
TOS
ALU
Accumulator
ALU
Memory
Example 4.1
The instruction ADD will add the top of stack to the next top of stack and
stores the result at the top of stack.
1-operand (“one address machines”) — often called “accumulator
machine”:
This class includes most early computers, with the implied address being
a CPU register known as the accumulator (AC). The accumulator contains one
of the operand and is used to store the result, see Figure 4.9. The 1-address
instruction takes the general form:
Op dst
In RTL means:
ALU
Memory
OP dst, src
In RTL:
[dst] ← [dst] OP [src]
Example 4.3 For a RISC machine (requiring explicit memory loads), the
instructions would be:
3-operand
In this case we have two source operands and one destination operand.
Some CISC machines fall into this category. In such a case (case of CISC)
the source and destination locations can be at the memory or any register. The
instruction takes the general form:
In RTL:
[dst] ← [src1] OP [src2]
Three-address instruction formats are not common, because they require
a relatively long instruction format to hold the three address references.
4.4 Instruction Classifications According to Number of Operands 179
ALU
3-operand RISC
Most RISC machines fall into this category, because it allows “better reuse
of data". In a typical three-operand RISC machines, all three operands must
be registers, so explicit load/store instructions are needed (See Figure 4.11).
An instruction set with 32 registers requires 15 bits to encode three register
operands, so this scheme is typically limited to instructions sets with 32-bit
instructions or longer.
4- or more operands
Some CISC machines permit a variety of addressing modes that allow more
than 3 operands (registers or memory accesses), such as the VAX “POLY”
polynomial evaluation instruction. In case of 4-Address (as an example), the
fourth operand is either address of next instruction (obsolete technique) or for
a certain reason, the instruction requires more than three operands (e.g. integer
division produces quotient and remainder).
Many of the modern processors (especially CISC processors) have many
instruction formats. They can handle 3-, 2- and 1-address format. Table 4.4
summarizes the interpretations to be placed on instructions with zero, one, two
and three addresses. In each case in the table, it is assumed that the address of
the next instruction is implicit and that one operation with two source operands
and one result operand is to be performed. In Table 4.4 we used AC to represent
accumulator, TOS to represent the top of the stack, (TOS-1) the contents of
second element of stack, and OP as the operation to be performed.
Example 4.4
This example helps the reader to compare the different types of instructions
and to understand the use of RTL. Consider the case of using 3-, 2-, 1-, and
180 Instructions And Instruction Set
X = (A + B) ∗ (C + D)
Instruction in
Assembly
Language RTL ;Comment
MOV R1, A [R1]←[A] ; Move the contents of memory
; location A to register R1
ADD R1, B [R1]←[B]+[R1] ; Add the contents of memory
; location A to the contents register
; R1 and store the result at R1
MOV R2, C [R2]←[C] ; Move the contents of memory
; location C to register R2
ADD R2, D [R2]←[D]+[R2] ; Add the contents of memory
; location D to the contents register R1
; and store the result at R2
MUL R2, R1 [R2]←[R1]×[R2] ; Multiply the contents of R1 by the
; of R2 and store the result at R2
MOV X, R2 [X]←[R2] ; Move the contents of register R2
; to memory location X.
Note: Intel MCS 51 family instructions are used.
From this example, it is clear that with 2-address instructions, one address
must do double duty as both an operand and a result. The instruction ADD
R1,B thus carries out the calculation A + B and stores the result in R1. The
2-address format reduces the space requirement, but also introduces some
awkwardness. To avoid altering the value of an operand, a MOV instruction is
used to move one of the values (C) to a temporary location before performing
the next operation. The sample programme expanded into six instructions.
The following are some of the AVR instructions that use 2-operand format.
Instruction in
Assembly
Language RTL ;Comment
ADD Rd, Rr [Rd]←[Rd]+[Rr] ; add the contents of register Rd
; to that of register Rr and store
; the result at register Rd
AND Rd, Rr [Rd]←[Rd] AND [Rr] ; AND the contents of register Rd
; and the contents o register Rs
; and store the result at Rd.
SUB Rd, Rs [Rd]←[Rd]−[Rs] ; subtract the contents of register Rs
; from that of register Rd and
; store the result at register Rd.
182 Instructions And Instruction Set
PUSH C
PUSH D
ADD
MUL
POP X
The contents of the stack during the execution of this programme are shown
before in Example 2.2.
Operand = [EA];
In case of computer the need for different addressing modes arises from the
fact that the address field or fields in a typical instruction format are relatively
small for most computers. Any programmer would like to be able to reference
a large range of locations in main memory or, for some systems, virtual
memory. To achieve this objective, a variety of addressing techniques has been
184 Instructions And Instruction Set
Memory
*Immediate
Instruction
*Register Based
*Register Direct +
*Indirect
*Register Indirect
Registers
Figure 4.12 Various addressing modes.
employed. They all involve some trade-off between address range and/or ad-
dressing flexibility, on the one hand, and number of memory references and/or
complexity of address calculation, on the other. Effective-address calculations
establish the paths to operands of instructions. Operands can be accessed by
immediate, direct, indirect, or register-based addressing, and by combinations
of these. The different forms of addressing are illustrated in Figure 4.12. In
this section, we examine the most common addressing techniques:
• Immediate
• Direct or Absolute
• Memory Indirect
• Register indirect
• Displacement
• Stack
While explaining the addressing modes we are going to take, in the
majority of the cases, our examples from the instruction set of AVR mi-
crocontroller. The AVR Enhanced RISC microcontroller supports powerful
4.6 Immediate Addressing Mode 185
and efficient addressing modes for access to the Programme memory (Flash)
and Data memory (SRAM, Register file, I/O Memory, and Extended I/O
Memory).
Before beginning the discussion, we have to mention that all processors
(microprocessors/microcontrollers) architectures provide more than one of
these addressing modes. Several ways are used to let the control unite de-
termines which address mode is being used in a particular instruction. One
approach is to allocate one or more bits in the instruction format to be used
as a “mode field”. The value of the mode field determines which addressing
mode is to be used.
In our text, as many other texts, we are using the word “Effective address”.
In a system without virtual memory, the effective address (EA) means either a
main memory address or a register. In a virtual memory system, the effective
address is a virtual address or a register. The actual mapping to a physical
address is a function of the paging mechanism and is invisible to the program-
mer. In all microcontroller based systems we do not use the concept of virtual
memory.
In our discussions we use the following notations:
A = Contents of an address field in the instruction.
R = Contents of an address field in the instruction that refers to a register.
[X] = Contents of memory location X or register X.
EA = Effective address.
OP = the word OP means the operation code part of the instruction word.
To simplify, not all figures show the exact location of the addressing bits.
To generalize, the abstract terms RAMEND and FLASHEND have been used
to represent the highest location in data and programme space, respectively.
Operand = A
186 Instructions And Instruction Set
Op-Code Operand
Figure 4.13 Immediate addressing mode.
The sizes allowed for immediate data vary by processor and often by
instruction (with some instructions having specific implied sizes). In some
microprocessors assembly language (e.g. Motorola 68 K series, Intel 8085,
etc.), the symbol # precedes the operand to indicate immediate addressing.
In other microprocessors/microcontrollers the instruction contains the letter
“I” to indicate the immediate mode. For example in ATMEL AT90S8515
series, the instruction “AND” is a logical AND, while ANDI is logical AND
with immediate. The four instructions given in Figure 4.14(a) demonstrate
how absolute and immediate addressing modes are represented in Motorola
68 K assembly language and in RTL, respectively. Figure 4.14(b) demon-
strates the case of Atmel AVR microcontroller and Figure 4.14c is the case of
Intel 8051.
All the numbers in Figure 4.14 are represented in base 10 (decimal). Any
base can be used to represent the numbers. The symbol #, in case of using it,
is not part of the instruction. It is a message to the assembler telling it to select
that code for “move data” that uses the immediate addressing mode. This is
exactly as when we use the symbols $, H, % or B to indicate the base used to
represent the numbers.
4.6 Immediate Addressing Mode 187
Note: AVR can deal with full word of 16 bits in the immediate mode. As
an example consider the following:
adiw r25:24,1 ; Add 1 to r25:r24
adiw ZH:ZL,63 ; Add 63 to the Z-pointer
; (r31:r30)
Example 4.9
The instruction SBIW
EA = A
15 5 4 0 I/O Memory
OP d 0
31
15 9 54 0 REGISTER FILE
0
OP r d
31
15 5 0 I/O Memory
0
OP n P
63
31 20 19 16 SRAM
$0000
OpCode Rr/Rd
SRAM address
15 0
$FFFF
IR IR
OP
OP EA
EA Operand Operand
Assembly
Language RTL Description
LD Rd, X [Rd]←M[(X)] Loads one byte indirect from
SRAM to register. The SRAM
location is pointed to by the
contents of the X pointer register.
I CALL PC<15-0>←Z<15-0> Indirect call of a subroutine pointed
to by the Z pointer register
IJMP PC<15-0>←Z<15-0> Indirect jump to the address pointed
to by the Z pointer register
In case of AVR the operand address is the contents of the X, Y or the Z-register.
As Figure 4.21 shows, for indirect data addressing the operand address is
the contents of the X-, Y-, or Z- register. The source or destination register
(needs 5 bits) of that operation is contained in the instruction word. An example
of indirect SRAM addressing is the instruction:
Ld r4, Z
4.8 Indirect Addressing Mode 195
15 0 Data Space
$0000
X, Y or Z- REGISTER
$FFFF
Figure 4.20 Data Indirect addressing.
31 20 19 16 SRAM
$0000
OpCode Rr/Rd
15 0
address
$FFFF (RAMEND)
Figure 4.21 Indirect addressing mode using index register.
When executing this instruction, register r4 will be loaded with the contents
of the SRAM location defined by the contents of the Z-register.
Many processors supports more than one variation on address register indirect
addressing. The possible variations are:
ATMEL AT90S8515 series, ATMega series, for example, support the first
two modes and the last mode. In the following we consider these variations
on the indirect register addressing. We are going to consider that address reg-
ister indirect with displacement as one of the possible forms of displacement
addressing.
196 Instructions And Instruction Set
EA = [Y] − 1
The original contents of the X-, Y-, or Z-register are replaced by the decre-
mented value after that operation. The source or destination register (5 bits)
of that operation is contained in the instruction word. Figure 4.22 shows this
addressing mode.
This mode is useful for a loop where the same or similar operations are
performed on consecutive locations in memory. This address mode can be
combined with a complimentary postincrement mode for stack and queue
operations.
An example of indirect SRAM addressing with pre-decrement is the
instruction:
Ldd r4, Y-
When executing this instruction, register r4 will be loaded with the contents
of the SRAM location defined by the contents of Y-register decremented by 1.
After that operation, the Y-register contains the decremented value.
Op Code n SRAM
$FFFF
15 0
-1
Note:
1. In all cases: 0 ≤ r ≤ 31 and PC ← PC + 1
2. Since r26 and r27 represent the low byte and the high byte of X pointer, the
result of the following combinations is undefined:
ST X+, r26
ST X+, r27
ST -X, r26
ST -X, r27
Example 4.17
Data space
Designated register
Address (A)
Displacement
Instruction
Opcode d
EA = [Y- or Z-register] + q
Register r4 will be loaded with the contents of the SRAM location defined
by the contents of the Y-register incremented by 2.
4.11 Programme Memory Addressing 201
Instruction Displacement
Op-code n q
15 10 6 5 0 $FFFF
Figure 4.25 AVR register indirect with displacement: The referenced register contains a
memory address and the address field contains displacement.
(16 bits) pointer register in the register file. Memory is limited to 4 K words.
The 15 MSBs of the Z-register select word address (0 − 2 K/4 K), the LSB
selects low byte if cleared (LSB = 0) or high byte if set (LSB = 1). Figure 4.27
shows addressing programme memory with instruction lpm. The Branch/Jump
and Call instructions are also accessing the programme memory. This group
of instructions will be discussed while discussing Branch and Call.
Not all variants of the LPM instruction are available in all devices. Refer
to the device specific instruction set summary. The LPM instruction is not
implemented at all in the AT90S1200 device.
The result of these combinations is undefined:
LPM r30, Z+
LPM r31, Z+
Example 4.20
ldi ZH, high(Table_1<<1) ; Initialize Z-pointer
ldi ZL, low(Table_1<<1)
lpm r16, Z ; Load constant from
; programme
; Memory pointed to by Z
; (r31:r30)
...
Table_1:
.dw 0x5876 ; 0x76 is addresses when
; ZLSB = 0
; 0x58 is addresses when
; ZLSB = 1
...
Example 4.21
Example 4.22
a. Case of Jump/Branch
The simplest instructions to deal with are those that change the programme
flow without any side effects. As we have indicated before, the assumed
address for the next instruction to execute identities the location immediately
following the current instruction. That is, normal programme behavior calls
for the programme counter to be incremented from one instruction to the next.
When the next instruction to execute is not the next one in the memory, then the
programme counter must be modified accordingly. The programme counter
must be changed to identify the appropriate instruction to be fetched next. The
terminology used by many manufacturers that a programme counter change
that uses direct addressing mechanisms is called a “jump”. and a programme
counter change that identities its target address as an offset from the current
location (PC relative) is a “branch”.
The jump/branch instruction is very straight forward: the target address
is identified, and the programme counter is changed accordingly. The target
address can be specified by combinations of the various addressing modes that
we have already identified. The system operation changes somewhat when
the branch is made conditional. In this situation, the contents of the PC at the
completion of the branch instruction are dependent upon some system status
condition or on some comparison identified by the instruction. The conditions
may include the status bits contained within the status register of the machine.
In all the cases, if the proper conditions are satisfied, the PC contents are
modified to allow the programme to continue at an address identified by the
instruction. If the conditions are not satisfied for modifying the programme
flow, then the programme counter is incremented in the normal fashion and
execution of the programme continues with the next instruction in the normal
order of execution. These programme counter modification mechanisms are
demonstrated by the following example.
Normally jumps can use any of the appropriate addressing modes to
identify the target address. The target address is identified, by whatever
206 Instructions And Instruction Set
push a new return address onto the stack, recursive routines can be utilized as
well. For these reasons, the stack method for establishing subroutine linkage
is used by many systems.
The instructions discussed thus far have included mechanisms for doing data
transformations (arithmetic and logical instructions), mechanisms for passing
information (moves, etc.), and mechanisms for controlling the actions (pro-
gramme control instructions). One of the areas not covered yet is the transfer
of information to and from an external device (peripheral).This is generally
called input/output processing (I/O) and normally involves more than transfer
of data. Additional requirements include such things as testing of conditions
and initiating action in an external device. Some of the I/O programming is
in response to an external event signaling the processor that a device needs
to be serviced. This signaling process is called an interrupt, and the processor
responds to the interrupt in a predetermined fashion. If the signaling result
from conditions detected internal to the processor (e.g. some form of overflow
signals), sometime we call them traps instead of interrupts.
210 Instructions And Instruction Set
I/O processing has evolved from the very simple capabilities of the first
machines to sophisticated mechanisms used in some machines available today.
In its simplest form, I/O transfers data to or from an addressed device under the
direct control of the processor. This concept is called programmed I/O. This
concept uses specific instructions for input and output. Some of the micropro-
cessors that use this concept include an additional control signal to indicate
that the address appearing on the address lines is to identify an I/O device
address, rather than a memory address. However, another technique, called
memory mapped I/O, is perhaps more widely utilized. With this method, I/O
devices are assigned specific locations in the address space of the processor,
and any access to that address actually results in an I/O transfer of some
kind. The memory mapped I/O scheme has the advantage that no special I/O
instructions are required in the instruction set, and this reduces the complexity
of the instruction decode mechanism. In addition, devices attached to the
processor need not decode special signals to differentiate between memory
and I/O requests. However, the fact that I/O instructions are included in an
instruction set does not prevent the use of memory mapped I/O techniques in
a system. The user of the system can decide which technique would be most
appropriate for the goals of that particular implementation.
Concerning the responsibility signaling the processor that some peripheral
device needs service and also the method by which information can be trans-
ferred between a processor and a peripheral device, we can recognize three
techniques:
• Polling
• Interrupt, and
• Direct Memory Access (DMA).
In polling the machine will poll the I/O device until either the device has
information for the system, or the I/O device can accept data from the system.
The polling is done by reading the status register of the I/O device. When
the status register indicates that transfers can occur, they are performed by
writing/reading the appropriate memory location.
The polling mechanism is extremely inefficient in many circumstances. For
example, if the processor issued a command to a tape drive to seek a particular
file on a tape, a very long time will pass between issuing the request and having
the device respond with the desired results. With the polling technique, the
capabilities of the system are not available for anything else during the seek
time. Therefore, it is more efficient to have the I/O device send a signal to the
system when the action of a command (in this case the seek action) has been
4.14 I/O and Interrupts 211
iI
Interrupt
Occurs here i+1
M
Figure 4.32 Transfer of control via interrupt.
Interrupts
disabled
HALT
The processor now proceeds to the fetch cycle and fetches the first in-
struction in the interrupt handler programme, which will service the interrupt
(this is why it is normally called interrupt service routine ISR). The inter-
rupt handler programme is generally part of the operating system (in case
of microcontroller-based systems, the ISR are part of the contents of the
programme memory). Typically, this programme determines the nature of
the interrupt and performs whatever actions are needed.
With polling (or programmed I/O) and interrupt I/O, the processor is
responsible for extracting data from main memory for output and storing
data in main memory for input. In some cases, it is desirable to allow I/O
exchanges to occur directly with memory. In such a case, the processor grants
to an I/O module the authority to read from or write to memory, so that the
I/O-memory transfer can occur without tying up the processor. During such
transfer, the I/O module issues read or write commands to memory, relieving
the processor of responsibility for the exchange. This operation is known as
direct memory access (DMA).
This must be handled by the application programme. The Stack Pointer uses
a pre-increment scheme during RETI.
Example 4.25
...
extint: push r0 ; Save r0 on the Stack
...
pop r0 ; Restore r0
reti ; Return and enable
; interrupts
4.1 What are the relative advantages and disadvantages of one-address, two-
address, and three-address instruction format?
4.2 What is a register-to-register architecture and why is such architecture
also called a load and store computer?
4.3 What are the three fundamental addressing modes? Are they all
necessary? What is the minimum number of addressing modes required?
4.4 What does the RTL expression [100] ← [50] + 2 mean?
4.5 What does the RTL expression [100] ← [50 + 2] + 2 mean?
4.6 What is the operand?
4.7 In the context of an instruction register, what is the meaning of a field?
4.8 What is a literal operand?
4.9 What is the difference between a dedicated and a general-purpose
register?
4.10 Why is the programme counter a pointer and not a counter?
4.11 Some machines have a one-address format, some a two-address format,
and some a three-address format. What are the relative merits of each
of these instruction formats?
4.12 Describe the action of the following assembly language instructions
in RTL. That is, translate the assembly language syntax of the 8051
processor instruction into the RTL notation that defines the action of the
4.16 Review Questions 215
instruction:
(i) MOVE 3000,4000 (ii) MOVE #12,(A0)
(iii) MOVE #4000,5000 (iv) ADD (A3), 1234
4.13 Give examples (if available) of valid Intel Motorola 68K and AVR
instructions that use:
(i) Register-to-register addressing
(ii) Register-to-memory addressing
(iii) Memory-to-register addressing
(iv) Memory-to-memory.
4.14 An address field in an instruction contains decimal value 14. Where is
the corresponding operand located for:
(i) Immediate addressing? (ii) Direct addressing?
(iii) Indirect addressing? (iv) Register addressing?
(v) Register indirect addressing?
4.15 Let the address stored in the programme counter be designated by the
symbol X1. The instruction stored in X1 has an address part (operand
reference) X2. The operand needed to execute the instruction is stored
in the memory word with address X3. An index register contains the
value X4. What is the relationship between these various quantities if
the addressing mode of the instruction is (a) direct; (b) indirect; (c) PC
relative; (d) indexed?
This page intentionally left blank
5
Machine Language and Assembly Language
5.1 Introduction
217
218 Machine Language and Assembly Language
numbers into your computer’s memory and run them under MS-DOS, you
would see a dollar sign placed in the lower right hand corner of your screen,
since that is what these numbers tell the computer to do.
The numbers given above, accordingly, represent a language that is un-
derstandable by the computer (but it is difficult for the human to remember).
It is actually one form of what we call “Low-level Programming Language”.
In computer science, a low-level programming language is a language that
provides little or no abstraction from a computer’s instruction set architecture.
The word “low” refers to the small or nonexistent amount of abstraction
between the language and machine language; because of this, low-level lan-
guages are sometimes described as being “close to the hardware.” It deals
directly with registers, memory addresses, stacks and other components that
are directly related to the hardware of the computer.
The low-level languages which made up of numbers are normally called
the “object code” or “machine language” programme. On the other hand
the programme which is written using alpha-numeric characters (more
understandable for the programmer) is called the “source code”.
Low-level languages are directly understandable by computer and, accord-
ingly do not need a compiler or interpreter to run; the processor for which the
language was written is able to run the code without using either of these.
High-Level Language
;**************************************************
; Directive Test
;**************************************************
.DEVICE AT90S8515 ;Prohibits use of non-
;implemented
;instructions
.NOLIST ;Disable listfile
;generation
.INCLUDE "8515def.inc" ;The included files will
;not be shown
;in the listfile
.LIST ;listfile generation
rjmp RESET ;Reset Handle
;**************************************************
.EQU tab_size = 10 ; Set tab_ize to 10
.DEF temp = R16 ; Names R16 as temp
.SET io_offset =0x23 ; Set io_offset t0 0x23
;.SET porta=io_offset+2 ; Set porta to io_offset
; +2 (commented
; because defined in
; 8515def.inc)
.DSEG ; Start data segment
Lable:.BYTE tab_size ; reserve tab_size bytes
; in SRAM
.ESEG ; Start EEPROM segment
Eeconst: .DB 0xAA, 0x55 ; Defines constants
.CSEG
RESET: ser temp ; Initializes temp (R16)
; with $FF
Out porta, temp ; Write constants of
; temp to Port A
Ldi temp, 0x00 ; Load address to EEPROM
; address register
Out EEAR, temp ;
5.2 Directives: Pseudo-Instructions 221
5.2.1 Macros
Macros enable the user to build a virtual instruction set from normal Assembler
instructions. You can understand a macro as a procedure on the Assembler
instruction level.
The MACRO directive tells the Assembler that this is the start of a macro
and takes the macro name as its parameter. When the name of the macro
is written later in the programme, the macro definition is expanded at the
place it was used. A macro can pond up to 10 parameters. These parameters
are referred to as @0–@9 within the macro definition. When a macro call
is issued, the parameters are given as a comma-separated list. The macro
definition is terminated by an .ENDMACRO directive.
A simple example will show the definition and use of a macro:
...........
.MACRO SUBI16 ; Start macro
; definition
subi @l,1ow(@2) ; Subtract low
; byte
sbci @0, high(@2) ; Subtract high
; byte
.ENIDMACRO ; End micro
; definition
000001 e112 RESET: ldi r17, 0x12
000002 e304 ldi r16, 0x34
000003 + SUBI16 r17, ; Sub.0x1234
r16,0x1234 ; from r17 r16
...........
Status bar will indicate whether execution is targeted at the AVP In-Circuit
Emulator or the built — in AVR Simulator
The user has full control of the status of each part of the microcontroller
using many Separate windows:
• Register window displays the contents of the register file
• Watch window displays the values of defined symbols (in C programmes)
• Message window displays messages to the user
• Processor window displays information such as Programme Counter,
Stack Pointer Status Register, and Cycle Counter
• Memory windows show programme, data, I/O and EEPROM
• Peripheral windows show 8-bit timer, I/O ports, and EEPROM registers
AVR Studio supports all available types of AVR microcontroller. The
simulation with AVR Studio itself is more comfortable than simulation with
the AVR Simulator, although the handling is quite similar.
Y
Are there
Assemble the program runtime errors?
N
Y N
Are there
errors? Development
completed
that the remainder of the line is a comment, and will therefore be ignored by
the assembler.
The label identifies a particular stage in the programme, and as mentioned
before is a more convenient identifier than a memory address which may
change as the programme is developed. A label at the beginning of a line must
be followed by a colon (:), but when used as the operand of a jump or call
instruction there is no colon.
Example 5.1
The comment is added to make clear the nature of the actions being pro-
grammed. Comments must begin with a semicolon (;). Entire lines may be
comment lines by beginning them with the semicolon. Except in very short
programmes, all the major stages and subroutines of a programme should be
commented and any complex operations explained. Although the assembler
does not regard the layout of an instruction on a line, it improves readability
if the four elements are indented in a consistent manner.
;***********************************************************
; File Name: *
; Title: *
; Author: *
; Date: *
; Version: *
; File saved as: *
; Target MCU: *
; Clock frequency: *
;*******************************************************************
; Programme Function:__________________________
;*********************************************************************************
; ********* Directives:
.device at90s1200 ; Device is AT90S1200
.nolist
.include “at90s1200.inc”
.list
;*********************************************************************************
; Declarations:
.def temp = r16
;*********************************************************************************
; Start of Programme
rjmp Init ; first line executed
;*********************************************************************************
Init: idi temp, 0bxxxxxxxx ; Sets up inputs and outputs on PortB
out DDRB, temp ;
Idi temp, 0bxxxxxxxx ; Sets up inputs and outputs on PortD
out DDRD, temp ;
Idi temp, 0bxxxxxxxx ; Sets pulls ups for inputs of PortB
out PortB, temp ; and the initial states for
; the outputs
Idi temp, 0bxxxxxxxx ; Sets pulls ups for inputs of PortD
out PortD, temp ; and the initial states for
; the outputs
;********************************************************************************
; Main body of programme:
Start:
<Write your programme here>
rjmp Start ;loops back to Start
;********************************************************************************
values of some control registers that will let the system works as required
by the specifications given to the designer. In the majority of the cases
the frequency used is one of the parameters needed to calculate the initial
values.
• The “version” and “date” are needed especially if the programme has
more than one version.
• ‘Target MCU:’ refers to which particular microcontroller the programme
is written for. We used here AVR as example but it represents in general
the microcontroller used.
5.3 Design of an Assembly Language Programme 227
.device at90sl200
.include “at90sl200def.inc”
• After the declarations, we have the first line executed by the chip on
power- up or reset. In this line I suggest jumping to a section called Init
which sets up all the initial settings of the AVR. This uses the rjmp
instruction:
rjmp Init ;
This stands for relative jump. In other words it makes the chip jump to a
section of the programme which you have labeled Init.
• The first part of the Init section defines the input and output pins, i.e. it
sets which pins are going to act as inputs, and which as outputs. This is
done using the Data Direction I/O registers: DDRB and DDRD. Each bit
in these registers corresponds to a pin on the chip. For example, bit 4 of
DDRB corresponds to pin PB4, and bit 2 of DDRD corresponds to pin
PD2. As we shall see latter, setting the relative DDRX bit high makes the
pin an output, and making the bit low makes the pin an input.
If we configure a pin as an input, we then have the option of selecting
whether the input has a built-in pull-up resistor or not. This may save us the
trouble of having to include an external resistor. In order to enable the pull-
ups make the relevant bit in PORTx high; however, if you do not want them
make sure you disable them by making the relevant bit in PORTx low. For
the outputs, we want to begin with the outputs in some sort of start state (e.g.
all off), and so for the output pins, make the relevant bits in PORTx high
or low depending on how you wish them to start. An example should clear
things up.
Solution:
The required programme is given as Programme A. In this case we need
only one output to which we are going to connect the LED. The programme
assumes that the LED is connected to PB0.
Programme A — LED on
;**************************************************
; File Name: *
; Date: *
; Author: *
; Version: 1.0 *
; File saved as: LEDon.asm *
; Target MCU: AT90S1200 *
; Frequency: 4MHz *
;**************************************************
; Programme Function: Turns an LED on
;**************************************************
; Directives
.device at90sl200
.nolist
.include "l200def.inc"
.list
;**************************************************
; Declarations:
.def temp = rl6
;**************************************************
; Start of Programme
rjmp Init ; first line executed
;**************************************************
Init: ser temp ; PB0 - output. This
; instruction
; moves 0b11111111
; to r16
out DDRB, temp ;
out DDRD, temp ; PD0-7 all N/C
clr temp ; all Port B outputs off
out PortB, temp ;
out PortD, temp ; all Port D N/C
;**************************************************
230 Machine Language and Assembly Language
Start:
sbi PortB, 0 ; turns on LED
rjmp Start ; loops back to Start
;**************************************************
Example 5.3
Write a programme which represents the following: An input pin is con-
nected to a button and an output pin is connected to a LED. It is required to
write a programme that turns the LED when the button is pressed, and turns
off when it is released.
Solution:
The required programme is given: Programme B. In this programme we
assumed AT90S1200 microcontroller is used. The same programme can be
used with AT90S8515 after changing the corresponding directives.
Programme B — Push Button
;**************************************************
; File Name: *
; Date: *
; Author: *
; Version: 1.0 *
; File saved as: PushA.asm *
; Target MCU: AT90S1200 *
; Clock frequency: 4MHz *
;**************************************************
; Programme Function: Turns an LED on when a button
; is pressed
;**************************************************
; Directives
.device at90sl200
.nolist
.include "l200def.inc"
.list
;**************************************************
; Declarations:
.def temp = r16
;**************************************************
5.4 Use of Template: Examples 231
;Start of Programme
rjmp Init ; first line executed
;**************************************************
Init: ser temp ; P80 - output, rest
; N/C
out DDRB, temp ;
ldi temp, 0b11111110 ; PD0 - input, rest
; N/C
out DDRD, temp ;
clr temp ; all Port B outputs
; off
out PortB,temp
ldi temp, 0b0000000l ; PD0 - pull-up, rest
; N/C
out PortD,temp ;
;**************************************************
Start:
sbis PinD, 0 ; tests push button
rjmp LEDoff ; goes to LEDoff
sbi PortB, 0 ; turns on LED
rjmp Start ; loops back to Start
LEDoff:
cbi PortB, 0 ; turns off LED
rjmp Start ; loops back to start
;**************************************************
rjmp RESET
;Reset handle
reti ;External Interrupt0
;handle
reti ;Overfiow0 Interrupt
;handle
reti ;Analogue Comparator
;interrupt handle
;***** Main **********************************
RESET: ser temp
out DDRB, temp ;PORTB = all outputs
sec
ldi temp, $FF
out PORTB,temp ;Set bit pattern
loop: sbic PIND, 2 ;Wait tint key is
;pressed
rjmp loop
rol temp ;Rotate bit pattern
out PORTB,temp ;Output rotated bit
;pattern
wait: sbis PIND, 2 ;Wait until key is
;unpressed
rjmp wait
rjmp loop ;Repeat forever
;**************************************************
two examples:
Example 5.5
Write a programme that copies the contents of an area within the on-chip
memory (between addresses 25h and 45h) to another area within the same
memory (addresses 55h to 75h). The on-chip memory is called the internal
RAM and in case of AVR is called internal SRAM.
Solution:
The programme in this case is as follows:
Possible Solution 1: Case of Intel 8051
Example 5.6
In this example it is required to copy the block of data in the on-chip
memory in the range 25 h to 45 h to the external memory (RAM/ SRAM)
addresses starting at 6000 h.
In case of AVR an access to external SRAM occurs with the same in-
structions as for the internal data SRAM access. When internal data SRAM
is accessed, the read and write strobe pins (/RD and /WR) are inactive during
the whole access cycle. The external data SRAM physical address locations
corresponding to the internal data SRAM addresses cannot be reached by the
CPU. External SRAM is enabled by setting the SRE bit in the MCUCR control
register. More details are given in Chapter 6 on accessing external SRAM.
The programme in this case will be as that given in the previous example
with one change at the first line. The first line:
START LDI R28,$55 ; Set Y low byte to the value $25
; is replaced now with:
START LDI YH, $60h
LDI YL, $00h
It is also possible to replace it (in this case) with:
START LDI YH, $60h
CLR YL
Example 4.8
Write a programme to give the value of any of the first 16 prime numbers
using a look-up table with values 1, 2, 3, 5, 7, 11,…. If a number (between
1 and 16) is put into the accumulator A, then the programme should put the
prime number at that position on the list into A. (For example, 6 would return
the sixth prime number, which is 11). Two possible methods of programming
238 Machine Language and Assembly Language
CLR C
SUBB A, #30h ; A=18h
CLR C
ADD A, #10101011b ; Add -55h; A = -3Dh
CLR C
SUBB A, #l0011011b ; Subtract -65h A= +28h
Possible solution:
; Addition of 16-bit numbers
MOV A, 30h
ADD A, 40h ; Add low bytes
MOV 50h, A ; Store the low-byte result
MOV A,31h
ADDC A, 41h ; Add high bytes + carry
MOV 51h, A ; Store the high-byte result
; Subtraction of 16-bit numbers
MOV A, 30h
CLR C
SUBB A, 40h ; Subtract low bytes
240 Machine Language and Assembly Language
A PB0
B PB1
PB4 Output
C PB2
D PB3
The programme that implements the equation using AVR instruction set
is the subject of question 5.4.
Example 5.11
If the oscillator clock frequency is 12 MHz, design a subroutine that will
run for exactly 100 ms before returning. Use Intel 8051 Instruction set.
5.5 Data Manipulations: Examples 243
Solution
The method used here is to design a programme with a set of nested loops.
Each loop will decrement a register once each time it is executed. Initially
the values of data required in the registers are not known. Therefore values
called XX, YY,… are used in writing the programme, and a time budget is
done to calculate the total time in terms of the unknown variables. This is done
using the “cycles” column in the Instruction Set sheets of the microcontroller.
Then the time is adjusted to the required length by putting suitable values into
the registers. In this case two loops will be used, but if these are insufficient
more can be added. The flowchart for the decrementing the loops is shown in
Figure 5.7.
The assembly language programme corresponding to the flowchart is as
follows.
In practice, the label LOOP2 is not required, since the $ means jump to
the same address. But it is included here to help clarify the operation of the
programme.
The next step is to calculate the values of XX and YY representing the
number of repeating the outer and inner loops correspondingly. This can be
achieved by preparing from the programme a time budget is prepared. The
time budget takes into consideration (1) the number of cycles per instruction,
(2) the number of times each instruction is executed and (3) the total cycles
used by that instruction. This is shown in Table 5.3.
Ignoring the first and last instructions, which are only executed once, the
total number of cycles taken by the subroutine is
12 ∗ XX + 24 ∗ XX ∗ YY + 24 ∗ XX
The required time for the subroutine is 100 ms, which for a clock frequency
of 12 MHz requires 12 × 105 cycles.
Therefore
12 ∗ XX + 24 ∗ XX ∗ YY + 24 ∗ XX = 12 × 105
3 ∗ XX + 2 ∗ XX ∗ YY = 105
Choosing a value for XX, say 255 (FFh)
3 ∗ 255 + 2 ∗ 255 ∗ YY = 105
YY = [105 − 3 × 255]/[2 × 255] = 194.57
Obviously the fractional part of the number will not be stored in the
microprocessor memory and the value of YY should be treated as 195, which
in hexadecimal is C3h. The complete subroutine can now be written as follows:
START: MOV R0, #0FFh
LOOP1: MOV Al, #0C3h
5.5 Data Manipulations: Examples 245
DJNZ R1, $
DJNZ R0, LOOP1
RET
If the value of YY had come out to be very small, the rounding error
would have been significant, and it would have been necessary to reduce XX
accordingly. In the present case both XX and YY are large enough for the
rounding error to be ignored.
If the value of YY is larger than FFh, it means that the required delay
cannot be achieved with two loops, and the method of the next section will
have to be employed.
the programme, the required delay time and the clock frequency. The following
formula can be used in our case.
;**************************************************
; File Name: delay.asm
; Title: Software delay
; Date:
; Version: 1.0
; Target MCU AT90S1200
; Description:
; Software delay with counted loops
;**************************************************
;********Directives *******************************
.device at90S1200 ; Device is AT90S1200
.notlist
.include "1200def.inc"
.list
.def temp = r16
.def dly = r17
.def delent = r18 ;Loop counter
;********Interrupt vector table
rjmp RESET
;Reset handle
reti ;External Interrupt0
;handle
reti ;Overflow0 Interrupt
;handle
reti ;Analogue Comparator
;Interrupt handle
;*************Subroutines *************************
nop
brne loop1
dec dly
brne loop1
;********main *************************************
RESET: er temp
Out DDRB, temp ;PORTB = all
;outputs
Out PORTB, temp
Ldi dly, $FF ;Initialize delay
;of about 65 ms
Loop: bi PORTB, PB0
Ldi dly, $FF ;Initialize delay
;of about 65ms
rcall DELAY ;Number of cycles
;= 1026*dly+4
Sbi PORTB, PB0
Ldi dly, $FF ;Initialize delay
;of about 65ms
rcall DELAY ;Number of cycles
;= 1026*dly+4
rjmp loop ;Repeat forever
;**************************************************
5.7 Write a programme that counts the number of times a push button is
pressed.
6
System Memory
6.1 Introduction
249
250 System Memory
Input Output
Memory
Instructions
and Data
Logic
As the centre of the computer system, the memory has a significant effect
on many of the design metrics discussed in Chapter 1:
• Performance: The use of slow-memory (i.e. large access time) de-
grades the overall performance of the system. Slow-memory may create
“memory-bottlenecks”, which causes the processor to work below its
performance capabilities.
• Software Support: The capacity of the memory limits the use of the com-
puter; less memory capacity limits the programmes and the applications
that can run on it.
• Reliability and Stability: The design of the memory defines the probabil-
ity of finding one bit or more in error. Bad memory means high probability
of bit in error, which means less reliable system. It is very important to
use high-quality, error-free memories to insure smooth running for the
system.
• Upgradability: Because of the importance of the memory capacity, as
mentioned before, it is important during the system design to use some
form of universal memory. This allows the user, when the needs arise, to
upgrade the current memory if the motherboard allows, or to replace it
with newer ones.
The selection of the memory unit(s) for a given application is a series
of tradeoffs, since a number of different factors must be considered. These
include: memory size, datapath width (bus width and word width), access
time, the write ability and the organization method. The speed of the mem-
ory depends on many factors, including technology of implementation and
organizational method.
This chapter familiarizes the reader with many aspects related to the
memory and memory systems. The memory systems of AVR and Intel
microcontrollers are used as case studies.
6.2 Memory Classification 251
• Read mostly memory: This is some form of ROM that can be repro-
grammed, out-of system, from time to time to change its contents.
Erasable PROM (EPROM) and Electrically EPROM (EEPROM)
are some examples.
4. Volatility: Refers to the ability of memory to hold its stored bits after
those bits have been written:
• Nonvolatile Memory: Is the memory that retains its contents when
the system is switched off or when there is a power failure. All
currently used forms of magnetic storage and optical disk storage
and also the ROMs are nonvolatile.
• Nonvolatile memories may split into two types:
– Erasable, e.g., EPROM, EEPROM and magnetic memories.
– Nonerasable: this is the memory range that will essentially
never lose its bits-as long as the memory chip is not damaged,
e.g. ROM and PROM.
• Volatile Memory: The data in such class of memory stays there only
as long as the memory is connected to the power.
5. Destructive or Non Destructive Read Out: This parameter describes
the effect of the READ operation on the information stored. In Ferrite
Core-memories, the read operation destroys the stored information under
consideration. The read operation must be followed by rewrite operation.
Such memory is called Destructive Read Out (DRO) memory. Most
of the Semiconductor memories, the magnetic drums, disks, and tapes
are nondestructive. The read operation in this case will not affect the
stored information. Dynamic RAM and semiconductor latches (buffers)
accordingly is a destructive read storage
6. Static or Dynamic:
• Static RAM (SRAM): is semiconductor memory which holds its
contents so long as there is a power supply. Data can be changed
by over-writing it with new data. The data is stored in the form of a
state of a flip-flop.
• Dynamic RAM (DRAM): DRAMs store data in the form of an
electronic charge on the inter-electrode capacitance of a field ef-
fect transistor. This capacitor is not ideal, accordingly it starts to
leak the charge stored on it. Because of this discharging effect, it
is possible that the cell lose the data stored in it. To avoid that,
dynamic memories are to be refreshed regularly. Refreshing is used
6.3 Memory Response Time 253
Memory
ROM
SRAM Ferrite Bubble CD-ROM
core
PROM
DRAM Floppy Rewritable
CD
EPROM
Hard disk
EEPROM
Tape
Flash
Arrival of Address
Figure 6.3 Memory response time.
In case of random-access memory, the access time is the sum of the two
components, while in case of non-random-access (e.g. sequential access),
the latency represents the access time.
• Memory Cycle Time: This is measured by the minimum period between
two successive requests of the memory. Compared with access time,
memory cycle time takes into consideration the time required for the
signals to settle on the data bus; “settling time”. Many memory types
have identical access time and cycle time.
The “read access time” and “write access time” are almost the same for the
majorities of the semiconductor memories but not the same for other memories.
Concerning the response time, normal memories (that use addresses for
identifying the locations of the data stored within the memory) are classified
into random access, sequential address and direct address.
Any memory with random read cycle, it is also random for write cycle. It
is generally called random access memory (RAM). We must note here that the
term RAM has no relation with the write-ability of the memory. It is related to
the access time. Unfortunate this term is used wrongly to describe read/write
ability of the memory. As a matter of fact ROM is also RAM if we consider
the correct definition of the term “random access”.
The semiconductor memories are, generally, random access memories.
As mentioned before, the access time means the time from when the instruction
decoder supplies the system with the address of the needed location at the
memory until the desired information is found but not read. In sequential
access memories data is arranged in the form of units, called records, each
6.4 Semiconductor Memory 257
with its unique address. The address of the record is stored as part of the
contents of the record. The record contains normally more than one word. The
addresses are used as a way to separate the records from each other at the same
time the system uses them for retrieving the records. The sequential access
memory has one read/write mechanism; in case of hard disk this mechanism
is called head assembly. Accessing any record must follow a specific linear
sequence; any time the system needs to access record n, the head assembly has
to start from an initial address passing and rejecting any intermediate address
till it reaches the required address. In some systems the initial address is that of
the first record (barking place of the head assembly in case of hard disk), other
systems starts from the current location. This makes the access time to be a
function of the physical location of the data on the medium with reference to
the default position from which the read/write mechanism starts its movement.
As in case of sequential access, this system uses one mechanism for read/write.
In direct address, individual blocks or records have a unique address based
on physical location. Access is accomplished by direct access to reach gen-
eral vicinity plus sequential searching, counting, or waiting to reach the final
location. Disk units are direct access.
Semiconductor memory
and microprocessors been forced to use the slow, bulky, and expensive ferrite
core memory of 1960s, and 1970s mainframes.
Semiconductor memories fall into two main categories, as shown in Fig-
ure 6.4: table memories and function memories. With table memories, an
address A is defined in the range
0 ≤ A ≤ n = 2N − 1
This expression defines the address space of a memory chip that has address
width N bits. Data can be stored at each of the 2N addresses. If the data word
width of the memory chip is m bits, then it has storage capacity of K = m.2N
bits. It is possible to extend the used address space and the word width by
using several memory chips (see Section 6.6.1). The extension in the address
space is limited by the width of address bus of the system. For address bus
width Nb , the memory space available for the system is 2Nb . This will allow
any tables such as truth tables, computer programmes, results or numbers to
be stored. In general the table memories are used to store information (data,
instructions,…).
6.4 Semiconductor Memory 259
Read-only memory is a non-volatile memory that can be read from, but not
written to, by a processor in an embedded or general–purpose system. The
mechanism that is used for setting the bits in the memory, is called program-
ming, not writing. For traditional types of ROM, such programming takes
place off-line at the factory, when the memory is not actively serving as a
memory in a system.
Read-only Memory (ROM) cells can be built with only one transistor
per bit of storage. A ROM array is commonly implemented as a single-ended
NOR array using any of the known NOR gate structures, including the pseudo-
nMOS and the footless dynamic NOR gate. Figure 6.5 shows a 4-word by 6-bit
ROM using pseudo-nMOS pull-ups with the following contents:
Word 0 : 010101
Word 1 : 011001
Word 2 : 100101
Word 3 : 101010
It is easy to see the two main reasons that read-only memory is used in
such and similar applications to fulfill certain functions within the PC or the
microcontroller:
• Permanence: The values stored in ROM are always there, whether the
power is on or not. A ROM can be removed from the PC, stored for an
indefinite period of time, and then replaced, and the data it contains will
still be there. For this reason, it is called non-volatile storage. A hard
disk is also non-volatile, for the same reason, but regular semiconductor
RAM is not.
• Security: The fact that ROM cannot easily be modified provides a mea-
sure of security against accidental (or malicious) changes to its contents.
You are not going to find viruses infecting true ROMs, for example;
it’s just not possible. (It’s technically possible with erasable EPROMs,
though in practice never seen.)
While the whole point of a ROM, as mentioned before, is supposed to be
that the contents cannot be changed, there are times when being able to change
the contents of a ROM can be very useful. There are several ROM variants
that can be changed under certain circumstances; these can be thought of as
"mostly read-only memory". PROM, EPROM, EEPROM, and flash memory
are some of the modified forms of the ROM. In terms of write ability, the
latter two have such a high degree of write ability that calling them read-only
memory is not really accurate. In terms of storage permanence, all ROMs have
high storage permanence, and in fact, all are nonvolatile.
6.4 Semiconductor Memory 261
have very high storage permanence, since their stored bits won’t change un-
less someone reconnects the device to a programmer and blows more fuses.
Because of their high storage permanence, OTP ROMs are commonly used
in final products, versus other PROMs, which are more susceptible to having
their contents inadvertently modified from radiation, maliciousness, or just
the mere passage of many years.
Use of PROM is useful for companies that make their own ROMs from
software they write. When the company changes its code, the company can
create a new PROM with the new code. This can be achieved without the need
of expensive equipment.
The read operation addresses the cell in exactly the same way, but in
this case the data input is disabled and is not allowed to drive the bit lines.
Information from the bistable cell is passed over the bit line through an
amplifier to become the data output.
Word
VDD/2
2V∆
bit
(a) (b)
Figure 6.7 Dram cell (a) The cell, (b) Read operation.
268 System Memory
first precharged to VDD/2. When the wordline rises, the capacitor shares its
charge with the bitline, causing a voltage change V that can be sensed (see
Figure 6.7(b)). The read disturbs the cell contents at x, so the cell must be
rewritten after each read. On a write, the bitline is driven high or low and
the voltage is forced onto the capacitor. Some DRAMs drive the wordline to
VDDP = VDD + Vt to avoid a degraded level when writing a ‘1’.
The DRAM capacitor Ccell must be as physically small as possible to
achieve good density. However, the bitline is contacted to many DRAM
cells and has a relatively large capacitance Cbit . Therefore, the cell capac-
itance is typically much smaller than the bitline capacitance. According to the
charge-sharing equation, the voltage swing on the bitline during readout is
VDD Ccell
V = ·
2 Ccell + Cbit
This equation shows that a large cell capacitance is important to provide
a reasonable voltage swing. This large capacitance is also necessary to retain
the contents of the cell for an acceptably long time and to minimize soft errors.
For designers, 30 fF (30 ×10−15 ) is a typical target. In modern technology, the
most compact way to build such a high capacitance is to use three-dimensional
capacitor structures.
Dynamic RAM Refresh Cycle
In case of DRAM each cell needs only one transistor. The cell’s transistor
is nothing more than a gate. All cell gates in the entire row are opened by
assertion of the row select line. Each row capacitor drives a bit-line sense
amplifier that also includes a bistable storage cell. With this arrangement, the
assertions of a row select line transfers and stores data from all cells of the
row to the corresponding sense amplifiers. A particular sense amplifier can be
gated to the output by assertion of the appropriate column select line.
There are two problems with the dynamic cell that are not encountered in
the static cell. The first is that the sense operation draws current from the cell
and modifies the capacitor’s charge. The second problem is capacitor leakage.
Even if the gating transistor remains off, the capacitor tends to lose charge.
Assuming that the target value of 30 fF for the cell capacitance is achieved,
we can see that even when using a very large leakage resistance to ground, the
time constant will be low. In order for the cell to maintain enough charge to
result in accurately sensed data, a refresh operation must be performed before
significant charge is lost. A common refresh time for practical DRAMs is 2 ms.
The refresh operation must restore all lost charge on the capacitors every 2 ms
or less.
6.5 Interfacing Memory to Processor 269
Table 6.1 The Address Decoder and R/W determine the type of bus activity for each cycle.
Select R/W Function Rationale
0 0 OFF Because the address is incorrect
0 1 OFF Because the address is incorrect
1 0 WRITE Data flow from the processor to the external
device (can be external memory)
1 1 READ Data flow from the device to the processor
270 System Memory
width is 32 bits (or lines) A0–A31, the memory space will be 232 locations.
Microprocessor systems (also microcontroller systems) often have memory
components that are smaller than 232 . Moreover, there normally use more than
one type of memory at the same time to form the memory system: read/write,
ROM, EEPROM and memory – mapped peripherals. A memory map is a
graphical representation describes the way such different types of memories
are organized in the memory space. The memory map shows the address
allocation of each memory component and the address of any other devices
connected to the address bus.
Figure 6.11 gives an example of a 16-bit width address bus. The 16-bit
address bus gives 216 address space, which means that a maximum of 64 K
word different storage devices can be addressed when using this processor.
The user can use all the available space or part of it and can use different types
of storage devices, e.g. register files, RAM, and ROM. In case of Figure 6.11
four different memory chips each of 16 K words are used by the system. The
RAM has been the first 48 K addresses (from 0000 to BFFF) and ROM given
the highest 16 k (i.e. addresses C000 to FFFF). Each area of memory can also
be subdivided into smaller pages of 256 addresses each. Page zero occupies
addresses 0000 to 255 in decimal or 0000 to 00FF in hex.
In memory mapped I/O system, the peripherals are considered as memory
locations.
Example 6.6
Assume a processor with 8-bit word and with a 16-bit address bus. Draw
the memory space and the memory map, if the processor is connected to the
FFFF
ROM (16K)
C000
BFFF
RAM (16K)
8000
7FFF
RAM (16K)
4000
3FFF
RAM (16K)
0000
Figure 6.11 An example of a memory map.
6.5 Interfacing Memory to Processor 277
x0000 RAM
x0FFF
Unused
x5000 Input device
x5001
Output device
Unused
xC000
ROM
FFFF
memory built-in:
32 working registers $0000 to $001F
64 I/O registers $0002 to $005F
256(512) Internal SRAM $0060 to $015F($025F)
Show the memory maps of the AVR microcontroller.
Solution
The AVR as any RISC structure has two memory maps: programme
memory map and data memory map. As a matter of fact the majority of the
AVR microcontrollers has a third map; data EEPROM memory map. The data
EEPROM is a 4 K × 8-bit word. Figure 6.13 shows the complete memory maps
of ATmega8515/AT90S8515. The external SRAM space in the data memory
map is normally unused unless the designer needs more storage space than
that available for him by the internal SRAM. In such cases the designer can
connect an external SRAM of capacity up to the 64 K words.
Data
$0000
8 bits
$01FF
/CS4
/CS3
/CS2
/CS1
Data bus
main function of the address decoder is to monitor the N address lines from
the bus master (the processor) and determine whether or not the slave has been
selected to communicate in the current cycle. Each slave has its own address
decoder, uniquely designed to select the addresses intended for the device.
Care must be taken to avoid having two devices driving the data bus at the
same time. It is required to select exactly one slave device during every cycle.
Consider the situation illustrated by Figure 6.14, which represents the
connections of the components mentioned in Example 6.6 to the address bus of
the processor. The ROM address input pins are connected to 14 of the address
bus lines coming from the master, and the RAM is connected to 12 of these
lines. This means that there are 212 locations are common between the ROM
and the RAM. Whenever one of these common locations is addressed in the
RAM, the corresponding location in the ROM is addresses. The data outputs
of the ROM and RAM (also the other I/O devices) are connected to the system
bus. Since all the data outputs of the different devices are connected together,
the data bus derives in the different components must have tri state outputs.
That is, only one of the components may put data on the system data bus.
To achieve that, all the components connected to the buses have a chip-
select signal (/CS1, /CS2, /CS3, and /CS4). Whenever the chip-select input of
any component is active-low, that device takes part in memory access and puts
(or receives) data on (from) the data bus if R/W = 1. When the chip-select
is not active (i.e. in high state) the appropriate data bus drivers are turned off,
stopping all the devices from putting any data on the data bus. Special circuit
280 System Memory
called “chip select decoder” are used to generate the chip-select signals. The
decoder is designed such that no conflict between the different devices; the
address decoders should not select two devices simultaneously. In other words,
no two select lines should be active simultaneously. In this case,
In this way it is guaranteed that for each address location within the
memory space, only the chip-select of the memory component that has the
needed data is active and the rest are not active. By this way the memory map
of our system now contains four disjoint memory components.
The address decoder is usually located in each slave interface. There are
several different strategies for decoding. These strategies may be divided into
groups: full address decoding, partial address decoding, block address de-
coding and minimal cost address decoding. Only the first strategy is discussed
here. But before discussing the decoding strategies we have to introduce the
use of address/data multiplexing.
Utilizing the limited number of pins on the microcontroller
Not all the microcontrollers/microprocessors have enough pins to let
each pin has a single function. To utilize the limited number of pins on the
microcontroller, use is made of address/data multiplexing in which some pins
have dual functions. For example, the Intel 8051, the Motorola 6811, the
AVR and MC68HC912B32 use a multiplexed address/data bus. The 8051, the
AVR and 6811 (where the address bus width N =16 and word length of 8-bit)
use the same eight wires AD7-AD0 for low address (A7-A0) in the first half
of each cycle and for data (D7-D0) in the second half. The address strobe
output (AS) in case of 6811 and the Address Latch Enable (ALE) in case of
8051/8085/8088/8086/AVR microcontrollers and microprocessors are used to
capture first half of the address into an external latch, 74HC375 in Figure 6.15.
In this way, the entire 16-bit address is available during the second half cycle.
In the MC68HC912B32, the same 16 wires, ADI5-AD0, are used for the
address (A15-A0) during the first half of each cycle and for the data (D15-D0)
during the second half of each cycle. We use the rising edge of the E clock
to capture the address into an external latch (two 74HC374s), as shown in
6.5 Interfacing Memory to Processor 281
RAM
Address Select RAM
Decoder
Bus Master
8 I/O
A15-A8 Address Select I/O
Latch Decoder
16
AD7-AD0 8
A7-A0 ROM
AS LE Address Select ROM
Decoder
Figure 6.15 The 6811 (also 8051/8085/8088/8086) multiplexes the low address.
RAM
A15-A0 Address Select RAM
Decoder
MC68HC912B32
Figure 6.16. In this way, the entire 16-bit address is available during the second
half of the cycle.
Some processors, e.g. MC6811C812A4, have enough number of pins to
let the address pins completely separated from the data pins and accordingly
run in non-multiplexed mode.
Whether multiplexing 8 pins or 16 pins or separating the address pins from
data pins, the address decoders should not select two devices simultaneously.
In other words, no two select lines should be active at the same time. In
282 System Memory
This way guarantees that for each address appears on the address bus of
the system only one location will respond. All the microprocessor’s address
lines are used to access each physical memory location, either by specifying
a given memory device or by specifying an address within it.
Example 6.8: Design a fully decoded positive logic Select signal for a 1 K
RAM at 4000H - 43FFH
Solution:
Step 1: Write in binary the start address and the last address of the space
occupied by the chip. Compare the two and then write specified address using
0, 1, X, using the following rules:
a. There are 16 symbols, one for each of the address bits A15, A14, …, A0
b. 0 means the address bit must be 0 for this device
c. 1 means the address bit must be 1 for this device
d. X means the address bit can be 0 or 1 for this device
e. All the Xs (if any) are located on the right-hand side of the expression
f. With n represents the number of Xs, the size of the memory in bytes is
2n
g. Let I be the unsigned binary integer formed from the 16 − n 0s and 1s,
then
In this example:
Beginning address $4000 = 0100 0000 0000 0000
Last address $43FF = 0100 0011 1111 1111
The needed address = 0100 00XX XXXX XXXX
From this result:
n = 10
size of the chip = 210 = 1024 bytes
I = 0100002 = 1610 = ($4000)/($0400)
Step 2: Write the equation using all 0s and 1s. A 0 translates into the
complement of the address bit, and a 1 translates directly into the address bit.
For our example,
Select = A15.A14.A13.A12.A11.A10
Step 3: Build circuit using real TTL gates. The result is given in Figure 6.17.
Example 6.9: Design a fully decoded select signal for an I/O device at $5500
in negative logic
Solution:
Step 1: address = 0101 0101 0000 0000
Step 2: The negative logic output can be created simply by inverting the
output of a positive logic design.
Select = A15.A14.A13.A12.A11.A10.A9.A8.A7.A6.A5.A4.A3.A2.A1.A0
Step3: Figure 6.18.
Example 6.10: Build a fully decoded positive logic address decoder for a 20 K
RAM with an address range from $0000 to $4FFF.
Start with
0000, 0000, 0000, 0000 Range $0000 to $0000
Start over
Example 6.11: Build a fully decoded negative logic address decoder for a
32 K RAM with an address range of $2000 to $9FFF.
6.5 Interfacing Memory to Processor 285
Even though the memory size is a power of 2, the size 32,768 does not
evenly divide the starting address 8192. We break the $2000 to $9FFF irregular
address range into three regular address ranges.
(a) (b)
Figure 6.19 A 1 K × 4 RAM.
286 System Memory
The CS (or chip enable CE) signal represents one of the control signals
that control the operation of the memory. The number of control signals has
its effect on the read/write timing diagram. RAM ICs typically have three
control inputs. In addition to a chip select (/CS), the IC has a write enable
(/WE) input, and an output enable (/OE) input. (The write enable signal is
also known as read/write R/W enable). Less common are RAMs with only
two control inputs, /CS and /WE.
When writing to the RAM, care must be taken that the correct address
is present before assertion of the write enable. If the write enable is asserted
before the desired address is present, the incoming data may write over existing
data as the address changes.
To write a RAM IC that has three control inputs involves the following
steps:
As mentioned before there are two modes of writing: /WE control and
/CS control. The waveforms of Figure 6.20 summarize the important timing
specifications for these two modes of writing data to the RAM. This device
has common input/output pins.
In mode 1, the chip select line is always asserted before assertion of
the write enable and becomes deasserted at the same time or after WE is
deasserted. Thus, the write enable controls the entry of data. This mode in-
cludes the case of a continuously selected chip. The /WE line can be asserted
simultaneously with the application of a valid address because the address
setup time, tAS , is specified as 0 ns. The /WE line must remain asserted for
35 ns, as indicated by the write pulse width parameter twp Input data may be
changed after /WE is asserted, but the valid input data to be written must be
stable for a minimum of 20ns. This figure corresponds to the data valid to end
of write time tow. Note that while /WE is low, the data output lines do not
contain valid data. If the chip select /CS remains low after /WE goes high, the
6.5 Interfacing Memory to Processor 287
Figure 6.20 Memory write waveforms: (a) for device continuously selected, (b) for address
valid before or coincident with chip select.
The memory section of the Atmel RISC AVR processors is based on the
Harvard model, in which various portions of the memory are separated to
allow faster access and increased capacity. The CPU has a separate interface
6.6 AVR Memory System 289
(a)
(b)
for the FLASH code (programme) memory section, the data memory section,
and the EEPROM memory section, if one is present. The data memory of the
AVR processor typically contains three separate areas of read/write (R/W)
memory.
• The register file: This is the lowest section and contains 32 registers. All
the 32 registers are general-purpose working registers.
• The I/O registers: This is the next 64 registers following the register file.
290 System Memory
• The internal and external SRAM: This represents the memory area
following the I/O registers.
while it is executing, and they can even be used for the storage of global
variables. The sixty-four I/O registers are used as the interface to the I/O
devices and peripherals on board the microcontroller. And the internal SRAM
is used as a general variables storage area and also for the processor stack.
MyTable:
.DW 0x1248,0x2458,0x3344,0x4466,0x5666 ; The table values, organized as words
.DW 0x5789,0x679A,0x22AB,0x9AB8,0x33AC ; the rest of the table values
Read5: LDI ZH, HIGH(MyTable*2) ; Address of table to pointer Z
LDI ZL,LOW(MyTable*2) ; multiplied by 2 for bytewise access
ADIW ZL,10 ; Point to fifth value in table
LPM ; Read least significant byte from
; programme memory
MOV R24,R0 ; Copy LSB to 16-bit register
ADIW ZL,1 ; Point to MSB in programme memory
LPM ; Read MSB of table value
MOV R25,R0 ; Copy MSB to 16-bit register
7 0 Addr
R0 $00
R1 $01
R14 $0E
General R15 $0F
Purpose R16 $10
Working R17 $11
Registers
R26 $1A X-register low byte
R27 $1B X-register high byte
R28 $1C Y-register low byte
R29 $1D Y-register high byte
R30 $1E Z-register low byte
R31 $1F Z-register high byte
The AVR register file is broken up into 2 parts with 16 registers each, R0 to
R15 and R16 to R31. All instructions that operate on the registers have direct
access to the registers and need one cycle for execution. The exception is the
instructions that use Immediate Address Mode and the instructions that used a
constant mask to set or clear the bits of a register. Such instructions allow the
programmer to specify a constant as an operand and they must use registers
between R16 and R31. These instructions are: LDI, ANDI, ORI, SBCI, SUBI,
CPI, SBRI, SER, and CBR. The general SBC, SUB, CP, AND and OR and
all other operations between two registers or on a single register apply to the
entire register file.
Some of the general-purpose registers have additional special functions.
Registers R0 and R26 through R31 have additional functions. R0 is used in the
instruction LPM (load programme memory), while R26 through R31 are used
as pointer registers. The instruction LPM loads the contents of programme
memory location pointed out by the contents of register Z to register R0.
Pointer-register
The register pairs R26:R27, R28:R29 and R30:R31 have an additional
special function. Because of the importance of this extra function, each pair
has an extra name in the assembler; X, Y and Z respectively, Figure 6.23. Each
pair forms a 16-bit register that can be used as a pointer pointing to a location
within the address space of the SRAM. To point to a location in programme
memory we must use the Z register. The lower register of each pair stores the
lower byte of the address and the upper register stores the higher byte of the
address. The Y pointer for example, the lower register (R28) is named YL and
R29 is called YH. The names XL, XH, YL, YH, ZL and ZH are defined in the
standard header file for the chip. To load a 16-bit address to Y pointer we use
the following statements:
.EQU Address = RAMEND ; RAMEND is the highest 16-bit address
; in SRAM
LDI YH, HIGH(Address) ; Load the higher byte of the address
LDI YL, LOW(Address) ; Load the lower byte of the address
Two special instructions are designed specially to access the SRAM using
pointers: LD (LoaD) and ST (STore). LD is to read the contents of the memory
location and ST is to write into the location.
There is only one command for the read access to the programme memory.
It is defined for the pointer pair Z and it is named LPM (Load from Programme
Memory), see Chapter 4.
It is very common also to use the pointers to access tables stored at the
programme memory. The following is an example of a table with 10 different
16-bit values stored at the programme memory and the programme is written
to read the fifth table value to R25:R24.
#include <90s8535.h>
main ()
{
DDRA = OxFF; //all bits of Port A are output
DDRB = 0x00; //all bits are input
while (1)
{
PORTA = PINB; //read port B and write to A
}
}
Figure 6.24 Port initialization using register names.
294 System Memory
This programme reads the pins of Port B using the PINB I/O register and
writes the results to the output latches of Port A using the PORTA register.
The C compiler gets the addresses to use with these register names from
6.6 AVR Memory System 295
the #include header file in the first line of the programme, in this example,
90s8535.h.
In summary, the C language programmer uses the I/O registers as the
interface to the I/O hardware devices within the microcontroller. Subsequent
sections of this chapter describe the use and function of each of the I/O
peripherals in the microcontroller and their associated I/O registers.
Two instructions are to be used to access the different I/O registers; IN and
OUT. The IN and OUT instructions are used to transfer data between the 32
working registers and the I/O registers.
I/O registers within the address range 0×00–0×1F are bit-accessible. The
two instructions SBI and CBI can be used to access directly any bit in any I/O
register.
In the following we are considering two of the I/O registers (the status
register and the stack register), while the rest will be consider during discussing
the hardware resources of the microcontroller.
The 1-bit is cleared by hardware after an interrupt has occurred and is set
by the REI instruction to enable subsequent interrupts.
2. Bit Copy Storage (T)
Bit Copy Storage (Bit 6 – T), used with instructions BLD (bit load) and
BST (bit store) for loading and storing bits from one register to another. The
two instructions BLD and BST use the T-bit as source and destination for
the operated bit. A bit from a register file can be copied into T by the BST
instruction and a bit in T can be copied into a bit in a register in the register
file by the BLD instruction.
Example 6.12
bst r1, 2 ; Store bit 2 of file register r1 in T-flag
bld r0, 4 ; Load T flag in bit 4 of r0.
Clt ; Clear T flag
The state of the processor, i.e. the results of the arithmetic and logical oper-
ations, has no automatic effect on the T-bit. Its value is defined explicitly by
the instructions BCLR T, BSET T, BST, BLD, CLT and SET.
3. Half Carry Flag (H)
It indicates half carry (carry from bit 3 to bit 4) in some arithmetic in-
structions. This bit is used, in most of microcontrollers and microprocessors,
during binary coded decimal (BCD) operations. BCD is a notation often used
in business applications requiring exact results without the round-off errors
caused by the usual two’s complement notation. BCD uses a 4-bit code to
represent each decimal digit. In order to provide data compaction, a standard
8-bit word will contain two BCD digits placed side by side. When performing
arithmetic on bytes, an addition might generate a carry from bit 3 into bit 4,
i.e., from the first BCD digit into the second BCD digit. This carry is normally
undesirable and must be detected. The H-bit performs this role. The H-bit is the
carry from bit 3 into bit 4. The disadvantages of BCD are that it is inefficient in
its use of memory space and that it is somewhat slow in performing arithmetic
calculation. Here is an example:
BCD
0101 1000 58
+
0000 1001 09
The carry from bit 3 to bit 4 in this example sets the H flag to1 indicating
that if BCD is the notation used, a decimal adjust instruction must be used to
let the final result takes the correct BCD value.
Example 6.13:
What are the value of the Half Carry flag H and the content of r21 after
execution of the following instruction sequence?
LDI r21, $07
LDI r22, $09
ADD r21, r22
Solution:
The binary addition that takes place in the third instruction is illustrated
below:
00000111 (r21 = 07H)
+ 00001001 (r22 = 09H)
00010000 (result in r21 = 10H)
The half carry bit H = 1, because there was a carry from bit 3 to bit 4.
In the case of the microcontrollers and microprocessors that uses BCD
notation, the half carry flag H is also set if the result in the lower nibble is in
the range $0A − $0F. In such cases the add instruction must be followed by
decimal adjust instruction (e.g., DA A in case of Intel 8051) to bring results
greater than 9 back into range.
Many instructions change automatically the value of the Half Carry Flag
H. Besides that, it is possible to clear or to set the H-bit explicitly through the
instructions CLH and SEH respectively. Also the instructions BCLR 5, can
be used to clear the H-bit (bit 5 in the SREG).
Example 6.14:
Add r0, r1 ; Add the contents of r1 to that of r0 and store the result
; at r1
brhs hset ; Branch if half carry flag is set
......
hset: nop ; Branch destination (Do nothing)
As a signed number, $8C represents −116, which is clearly not the correct
result of 140; therefore, V bit is set. In this example, the carry flag C is zero,
at the same time, there is carry from bit 6 into bit 7, as a result:
Example 6.15:
What are the state of the overflow flag and the contents of r21 after the
execution of the following instruction sequence?
Answer:
V = 0, contents of r21 = $0E
Discussion:
The register r21 is initialised with $FF, which as signed number equals
(–1)10 . The register r20 is initialised with $0F, which equals 1510 . The result
of the addition is (15) + (−1) = 14 = $0E. During adding the two binary
300 System Memory
numbers, there is carry from bit 6 to bit 7 and also there is the normal carry
(C = 1). As mentioned before, the overflow bit V is the exclusive OR of
the carry bit C and the carry generated from bit 6 into bit 7. Accordingly,
V = 1 ⊕ 1 = 0.
Many instructions automatically affect the value of the V-bit. It is possible
also to clear or set the V-bit explicitly by using the instructions CLV, SEV and
BCLR 4 (V is bit 4 in the SREG).
6. Negative Flag (N)
The negative flag N (bit 2 in the status register) indicates a negative result
after the different arithmetic and logic operations. The N-bit is directly con-
nected to bit position 7 of the result. Recall that in two’s complement notation,
a 1 in bit position 7 indicates a negative number, hence the name of the status
bit.
7. Zero Flag (Z)
It indicates zero result after an arithmetic or logical operation. The Z-bit
is set to 1 whenever the result of an operation is 0. It is used by arithmetic
instructions to determine whether or not a result is 0, and by logic operations
such as COMPARE. The latter implements a logical XOR between the word
being tested and the pattern to which it is being compared. Whenever the result
of a comparison is successful, the Z-bit is set to 1.
8. Carry Flag (C)
It indicates a carry in arithmetic or logical operation. It is the overflow of
the 8-bit result. However, the word “overflow” has a specific meaning that will
be explained below. As an example, if the following two binary numbers are
added:
Binary HEX
11101101 ED
+ 10000000 80
= 1 01101101 1 6D
(Carry)
the result generates a carry (i.e., a ninth bit). The 1 generated by this addition is
stored by the ALU in the C-bit, where it can be tested. Special instructions, such
as “ADD with carry” can be used to add the carry automatically to the result
of the next addition. A test can also be performed by the programmer, using
a conditional-branch instruction, to determine whether or not some action
should be undertaken.
6.6 AVR Memory System 301
The carry bit performs another different and independent function than that
mentioned above. It is used as a spill-out during the shift and rotate operations.
When used as a spill-out, the carry behaves again as a ninth bit of the result,
which justifies the merging of these two functions into the same bit.
Example 6.16:
ror r15 ; Rotate right the contents of register r15
C-Flag Bit 7 6 5 4 3 2 1 0
The execution of this instruction results in: shifting all the contents of r15
register one bit to the right, shifting the carry flag C into bit 7 of r15 and put
0 into the carry flag.
• Displacement addressing: The address has two address fields one, at least,
is explicit. The contents A of one of them is used directly, the contents
of the other field refers to a register whose contents are to be added to A
to produce the effective address of the required location.
The indirect and displacement addressing modes allow the user to build a
ring buffers for interim storage of constant values or calculated tables. (Note:
The reader is referred to Chapter 4 for the details of the addressing modes).
The CPU can access the contents of any of the file register but it cannot
access directly the contents of any location of the SRAM. To operate on the
contents of any of the locations of the SRAM, a register is usually used as
interim storage. Special load instructions must be used to read a location of
the SRAM and load it into the intermediate register (LDI, LD, LDD, LDS
instructions). Similarly store instructions (ST, STD and STS instructions)
are to be used to store the contents of the intermediate register into a given
location at the SRAM. The Load and Store instructions contain the location
of the SRAM and the name of the intermediate register. X, Y, and Z-registers
are normally used as pointers to point to the SRAM location. The following
example shows how to copy a value in SRAM to register R2, add it to the
contents of register R3 keeping the result at R3, at the end it writs back the
result to the SRAM.
LDS R2, 0x0075 ; Load register R2 with the
; contents of
; SRAM location 0x0075
ADD R3, R2 ; Add the contents of registers R2
; and R3
; and store the result at R3
STS 0x0075, R3 ; Copy the contents of register R3
; at SRAM
; location 0x0075.
Stack Pointer:
The stack pointer is a 16-bit-pointer, accessible like a port. Two of the I/O
registers are used to form the stack pointer (SP). These are SPH at memory
location $3D and SPL at memory location $3E. Figure 6.25 shows the SP of
ATmega8515/AT90S4414/8515.
The stack pointer keeps track for the address of the top of the stack (TOS),
representing the next available location to store data onto the stack. It works
as follows:
1. When new data (or address) is ready to be placed on to the stack, the
processor uses the SP to find where in memory to place the data or the
address. For example, if the SP contains the value 0×300, the data will
be placed in SRAM memory location 0×300.
2. The SP is then decremented by one (i.e. to location 0×2FF) when data
is pushed onto the stack with the PUSH instruction or decremented by
two (to location 0×2FE) when an address is pushed onto the stack with
subroutine calls and interrupts. After that the processor begins to execute
the next instruction. In this manner, the SP always contains the location
of the next memory cell of the stack that is available to be used.
At some later point, the processor may want to retrieve the last piece of data
(or address) pushed on to the stack. The retrieval process works as follows:
1. When the processor is ready to retrieve or pop data off the stack, it
reads the SP and immediately increments the SP contents by one if we
are popping data by POP instruction and increment it by two if we are
popping an address with RET instruction from a subroutine or RETI
instruction when returning from an interrupt. The new contents of SP
point now to the top of the stack (TOS).
2. The processor uses the address in the SP to pop or retrieve the data from
the stack. Since the data has been retrieved from this location, its address
is left in the SP as being the next available location on the stack.
To construct the stack the programmer must initialize the stack pointer
by loading it with the highest available SRAM address. This is shown in the
following code where we are assuming that the stack grows downwards, i.e.
towards lower addresses!).
The first line is for the assembler to confirm the name selected for register
R16, the next two lines are used to load the upper byte to stack pointer and the
last two lines to load the lower byte of the address to the stack pointer. The
programme starts the stack at the end of the memory space of the processor.
The RAM end is normally referred to by RAMEND which is specific for each
6.6 AVR Memory System 305
processor type. RAMEND is defined in the INCLUDE file for the processor
type. For example, in case of AVR8515, the file 8515def.inc has the line:
The RCALL statement lets the system jumps to the label somewhat which
exists somewhere in the programme code.
Internal SRAM
This is available on most of the AVR processors except the baseline pro-
cessors such as the ATM90S1200. The amount of SRAM varies between 128
bytes to 4K bytes. The SRAM is used for stack as well as storing variables.
Data is usually stored in the internal SRAM starting at the bottom of
the SRAM, and the processor stack (or stacks) start at the top of memory
and utilize memory from the top down. As data is pushed onto the stack, the
stack uses progressively more memory in the SRAM area, starting at the top of
SRAM and working downward. At the same time, the SRAM memory is being
used for the storage of variable starting at the bottom of memory and working
upward. A microcontroller has limited SRAM space, so it is important to be
sure that the stack does not come down or the data does not move up far enough
to interfere with one other. Overwriting the stack with data or overwriting data
with the stack will cause unpredictable results in your programme.
Note: The CodeVisionAVR C language compiler actually implements two
stacks: The system stack starts at the top of the SRAM area and is used to
store return addresses and processor control information. The data stack starts
below the system stack and works its way down through memory, in a manner
similar to the system stack, and is used to store temporary data, such as local
variables used in a function.
External SRAM
This is possible only on the larger processors of the AVR family. Those
processors that have external data and memory access ports (as AT90S8515
and ATmega8515), can use any available external SRAM the user may decide
to implement.
To access the external SRAM we use the same instructions as for accessing
the internal SRAM access. When internal data SRAM is accessed, the read
and write strobe pins (/RD and /WR) are inactive during the whole access
cycle. The external data SRAM physical address locations corresponding to
the internal data SRAM addresses cannot be reached by the CPU. To enable
the use of the external SRAM we must set the SRE bit in the memory control
register- MCUCR.
Because of the limited number of the available pins of the chip many pins
has more than one function; alternative functions. The Port A and Port C pins
have alternative functions related to the operational external data SRAM. The
alternative functions of Port A and Port C pins are activated when we enable
the signal XMEM (eXternal MEMory). Enabling XMEM allows the use of all
the address space outside the internal SRAM by using Port A and Port C. In
308 System Memory
Figure 6.27 External data SRAM memory cycle without wait state.
The external data SRAM is enabled by setting the External SRAM enable
bit SRE located at MCUCR control register. Setting the external memory
(XMEM) interface results in overriding the setting of the data direction reg-
isters of Port A and Port C (The ports dedicated to the interface in case of
ATmega). The XMEM interface will auto-detect whether an access is internal
or external. If the access is external, the XMEM interface will output address,
data, and the control signals on the ports according to Figure 6.27 (this figure
shows the wave forms without wait states). At the falling edge of ALE goes
there will be a valid address on port A (AD7:0). ALE is low during a data
transfer. When the XMEM interface is enabled, also an internal access will
cause activity on address-, data-, and ALE ports, but the /RD and /WR strobes
will not toggle during internal access.
When the SRE bit is cleared (zero), the external data SRAM is disabled,
and the normal pin and data direction settings are used.
Figure 6.28 sketches how to connect an external SRAM to the AVR using
8 latches (an octal latch) which are transparent when G is high.
Default, the external SRAM access is a three-cycle scheme as depicted in
Figure 6.27. When one extra wait state is needed in the access cycle, set the
SRW bit (one) in the MCUCR register. The resulting access scheme is shown
in Figure 6.28. In both cases, note that PORTA is data bus in one cycle only.
As soon as the data access finishes, PORTA becomes a low order address bus
again.
Summary of “How to access External SRAM”
• The same instructions are used to access the internal and the external
SRAM. The /RD and /WR signals are only active when we access the
310 System Memory
Bit 7 6 5 4 3 2 1 0
SRE SRW10 SE SM1 ISC11 ISC10 ISC01 ISC00
Access R/W R/W R/W R/W R/W R/W R/W R/W
Initial Value 0 0 0 0 0 0 0 0
Figure 6.29 MCUCR register.
Figure 6.31 ALU execution consisting of register fetch, execute, and write back.
the SRAM address. The pointer register is one of the pointer registers (X, Y,
or Z register pairs).
• The first clock cycle is needed to access the register file and to operate
upon the pointer register (the SRAM access instructions allow pre/post-
address increment operation on the pointer register).
• At the end of the first cycle, the ALU performs this calculation, and then
this address is used to access the SRAM location and to write into it (or
read from it into the destination register), as shown in the Figure 6.32.
Both EPROM and Flash memories are available for internal storage of pro-
grammes in MCS-51 microcontrollers. Table 6.4 shows some of the devices
that can be used as standalone microcontrollers.
When downloading a programme into the microcontroller, the two types of
memory are programmed in very similar ways. The main difference between
the types lies in the method of erasing the memory prior to reprogramming.
The EPROM is erased by exposure to ultra-violet light for a period up to
30 minutes. The Flash memory is electrically-erased by a combination of
programming signals.
The basic circuit for programming the 4 kbyte versions of both types of
device is shown in Figure 6.33.
To write a programme into the EPROM/Flash, the programming-unit puts
the addresses of the programme bytes onto Port 1 and onto lines 0 to 3 of
Port 2. The bytes of the programme code are put onto Port 0. The data is
transferred according to the bit settings shown in Tables 6.5 (EPROM) and
6.6 (Flash). The 8052 has a larger memory and therefore also uses pin P2.4 as
an address line.
Figure 6.33 Circuit connections for programming the 8751AH and 8951.
Table 6.5 Bit settings for programming the 8751 EPROM device.
Action RST /PSEN ALE /EA P2.6 P2.7 P3.6 P3.7
Programme code 1 0 50 ms low pulse Vpp 0 1 X X
Verify code 1 0 1 1 0 0 X X
Security lock 1 0 50 ms low pulse Vpp 1 1 X X
Table 6.6 Bit settings for programming the 8951 Flash device.
Action RST /PSEN ALE /EA P2.6 P2.7 P3.6 P3.7
Programme code 1 0 50 ms low pulse Vpp 0 1 1 1
Verify code 1 0 1 1 0 0 1 1
Security lock 1 0 50 ms low pulse Vpp 1 1 1 1
Erase chip 1 0 10 ms low pulse Vpp 1 0 0 0
The required values of Vpp on the /EA pin vary considerably. The EPROMs
required 21 volts, while the Flash memory can be programmed with either
5 or 12 volts depending on the version of the chip. The security lock can be
set to prevent unauthorized people from reading or changing the code in the
programme memory.
/CE is low the device is turned on, and when it is high the data lines change to
high-impedance mode. The /OE (Output Enable) toggles the device between
programming and operating mode. In normal use it is connected to Ground.
When the programme is stored entirely in an external EPROM, the /EA
pin on the microcontroller must be grounded. However, when the /EA pin
is connected to +5 V, the addresses in the programme between 0000 h and
OFFFh (the lowest 4096 addresses) will be fetched from the internal EPROM,
and addresses above OFFFh will be loaded from an external EPROM. (In the
case of the BH versions of the chip, this limit is 1FFFh.)
Expansion of code memory
An external EPROM can be added to the 8751 circuit by using the 16 pins
of ports P0 and P2 as the data and address buses. The circuit configuration is
shown in Figure 6.35.
The 74LS373 data latch has the connections shown in Figure 6.36.
The EN (Enable) effectively closes all the switches when it is high, and
transfers the binary values D0 to D7 to the memory units. This is called the
316 System Memory
Figure 6.36 The 74LS373 data latch and its equivalent switching circuit.
“transparent” state, in which the outputs are the same as the inputs. When EN
goes low, the outputs Q0 to Q7 retain the values in the memory units until new
values are received. The /OC (Output Control) pin is usually grounded.
The timing diagram for expanded code memory
The timing diagram for the 8051 connected to an external EPROM is shown
in Figure 6.37. Because of using address/data multiplexing, every machine
cycle is divided into six state cycles, each comprising two clock cycles. For
one and a half state cycles (3 clock cycles) the data outputted on P0 is the
low byte of the address. During the next three clock cycles, the instruction is
6.7 Intel Memory System 317
read from the EPROM pins D0–D7 into P0. The ALE and /PSEN lines are
low twice in each machine cycle, which allows two fetches from the EPROM
during that period. Thus it is possible to fetch an opcode and its operand from
the EPROM in one machine cycle.
The microcontroller reads the data from the EPROM at the mid-point of
State 1 and State 4, just before /PSEN disables reading.
Figure 6.39 8031 chip with 64 kbyte EPROM and 32 kbyte RAM.
bus. As seen in Figure 6.39, the address and data lines are wired in par-
allel, but the use of the /PSEN, /WR and /RD lines ensures that only the
correct bytes are transferred between the microcontroller and the EPROM or
RAM. Although this configuration may be used with the EPROM or Flash
versions of the MCS-5 1, it is most commonly used with the ROMless 8031
as shown.
6.8 Summary of the Chapter 319
Figure 6.40 Address decoding to give specific addresses for 8255A registers.
320 System Memory
Figure 6.41 Using an FPGA to decode addresses for several 8255A chips.
323
324 Timers, Counters and Watchdog Timer
7.1.1 Counters
Two commonly used types of counters are binary counters and linear-feedback
shift registers. An N-bit binary counter sequences through 2N outputs in
binary order. It has a minimum cycle time that increases with N. An N-
bit linear-feedback shift register sequences through up to 2N − 1 outputs in
pseudo-random order. It has a short minimum cycle time independent of N,
so it is useful for extremely fast counters as well as pseudo-random number
generation.
In general, divide-by-M counters (M < 2N ) can be built using an ordinary
N-bit counter and circuitry to reset the counter upon reaching M. M can be a
programmable input if an equality comparator is used.
reset, the registers must be initialized to a nonzero value (e.g., all 1’s). The
pattern of outputs for the LFSR is shown in Table 7.1.
This LFSR is an example of a maximal-length shift register because its
output sequences through all 2n − 1 combinations (excluding all 0’s). The
inputs fed to the XOR are called the tap sequence and are often specified with
a characteristic polynomial. For example, the 3-bit LFSR of Figure 7.3 has the
characteristic polynomial 1 + x 2 + x 3 because the taps come after the second
and third registers.
The output Y follows the 7-bit sequence [1110010]. This is an example of
a pseudo-random bit sequence (PRBS) because it is spectrally random. LFSRs
are used for high-speed counters and pseudo-random number generators. The
pseudo-random sequences are handy for built-in self-test and bit-error-rate
testing in communications links. They are also used in many spread-spectrum
communications systems such as GPS and CDMA where their correlation
properties make other users look like uncorrelated noise.
Table 7.2 lists characteristic polynomials for some commonly used
maximal-length LFSRs. For certain lengths, N , more than two taps may be
required. For many values of N , there are multiple polynomials resulting in
different maximal-length LFSRs. Observe that the cycle time is set by the
register and XOR delays, independent of N .
7.1 Introduction to Timers and Counters 327
Example 7.1
Use Table 7.2 to sketch an 8-bit linear-feedback shift register. How long
is the pseudo-random bit sequence that it produces?
Solution:
Figure 7.4 shows an 8-bit LFSR using the four taps after the first, sixth,
seventh, and eighth bits, as given in Table 7.2. It produces a sequence of
28 − 1 = 255 bits before repeating.
7.1.2 Timers
A timer, as mentioned before, is a series of divide-by-2 flip-flops that receive
an input signal as a clocking source. The clock is applied to the first flip-flop,
which divides the clock frequency by 2. The output of the first flip-flop clocks
the second flip-flop, which also divides by 2, and so on. Since each successive
stage divides by 2, a timer with n stages divides the input clock frequency
by 2n . The output of the last stage clocks a timer overflow flip-flop, or flag,
which is tested by software or generates an interrupt. The binary value in the
timer flip-flops can be thought of as a “count” of clock pulses since the timer
was started. This ‘count’ can be converted to time by multiplying it by the
328 Timers, Counters and Watchdog Timer
CK/8
Presecaler
CK/64
External Pin T0
be the same as the input signal, or it may have half the frequency (double
the period), one-fourth the frequency, one-eighth the frequency, etc. Thus,
a prescaler can be used to extend a timer’s range, by reducing the timer’s
resolution. For example, consider a timer with a resolution of 10 ns and a
range of 65, 535 ∗ 10 nanoseconds = 655.35 microseconds. If the prescaler
of such a timer is configured to divide the clock frequency by eight, then the
timer will have a resolution of 80 ns and a range of 65, 535 ∗ 80 nanoseconds
= 5.24 milliseconds. Figure 7.6 shows the prescaler configuration.
Timers, as mentioned before, are the most commonly device used in embedded
and digital systems. The following list shows some of the possible use of the
timer devices.
1. Real Time Clocks: Real time clock is a clock, which, once the system
starts, does not stop and cannot be reset and its count value cannot be
reloaded.
2. Initiating an event after a preset delay time.
3. Initiating an event (or chain of events) after a comparison(s) between the
preset time(s) with counted value(s). Preset time is loaded in a Compare
Register.
4. Capturing the count value at the timer on an event. The information of
time (instant of event) is thus stored at the capture register.
5. Finding the time interval between two events. Time is captured at each
event and the intervals are thus found out.
6. Wait for a message from a queue or mailbox or semaphore for a preset
time using RTOS (Real Time Operating System). There is a predefined
waiting period before RTOS lets a task run.
7. Watchdog timer. It resets the system after a defined time.
8. Baud or Bit Rate Control for serial communication on a line or network.
Timer timeout interrupts define the time of each bit or each baud.
9. Input pulse counting when using a timer, which is ticked by an external
non-periodic event rather than the clock input. The timer in this case is
acting in the “counter mode”.
10. Scheduling of various tasks.
332 Timers, Counters and Watchdog Timer
11. Time slicing of various tasks. RTOS (Real Time Operating System)
switches after preset time-delay from one running task to the next. Each
task can therefore run in a predefined slot of time.
12. Time division multiplexing (TDM). Timer device is used for multiplexing
the input from a number of channels. Each channel input is allocated a
distinct and fixed-time slot to get a TDM output.
loop completes, we know that the desired time passed. This implementation of
a timer on a dedicated general-purpose processor is obviously quite inefficient
in terms of size. One could alternatively incorporate the timer functionality
into a main programme, but the timer functionality then occupies much of
the program’s run time, leaving little time for other computations. Thus, the
benefit of assigning timer functionality to a special-purpose processor becomes
evident.
A PIT usually consists of three 16-bit counters which can be made to count
down from a programmed value to zero. Most microcontrollers consist of two
16-bit and one 8-bit counters with at least one of them is an up/down counter.
The rate at which counting down (or up) occurs can be fixed by using the
microprocessor system clock or some other external clock. A block diagram
of a PIT is shown in Figure 7.7 with two of the timers using the system
clock to control decrementing and one as an alternative clock and provides
an even greater flexibility when using the PIT, for producing programmable
timer delays and programmable frequency signals.
The general programmer’s model of the PIT consists of a command register
for selecting the type of operation required from each timer and several timer
count registers, usually 16 bits, into which is stored the initial count value.
The general programmer’s model of a PIT is shown in Figure 7.8.
System Clock
Timer 1 Output
Timer 1 Input
Timer 2 output
Timer 2 Input PIT
Timer 3 output
Timer 3 input
Decrementing
Clock 0 Timer 0 (16 bits) Timer 0 O/P
The smallest count value that can be loaded into a timer count register is
02H to allow the highest programmed frequency to be obtained, as the output
is activated half way through the count. On decrementing from 02H to 01H
the timer output is activated and at the next decrementing clock pulse the
count is decremented to 00H and the timer output is returned to a logic 0.
The timer count register is then automatically reloaded with the initial count
value, in this instance 02H, and the process starts again. Loading a value of
FFFFH gives the largest count possible and produces the lowest programmed
frequency. The rate at which each counter is decremented is determined by
the decrement clock, which has an input to each individual timer and will be
provided by the hardware of the microcontroller.
Example of programmable one-shot.
Assuming a hardware decrementing clock frequency of 1.19318 MHz,
which is the frequency used in the IBM PC, the highest possible programmable
frequency output produced by the timer is:
Decrementing clock frequency/Count value = 1.19318/2
= 0.59659 MHz
The lowest possible programmable frequency output from the timer is ob-
tained with the largest count value, which for a 16-bit register is 65536(= 216 ).
This produces a programmed frequency of 1.19318/65536 = 18.206 Hz.
In the one-shot mode, the programmable delay can be calculated from the
following formula:
Programmed delay = Count value/Decrementing clock frequency
336 Timers, Counters and Watchdog Timer
Therefore the shortest possible delay is obtained when the count value =
01H which, for the same decrementing clock frequency used above, produces
a programmed delay of = 1/1.19318 × 106 = 0.838 µs.
The longest programmed delay is when the largest count value of 65 536
is used. The programmed delay is then:
functions of the PIT and which are normally covered by Timers 0, 1, and 2 of
other microcontrollers. The following sections will discuss the counter unit,
the timer prescaler and input selector the features that are common to all of
the timers. The compare circuit which is common for all Timer 1 and Timer
2 is given. The most common uses for each timer/counter will be discussed,
although many timers have more functions than are discussed in this text. For
any specific microcontroller, it is necessarily to check the specifications to
determine all of the various functions possible with the timer/counters.
7.4 TIMER 0
Timer 0 is typically an 8-bit timer, but this varies by specific processor type.
It is capable of the usual timer/counter functions but is most often used to
create a time base or tick for the programme. As any timer/counter, Timer
0 has a control register; Timer counter control register 0, TCCR0. TCCR0
controls the function of Timer 0 by selecting the clock source applied to
Timer 0. Figure 7.13 shows the bit definitions for TCCR0. The five most
significant bits are (normally) reserved. ATMega8515 is using them to control
the additional modes added to Timer 0 to cover the modes of Timer 2 of other
microcontrollers (This will be discussed latter).
Timer 0 can be used to provide a highly accurate timing event by using a
simple programme. A programme tick, like the tick of a clock, is discussed
7.4 TIMER 0 341
here. The overall scheme is that a number is selected and loaded into the timer.
The timer counts from this number up to 255 and rolls over. Whenever it rolls
over, it creates an interrupt (or a tick). The interrupt service routine reloads
the same number into the timer, executes any time-critical activities that may
be required, and then returns to the programme. The cycle then repeats, with
the counter counting up from the number that was loaded to 255, and rolls
over creating another interrupt. The interrupt, then, is occurring on a regular
basis when each time period has elapsed. The number loaded into the counter
determines the length of the period. The lower the number, the longer it will
take the timer to reach to 255 and roll over and the longer the period of the
tick will be.
Example 7.2
Write a programme that toggles the state of an LED every 0.5 seconds.
Assume that the LED is attached to the most significant bit of port A as shown
in Figure 7.14 and the microcontroller uses a 4 MHz clock.
Solution:
In this example we are using Timer 0 as a timer tick. The first necessary task
is to determine the number that is loaded into the timer each time the interrupt
occurs. In this example, we want the LED to toggle every 0.5 seconds. One
obvious solution would be for the interrupt to occur every 0.5 seconds, in other
words the timer has to generate a square wave of a period of 0.5 seconds. Each
cycle of the square wave is a timer tick. In the case of Timer 0, the slowest
setting of the clock prescaler is system clock/1024.
4 MHz/1024 = 3.906 kHz which has a period of 1/3, 906 kHz = 256 µus
This shows that every 256 microseconds another clock pulse will be ap-
plied to Timer 0. Timer 0 is an 8-bit timer/counter, and so it can count up to
342 Timers, Counters and Watchdog Timer
256 such periods before it rolls over. So the total time that is possible with this
hardware is as follows:
256 ∗ 256µs = 65.536 ms
Sixty-five milliseconds are not sufficient to time 500 millisecond events.
In order to accomplish longer periods, a global counter variable is placed in
the interrupt service routine of the tick. So, for instance, if the tick occurs
every 50 milliseconds, in this example we would want the counter to count up
to 10 before toggling the LED (10 ∗ 50 ms = 500 ms).
The choice of a reload number is up to the programmer. Usually it is
desirable that the reload number produce an even and easy-to-use time delay
period, such as 1 millisecond or 10 milliseconds. In the case of the 4 MHz
clock shown in the example, using the divide-by-eight prescaler will give a
clock period of 2 microseconds for each clock pulse applied to the counter.
Given a 2-microsecond clock applied to an 8-bit counter, the maximum time
possible using the counter alone would be as follows:
2µs ∗ 256 counts = 512µs
512 microseconds is not a very even period of time to work with, but using
250 counts would give a timeout period of 500 microseconds. Therefore the
7.4 TIMER 0 343
256 − 250 = 6
Rollover occurs at count number 256 and it is desirable for the counter to
count 250 counts before it rolls over, and therefore the reload number in this
case is 6. This means that the interrupt service routine will be executed once
every 500 microseconds (2 µs/clock cycle ∗ 250 clock cycles/interrupt cycle
= 500 µs/interrupt cycle). A global variable will be used to count to 1000 to
produce the entire 500 milliseconds time period (500µs ∗ 1000 = 500 ms).
The entire programme is shown in Figure 7.15.
Figure 7.15 illustrates all of the concepts discussed above. The timer/
counter register itself, timer counter 0 (TCCNT0), is reloaded with the value 6
each time the ISR executes so that it can once again count up 250 steps to reach
256 and roll over.
A global variable called “timecount” is used to keep track of the number
of times that the interrupt service routine is executed by incrementing it each
time the ISR is executed. “timecount" is both incremented and checked inside
//**********************************************************
#include <ATmega8518.h>
unsigned int timecount = 0; //global time counter
the expression of the if statement. When “timecount” reaches 1000, the most
significant bit of port A is toggled, and “time-count” is reset to zero to count
up for the next 500-millisecond period. This tick could also handle any other
event that occurs on an even number of 500 microsecond increments in the
same manner.
7.5 Timer 1
The 16-bit timer, typically Timer 1, is a much more versatile and complex pe-
ripheral than Timer 0. In addition to the usual timer/counter, Timer 1 contains
a 16-bit input capture register (ICR1 Register), two 16-bit output compare
registers (OCR1A/B Registers) and is controlled through a control register
“Timer/counter control register 1 (TCCR1)”. The input capture register is used
for measuring pulse widths or capturing times. The output compare registers
are used for producing frequencies or pulses from the timer/counter to an
output pin on the microcontroller. Each of these modes will be discussed in the
sections that follow. Remember, however, that although each mode is discussed
separately, the modes may be, and often are, mixed together in a programme.
Timer 1 is also conceptually very different from Timer 0. Timer 0 is usually
stopped, started, reset and so on in its normal use. Timer 1, on the other
hand, is usually left running. This creates some considerable differences in its
use. These differences will be discussed in detail in the sections that follow,
covering the special uses of Timer 1.
subtracted to find the time that it took for the event to occur. In Timer 1, these
tasks are managed by the input capture register (ICR1).
ICR1 is a 16-bit register that will capture the actual reading of Timer 1
when the microcontroller receives a certain signal. The signal that causes a
capture to occur can be either a rising or a falling edge applied to the input
capture pin, ICP, of the microcontroller. As shown in Figure 7.16 the choice
of a rising or falling edge trigger for the capture is controlled by the input
capture edge select bit, ICES1. Setting ICES1 will allow ICR1 to capture the
Timer 1 time on a rising edge, and clearing it will allow ICR1 to capture the
time on a failing edge.
As is probably obvious by now, since there is only one capture register
available to Timer 1, the captured contents must be read out as soon as they
are captured to prevent the next capture from overwriting and destroying
the previous reading. In order to accomplish this, an interrupt is provided
that occurs whenever new data is captured into ICR1. Each time the capture
interrupt occurs, the programme must determine whether the interrupt signals
the beginning or the ending of an event that is being timed, so that it can treat
the data in ICR1 appropriately.
Timer 1 also provides an input noise canceller feature, to prevent mis-
cellaneous unwanted spikes in the signal applied to the ICP from causing a
capture to occur at the wrong time. When the noise canceller feature is active,
the ICP must remain at the active level (high for a rising edge, or low for a
falling edge) for four successive samples before the microcontroller will treat
the trigger as legitimate and capture the data. This prevents a noise spike from
triggering the capture register. Setting the input capture noise canceller bit,
ICNC1, in TCCR1B enables the noise canceller feature.
In Section 7.8.1, we gave an example of how to use the capture regis-
ter to measure the width of a pulse applied to the input capture pin of the
microcontroller.
+ 12 V
Average DC =3V
PWM Basics
PWM is the scheme in which the duty cycle of a square wave is varied
to provide a varying DC output by filtering the actual output waveform to
get the average DC. The basic drive waveform is shown in Figure 7.18 and
Figure 7.19 illustrates this principle.
As is shown in Figure 7.19, varying the duty cycle (proportion of the cycle
that is high), or equally changing the mark/space ratio (MSR), will vary the
average DC voltage of the waveform. The waveform is then filtered and used
to control analogue devices, creating a digital-to-analogue converter (DAC).
Examples of PWM applications in control and the circuits that can be used to
provide the filtering action are shown in Figure 7.20.
Figure 7.20 demonstrates some typical circuits that are in use to filter the
PWM signal. In Figure 7.20 at A, the RC circuit provides the filtering. The
7.5 Timer 1 349
time constant of the RC circuit must be significantly longer than the period
of the PWM waveform. Figure 7.20 at B shows an LED whose brightness is
controlled by the PWM waveform. Note that in this example a logic 0 will
turn the LED on, and so the brightness will be inversely proportional to the
PWM. In this case, our eyes provide the filtering because we cannot distinguish
frequencies above about 42 Hz, which is sometimes called the flicker rate. In
this case the frequency of the PWM waveform must exceed 42 Hz, or we will
see the LED blink.
The final example, Figure 7.20 at C, shows DC motor control using PWM.
The filtering in this circuit is largely a combination of the mechanical inertia of
the DC motor and the inductance of the windings. It simply cannot physically
change speed fast enough to keep up with the waveform. The capacitor also
adds some additional filtering, and the diode is important to suppress voltage
spikes caused by switching the current on and off in the inductive motor.
One method of creating PWM with Timer 1 would be to use the out-
put compare register and, each time a match occurs, vary the increment
number being reloaded to create the PWM waveform. However, Timer 1
provides a built-in method to provide PWM without the need for the pro-
gramme to be constantly servicing the compare register to create a pulse width
modulator.
PWM Applications
1. PWM is the common technique used to control the speed of DC motors
using an H-bridge. In this application, the duty cycle is varied between
350 Timers, Counters and Watchdog Timer
For example, choosing 9-bit resolution will result in a top count of 511
and the PWM frequency as calculated below (given an 8 MHz system clock):
a particular resolution of the PWM, the choices from the table would be limited
by the resolution, and the frequency might not have been as close.
Figure 7.23 also demonstrates that the frequency choices for any given
combination of crystal, prescaler, and PWM resolution are fairly limited. If a
project requires a very specific PWM frequency, the system clock crystal will
likely have to be chosen to allow this frequency to be generated at the desired
resolution.
In Section 7.8.4 (Application 4) we discussed how to use PWM to provide
four different duty cycles (10, 20, 30, and 40 percent at a frequency close to
2 kHz).
7.6 Timer 2
Bit Description
PWM2 Setting this bit enables the Timer 2 PWM function
COM21 These two bits set the output compare mode function. The bit definitions are the
COM20 identical to COM1x1 and COM1x0 bits of Timer 1
CTC2 Set to enable counter clear on compare match
CS22
CS21 Counter prescaler select bits for Timer 2.
CS20
Setting the AS2 bit allows Timer 2 to use the external oscillator as its clock
source. This means that the clock source for Timer 2 is running asynchronously
to the microcontroller system clock. The other three bits in the ASSR register
are used by the programmer to ensure that data is not written into the Timer
2 registers at the same moment that the hardware is updating the Timer 2
registers. This is necessary because the oscillator of Timer 2 is asynchronous
to the system oscillator, and it is possible to corrupt the data in the Timer 2
registers by writing data to the registers as they attempt to update.
A single control register, TCCR2, controls the operation of Timer 2. The
TCCR2 bit definitions are shown in Figure 7.25.
Using for instance, a 32.768 kHz crystal allows Timer 2 to function as
the time base for a real-time clock. Using Timer 2 in this way will allow the
microcontroller to keep accurate time when necessary.
Bit Description
FOCO Force Output Compare: Set only when the WGM00 bit specifies a non-PWM mode.
WGM00 Waveform Generation Mode:
WGM01 These bits control the counting sequence of the counter, the source for the maximum (TOP)
counter value, and what type of waveform generation to be used. Modes of operation
supported by the Timer/Counter unit are: Normal mode, Clear Timer on Compare Match
(CTC) mode, and two types of Pulse Width Modulation (PWM) modes. See Table -7.3.
COM01 Compare Match Output Mode
COM00 These bits control the Output Compare pin (OC0) behaviour. If one or both of the
COM01:0 bits are set, the OC0 output overrides the normal port functionality of the I/O pin
it is connected to. However, note that the Data Direction Register (DDR) bit corresponding
to the OC0 pin must be set in order to enable the output driver.
CS02, CS01, Clock Select:
CS00 The three Clock Select bits select the clock source to be used by the Timer/Counter as given
in case of Timer 0.
with the counter value (TCNT0). A match can be used to generate an output
compare interrupt, or to generate a waveform output on the OC0 pin.
The operation of ATmega8515 Timer 0 is the same as that of Timer 1;
accordingly, no further explanation is included.
to reach, others, such as space probes, are simply not accessible to human
operators. If their software ever hangs, such systems are permanently disabled.
In other cases, the speed with which a human operator might reset the system
would be too slow to meet the uptime requirements of the product.
For those embedded systems that can’t be constantly watched by a human,
and need to be automatically restarted after failing to give a response within
a predefined period of time, watchdog timers may be the solution.
A watchdog timer is a piece of hardware, often built into a microcontroller
that can be used to automatically detect software anomalies and reset the
processor if any occur. Generally speaking, a watchdog timer is based on a
counter that counts down, at constant speed, from some initial value to zero.
The embedded software selects the counter’s initial value (a real-time value)
and periodically restarts it. It is the responsibility of the software to set the
count to its original value often enough to ensure that it never reaches zero.
If the counter ever reaches zero before the software restarts it, the software
is presumed to be malfunctioning and the processor’s reset signal is asserted.
The processor (and the embedded software it’s running) will be restarted as if
a human operator had cycled the power.
The watchdog timer can be considered as a special type of timer. We
configure a watchdog timer with a real-time value, just as with a regular timer.
However, instead of the timer generating a signal for us every X time units,
we must generate a signal for the watchdog timer every X time units. If we
fail to generate this signal in time, then the timer “times out” and generates a
signal indicating that we failed.
Timers and Watchdog timers: A watchdog is a special type of timer. We
configure a watchdog timer with a real-time value, just as with a regular
timer. However, instead of the timer generating a signal for us every X time
units, we must generate a signal for the watchdog timer every X time units.
If we fail to generate this signal in time, then the timer “times out” and
generates a signal indicating that we failed.
Figure 7.27 shows a typical arrangement. In the figure, the watchdog timer
is a chip external to the processor. However, it could also be included within
the same chip as the CPU. This is done in many microcontrollers. In either
case, the output from the watchdog timer is tied directly to the processor’s
reset signal.
A simple example is shown in Listing 1. Here we have a single infinite loop
that controls the entire behavior of the system. This software architecture is
356 Timers, Counters and Watchdog Timer
Reset
Watchdog Timer
Processor
Reset
Clock
checkreg
(a)
(b) (c)
Figure 7.28 ATM timeout using a watchdog timer : (a) timer structure, (b) main pseudo-code,
(c) watchdog reset routine.
358 Timers, Counters and Watchdog Timer
When multitasking kernels are used, deadlocks can occur. For example, a
group of tasks might get stuck waiting on each other and some external signal
that one of them needs, leaving the whole set of tasks hung indefinitely.
If such faults are transient, the system may function perfectly for some
length of time after each watchdog-induced reset. However, failed hardware
could lead to a system that constantly resets. For this reason it may be wise to
count the number of watchdog-induced resets, and give up trying after some
fixed number of failures.
When using the watchdog timer to enable an embedded system to restart
itself in case of a failure, we have to modify the system’s programme to
include statements that reset the watchdog timer. We place these statements
such that the watchdog timer will be reset at least once during every time out
interval if the programme is executing normally. We connect the fail signal
from the watchdog timer to the microprocessor’s reset pin. Now, suppose the
programme has an unexpected failure, such as entering an undesired infinite
loop, or waiting for an input event that never arrives. The watchdog timer will
time out, and thus the microprocessor will reset itself, starting its programme
from the beginning. In systems where such a full reset during system operation
is not practical, we might instead connect the fail signal to an interrupt pin,
and create an interrupt service routine that jumps to some safe part of the
programme. We might even combine these two responses, first jumping to an
interrupt service routine to test parts of the system and record what wrong,
and then resetting the system. The interrupt service routine may record infor-
mation as to the number of failures and the causes of each, so that a service
technician may later evaluate this information to determine if a particular
part requires replacement. Note that an embedded system often must self-
recover from failures whenever possible, as the user may not have the means
to reboot the system in the same manner that he/she might reboot a desktop
system.
Another common use is to support time outs in a programme while keeping
the programme structure simple. For example, we may desire that a user re-
spond to questions on a display within some time period. Rather than sprinkling
response-time checks throughout our programme, we can use a watchdog timer
to check for us, thus keeping our programme neater.
It is important to note here that, it is also possible to design the watchdog
timer hardware so that a reset signal that occurs too soon will cause the
watchdog timer to reset the system. In order to use such a system, very precise
knowledge of the timing characteristics of the main loop of your programme
is required.
360 Timers, Counters and Watchdog Timer
If sanity checks OK
Reset the watchdog
Else
Record failure
Flag 1= TRUE;
Flag 2= TRUE;
Flag 3=TRUE;
Figure 7.30 Use three flags to check that certain Points within the main loop have been visited.
watchdog from within the waiting loop. If there are many such places in your
software, control of the watchdog can become problematic.
System initialization is a part of the code that often takes longer than the
watchdog timer’s maximum period. Perhaps a memory test or ROM to RAM
data transfer slows this down. For this reason, some watchdogs can wait longer
for their first reset than they do for subsequent resets.
As threads of control are added to software (in the form of ISRs and
software tasks), it becomes ineffective to have just one place in the code
where the watchdog is reset.
Choosing a proper reset interval is also an important issue, one that can
only be addressed in a system-specific manner.
persistent data the device requires, and whether that data is stored regularly
and read after the system resets.
This allows the processor time to initialize, without having to worry about
the watchdog biting.
While the watchdog can often respond fast enough to halt mechanical sys-
tems, it offers little protection for damage that can be done by software alone.
Consider an area of non-volatile RAM which may be overwritten with rubbish
data if some loop goes out of control. It is likely that such an overwrite would
occur far faster than a watchdog could detect the fault. For those situations you
need some other protection such as a checksum. The watchdog is really just
one layer of protection, and should form part of a comprehensive safety net.
it is supposed to be running. The watchdog timer has an output that has the
capability to reset the controller.
The Watchdog Timer (Figure 7.31) is clocked from a separate on-chip
oscillator which runs at 1 MHz. This is the typical value at VCC = 5 V. (See
characterization data for typical values at other VCC levels). By controlling
the Watchdog Timer prescaler, the Watchdog reset interval can be adjusted, see
Table 7.4 for a detailed description. The WDR-Watchdog Reset — instruction
resets the Watchdog Timer. Eight different clock cycle periods can be selected
to determine the reset period. If the reset period expires without another
Watchdog reset, the AT90S4414/8515 resets and executes from the reset
vector.
To prevent unintentional disabling of the watchdog, a special turn-off
sequence must be followed when the watchdog is disabled. Refer to the
description of the Watchdog Timer Control Register for details.
The different conditions for the watchdog timer are defined by the contents of
the watchdog timer control register (WDTCR).
The Watchdog Enable bit WDE enables the watchdog function if set.
Otherwise, the watchdog is disabled and no check of programme flow occurs.
The bits WDP2, WDP1 and WDP0 determine the watchdog timer prescal-
ing when the watchdog timer is enabled. The different prescaling values and
their corresponding timeout periods are shown in Table 7.4.
366 Timers, Counters and Watchdog Timer
Bit Description
OWDTE Watch Dog Turn-Off Enable: Set when the WDE bit is cleared. Otherwise, the watchdog will
not be disabled. Once set, hardware will clear this bit to zero after four clock cycles
WDE Watch Dog Enable: Set to enable the Watchdog Timer. If the WDE is cleared (zero) the
Watchdog Timer function is disabled. WDE can only be cleared if the WDTOE bit is set.
WDP2 Watchdog Timer Prescaler: They define the Watchdog Timer prescaling when the Watchdog
WDP1 Timer is enabled (see Table-4.3)
WDP0
Watchdog Timer Control Register WDTCR
;**************************************************
void main(void)
{
WDTCR = 0xB; //enable WDT and set to
//120ms timeout
while (1)
{
#asm(‘‘wdr’’) // reset the watchdog
// timer
If (expression) // if true disable the
// WDT
{
WDTCR = 0x18 // set both WDE and
// WDTOE
WDTCR = 0x00 // clear WDE
\ldots .
\ldots ..
/* other code here*/
}
}
}
;**************************************************
368 Timers, Counters and Watchdog Timer
In the given programme fragment, the third line enables the watchdog timer
and sets the timeout to 120 milliseconds. This is accomplished by setting
the WDE bit and the watchdog prescaler bits in a single write to WTCR.
The watchdog timer is reset each time the while(1) loop is executed by the
“#asm(“wdr”)” instruction. This is an assembly code instruction (Watch Dog
Reset) not available in the C language, and it must be executed before the
watchdog timer has an opportunity to timeout.
The “if (expression)” is used to determine when to disable the watchdog
timer. When “expression” is found to be True, the WDT will be disabled.
Disabling the watchdog timer, as mentioned before, is a two-step process.
Lines nine and ten represent the two steps.
The Approach
To implement a time domain measurement, in general, a known signal is
measured for an unknown time. In our case, the known signal is the timer
clock and the unknown signal is the pulse that needs to be measured (See
Figure 7.32).
7.8 Timer Applications 369
In Figure 7.32, if the rising edge of the unknown signal is used to enable a
second signal of known frequency into a counter which will be disabled with
the falling edge of the unknown signal, the value in the counter at the end
of the unknown time will provide a measure of the duration of the unknown
signal. The resolution of the measurement is a direct function of the frequency
of the clock to the counter, fcounter .
If from the rising edge to the falling edge of the unknown pulse the counter
counts N, the time period Tp of the unknown pulse will be:
Tp = N/fcounter
For example, if the known frequency to the counter is 10 MHz and at the
end of measurement 7500 counts have accrued, then:
Solution
The hardware and software shown in Figures 7.33 and 7.34, respectively,
demonstrate the use of the input capture register. This hardware and software
are used to measure the width of a positive-going pulse applied to the ICP of
the microcontroller and output the result, in milliseconds, on port C.
7.8 Timer Applications 371
The software in Figure 7.34 has several features worth noting. #define
statements are used to connect a meaningful name with the output port and
also to define the input capture pin, ICP, so it can be tested in the software.
The Interrupt Service Routine (ISR) for the Timer 1 overflow is used to
handle the possibility that the period of the unknown pulse is greater than
the range of the timer/counter. The ISR does nothing more than increment an
overflow counter when the overflow occurs during a pulse measurement. The
number of overflows is used in the calculation for the pulse width.
The ISR for the Timer 1 input capture event checks the actual state of
the ICP to determine whether a rising or a falling edge has occurred. If the
interrupt was triggered by a rising edge (ICP is now high), the programme
must be starting to measure a new pulse, and the TSR records the rising edge
time, changes the trigger sense on the input capture register to capture on a
falling edge (to capture the end of the pulse), and clears the overflow counter
for this measurement. If, on the other hand, ICP is found to be low, the interrupt
must have been triggered by a falling edge, and the pulse has ended. In this
case the length of the pulse is calculated, the result is output to port C, and the
trigger sense on the input capture register is set to catch a rising edge for the
start of the next pulse. Note that the CodeVisionAVR compiler allows access
372 Timers, Counters and Watchdog Timer
//*******************************************************************
//define Port C as output for pulse width
#define pulse_out PORTC
to the 16-bit input capture register, ICR1, under a single name, ICR1. This is
defined in the ATmega8515.h file. Not all compilers provide for 16-bit access
to these registers, nor does CodeVisionAVR allow 16-bit access to all of the
16-bit registers. The reader has to check the appropriate header file for his
processor.
The actual pulse width, in milliseconds, is calculated using the clock rate
applied to the counter. The clock applied to Timer 1 is system clock (4 MHz)
divided by 8, in this case 500 kHz, as initialized in TCCR1B at the beginning
7.8 Timer Applications 373
of the main() function. The period of the 500 kHz clock is 2 microseconds. At
2 microseconds per clock tick, it takes 500 clock ticks to make 1 millisecond.
Therefore the programme divides the resulting clock ticks by 500 to calculate
the correct result in milliseconds.
Limitations of using Timer 1 for time measurements:
The only limitation of the above technique is when the resolution of the
timer is less than the accuracy needed to measure the time. It is possible to
change the timer/counter clock to get better resolution, accordingly better
accuracy, but it may happened that the maximum accuracy that we can get
from the timer is less than the needed accuracy. In such cases using external
hardware with the required accuracy is the only solution.
Figure 7.35 Measure the period of a signal based on a hardware implementation outside of the
microprocessor.
374 Timers, Counters and Watchdog Timer
Figure 7.36 Timing diagram for system to measure the period of a signal based on a hardware
implementation outside of the microprocessor.
response, the state machine enables the synchronizer. The synchronizer serves
a dual role. In addition to synchronizing the external signal to the internal
clock, the second synchronizer flip-flop serves to delimit the measurement.
One to two clock1 pulses after the external signal makes a 0 to 1 transition, the
Enable signal is asserted by the state machine, thereby starting the counter.
On the second 0 to 1 transition by the external signal, the Enable is deasserted
and the Complete event is sent to the microprocessor. The state of the counter,
representing the period of the unknown signal, can be read at any time after
that.
signal will be
funknown = N/Tp
When using microcontroller, by applying an unknown signal to the
timer/counter capture input we succeeded to measure time; the duration of the
unknown signal. If we apply it to the counter input we can measure frequency.
More accurately, to make frequency measurement we need two counters, one
for the time base, and one counting the edges of the unknown signal. Ideally,
the second counter must clear-on-capture, from the period signal of the time
base.
As in the case of measuring time period, measuring the frequency of
an unknown signal can be implemented using the built-in timer of the
microcontroller, or using an external circuit.
a) Use of the Internal Timer/counter for Frequency Measurement
As shown in Figure 7.37, we use one counter as time base to open a gate
for a known and specified interval. The unknown signal is used to increment
another counter. If the known interval is Tknown , and when the interval ends,
the counter contains the value N, the frequency of the unknown signal will be
funknown = N/Tknown .
The width of the window Tknown depends on the precision needed and the
resolution of the timer/counter.
To implement the measurement, a timer and a counter are needed. The un-
known signal is used as an interrupt into the microcontroller. When the first
interrupt occurs, the timer is started and the counter is incremented. The
counter is incremented for each subsequent interrupt until the timer expires. At
that point, the external interrupt is disabled. The counter will now contain the
number of accrued counts, which translates directly to the unknown frequency.
376 Timers, Counters and Watchdog Timer
In the following we show how to build a frequency meter using the timer
of the microcontroller. The circuit given in Figure 7.38 can be used to measure
frequencies up to 40 MHz with a very high accuracy (less than 1% error).
i. The hardware
The hardware shown in Figure 7.38 uses ATMEGA16 microcontroller
with an internal oscillator of 8 MHz.
The frequency measurement probe is connected to W1 terminal, which is
then fed to the pin PB0 and the clock input of the 74191 4-bit counter. The
4-bit counter is used to divide the measured frequency by 16 before feeding it
to the microcontroller (pin PB1). The frequency of output Q3 of the counter
is all the time input frequency to the counter divided by 16. In the figure, Q3
is connected to pin PB1 of the microcontroller.
7.8 Timer Applications 377
than F_max. In the two cases timer T2 is used to generate the known period
shown in Figure 7.39.
But due to the fact that there is some delay between the time the Timer_2
overflows and the time where the counted frequency is actually processed,
this factor need some calibration to reflect more accurately the reality of the
frequency being measured.
7.8 Timer Applications 379
Figure 7.40 Block diagram for system to measure the frequency of a signal based on a hardware
implementation outside of the microprocessor.
Figure 7.41 Timing diagram for system to measure the frequency of a signal based on a
hardware implementation outside of the microprocessor.
Solution:
The scheme that can be used to generate the square wave using the compare
mode works something like this:
1. When the first match occurs, the output bit is toggled, and the interrupt
occurs.
2. In the interrupt service routine, the programme will calculate when the
next match should occur and load that number into the compare register.
In this case, the 20 kHz waveform has a period of 50 microseconds with
25 microseconds in each half of the waveform. So, the time from one
toggle to the next is 25 microseconds. Using the frequency of the clock
applied to the timer, the programme calculates the number of clock ticks
that will occur in 25 microseconds.
3. This number of clock ticks is added to the existing contents of the compare
register and reloaded into the compare register to cause the next toggle
and to repeat the calculation and reload cycle.
An example of hardware and software demonstrating these concepts is
shown in Figures 7.42 and 7.43, respectively. Note that no external hardware
is required because the signal is produced entirely by the microcontroller.
In the programme in Figure 7.43, notice first the initializations that occur
in main . Register DDRD is set to 0 × 20 so that the output compare bit,
OC1A, that relates to output compare register A, OCR1A, is set for output
mode, so that the output signal can appear on the bit. TCCR1A and TCCR1B
set the prescaler to clock/8 (in this case clock = 8 MHZ, so clock/8 = 1 MHZ)
and set the output compare mode for output compare register A to toggle the
7.8 Timer Applications 381
//******************************************************************
// Timer 1 output compare A interrupt service routine
interrupt [TIM1_COMPA] void timer_compa_isr (void)
{
OCR1A = OCR1A + 25; //set to next match (toggle) point
}
void main(void)
{
DDRD = 0x20; //set OC1A bit for output
TCCR1A = 0x40; //enable output compare mode to toggle OC1A pin
// on match
TCCR1B = 0x02; //set prescaler to clock/8 (1 microsec. Clocks)
TIMSK = 0x10; //unmask output compare match interrupt for register
A
while (1)
; //do nothing
}
//*************************************************
20 kHz
50 µs
25 µs
Figure 7.44 20 kHz waveform.
output bit, OC1A, when a match occurs. And finally, the compare interrupt
is unmasked by setting the OCIE1A bit in the timer interrupt mask register,
TIMSK.
Information such as that shown in Figure 7.44 is used to calculate the
number that is added to the compare register contents to determine the point
of the next toggle. In this case one- half of the waveform is 25 microseconds
long. Since the clock applied to the counter is 1 MHz (dock/8), the number of
clock cycles between match points (waveform toggle points) is given by the
following:
And so, for this example, each time the compare match interrupt occurs,
25 is added to the current contents of the compare register to determine when
the next match and toggle of the output waveform will occur.
One additional important point to consider relative to the output compare
registers, and specifically relative to the calculation of the next match point
number, is the situation in which adding the interval number to the current
contents of the compare register results in a number bigger than 16 bits. For
example, if the output compare register contains a number 65,000, and the
interval is 1000, then
As long as unsigned integers are used for this calculation, those bits greater
than 16 will be truncated and so the actual result will be:
65, 000 + 1000 = 464(drop the 17th bit from 66000 to get 464)
7.8 Timer Applications 383
This will work out perfectly, since the output compare register is a 16-bit
register and the timer/counter is a 16-bit device as well. The timer counter will
count from 65,000 up to 65,536 (a total of 536 counts) and then an additional
464 counts to reach the match point. The 536 counts plus the 464 counts is
exactly 1000 counts as desired. In other words, in both the timer/counter and
the compare register, rollover occurs at the same point, and as long as unsigned
integers are used for the related math, rollover is not a problem.
The ability to control different kinds of motors is very important in many appli-
cations. The applications ranging from assembly robots to remotely controlled
vehicles to the precision positioning of medical instruments. Motors that are
typically found in such applications fall into three categories: DC motors, servo
motors, and stepper motors. DC and servo motors can be controlled using the
same technique: use of PWM. PWM cannot be used to control stepper motor.
opposite poles of the stator, thereby causing the rotor to turn. As the rotor
turns, the electromagnet becomes polarized in the opposite direction and the
poles of the rotor are now repelled by the nearer poles and are attracted to the
opposite poles of the permanent magnet causing the rotor to turn once again.
The commutator is a split ring against which the brushes make physical
contact. One portion of the commutator is connected to one end of the elec-
tromagnet, and the other portion is connected to the opposite end. Through
the action of the commutator, the direction of the field in the electromagnet is
continually switched, thus causing the rotor to continue to move.
Figure 7.46 is a simple illustration showing the actions of the commutator,
brushes, and electromagnet. The brushes are fixed. However, as the rotor
rotates, the commutator (which is attached to the rotor) acts like a switch,
connecting the voltage source first one way then the opposite way across the
electromagnetic thereby changing its polarization.
The DC motor has the ability to turn through 360 degrees, continuously,
in one direction, when power is applied. Two important features can be noted
here:
• If the applied voltage is held constant, the speed of the motor is also
held constant; increasing or decreasing the applied voltage will have a
corresponding effect on the speed of the motor.
• If the polarity of the applied voltage is reversed, the motor will run in the
opposite direction.
These features define the mechanism that can be used to control the speed
of the motor: control the applied voltage across the electromagnetic to control
the speed and control the polarity of the applied voltage to control the direction
of rotation. As an example, assume that a DC motor is driven by a voltage
signal ranging from 0 to 12 V. To run the motor at full speed, a 12 V signal is
applied; to run the motor at half speed a 6 V signal is applied; to run the motor
at one-quarter speed a 3 V signal is applied; and so on.
This can be achieved using PWM. By using PWM scheme the average
magnitude of the applied voltage can effectively be controlled and so can the
speed of the motor. H-bridge may be used to manage the reversal.
Generally, a DC motor is not used for positioning tasks unless it is incor-
porated into a control system that can provide position (and possibly velocity)
feedback information.
Figures 7.47 and 7.48 show the hardware and software, respectively, of
an example programme using PWM. This programme provides four different
duty cycles (10, 20, 30, and 40 percent at a frequency close to 2 kHz) based
on the logic levels applied to port C.
Using Figure 7.23 it is clear that 8-bit resolution and system clock/8 would
give 1.96 kHz, which is very close to the desired frequency.
The programme shown in Figure 7.48 uses a switch/case statement to
select the duty cycle of the PWM output. A #define statement is used to mask
port C so that only the two lower bits are used as input, and port C is set to
7.8 Timer Applications 387
//********************************************************
#include <Mega8515.h>
//use a ‘define’ to set the two lowest bits of port C as control input.
#define PWMselect (PINC & 3)
void main (void)
{
unsigned int oldtogs; //storage for past value of input
PORTC = 0x03; //enable the internal pull-ups for
//the input bits
TCCR1A = 0x91; //Compare A for non-inverted PWM and
//8 bit resolution
TCCR1B = 0x02; //clock/8 presacler
While (1)
{
if (PWM_select != oldtogs)
{
oldtogs = PWM_select; //save toggle switch value
switch (PWM_select);
{
Case 0:
OCR1A = 25; //10 % duty cycle desired
Break;
Case 1:
OCR1A = 51; //20 % duty cycle desired
Break;
Case 2:
OCR1A = 76; //30% duty cycle desired
Break;
Case 3:
OCR1A = 102; //40% duty cycle desired
}
}
}
}
//*************************************************
0x3 so that the internal pull-ups are enabled on the same lower two bits. This
avoids the need for external pull-up resistors.
TCCR1A and TCCR1B are initialized to select the PWM mode and the
prescaler to provide the desired frequency output.
Processor time in many microcontroller applications is precious. In this
programme, an if statement is used along with the variable “oldtogs” so that
new data is written to the pulse width modulator only if the switches are
actually changed, thereby preventing continuous writes to the output compare
register.
moves in either the forward or reverse direction with a smooth and continuous
motion; the stepper motor moves in a series of increments or steps. The size
of each step is specified in degrees and varies with the design of the motor.
The step size is selected based on the precision of the positioning required.
The accompanying diagram in Figure 7.49 presents a high-level view of the
essential elements of a stepper motor.
From Figure 7.49 it is clear that the rotor of stepper motor is a permanent
magnet rather than the stator as in the DC and servo motors. Also it is easy
to observe that the rotor of the stepper motor shown in Figure 7.49 has two
teeth and has four poles and four electromagnets. This configuration has a step
angle of 90 degrees, based on the spacing of the poles.
The first point to observe is that the rotor is a permanent magnet rather than
the stator as in the DC and servo motors. The rotor in the motor in Figure 7.51
has two teeth, and the stator has four poles and four electromagnets.
In a stepper motor, the size of each step is specified in degrees and varies
with the design of the motor. The step size is selected based on the precision
of the positioning required. The simple motor given above has a step angle of
90 degrees, based on the spacing of the poles. Connections are made to the
electromagnets through the signals marked Xl, X2, Yl, and Y2. Like the DC
motor, the stepper can rotate through 360 degrees and in either direction. The
speed of the motor is determined by the repetition rate of the steps.
all the signals straight. The control is based on the view that the stepper
motor is an electromagnetic device that converts digital pulses (current) into
mechanical shaft rotation. By controlling the sequence of the pulses applied to
electromagnets we can control the angle of rotation, direction of rotation and
speed of rotation. For example, to rotate the motor one step, we pass current
through one or two of the coils; which particular coil or coils depends on the
present orientation of the motor. Thus, rotating the motor 360 degrees requires
applying current to the coils in a specified sequence. Applying the sequence
in reverse causes reversed rotation.
To understand that we start by Figure 7.49 again. The polarization of the
electromagnets as illustrated in Figure 7.50 requires that the indicated input
signals are applied to Xl, X2, Yl, and Y2: V to Xl and Y2 and 0 to X2 and Yl.
If the input signals to V on Xl and Yl and 0 to X2 and Y2 are now changed,
the polarization on the electromagnets changes to that shown in Figure 7.50.
The two north poles at the top of the drawing will repel, and the north pole
on the rotor will be attracted to the south pole on the right-hand side of the
stator. The rotor will thus move 90◦ in the clockwise direction (see Figure 7.51).
Similarly, changes to the input signal levels shown in the accompanying
table will produce the rotor movements shown in Figures 7.52 and 7.53.
X1 X2 Y1 Y2 Position
V 0 0 V 0o
V 0 V 0 90o
0 V V 0 180o
0 V 0 V 270o
Figure 7.51 Stepper Motor with 90◦ per step.
390 Timers, Counters and Watchdog Timer
to generate the appropriate high signals to the coils that will cause the motor
to rotate one step.
In the following we discuss the two cases.
a) Use of Stepper Motor Driver
Controlling a stepper motor requires applying a series of voltages to the
four (typically) coils of the stepper motor. The coils are energized one or two
at a time causing the motor to rotate one step. In this example, we are using
a 9-volt, 2-phase bipolar stepper motor. Figure 7.55 shows a table indicating
the input sequence required to rotate the motor. In this figure the motor step is
392 Timers, Counters and Watchdog Timer
7.5 degrees. The entire sequence given in the table shown in the figure must
be applied to get the motor to rotate 7.5 degrees. To rotate the motor in the
opposite direction, we simply apply the sequence in reverse order. To rotate
the motor an angle of 7.5 ∗ N (N = 1, 2, . . . .) the sequence must be repeated
N times.
In the figure we used an 8051 microcontroller and a stepper motor driver
(MC3479P) chip to control the stepper motor. We need only worry about
setting the direction on the clockwise/counterclockwise pin (cw‘/ccw) and
pulsing the clock pin (clk) on the stepper motor driver chip using the 8051
microcontroller. Figure 7.55 gives the schematic showing how to connect the
stepper motor to the driver, and the driver to the 8051. Figure 7.56 gives some
sample code to run the stepper motor.
*/*********************************************
/* main.c */
sbit clk=Pl ^l;
sbit cw=Pl^0
void main(void) {
*/turn the motor forward*/
cw=0; /*set direction*/
clk=0; /*pulse clock*/
delay ( );
clk=1;
this table in software, and write a step routine that applies high values to the
inputs based on the table values that follow the previously applied values.
In the second example, the stepper motor driver is eliminated. The stepper
motor is connected directly to the 8051 microcontroller. Figure 7.57 gives
the schematic showing how to connect the stepper motor to the 8051. The
direction of the stepper motor is controlled manually. If P2.4 is grounded,
the motor rotates counterclockwise, otherwise the motor rotates clockwise.
Figure 7.58 gives the code required to execute the input sequence to turn the
motor.
Note that the 8051 ports are unable to directly supply the current needed
to drive the motor. This can be solved by adding buffers. A possible way to
implement the buffers is show in Figure 7.57. The 8051 alone cannot drive
the stepper motor, so several transistors were added to increase the current
going to the stepper motor. Q1 are M5E3055T NPN transistors and Q2 is an
MJE2955T PNP transistor. A is connected to the 8051 microcontroller and B
is connected to the stepper motor.
In this chapter we studied one of the most popular devices used in embedded
and digital systems; Timers. Any system must include at least one hardware
394 Timers, Counters and Watchdog Timer
;******************************************************************
sbit notA =P2^0;
sbit isA
sbit notB=P2^2;
sbit isB=P2^3;
sbit dir=P2^4;
void delay () {
int a, b;
for(a=0; a<5000; a++)
for(b=0; b<l0000; b++);
}
void move(int dir, int steps) {
int y, z;
if (dir = 1) {
for(y=0; y<=steps; y++){
for(z=0; z<=l9; z+=4) {
isA= lookup[z];
isB=lookup[z+1];
notA=lookup[z+2];
notB=lookup[z+3];
delay ();
}
}
}
if(dir=0)
for(y=0; y<=step; y++){
for(z=l9; z>=U; z -= 4){
isA=lookup[z];
isB=lookup[z-1];
notA=lookup [z-2];
notB=lookup [z-3];
delay( );
}
}
}
}
int 1ook[20] = {
1, 1, 0, 0, 0, 1, 1, 0, 0, 0
1, 1, 1, 0, 0, 1, 1, 1, 0, 0};
void main ()
while (1) {
/* move forward 15 degrees */
move (1, 2);
/* move backwards 7.5 degrees */
move (0, 1);
}
}
;**************************************************************
Figure 7.58 Controlling a stepper motor directly-software.
timer and there can be a number of software timers. The SWT is a virtual timing
device. A timer is essentially a counter getting the count-input at regular time
interval. This can be the system clock or the system clock divided by prescalar.
7.10 Review Questions 395
continuously. The following table lists the names and frequencies of the
notes.
Names and frequencies of musical notes
Musical note Tonic sol-fa Frequency (Hz)
C4 Doh 262
D4 Re 294
E4 Mi 330
F4 Fa 349
G4 So 392
A4 La 440
B4 Ti 494
C4 Doh5 523
7.8 A 16-bit counter is getting inputs from an internal clock of 12 MHz.
There is a prescaling circuit, which prescales by a factor of 16. What
are the time intervals at which overflow interrupts will occur from this
timer?
What will be the period before these interrupted must be serviced?
This page intentionally left blank
8
Interface to Local Devices — Analogue Data
and Analogue Input/Output Subsystems
8.1 Introduction
399
400 Analogue Data and Analogue Input/Output Subsystems
Figure 8.1(a) shows a block diagram of an analogue input subsystem. The input
to the analogue input subsystem comes from an analogue input device. The
analogue input device is usually an input transducer, or sensor, that converts
some form of energy, such as heat, pressure, position, or light, into an analogue
voltage or current.
For almost every type of physical measurand, numerous input transducers
exist. A few transducers directly generate a digital output, like the optical
shaft encoder, which transforms angular position to digital output pulses.
However, we are now interested in input transducers that generate an analogue
output. An analogue input subsystem must convert the transducer’s analogue
output voltage into a digital number or code so that it can be read by the
microprocessor.
The task of an analogue output subsystem is just the opposite of that of
an analogue input subsystem (Figure 8.1(b)). A digital number or code is
converted to an analogue voltage or current. A system may need to generate
output data in analogue form to provide analogue control signals, drive ana-
logue output transducers, or to synthesize analogue waveforms. An analogue
output device is often called an output transducer or actuator. The output
transducer changes the analogue voltage to some other form of energy, for
example mechanical position, rotation, pressure, or light.
402 Analogue Data and Analogue Input/Output Subsystems
(a)
(b)
Figure 8.1 Analogue I/O subsystems: (a) analogue input device or input transducer provides
analogue input signal to analogue input subsystem; (b) analogue output subsystem provides
analogue signal to output device or output transducer.
System
Measurand Input Analogue Track Analogue- Bus
Trans- signal -Hold to- Digital Interface
ducer Conditioning Converter Bus
Start Conversion
Yes
Done?
Optional
No
Next Channel
Continue
Store in Memory
Nest channel
(a)
(b)
Figure 8.4 Sample and hold (S/H): (a) Operation (b) Circuit.
8.2 Analogue Data and Analogue I/O Subsystems 405
that is usually quite fast. The droop rate is the output voltage slope (dVout /dt)
when control equals hold. Normally the gain K should be 1 and the offset Voff
should be 0. The gain and offset error specify how close the Vout , is to the
desired Vin when control equals sample,
converter per channel eliminates the need for multiplexing but is more costly.
Each approach has its advantages and disadvantages.
Example 8.1: A 16-channel multiplexer
Figure 8.6 shows the structure of a 16-channel multiplexer. One of 16
inputs may be selected by a combination of 4 input signals derived from A0,
A1, A2, A3 on the address bus.
Whenever all 16 lines must be sampled sequentially, a hardware counter
may be used. The resulting organization is shown on Figure 8.7, and the
resulting switching is shown on Figure 8.8.
Note:
Sensors and Transducers: A distinction is sometimes made between a
sensor and a transducer. The definitions that follow correspond to the most
commonly made distinction. An element that senses one form of energy, or a
change in that form of energy, and transforms it into another form of energy, or
change in another form of energy, is a sensor. An electrical sensor converts
a measurand, or a change in measurand, to an electrical signal. In contrast, a
transducer is a device composed of a sensor and other elements that convert
the output of the sensor to a signal that is more appropriate to the application.
8.2 Analogue Data and Analogue I/O Subsystems 407
BIN
QA A0 S1
QB A1 S2
4-Bit
Binary Counter Analogue
QC A2 Input
Clock (output)
QD A3
AIN
N S16
interfaced to SPI synchronous serial port. In such serial DACs the input data
is serially shifted into the DAC.
The DAC precision is the number of distinguishable DAC outputs (e.g.,
256 alternatives, 8 bits). The DAC range is the maximum and minimum DAC
output (volts, amperes).
The DAC resolution is the smallest distinguishable change in output. The
units of resolution are in volts or amperes depending on whether the output is
voltage or current. The resolution is the change that occurs when the digital
input changes by 1.
B1
B2 Digital-to-Analogue Analogue
Digital Converter Voltage or
Input Current
DAC
Bn
converter may or may not be included on a DAC IC. If not, an external voltage
reference and an external op-amp are used.
The output of an ideal (error free, i.e. no offset voltage) DAC that accepts
an n-bit straight binary code is:
Vo = VREF × (B1 2−1 + B2 2−2 + · · · + Bn 2−n ) (8.1)
where B1 -the most significant bit, and Bn -the least significant bit of the binary
input. The output voltage of DAC is the product of the reference voltage and
the binary fraction 0.B1 B2 …Bn . Some DACs require VREF to be a constant
value for the equation to be valid. Some DACs have a negative output voltage.
The sign of the output voltage is a function of the type of DAC circuit and the
polarity of the voltage reference.
In the case when the DAC has an offset voltage Vos , the above equation
takes the form:
for unsigned: Vo = VREF ∗ (B1 2−1 + B2 2−2 + · · · + Bn 2−n ) + Vos (8.2)
One can choose the full-scale range of the DAC to simplify the use of
fixed-point mathematics. For example, if an 8-bit DAC had a full-scale range
of 0 to 2.55 V, then the resolution would be exactly 10 mV. This means that if
the D/A digital input were 123, then the D/A output voltage would be 1.23 V.
Linearity and monotonicity: To define linearity, let m, n be digital inputs,
and letf (n) be the analogue output of the DAC. Let be the DAC resolution.
The DAC is linear if:
f (n + 1) − f (n) = f (m + 1) − f (m) = for all n, m
The DAC is monotonic if:
sign[f (n + 1) − f (n)] = sign[f (m + 1) − f (m)] for all n, m
Conversely, the DAC is nonlinear if
f (n + 1) − f (n) = f (m + 1) − f (m) for some m, n
Practically speaking all DACs are nonlinear, but the worst nonlinearity is
nonmonoticity. The DAC is nonmonotonic if:
sign[f (n + 1) − f (n)] = sign[f (m + 1) − f (m)] for some n, m
Figure 8.14 Conversion relationship for an ideal 3-bit straight binary D/A converter.
Table 8.1 Output vs. binary coding for a +10 V FS 8-bit unipolar DAC.
Binary Scale Vout
0000 0000 0 0.00
0000 0001 +1 lsb +0.04
0010 0000 +1/8 FS +1.25
0100 0000 +1/4 FS +2.50
1000 0000 +1/2 FS +5.0
1100 0000 +3/4 FS +7.5
1111 1111 +FS – 1 lsb +9.96
maximum output is 8.75 V. Table 8.1 gives some output voltages for an 8-bit,
10 V FS DAC as a function of its binary input.
The size of the change at the output of DAC for a one least significant bit
(1 lsb) change in its input code is called a step and equals FS/2n . For a 3-bit,
0 to 10 V DAC, the step size is 10/23 or 1.25 V.
IC DACs with 8- to 16-bit inputs are common. The determination of the
required size (number of bits) for a DAC is a function of the application. DACs
with a larger number of bits have a correspondingly reduced step size. A DAC
with an 8-bit straight binary input code has 256 distinct output values, one for
each possible input code. The step size for an 8-bit 10 V FS DAC is 39.06 mV.
Table 8.2 shows the resolution of a DAC with a straight binary input as a
function of the number of input bits. Resolution is the measure of the output
414 Analogue Data and Analogue Input/Output Subsystems
Table 8.2 Number of output values and resolution as a function of the number of bits for a
binary DAC.
Resolution
PPM Part per Millivolts (for
Bits Output values Percentage million) dB 10V FS)
1 2 50 500,000 −6 5000 mV
6 64 1.6 15,625 −36 156 mV
8 256 0.4 3,906 −48 39.1 mV
10 1,024 0.1 977 −60 9.77 mV
12 4,906 0.024 244 −72 2.44 mV
16 65,536 0.0015 15 −96 0.135 mV
20 1,048,576 0.0001 1 −120 0.00954 mV
24 16,777,216 0.000006 0.06 −144 0.000596 mV
output. The unity gain op-amp buffer prevents a load connected to the DAC
from drawing current from the resistor divider.
The main drawback with this circuit is that the number of components
needed increases exponentially with the number of bits. A n-bit resistor divider
circuit requires 2n resistors, 2n switches, and an n-to-2n decoder. Thus, this
circuit is impractical for values of n above 8.
Two versions of resistor DAC circuits where the number of components
required increases linearly with n are presented in the following subsection.
These circuits are both R-2R ladder implementations.
Figure 8.16 Current steering R-2R ladder network voltage output DAC.
to its input code. The op-amp converts this current to an output voltage with:
Vout = −Is R
where Is is the sum of the currents through those switches that are in their
logic 1 positions.
The input to the circuit is a 3-bit word, B1 B2 B3 , which represents a straight
binary fraction. B1 , the MSB (most significant bit), has a weight of 2−1 , B2
has a weight of 2−2 , and B3 the LSB (least significant bit), has a weight of 2−3 .
Each bit controls an analogue switch. When Bi is 1, its analogue switch passes
a current through the associated horizontal 2R resistor and into the op-amp’s
summing junction. The summing junction of the op-amp is held at ground
potential by negative feedback through feedback resistor R. The current into
the summing junction due to Bi being a 1 is:
Ii = (VREF /R)2−i
When Bi is 0, the analogue switch directs the current through its associated
horizontal resistor to circuit ground, instead of into the op-amp’s summing
junction.
Due to the virtual ground effect of the op-amp in this negative feedback
configuration, the inverting ( - ) terminal of the op-amp is effectively at ground
8.3 Digital-to-Analogue Converters (DACs) 417
Figure 8.17 Voltage switching R-2R ladder network voltage output DAC.
The output of a voltage switching mode DAC has the same polarity as its
reference voltage. In contrast, the polarity of a current steering DAC’s output
is the negative of its reference voltage.
D7-D0
8-bit 8-bit B4-B11
Latch Latch
8 1 8 2 8
12-bit Analogue
D/A output
ODSP(N) Converter
4-bit
D3-D0 Latch
3 B0-B3
4
ODSP(N+1)
old 12-bit word. This intermediate input to the DAC circuit may produce a
significant glitch.
Double buffering is used to solve this problem (Figure 8.18). The eight
least significant bits of a 12-bit word are first output to latch 1. The 4 most
significant bits are then output to latch 3. The output device select pulse that
clocks the 4 bits from the data bus into latch 3, simultaneously clocks the
output of latch 1 into latch 2. Thus, all 12 bits appear at the input to the DAC
circuit nearly simultaneously.
Even with double buffering, glitches from data loading can exist. These
glitches result from the timing relationship between /WR and valid data during
a write or output bus cycle and the fact that most DACs use level-triggered
latches. Some microprocessors/ microcontrollers assert /WR at the same time
that it starts to drive valid data onto the data bus. If the DAC’s registers
are level-triggered, they will be transparent as soon as /WR goes low. The
DAC will initially respond to invalid data. During this time a glitch can be
generated.
Figure 8.19 The waveform at (b) is created by a DAC with one bit more than that at (a).
the finer the control the system has over the waveform it creates. To establish
a priori specification for this important parameter represents a real difficulty
for the designer. One effective way that helps in such a case is to design a
prototype system with a very high precision (e.g. 12, 14, or 16 bits). The
prototype is simulated either by using standard programmes (usually given
by manufacturers) or by tailored software. The goal of the simulation is to
give as an output the waveform corresponding to each DAC precision. The
software must be flexible such that the designer can modify it for the available
precision. Figure 8.19 gives an example of the expected output we can get out
of the simulation. From the simulation output waveforms, and according to
specifications given to the designer, he can decide the needed precision.
Channels: Even though multiple channels could be implemented using
multiple DAC chips, it is usually more efficient to design a multiple-channel
system using a multiple-channel DAC. Some advantages of using a DAC
with more channels than originally conceived are future expansion, automated
calibration, and automated testing.
Configuration: DACs can have voltage or current outputs. Current-output
DACs can be used in a wide spectrum of applications (e.g., adding gain
and filtering) but do require external components. DACs can have internal
or external references. An internal-reference DAC is easier to use for standard
digital-input/analogue-output applications, but the external- reference DAC
can be used often in variable-gain applications (multiplying DAC). We note
also that some DACs can sometimes generate a unipolar output, while other
times the DAC produces bipolar outputs.
Speed: There are a couple of parameters manufacturers use to specify
the dynamic behavior of the DAC. The most common is settling time and
sometimes the maximum output rate. When operating the DAC in variable-gain
mode, we are also interested in the “gain/Band Width” product of the analogue
amplifier. When comparing specifications reported by different manufacturers,
8.3 Digital-to-Analogue Converters (DACs) 421
Figure 8.20 The waveform at (b) was generated by a system with twice the output rate than
in (a).
422 Analogue Data and Analogue Input/Output Subsystems
logic interface, the individual data bits are connected to a dedicated computer
output port. For example, a 12-bit DAC requires 12-bit output port bits to
interface. The software simply writes to the parallel port(s) to change the DAC
output. The second approach is called µ P-bus or microprocessor-compatible.
These devices are intended to be interfaced onto the address/data bus of an
expanded-mode microcomputer. Some processors, e.g. MC68HC8 12A4, have
built-in address decoders enabling most µ P-bus DACs to be interfaced to
the address/data bus without additional external logic. The third approach is
to use SPI (Serial Peripheral Interface) to interface DAC. The SPI/DAC8043
interface is an example. This approach requires the fewest number of I/O pins.
Even if the microcomputer does not support the SPI directly, these devices
can be interfaced to regular I/O pins via the bit-banging software approach.
Package: The standard DIP is convenient for creating and testing an
original prototype. On the other hand, surface-mount packages like the SO
and µ Max require much less board space. Because surface-mount packages
do not require holes in the printed circuit board, circuits with these devices
are easier/cheaper to manufacture (Figure 8.22).
Cost: Cost is always a factor in engineering design. Beside the direct costs
of the individual components in the DAC interface, other considerations that
affect cost include (1) power supply requirements, (2) manufacturing costs,
(3) the labour involved in individual calibration if required, and (4) software
development costs.
Example of Practical DAC: The DAC0832 integrated circuit
The DAC0832 is a CMOS 8-bit DAC designed to interface directly
with several popular microprocessors. A resistor ladder network divides the
8.3 Digital-to-Analogue Converters (DACs) 423
reference current and provides the circuit with good temperature tracking
characteristics. The circuit uses CMOS current switches and control logic to
achieve low power consumption and low output leakage current errors. The use
of an input latch and a DAC register allows the device to output a voltage cor-
responding to one digital number while holding the next digital number. This
permits the simultaneous updating of any number of these devices. Figure 8.23
shows the block diagram of the device.
424 Analogue Data and Analogue Input/Output Subsystems
point. Any analogue input in the quantum from 1/16 FS to 3/16 FS is assigned
the output code representing 1/8 FS (001b ). Thus, quantization produces an
inherent quantization error of ± Q/2. Accordingly an output, M, from an ADC
indicates that the analogue input has a value of M ± Q/2 (M ± FS/2n+1 ).
Since quantization error is inherent in the conversion process, the only way
quantization error can be reduced is by selecting an ADC with a larger number
of bits. In a practical converter, the transition points are not perfectly placed
and nonlinearity, offset, and gain errors result. These errors are in addition to
the inherent quantization error.
Errors in ADCs are defined and measured in terms of the location of the
actual transition points in relation to their ideal locations (Figure 8.25). If the
first transition does not occur at exactly +1/2 LSB (+1/2 Q), there is an offset
error. If the difference between the points at which the last transition and first
transition occur is not equal to FS −2 LSB, there is a gain error. Linearity
error exists if the differences between transition points are not all equal, in
which case the midpoints of some decision quanta do not lie on a straight line
between 0 and FS. The midpoints between transition points should be 1 LSB
apart. Differential nonlinearity is the deviation between the actual difference
between midpoints and 1 LSB, for adjacent pairs of codes. If the differential
nonlinearity is equal to or more negative than −1 LSB, then one or more codes
will be missing.
426 Analogue Data and Analogue Input/Output Subsystems
ADCs differ in how the operations of quantization and coding are accom-
plished. Several types of ADCs and their associated conversion techniques are
presented in the next two sections.
Flash ADCs are the fastest converters available. Conversion time is the
sum of the delay through the comparator stage and the encoders. Because of
the flash converter’s conversion speed, a track-hold is usually not needed.
The flash ADC’s drawback is the number of comparators required. An
8-bit converter requires 255 comparators! Practical IC flash converters range
in size from 4 to 10 bits.
It is possible to build a signed flash ADC. There are two approaches to build
such produce a signed flash ADC. The direct approach is to place the equal
resistors from a +10.00 V reference to a −10.00 V reference (instead of +10 V
and ground shown in Figure 8.27). The middle of the series resistors (“zero”
reference) should be grounded. The digital circuit would then be modified to
produce the desired digital code.
The other method would be to add analogue preprocessing to convert the
signed input to an unsigned range. This unsigned voltage is then converted
with an unsigned ADC. The digital code for this approach would be offset
binary. This approach will work to convert any unsigned ADC into one that
operates on bipolar inputs.
Figure 8.28 Basic hardware structure for software ADC conversion routines.
This guarantees that the final value will be reached in log2 n operations, where
n is the number of elements. The logarithmic search is used in programming
to retrieve an element within a file. A simpler comparison consists in opening
a book in the middle, then jumping to the middle of the first section or else
the middle of the next section depending on whether the item one is looking
for was before or after the middle of the book, and so on. This is also called
“binary search”. The intent is to reduce the amount of time required for the
search process. The successive approximation algorithm is, in effect, a binary
search; Starting with the most significant bit, each comparison determines one
bit of the final result and the approximation to be used for the next comparison.
A successive approximation converter uses a DAC to generate the voltage
that is compared to the unknown voltage. A software implementation of a
successive approximation converter uses the same hardware used for the
counting converter (Figure 8.28). However, for a successive approximation
converter, the offset required when the DAC is loaded with all 0s must be
−1/2 LSB, rather than +1/2 LSB.
The steps in the successive approximation algorithm are as follows:
4. If all bits have not been determined, the next most significant bit is tested.
To test the next most significant bit, the next bit in the approximation code
that resulted from step 3 is made a 1, and then step 2 is repeated. When
all bits have been determined, the conversion process is complete.
The effect of this algorithm is to first test whether the analogue input
voltage is greater or less than 1/2 FS. If the analogue input is greater than 1/2
FS, having a 1 as the most significant bit of the approximation code output
to the DAC for the first comparison makes the comparator output a 1. When
this is the case, the most significant bit is left as a 1. The next most significant
bit in the approximation code is then set to 1, and the analogue input is again
tested (second comparison) to see whether it is greater than 3/4 FS.
In contrast, if during the first comparison the analogue input is less than
1/2 FS, the comparator’s output is a 0, and the most significant bit is changed
to a 0. The next most significant bit is then set to 1, and a test is made to
determine whether the unknown voltage is greater than 1/4 FS. This process
is repeated in succession for each bit until the last bit is tested.
1. A Load command to transfer the contents of the Register into the Timer
2. An Enable to the gate controlling the Clk to the Timer
3. A Convert command to connect the unknown voltage to the input of the
VCO
After one second, the timer issues an Interval End to the State Machine, which
responds by disconnecting the unknown signal from the VCO and issues
8.4 Analogue-to-Digital Conversion (ADC) 433
samples. But if fs less than or equal to twice f , then one cannot determine
A, f , and . For example, if the frequency of an input sine wave is 1000 Hz
we must sample with sampling rate fs > 2000 samples per second.
In general, the choice of sampling rate fs is determined by the maximum
useful frequency fmax contained in the signal. One must sample at least twice
this maximum useful frequency. Faster sampling rates may be required to
implement various digital filters and digital signal processing.
fs > 2fmax .
We must note here that, even though the largest signal frequency of interest
is fmax , there may be significant signal magnitudes at frequencies above fmax .
These signals may arise from the input signal x, from added noise in the
transducer, or from added noise in the analogue processing. Once the sampling
rate is chosen at fs , then, as discussed before, a low-pass analogue filter maybe
required to remove frequency components above 0.5fs . It is worth to mention
here also that a digital filter cannot be used to remove aliasing.
2. How Many Bits Does One Need for the ADC?
The choice of the ADC precision is a compromise of various factors. The
overall objective of the DAS will dictate the potential number of useful bits
in the signal. If the transducer is nonlinear, then the ADC precision must be
larger than the precision specified in the problem statement. For example, let
y be the transducer output and let x be the real-world signal. Assume for now
that the transducer output is connected to the ADC input. Let the range of
x be rx , let the range of y be ry , and let the required precision of x be nx .
The resolutions of x and y are x and y, respectively. Let the following
expression describes the nonlinear transducer:
y = f (x)
an ADC clock between 50 kHz and 200 kHz. The selection bits and division
ratios are shown in Figure 8.32.
Although it could be done by trial and error, the most direct method for
choosing the ADC pre-selector factor is to divide the system clock by 200 kHz
and then choose the next higher division factor. This will ensure an ADC clock
that is as fast as possible but under 200 kHz.
The ADC, like the serial UART, is somewhat slower than the processor.
If the processor were to wait for each analogue conversion to be complete, it
would be wasting valuable time. As a result, the ADC is usually used in an
interrupt-driven mode.
Although the discussion that follows uses the more common interrupt-
driven mode, it is also possible for the ADC to operate in free-running mode,
in which it continuously does conversions as fast as possible. When reading
the ADC output in free-running mode, it is necessary to disable the interrupts
or stop the free-running conversions, read the result, and then re-enable the
interrupts and free-running mode. These steps are necessary to ensure that the
data read is accurate, in that the programme will not be reading the data during
the time that the processor is updating the ADC result registers.
The ADC is usually initialized as follows:
1. Set the three lowest bits of ADCSR for the correct division factor.
2. Set ADIE high to enable interrupt mode.
3. Set ADEN high to enable ADC.
4. Set ADSC to immediately start a conversion.
For a division factor of 8, the following lines of code would initialize the ADC
to read the analogue voltage on the ADC2 pin:
The initialization above sets up the ADC, enables it, and starts the first
conversion all at once. This is useful because the first conversion cycle after
the ADC is enabled is an extra long cycle to allow for the setup time of
the ADC. The long cycle, then, occurs during the balance of the programme
initialization, and the ADC interrupt will occur immediately after the global
interrupt enabled bit is set. Notice that the ADMU)( register is loaded with
the number of the ADC channel to be read.
Figures 8.33 and 8.34 show the hardware and software, respectively, for a
limit detector system based on the analogue input voltage to ADC channel 3.
Briefly, the system lights the red LED if the input voltage exceeds 3 V, lights
the yellow LED if the input voltage is below 2 or lights the green LED if the
input voltage is within the range of 2 V to 3 V.
The limit detector programme in Figure 8.34 shows a typical application
for the ADC. The ADC is initialized and started in main 0 by setting ADCSR
to 0xCE. ADC channel 3 is selected by setting ADMUX to 3. This starts
the sequence so the ADC interrupt will occur at the end of the first conver-
sion. Checking the ADC output to see which LED to light and lighting the
appropriate LED are all handled in the ADC interrupt ISR.
#include <ATMega8515.h>
//Define output port and light types
#define LEDs PORTD
#define red 0b111
#define green 0b101
#define yellow 0b110
Notice that the 10-bit output from the ADC is read by reading the data
from ADCW. ADCW is a special register name provided by CodeVisionAVR
that allows retrieving the data from the two ADC result registers, ADCL and
ADCH, at once. Other compilers would more likely require the programmer
to read both ADC registers (in the correct order, even) and combine them in
the 10-bit result as a part of the programme.
Also notice that the programmer has used the analogue-to-digital conver-
sion formula in the ISR. The compiler will do the math and create a constant
that will actually be used in the programme. You will find that this technique
can be used to your advantage in several other places, such as loading the
UBRR for the UART.
The ADC peripheral in the AVR microcontrollers varies somewhat accord-
ing to the specific microcontroller in use. All of the ADCs require some noise
suppression on the ADC Vcc connection (see Figure 8.33). Some also have
440 Analogue Data and Analogue Input/Output Subsystems
Bit Description
ACD Analogue Comparator Disable bit. Set to disable the analogue comparator
ACO Analogue Comparator Output bit
ACI Analogue Comparator Interrupt flag
ACIE Analogue Comparator Interrupt mask bit
ACIC Analogue Comparator Interrupt Capture bit
Set enable input capture on comparator change-of-state
ACIS1 Analogue Converter Comparator Mode Select bits
ACIS0 (See definition below)
a built-in noise canceller function, and some have the ability to control Vref
internally. As mentioned in Chapter 3, you will need to check the specification
for your particular microcontroller when using the ADC.
by the analogue comparator. When the battery voltage becomes too low, the
LED is lighted.
Caution: the circuit shown has a design flaw. When the battery gets low,
the LED is turned on and left on, further draining an already ailing battery.
The LED should be pulsed in any actual circuit of this type. However, the
circuit does demonstrate the use of the analogue comparator peripheral.
The two analogue converter inputs are connected to voltage dividers: one is
powered by the regulated +5 V as a reference, and the other is powered directly
by the battery. The voltage divider powered by the +5 V is designed to provide
approximately 2.2 V at its center point. The other voltage divider is designed
so that when the battery discharges to approximately 6 V, the center point
of the voltage divider will also measure approximately 2.2 V. The analogue
comparator is going to be set to detect when the voltage from the battery’s
voltage divider drops below the 2.2 V provided by the reference divider.
As Figure 8.37 shows, using the analogue comparator is relatively simple.
In this example, ACSR is loaded with 0x0A to enable the analogue comparator,
442 Analogue Data and Analogue Input/Output Subsystems
#include <ATMega8515.h>
// Analogue Comparator interrupt service routine
interrupt [ANA_COMP] void ana_comp-isr (void)
{
PORTB.1 = 0; //light the LED
}
void main(void)
{
PORTB = 0x02; //start with LED off
DDRB = 0x02; //set bit 1 for output
ACSR = 0x0A; //enable analogue comp, AC interrupt, falling edge
while (1)
; //do nothing
}
to enable its interrupt, and to set it so that the interrupt occurs on a failing edge
when AIN0 drops below AIN1 (AIN1 becomes greater than A1NO).
Example: Measuring Engine Temperature Using the Analogue-to-Digital
Converter (ADC)
A simple data acquisition system is used to measure the engine temperature
and sending the collected data to a PC. The practical measurements taken
from the output of the thermocouple and corresponding conditioning circuitry
showed that the temperature of the motor changes in the range 100◦ F to 250◦ F.
The DAC is 10-bit which takes one sample every one second during the free-
running mode. Analyze the system and write the corresponding programmes.
Step 1: Using the 10-bit measurement mode on the ADC means that the
resulting measured values will be as follows:
100◦ F = 0 × 000 = 010
250◦ F = 0 × 3FF = 102310
This sets the conversion formula to be
Temp = (150◦ F ∗ ADC reading)/1023 + 100◦ F
The ADC may most conveniently be run in free-running mode for this use.
In this way the temperature will be kept up to date as fast as possible so that
when the data is stored at the one-second interval, the most recent value will
be recorded. In free-running mode, the ADC interrupt occurs at the end of
each conversion and can be used to update the current temperature value.
8.5 AVR Analogue Peripherals 443
will appear in the spreadsheet exactly as shown above occupying a space two
cells high by three cells wide. In this case, you decide that the first column
in the spreadsheet will contain the engine rpm, the second will contain the
shaft rpm, and the third will contain the engine temperature. Each line of the
spreadsheet will contain one set of data, so each line will contain data taken
one second after the data in the previous line. The following code is the while
(1) from the main() function of the code:
while (1)
{
if (! PINA.0) //Note: switch must be released
//before data is all
sent
{
unsigned char x; //temporary counter
//variable
//print column titles into
//the spreadsheet in the
//first row
printf (‘‘%s , %s , %s \n’’,
‘‘Engine RPM’’, ‘‘Shaft RPM’’,
‘‘Temperature’’);
for (x = 0; x< 120; x++)
{
// print one set of data into
// one line on spreadsheet
printf (‘‘%d , %d , %d \n’’,
e_rpm[x], s_rpm [x],
temp[x]);
}
}
};
8.6 Some Practical ADC: The ADC0809 IC 445
This routine sends the 120 sets of data (and a line of column titles) using the
built-in prinf() function. A new line character (‘\n’) is included with each set
of data to cause the next set to appear in the following line of the spreadsheet.
register. The 8-channel multiplexer can directly access any of eight analogue
input signals.
The 8-channel analogue multiplexer can switch the converter to any of the
input channels which are numbered from 0 to 7. These are selected by means
of the 3-bit address bus (ADDA-ADDC), as shown in Table 8.4. The upper
and lower limits of the analogue range are determined by setting the values
on the REF+ and REF− pins. The eight output pins D0/D7 are latched. The
input pin START will cause the conversion to commence, and the device will
signal the completion of the conversion by setting EOC (End Of Conversion).
If it is required to run the converter in continuous operation, then the START
and EOC pins can be connected together.
Figure 8.40 Timing diagram for reading the digital value from the ADC.
Task:
Using the circuit arrangement of Figure 8.39, write a programme, using
external interrupt 0, to put the number from the A-D conversion into address
50 h. Channel 0 is the selected analogue input.
Possible solution:
The MOVX instruction is used to input the data from the ADC0809 at the
appropriate time. The analogue channel is selected by outputs on Port 1 pins.
448 Analogue Data and Analogue Input/Output Subsystems
Possible solution:
The MOVX instruction is again used to read data from the external device.
The pins /CS and /XFER on the DAC are connected to P2.7 on the 8051, which
is the MSB of the 16-bit address. Therefore any address can be put into the
DPTR so long as the MSB is zero. In this case 7FFFh is chosen.
ORG 0000h
MOV A, 50h
MOV DPTR, #7FFFh
MOVX @DPTR, A ; Send data to DAC
SJMP $
DAC can be selected when bit 14 is zero, while the other requires bit 13 to be
zero. Typical values of addresses which would meet these requirements are
BFFFh and DFFFh respectively.
The operation is done in two stages. Firstly, the data for each DAC is
written to the relevant input register. Then the XFER line is brought low to
transfer the data into the DAC register for conversion. This can be done by
using an address with bit 15 cleared, such as 7FFFh.
The programme to achieve this is as follows:
ORG 0000h
LOOP: MOV A, P1 ; Put P1 data into A
MOV DPTR, #0BFFFh ; Address bit 14 clear
MOVX @DPTR, A ; Data to DAC input reg
MOV A, P3 ; Put P3 data into A
MOV DPTR, #0DFFFh ; Address bit 13 clear
MOVX @DPTR, A ; Data to DAC input reg
MOV DPTR, #7FFFh ; Address bit 15 clear
MOVX @DPTR, A ; Output both DACs
SJMP LOOP
In this chapter we introduced the meaning of data acquisition system and the
performance criteria that can be used to evaluate the overall data acquisition
452 Analogue Data and Analogue Input/Output Subsystems
8.14 Write the expression for analogue current, Ia of a 4-bit D/A converter.
Calculate values of Ia for input codes b3b2b1b0 = 0000, 0001, 1000,
1010, and 1111, if Iref = 1 mA.
8.15 a. Calculate the output voltage of an 8-bit bipolar DAC (offset binary)
for code = 00000000. FS = 8 V.
b. Calculate the output voltage of the same DAC for code = 11111111.
c. Write the code for FS/2 and state the value of the output voltage.
8.16 a. Calculate the output voltage of an 8-bit bipolar DAC (2’s complement)
for code = 00000000. FS = 8 V.
b. Calculate the output voltage of the same DAC for code = 11111111.
c. Write the code for FS/2 and state the value of the output voltage.
8.17 a. Determine the range of voltages that will generate a code of 00000000
for an 8-bit bipolar ADC (2’s complement coding) with a full scale range
of - 4 V to +4 V.
b. Determine the range of voltages that will generate a code of 11111111
for the same ADC.
c. Calculate the output code when the input voltage is at the halfway
point of its range.
8.18 a. Determine the range of voltages that will generate a code of 00000000
for an 8-bit unipolar ADC with a full scale value of 8 volts.
b. Determine the range of voltages that will generate a code of 11111111
for the same ADC.
c. Calculate the output code when the input voltage is at the halfway
point of its range.
8.19 An ADC has a full scale voltage of 8 V. Write the codes for 0 V, 2 V, 4 V,
and 6 V for the cases where the code is 4 bits, 6 bits, and 8 bits. Also
write the voltage for the highest code for each case.
(d) Write a simple software loop to repeat the measurements in part (c)
ten times. Plot your samples and the minimum and maximum error for
each cardinal point. Are the errors consistent across the range of values?
If not, how do you explain the differences?
9
Multiprocessor Communications
(Network — Based Interface)
9.1 Introduction
The main target of using microcontrollers and microprocessors in an embedded
system is to process data coming from input devices and/or send data to
control some output devices. System I/O devices and timing devices play
the most significant role in any embedded system. Some of the I/O devices,
e.g. timers and UART, are build-in the microcontroller chip. Such devices are
accessing the system processor internally. The majority of the I/O devices are
outside the microcontroller. Such I/O devices are connected and accessing
the microcontroller through a port with each port having an assigned port
address similar to a memory address.
Till recently the embedded system has two main features: (a) the
external I/O devices and the processor are local to the immediate environ-
ment, (b) all the needed processing are taking place within the microcon-
troller/microprocessor, i.e. the I/O devices are passive and have no role in
processing. Now, embedded systems are expanded into nearly every corner
of the modern world. Embedded systems, accordingly, become complex sys-
tems comprising potentially several processors, programmable logic devices,
455
456 Multiprocessor Communications
Data
XMTR RCVR
Signal Ground
(a)
Data
XMTR/ XMTR/
RCVR RCVR
Signal ground
(b)
Data
XMTR/ XMTR/
RCVR Data RCVR
Signal ground
(c)
Figure 9.1 Serial data connections classified by direction of transfer: (a) Simplex; (b) Half-
duplex; (c) Full-duplex.
458 Multiprocessor Communications
A simplex channel can transfer data in only one direction. One application
of the simplex channel is the ring network.
A half-duplex communication system, Figure 9.1b, allows information to
transfer in either direction, but in only one direction at a time. Half-duplex
is a term usually defined for modem communications, but it is possible to
expand its meaning to include any serial protocol that allows communication
in both directions, but in only one direction at a time. One application of the
half-duplex protocol is the desktop serial bus, or multidrop network.
A full-duplex communication system, Figure 9.1c, allows information
(data, characters) to transfer in both directions simultaneously. A full-
duplex channel allows bits (information, error checking, synchronization, or
overhead) to transfer simultaneously in both directions.
When the communications channel consists of wires or lines, the minimum
number required for a simplex channel is two: the data, or signal line, and a
signal ground. A half-duplex channel also requires only two lines, signal and
ground. However, there must be some method to “turn the line around” when
the direction of transfer is to change. This involves a protocol and logic to
establish one device as the transmitter and the other as the receiver. After
transfer in one direction is complete, the roles of transmitter and receiver
are reversed and transfer takes place in the opposite direction, on the same
signal line. Full- duplex channels require two signal lines and a signal ground.
Each signal line is dedicated to data transfer in a single direction, allowing
simultaneous transfer in opposite directions.
A frame is a complete and nondivisable packet of bits. A frame includes
both information (e.g., data, characters) and overhead (start bit, error checking,
and stop bits). A frame is the smallest packet that can be transmitted. The
RS232 and RS422 protocols, for example, have one start bit, seven/eight data
bits, no/even/odd parity, and one/1.5/two stop bits (Figure 9.2).
Parity is generated by transmitter and checked by the receiver. Parity is
used to detect errors. The parity bit can be calculated using a built-in Parity
Generator that calculates automatically the parity bit for the serial frame data
+12V False
−12V B0 B1 B2 B3 B4 B5 B6 True
Stop
Idle Start 7 bit data Parity Stop
Figure 9.2 A R232 Frame (R232 uses negative logic).
9.2 Serial Communications Channels 459
overhead include:
two seconds to send each bit, the receiver will examine the signal to determine
if it is a 1 or a 0 after one second has passed, then it will wait two seconds and
then examine the value of the next bit, and so on.
The sender does not know when the receiver has “looked” at the value of
the bit. The sender only knows when the clock says to begin transmitting the
next bit of the word.
When the entire data word has been sent, the transmitter may add a Parity
Bit that the transmitter generates. The Parity Bit may be used by the receiver
to perform simple error checking. After the parity bit, if any, at least one Stop
Bit is sent by the transmitter.
When the receiver has received all of the bits in the data word, it may check
for the Parity Bits (both sender and receiver must agree on whether a Parity
Bit is to be used), and then the receiver looks for a Stop Bit. If the Stop Bit
does not appear when it is supposed to, the UART considers the entire word
to be distorted and will report a Framing Error to the host processor when
the data word is read. The usual cause of a Framing Error is that the sender
and receiver clocks were not running at the same speed, or that the signal was
interrupted.
Regardless of whether the data was received correctly or not, the UART
automatically discards the Start, Parity and Stop bits. If the sender and receiver
are configured identically, these bits are not passed to the host.
If another word is ready for transmission, the Start Bit for the new word
can be sent as soon as the Stop Bit for the previous word has been sent.
Because asynchronous data is “self synchronizing”, if there is no data to
transmit, the transmission line can be idle.
Along with “odd” and “even” parity, there is also “No”, “Mark,” and
“Space” parity. “No” parity means that no parity bit is transmitted with the data
in the packet. “Mark” or “Space” parity means that a “1” or “0,” respectively,
is always sent after the data in the packet. This type of parity is very rarely
used, and when it is used, the reason for its use is to allow the receiver time to
process the data before the next data byte comes along.
The number of stop bits is also an option, for the same reasons as mark
and space parity. A second stop bit can be inserted in the data packet to give
the receiver more time to process the received byte before preparing to receive
the next one.
In virtually all modern asynchronous communications, the data format is
“8-N-1,” which means 8 data bits, no parity, and one stop bit as the packet
format. The parity and the additional end interval are generally not required
for serial communications.
9.3 Asynchronous Serial Communication: UART 463
With such key elements in place, the asynchronous data recovery at the
receiving end takes place, roughly, in two phases:
1. Synchronize the Internal Clock: The system uses this phase for defining
a valid “start bit” that will be used to synchronize the internal clock to
the incoming serial frame. This phase is called “Asynchronous Clock
Recovery”.
2. Data Recovery: When the Receiver clock is synchronized to the start bit,
the second phase, which is the data recovery, starts. This phase is called
“asynchronous data recovery”.
For example if the baud rate is 9600 and the oversampling rate is selected to
be 16, then the bit timer is running with clock rate of 153.6 KHz.
When the clock recovery logic detects a high (idle) to low (start) transition
on the input line (RxD line), the start bit detection sequence is initiated; the
bit timer is enabled. Let sample 1 denotes the first zero-sample as shown in
the Figure 9.5b. The clock recovery logic then uses samples 8, 9, and 10 for
Normal mode, and samples 4, 5, and 6 for Double Speed mode (indicated
with sample numbers inside boxes on the figure), to decide if a valid start bit
is received. If two or more of these three samples have logical high levels (the
majority wins), the start bit is rejected as a noise spike “glitch” and the receiver
starts looking for the next high to low-transition. If however, a valid start bit is
detected, the clock recovery logic is synchronized and the data recovery can
begin. The synchronization process is repeated for each start bit. The block
diagram in Figure 9.5a illustrates the design.
The above mentioned process of clock synchronization is used for both
hardware (use of clock recovery logic) and software asynchronous data
9.3 Asynchronous Serial Communication: UART 465
Sample Clock
3 bit (or 4 bit) (Counter MSB)
Counter
Receiver Start
Clock Detect
Data Enable
Shift Register
(a)
R×D
Over-speed :
16 (U2X =0)
8 (U2X =1)
(b)
Figure 9.5 (a) System utilization bit sampling at the receiver, (b) Start Bit Sampling.
receiving. In case of software receiver, a timed loop is used for the half bit and
full bit delays.
It is important to mention here that when using oversampling of 16, the
output of the most significant bit of the bit timer makes a 0-1 transition when
the count advances from 7 (0111) to 8 (1000). That transition can be used to
implement other functions that will be used by the system for data recovery.
For example, it can be used to clock the incoming data into a shift register. The
same transition can be used to increment a second bit counter that counts the
number of incoming data bits that have been stored. When the agreed upon
(by the sender and receiver) number of bits have been stored, it is known that
a full character has been received. Both the bit timer and bit counter return to
the quiescent state awaiting the next start bit.
second and succeeding data bits. As in case of start bit the decision of the logic
level of the received bit is taken by doing a majority voting of the logic value to
the three samples in the center of the received bit. This majority voting process
acts as a low pass filter for the incoming signal on the input pin (RxD pin). The
recovery process is then repeated until a complete frame is received, including
the first stop bit. Note that the receiver only uses the first stop bit of a frame.
The same majority voting is done to the stop bit as done for the other bits
in the frame. If the stop bit is registered to have a logic 0 value, a Frame Error
(FE) Flag in the UART status register will be set. The bit counter helps to
achieve that; after counting the agreed number of data bits, it is expected that
the next bit is a stop bit. A new high to low transition indicating the start bit of
a new frame can come right after the last of the bits used for majority voting.
provide additional circuits for signals that can be used to indicate the state
of the transmission media, and to regulate the flow of data in the event that
the remote device is not prepared to accept more data. For example, when the
device connected to the UART is a modem, the modem may report the presence
of a carrier on the phone line while the computer may be able to instruct the
modem to reset itself or to not take calls by raising or lowering one more of
these extra signals. The function of each of these additional signals is defined
in the EIA RS232-C standard.
• Baud Rate Register — UBRR: This register determines the baud rate
at which the serial port transmits and receives data.
The following sections discuses the serial communication as implemented
by AVR microcontrollers. At first we are going to deal with the AVR AT90S
series that has single UART, and then we are going to discuss the USART of
the AVR Mega series.
Main Modem
Computer
DCE
DTE Serial Channel
UART UART
Interface
RS-232
Bit 7 6 5 4 3 2 1 0
RXB[7:0] UDR Read
TXB[7:0] UDR Write)
Read/Write R/W R/W R/W R/W R/W R/W R/W R/W
Initial Value 0 0 0 0 0 0 0 0
• 8 or 9 bits data
• Noise filtering by oversampling
• Overrun detection
• Framing Error detection
• False Start Bit detection
• Three separate interrupts on TX Complete, TX Data Register Empty and
RX Complete
The UART Data Register or UDR (Figure 9.8) is actually two registers
sharing a single I/O address ($0C): one a read-only register and the other a
write-only register.
• The read-only register, or the USART Receive Data Buffer Register
(RXB), contains any serial byte received, and
• The write-only register, or the USART Transmit Data Buffer Register
(TXB), contains any serial byte to be transmitted.
So, when a programme reads the UDR, it is reading the receive UDR to
get data that has been received serially. When the programme writes to the
transmit UDR, it is writing data to be serially transmitted.
In other words, the Transmit Data Buffer Register (TXB) will be the
destination for data written to the UDR Register location. Reading the UDR
Register location will return the contents of the Receive Data Buffer Register
(RXB).
UART Control Register (UCR)
Figure 9.9 shows the bit definitions for the UART control register, UCR.
The UART control register is used to initialize and set the function of the
UART. The most significant three bits, as mentioned before, are the mask bits
for the three interrupts associated with the UART.
The UART Status Register (USR)
The current state of the UART is reflected in the UART status register.
The status register has bits to indicate when a character has been received,
when a character has been transmitted, when the transmit UDR is empty,
470 Multiprocessor Communications
Bit 7 6 5 4 3 2 1 0
RXCIE TXCIE UDRIE RXEN TXEN CHR9 RXB8 TXB8
Bit Description
RXCIE Mask bit for the receiver interrupt enable. Set to unmask the interrupt. .
TXCIE Mask bit for the Transmit interrupt enable. Set to unmask the interrupt.
UDRIE Mask bit for the UART Data Register Empty interrupt enable. Set to unmask the interrupt.
Bit 7 6 5 4 3 2 1 0
RXC TXC UDRE FE DOR PE U2X MPCM UCSRA
Read/Write R R/W R R R R R/W R/W (b)
Initial value 0 0 1 0 0 0 0 0
and any errors that may have occurred. The microcontrollers that have UART
(e.g.AT90S8515) have one 8 bits status register USR that is completely sepa-
rate from the control register UCR. Figure 9.10 shows the contents of the 8-bit
USR of the AVR AT90 series, and Figure 9.11 gives the definition of each bit.
The status register USR/USCR is important due to the fact that serial
communication is always slower than parallel communication. During the
9.3 Asynchronous Serial Communication: UART 471
1.04 milliseconds that it takes to transmit a single serial byte at 9600 baud, a
microcontroller using a system clock of 8 MHz can execute as many as 8000
instructions. So, in order for the microcontroller not to be waiting around for
the serial port, it is important for the programme to be able to tell the state
of the serial port. In the case of a received character, it takes the same 1.04
milliseconds from the time the start bit is received until the character has been
completely received. After the eighth bit is received, RXC bit is set in the
USR/USCR to indicate that a serial byte has been received. The programme
uses the RXC bit to know when to read valid data from the receive UDR. The
data must be read from the UDR before the next character is received, because
it will overwrite and destroy the data in the UDR. This explains the provision
for an interrupt to occur when a serial character is received so that it may be
read promptly without consuming large amounts of processor time polling to
see if a byte has been received.
In a similar manner, the UDRE bit is used to indicate that the transmit
UDR is empty and that another byte may be loaded into the transmit UDR to
be transmitted serially. Again, this is necessary because the microcontroller
can load out bytes much faster than the UART can transmit them. In order to
keep the programme from having to poll the USR continuously to see when it
is available to send another byte, an interrupt is provided that indicates when
the transmit UDR is empty.
The transmitter side of the UART is actually double buffered. That is, the
UDR that the programme writes to holds the data until the actual transmit
register is empty. Then the data from the UDR is transferred to the transmit
register and begins to serially shift out on the transmit pin. At this point, the
UDR is available to accept the next data word from the programme, UDRE
flag is set high and, if enabled, the UDRE interrupt occurs. Occasionally it is
necessary for the microcontroller to actually know when a byte has been sent.
The TXC flag is provided to indicate that the transmit register is empty and
that no new data is waiting to be transmitted. The programme uses this flag,
and its associated interrupt, when it is necessary to know exactly when the
data has been sent.
Baud Rate Register UBRR
The final register associated with the UART is the UART baud rate register,
UBRR. This register determines the baud rate at which the serial port transmits
and receives data. The number entered in the UBRR is determined in case of
UART according to the following formula:
Bit 7 6 5 4 3 2 1 0
$09 ($29) MSB LSB UBRR
Read/Write R/W R/W R/W R/W R/W R/W R/W R/W
Initial Value 0 0 0 0 0 0 0 0
Figure 9.12 AVR Baud Rate Register (UBRR).
Example 9.2: Calculate the number to be entered in the UBRR to transmit the
data asynchronously with baud rate of 9600. Assume the internal clock has a
frequency of 8 MHz.
Answer: Using the formula of the normal UART, the UBRR number is
calculated as follows:
The microcontrollers that handle asynchronous mode only, i.e. that have
UART, uses an 8-bit read/write register to store the UBRR number. The con-
tents of this register, the UBRR register, specify the UART Rate according
to the above equation. In many AVR microcontroller this register is at I/O
location $09 ($29). Figure 9.12 shows the AVR UBRR register.
The following example shows how to handle the UBRR register
Example 9.3: The following example prints the message “Yabba Dabba Do”
on a terminal device. The programme considers the case of asynchronous
transmission by baud rate of 9600 and internal frequency clock of 8 MHz.
To transmitter with baud rate of 9600 using an internal frequency of 8MHz,
the UBRR number as calculated in Example 9.2 is 0x33. The programme in
case of using AT90S8535 microcontroller is as follows:
#include <90s8535.h>
#include <stdio.h> //Standard Input/Output
//functions
void main(void)
{
UCR=0x18; //serial port initialized
UBRR = 0x33; //Set baud rate
Printf (‘‘\n\rYabba Dabba Do’’); //print
//phrase
While (1)
9.3 Asynchronous Serial Communication: UART 473
; // do nothing else
}
As the simple serial example shows, instead of initializing the serial port,
the first two lines of main() serial communication is extremely easy using the
built-in library functions of CodeVisionAVR. CodeVisionAVR, like most C
language compilers, provides built-in library functions to handle the common
serial communication tasks. These tasks usually involve communicating with a
terminal device, for example, a PC executing a terminal programme to transmit
serially the characters typed on the keyboard and to display in a window
the characters received serially from the microcontroller. CodeVisionAVR
provides a built-in terminal emulator in the development environment as a
convenience to the programmer. The standard library functions are included
in the header file stdio.h, which may be included in the C language programme.
Using the header file with its built-in functions makes serial communication
very easy. This will be shown during the following discussions.
Data Transmission Using UART
A block schematic of the UART transmitter is shown in Figure 9.13.
Data transmission is initiated by writing the data to be transmitted to the
UART I/O Data Register, UDR. The data is transferred from UDR to the
Transmit shift register when:
• A new character has been written to UDR after the stop bit from the
previous character has been shifted out. The shift register is loaded
immediately.
• A new character has been written to UDR before the stop bit from the
previous character has been shifted out. The shift register is loaded when
the stop bit of the character currently being transmitted has been shifted
out.
If the 10(11)-bit Transmitter shift register is empty, data is transferred from
UDR to the shift register. At this time the UDRE (UART Data Register Empty)
bit in the UART Status Register, USR, is set. When this bit is set (one), the
UART is ready to receive the next character. At the same time as the data is
transferred from UDR to the 10(11)-bit shift register, bit 0 of the shift register
is cleared (start bit) and bit 9 or 10 is set (stop bit). If 9 bit data word is selected
(the CHR9 bit in the UART Control Register, UCR is set), the TXB8 bit in
UCR is transferred to bit 9 in the Transmit shift register.
On the Baud Rate clock following the transfer operation to the shift register,
the start bit is shifted out on the TXD pin. Then follows the data, with LSB
474 Multiprocessor Communications
first. When the stop bit has been shifted out, the shift register is loaded if
any new data has been written to the UDR during the transmission. During
loading, UDRE is set. If there is no new data in the UDR register to send when
the stop bit is shifted out, the UDRE flag will remain set until UDR is written
again. When no new data has been written, and the stop bit has been present
on TXD for one bit length, the TX Complete Flag, TXC, in USR is set.
The TXEN bit in UCR enables the UART transmitter when set (one).
When this bit is cleared (zero), the PD1 pin can be used for general I/O. When
TXEN is set, the UART Transmitter will be connected to PD1, which is forced
to be an output pin regardless of the setting of the DDD1 bit in DDRD.
Data Reception
Figure 9.14 shows a block diagram of the UART Receiver.
9.4 The EIA-232 Standard 475
PIN
2 TXD- Transmitted Data
3 RXD-Received Data
4 TTS- request to Send
5 CTS-Clear to Send
6 DSR- Data Set Ready
7 SG- Signal Ground
8 CD- Carrier Detect
20 DTR- Data Terminal Ready
22 RI- Ring Indicator
DTE DCE
1 CD - Carrier Detect
2 TXD - Transmitted Data
3 RXD - Received Data
4 DTR - Data Terminal Ready
5 SG - Signal Ground
6 DSR - Data Set Ready
7 RTS - Request to Send
8 CTS - Clear to Send
9 RI - Ring Indicator
Figure 9.16 Pin Numbering for the DB-9 and DB-25 EIA-232 Connectors.
connector. The standard specifies signals for 22 of the available pins. Today
a subset of these appears on a wide variety of computing equipment, with
cables terminating in DB9P and DB9S connectors. In either case, the original
cabling is parallel, straight through — no crossover connections. The drawings
in Figure 9.16 show the pin numbers on the DB-9 and DB-25 connectors as
well as the EIA-232 inputs/outputs to which they correspond.
In addition to the lines for exchanging data, the standard also specified a
number of handshaking and status lines. In Figure 9.16, the more commonly
used signals are given. In the following we give them in more detail.
9.4 The EIA-232 Standard 477
Timing signals
Some synchronous devices provide a clock signal to synchronize data
transmission, especially at higher data rates. Two timing signals are provided
by the DCE on pins 15 and 17. Pin 15 is the transmitter clock, or send timing
(ST); the DTE puts the next bit on the data line (pin 2) when this clock
transitions from OFF to ON (so it is stable during the ON to OFF transition
when the DCE registers the bit). Pin 17 is the receiver clock, or receive timing
(RT); the DTE reads the next bit from the data line (pin 3) when this clock
transitions from ON to OFF.
Alternatively, the DTE can provide a clock signal, called transmitter timing
(TT), on pin 24 for transmitted data. Again, data is changed when the clock
transitions from OFF to ON and read during the ON to OFF transition. TT can
be used to overcome the issue where ST must traverse a cable of unknown
length and delay, clock a bit out of the DTE after another unknown delay, and
return it to the DCE over the same unknown cable delay. Since the relation
between the transmitted bit and TT can be fixed in the DTE design, and since
both signals traverse the same cable length, using TT eliminates the issue.
TT may be generated by looping ST back with an appropriate phase change
478 Multiprocessor Communications
to align it with the transmitted data. ST loop back to TT lets the DTE use the
DCE as the frequency reference, and correct the clock to data timing.
they can be used. Finally, note that the data link remains in the marking state
(< −3 V) until the start bit, a space (> +3 V), is sent.
RD CTS CTS
CD CD
#include <90S8535.h>
unsigned char qcntr, sndcntr; //indexes into the que
unsigned char queue[50]; //character queue
while (1)
{
if (USR & 0x80) //check for character received
{
ch = UDR; //get character sent from PC
switch (ch)
{
case ‘a’:
sendmsg (msg1); //send first message
break;
case ‘b’:
sendmsg (msg2); //send second message
break;
default:
sendmsg (msg3); //send default message
}
}
}
}
The first two lines in main() initialize the UART The UCR is loaded with
0 × 58 to enable the receiver, the transmitter, and the transmit interrupt. The
UBRR is set to 0 × 33 to set the baud rate to 9600. Once the UART is set
up and the interrupt is enabled, the programme enters a while (1) loop that
continuously checks for a received character using an If statement, whose
expression will evaluate to TRUE after a character is received. Receiving the
character will set the RXC bit in the USR, which is being tested by the If
statement. This is an example of manually polling the status of the serial port
to determine when a character is received.
When the If statement determines that a character has been received, the
character is read from the UDR and used in a switch statement to determine
which message to send. The message being transmitted (such as “That was
an a.”) is composed of several characters, each of which must be sent serially
to the PC along with a carriage return, CR, and a line feed, LF, character so that
each message will start on a new line. The longest message is composed of 25
characters, and adding the CR and LF characters makes the total message 27
characters long. At 9600 baud (1.04 ms per serial byte) this message will take
over 27 milliseconds to transmit. It is not appropriate for the microcontroller
to wait around for 27 milliseconds while this message is being sent, because
it could be executing as many as 216,000 instructions (8 instructions per mi-
crosecond ∗27 milliseconds ∗1000 microseconds per milliseconds = 216,000
instructions). In order to free the microcontroller from the need to wait while
messages are being transmitted, a FIFO queue is used in conjunction with the
transmit interrupt to actually transmit the message.
A queue is a temporary holding device to hold bytes of data and return
them on a first in, first out (FIFO) basis. In this case, the queue is mechanized
as a variable array of data called, appropriately, “queue”. An index, “qcntr”,
is used to indicate where new data is to be placed in the queue. As each new
byte is added to the queue, the index is incremented so the next byte is added
in sequence and so on, until the queue holds all of the necessary data.
A separate index, “sndcntr”, is used to retrieve the data from the queue.
As each byte is retrieved, this index is incremented and finally, when the two
indices are equal, the programme knows that the queue has been emptied.
Actually, the CodeVisionAVR C language compiler can provide either a
transmitter queue or a receiver queue or both using the CodeWizardAVR code
generator feature. This example is provided to demonstrate how the queue
works for educational reasons.
Getting back to the example programme, the sendmsg() function called
from the switch state ment puts the message in the queue and starts the transmit
484 Multiprocessor Communications
function. The switch statement passes a pointer (an address) to the appropriate
message when it calls the function. The function first puts the CR and LF
characters in the queue and then puts the message to be transmitted in the
queue using the pointer. Finally, the function writes the first byte in the queue
into the UDR to start the transmission process. After this character has been
transmitted, the TXC interrupt occurs, the ISR loads the next character from
the queue into the UDR, and so the cycle continues until the queue is empty,
as indicated by the two indices being equal.
The most popular form of serial bus protocol is “I2C”, which stands for
“Inter-Integrated Circuit”. Sometimes the bus is called IIC or I2 C bus.This
standard was originally developed by Philips Semiconductors in the late 1970s
as a method to provide an interface between microprocessors and peripheral
devices without wiring full address, data, and control busses between devices;
more specifically, it was designed to provide an easy way to connect a CPU
to peripheral chips in a TV-set.
The problem that faced the designers at that time, and till now facing
many of them, is that the peripheral devices in embedded systems are often
connected to the microcomputer unit (normally a microcontroller) as memory-
mapped I/O devices, using the microcontroller’s parallel address and data bus.
This result in lots of wiring on the PCB’s to route the address and data lines.
The number of wires increases if we consider the need to connect also a
number of address decoders and glue logic to connect everything. In mass
production items such as TV-sets, VCR’s and audio equipment, this is not
acceptable. In such applications every component that can be saved means
increased profitability for the manufacturer and more affordable products for
the end customer. Another problem directly related to the use of a lot of
wires is the possible interference: lots of control lines imply that the system is
more susceptible to disturbances by Electromagnetic Interference (EMI) and
Electrostatic Discharge (ESD).
The need for a more effective and economical way of connecting different
devices (normally ICs) resulted in introducing the 2-wire communication bus
that we call it now I2C bus protocol. This protocol enables peripheral ICs
in electronic systems to communicate with each other using simple commu-
nication hardware. The ICs can be on the same board or linked via cables.
The length of the cable is limited by the total bus capacitance and the noise
generated on the bus.
9.5 Inter-Integrated Circuits (I2C) 485
Today, I2C has become a de facto world standard that is now implemented
in over 1000 different ICs and is licensed to more than 50 companies including
some of the leading chip manufacturers like Xicor, ST Microelectronics, Infi-
neon Technologies, Intel, Texas Instruments, Maxim, Atmel, Analog Devices
and others.
Figure 9.21 Connecting devices of the same type to form I2C system.
Figure 9.22 I2C bus Supporting Devices Operating on different voltage levels.
of the bus. This results because of the effect of the RC constant on the slew
rate of the edge of the signals. At a certain point, and because of the slew rate,
the ICs will not be able to distinguish clearly between logic 1 and logic 0.
Increasing the length of the line has another effect: it increases the possibility
of getting reflections at high speed. The reflected waves can corrupt the data
on the bus to the extent that it becomes unreadable.
To avoid such problems a number of strict electrical specifications are to
be followed when using I2C bus. According to the specification the length of
the bus is limited by the total bus capacitance: the total capacitance of the bus
remains under 400 pF.
The original I2C specification allows a data transfer rates of up to 100
kbits/s and 7-bit addressing. Seven-bit addressing allows a total of 128 de-
vices to communicate over a shared 1 bus. With increased data transfer rate
requirements, the I2C specification has been recently enhanced (Version 1.0 —
1992 and Version 2.0-1998) to include fast-mode, 3.4 Mbits/s, with 10-bit
addressing. At present I2C range includes more than 150 CMOS and bipolar
I2C-bus compatible types for performing communication functions between
intelligent control devices (e.g. microcontrollers), general-purpose circuits
(e.g. LCD drivers, thermal sensors, remote I/O ports, RAM, Flash and EPROM
memories) and application-oriented circuits (e.g. digital tuning and signal pro-
cessing circuits for radio and video systems). Other common devices capable
of interfacing to an I2C bus include real time clocks and watchdog timers.
In Figure 9.23 we depict a sample of I2C network with four devices attached to
the bus. In reality any number of devices may be connected to the bus as long
488 Multiprocessor Communications
as its length will keep its total capacitance under 400 pF. The system given in
Figure 9.23 is used here to explain the basic operation of I2C. In Figure 8.23,
the device that initiates a transaction on the I2C bus is termed the master. The
master normally controls the clock signal. A device being addressed by the
master is called a slave. In our case, the master is the microcontroller, the other
three devices, a temperature sensor, an EEPROM, and a LCD-controller, are
slaves. Each of the slave devices is assigned a unique address (Figure 9.23a).
The address comprises 7 bits. The four most significant bits (A7 to A3) identify
the category of the device being addresses. The three least significant bits (A2
to A0) identify a programmable hardware address assigned to the device. Thus,
up to eight instances of the same device can be included in the system. For
example, as we are going to see later, the code 1010 is reserved for the serial
9.5 Inter-Integrated Circuits (I2C) 489
EEPROM, and then no more than eight EEPROMs may be connected to the
system. The address of all the eight devices starts with 1010 concatenated with
one of the combinations 000 to 111. The seven bits of the address is followed
by a direction bit (W/R bit) which is used to inform the slave if the master is
writing (W/R = 0) to it or reading from it (W/R = 1).
Only master devices can initiate a data transfer on an I2C bus. The protocol
is a multi-master bus; it does not limit the number of master devices on an
I2C bus, but typically, in a microcontroller-based system, the microcontroller
serves as the master. Both master and slave devices can be senders or receivers
of data. This will depend on the function of the device. In the example given in
Figure 9.23, the microcontroller and EEPROM send and receive data, while
the temperature sensor sends data and the LCD-controller receives data. In
Figure 9.23(a), arrows connecting the devices to the I2C bus wires depict the
data movement direction. Normally, all the slave devices connected to I2C
bus assert high-impedance on the bus while the master device maintains logic
high, signaling an idle condition.
Frame- Start/Stop conditions: When the I2C bus is in the “idle state”,
both the clock and the data lines are not being driven and are pulled high. To
begin any data transfer on an I2C bus, a “start” condition is put on the bus.
A start condition is shown in Figure 9.23(b). A high to low transition of the
SDA line while the SCL signal is held high signals a start condition. All data
transfers on an I2C bus are terminated by a “stop” condition. A stop condition
is shown in Figure 9.23(b). A low to high transition of the SDA line while the
SCL signal is held high signals a stop condition.
Data Transfer: Actual data is transferred in between start and stop condi-
tions. Data is transmitted in a synchronous fashion, with the most significant bit
sent first. Data can only change when SCL line is in the LOW state. A command
is sent from the master to receiver in the format shown in Figure 9.23(d). Next
we explain the steps needed for a master write and master read operations
(normally called bus events).
I2C Bus Events: Transmitting a byte to a slave device
To write to a slave, the write cycle is as follows:
• The master device initiates the transfer by a start condition. This acts as
an ’Attention’ signal to all of the connected devices. All ICs on the bus
will listen to the bus for incoming data.
• Once the start condition has sent, the master sends a byte to the slave. This
first byte is used to identify the slave on the bus and to select the mode of
operation. As shown in Figure 9.23c, the 7 bits forming the address of the
490 Multiprocessor Communications
slave starting with the most significant bit (MSB) is send at first followed
by the eighth bit that defines the direction of the data. The eighth bit is
labeled R/W in the figure with 1 for “read” and 0 for “write”. Here, the
bit value is placed on the SDA line by the master device while the SCL
line is low and maintained stable until after a clock pulse on SCL.
• If performing a write, directly after sending the address of the receiving
device, the master sends a zero.
• Having received the address, all the ICs will compare it with their own
address. If there is no match, the ICs will ignore the rest of this transac-
tion and wait for the next, i.e. wait until the bus is released by the stop
condition. On the other hand, if the address matches, the receiving device
will respond by producing an acknowledge signal. The receiving device
produces this signal by holding the SDA line low during the first ACK
clock cycle.
• Having sent the I2C address (and the internal register address) the master
can now send the data byte (or bytes). The master device transmits a byte
of data starting with the most significant bit down to the least significant
bit. The receiving device, in this case one of the slaves, acknowledges
the reception of data by holding the SDA line low during the second
ACK clock cycle. This means that for every 8 bits transferred, the device
receiving the data sends back an acknowledge bit, so there are actually 9
SCL clock pulses to transfer each 8 bit byte of data. If the receiving device
sends back a low ACK bit, then it has received the data and is ready to
accept another byte. If it sends back a high, then it is indicating it cannot
accept any further data and the master should terminate the transfer by
sending a stop sequence.
• All the master has to do (see Figure 9.24) is generate a rising edge on the
SCL line (2), read the level on SDA (3) and generate a falling edge on
the SCL line (4). The slave will not change the data during the time that
SCL is high. (Otherwise a Start or Stop condition might inadvertently
be generated.). During (1) and (5), the slave may change the state of the
SDA line.
• In total, this sequence has to be performed 8 times to complete the data
byte.
The meaning of all bytes being read depends on the slave. There is no
such thing as a "universal status register". The user needs to consult the data
sheet of the slave being addressed to know the meaning of each bit in any byte
transmitted
Possible modifications on the timing diagram: Repeat start condition
Figure 9.23(d) depicts a timing diagram of a typical read/write cycle.
Sometimes this timing diagram may change. Some of the possible changes
are (See Figure 9.25):
• Acknowledge Extension: Under normal circumstances, following the
ACK bit time, the master will release the SCL line so that transmission
may continue with the next byte. If, however, the receiver (which is
usually a “slave device” or “master”) is temporarily unable to proceed,
it will hold the SCL line LOW, thereby extending the ACK interval. In
other words it is allowed for the acknowledge bit to float high. When the
receiver is ready able to proceed again, it will release the SCL line and
transmission continues. The timing diagram in Figure 9.25 illustrates the
ACK interval extension.
• Multi-register Slaves: While explaining the write/read cycle we assumed
that the slave is simple and has only one register. In such a case just
sending the address of the slave is enough. In many cases, the slave has
492 Multiprocessor Communications
many registers each has its own address within the slave device. If the
master wants to write at specific register within the slave, it must send
this address. In such cases, having addressed the slave device the master
must now send out the internal location or register number inside the
slave that it wishes to write to or read from. This number is obviously
dependant on what the slave actually is and how many internal registers
it has. In such a case the master can continue to send data bytes to the
slave and these will normally be placed in the following registers because
the slave will automatically increment the internal register address after
each byte. When the master has finished writing all data to the slave, it
sends a stop sequence which completes the transaction.
• Repeated Start: During an I2C transfer there is often the need to first send
a command and then read back an answer right away. This has to be done
without the risk of another (multimaster) device interrupting this atomic
operation. The I2C protocol defines a so-called repeated start condition.
After having sent the address byte (address and read/write bit) the master
may send any number of bytes followed by a stop condition. If, however,
it wishes to establish a connection with a different slave, rather than issue
the Stop, the master will issue another Start, using the address of the new
device and then sends more data. This is defined recursively allowing any
number of start conditions to be sent. The purpose of this is, in general,
to allow combined write/read operations to one or more devices without
9.5 Inter-Integrated Circuits (I2C) 493
releasing the bus and thus with the guarantee that the operation is not
interrupted.
• Also we have to mention here that, in some devices, a start bit has to be
resent to reset the receiving device for the next command e.g., in a serial
EEPROM read, the first command sends the address to read from, the
second reads the data at that address
Multiple address is shown in Figure 9.26.
I2C Addresses Standard: Special Addresses and Exceptions
The first byte of an I2C transfer contains the slave address and the data
direction (Figure 9.23). The address is 7 bits long, followed by the direction
bit (W/R bit). A 7-bit wide address allows, theoretically 128 I2C addresses —
however, some addresses are reserved for special purposes. Only 112 addresses
are available with the 7 bit address scheme.
As a matter of fact, I2C gives a loose standard for the address. It uses the
most significant four bits to identify the type of the device and the next three
bits are used to specify one of eight devices of this type (or further specify
the device type). Also, some devices require certain patterns for the last three
bits, while others (such as large serial EEPROMs) use these bits to specify
an address inside the device. This shows the importance of mapping out the
devices to be put on the bus and all their addresses.
In the I2C address map there are also what is called “reserved addresses”.
Table 9.2 shows the special addresses
(1) SDA and SCL are not used here but may be used for other functions.
(2) To input filter
(3) Only the active master can enable its current-source pull-up circuit
(4) Dotted transistors are optional open-drain outputs which can stretch the serial signal
SCLH
Figure 9.28 HS-device connection.
only used in mixed-speed bus systems and are not connected in an Hs-mode
only system. In such cases, these pins can be used for other functions.
As long as the two CPUs monitor what is going on the bus (start and stop)
and as long as they are aware that a transaction is going on because the last
issued command was not a STOP, there is no problem. The problems arise
if the two CPUs decided to initiate the “start condition” at the same time or
if one of the CPUs missed the start condition and still think that the bus is
idle. As a fact this is not the only problem that may arise when a system has
multiple masters. A second problem that can arise is the case when having
multiple clocks in the system. The first problem is resolved by “arbitration”
and the second by “synchronization”. The two problems and their solution
are discussed next.
9.5.4.1 Arbitration
For proper functioning in case of multi master, each device needs to be able
to cooperate with the fact that another device is currently talking and the bus
is therefore busy. This can be translated into:
• Being able to follow arbitration logic. If two devices start to communicate
at the same time the one writing more zeros to the bus (or the slower de-
vice) wins the arbitration and the other device immediately discontinues
any operation on the bus.
• Bus busy detection. Each device must detect an ongoing bus communi-
cation and must not interrupt it. This is achieved by recognizing traffic
and waiting for a stop condition to appear before starting to talk on the
bus.
The physical I2C bus setup is designed not only to help the devices to
monitor what is going on the bus but more importantly, it is designed to
prevent any risk of data corruption that may arise from data collision. The bus
monitoring and collision avoidance is discussed next.
Bus monitoring:
The I2C bus structure is a wired AND. This means that if one device pulls
a line low it stays low and accordingly any device can test if the bus is idle or
498 Multiprocessor Communications
occupied. When a master (the sender) changes the state of a line to HIGH, it
MUST always check that the line really has gone to HIGH. If it stays low then
this is an indication that the bus is occupied and some other device is pulling
the line low.
Therefore the general rule of thumb in I2C bus is: If a master (a sender)
cannot get a certain line to go high, it lost arbitration and needs to back off
and wait until a stop condition is seen before making another attempt to start
transmitting.
Possibility of Collision:
Since the previous rule says that a master loses arbitration when it cannot
get either SCL or SDA to go high when needed, the problem of data corruption
(or data collision) does not exist. It is the device that is sending the ’0’ that rules
the bus. One master cannot disturb the transmission of other master because
if it cannot detect one of the lines to go high, it backs off, and if it is the other
master that cannot do so, it will behave the same.
This kind of back-off condition will only occur if the two levels transmitted
by the two masters are not the same. As an example, let’s consider Figure 9.30,
where two CPUs start transmitting at the same time.
The two CPUs are accessing a slave in write mode at address 1111001.
The slave acknowledges this. So far, both masters are under the impression
that they “own” the bus. Now CPU1 wants to transmit 01010101 to the slave,
while CPU2 wants to transmit 01100110 to the slave. The moment the data
bits do not match anymore (because what the CPU sends is different than what
is present on the bus) one of them loses arbitration and backs off. Obviously,
this is the CPU which did not get its data on the bus. For as long as there has
been no STOP present on the bus, it won’t touch the bus and leave the SDA
and SCL lines alone (shaded zone). The moment a STOP was detected, CPU2
can attempt to transmit again.
From the example above we can conclude that it is the master that is pulling
the line LOW in an arbitration situation that always wins the arbitration. The
master which wanted the line to be HIGH when it is being pulled low by
9.5 Inter-Integrated Circuits (I2C) 499
the other master loses the bus. We call this a loss of arbitration or a back-off
condition. When a CPU loses arbitration, it has to wait for a STOP condition
to appear on the bus. Then it knows that the previous transmission has been
completed.
LOW period, and its HIGH period determined by the one with the shortest
clock HIGH period.
drawback to this technique except the loss of speed / bandwidth and some
software overhead in the masters.
It is possible to use this mechanism between masters in a multi-master
environment. This can prevent other master from taking over the bus. In a
two-master system this is not useful. But as soon as the system has three or
more masters this is very handy. A third master cannot interrupt a transfer
between master 1 and 2 in this way. For some mission-critical situations this
can be a very nice feature.
It is possible to make this technique more rigid by not pulling only the SCL
line low, but also the SDA line. Then any master other than the two masters
talking to each other will immediately back off. Before continue, the master
must first let SDA go back high, and then SCL, representing a stop condition.
Any master which attempted to communicate in the meantime would have
detected a back-off situation and would be waiting for a STOP to appear
Any node finds that the contents are relevant to it, this node processes the
message otherwise it ignores the message.
CAN uses the identifier also to determine the priority of the message
in terms of competition for bus access. The lower the numerical value of
the identifier, the higher the priority of the message. CAN standards use
the identifier to avoid collision on the bus. This technique is used by CAN
to achieve that is called “non-destructive arbitration”. It guarantees that in
situations where two or more nodes attempt to transmit at the same time, the
messages are sent in order of priority and that no messages are lost.
Controller area network was first developed by Robert Bosch in 1986. It
is documented in ISO 11898 (for applications up to 1 Mbps) and ISO 11519
(for applications up to 125 Kbps).
Processor
Data Link
CAN Controller
Layer
Physical
CAN Transceiver
Layer
The physical layer is responsible for defining how the signals are actually
transmitted. Some of the tasks of the physical layer are:
– Transmission Medium: The physical layer specifies the physical
and electrical characteristics of the bus.
– Signal Level and Bit Representation: This includes the hardware
required to convert the characters of a message into electrical signals
for transmitted messages and electrical signals into characters for
received messages.
The Physical Layer is all the time a “real” hardware. The Physical Layer will
be discussed next.
with the transceivers. This is a physical limitation and not set by CAN protocol.
The length of the bus is limited by two factors:
• The propagation delay time: is the time period necessary for a signal to
go from one end of the bus to the other back again before the next signal
is transmitted.
• Time needed by the electronic circuitry to transmit and receive these
signals.
Increasing the length of the bus increases the sum of the propagation
delay and the time needed by the transmitting and receiving devices. This, in
turn, increases the nominal bit time and, accordingly, decreases the possible
transmission bit rate. As a result, every CAN system must trade bus length
for bit speed. For example, to have 1Mbit/sec speed, the maximum possible
bus length is specified as 25 meters, for longer bus lengths it is necessary to
reduce the bit rate. Table 9.3 gives some indication of the bit rates and the
corresponding maximum bus length.
The bit rate can always be slower than the maximum possible speed for a
given bus length. Conversely, the bus length can be shorter than the maximum
possible bus length for a given transmission speed. The CAN system designer
needs to keep in mind that transmissions tend to become more reliable with a
slower bit speed and shorter bus lengths.
a. Bus Arbitration
One of the subfunctions of the Data Link layer is called Medium Access
Control (MAC). The main function of MAC is to prevent conflicts on the
network. If two or more transmitters seek to send messages across the network
at the same time, MAC will make sure that each one of them will be given
an opportunity to transmit one at a time so that the messages do not interfere
with each other. As in any communication network, the use of medium access
control has a significant impact on the performance of the network.
Access control methods have two varieties, Determined and Random:
• Determined access control: In case of determined access control, the
right to access the bus is defined before a node tries to access the bus,
guaranteeing that no conflicts will occur. Determined access control re-
quires either a central entity to manage network access or a decentralized
agreement between nodes (such as token passing). Centrally controlled
access methods are more vulnerable to system failures. One of the reasons
is that if the central entity fails then the entire network fails. Decentralized
determined access methods are more complex than centralized ones. For
decentralized methods, it also becomes difficult to dynamically assign
priority to nodes.
• Random access control: In case of random access control any node can
access the bus as soon as it is idle. Most random access control meth-
ods are based on a mechanism called “Carrier-Sense Multiple Access”
(CSMA). In CSMA all nodes monitor the network and wait for it to
become idle. Once the network is idle all of the nodes that have a message
to transmit will attempt to access the network at the same time. Of course
only one node is able to transmit at a time, so a method must be found to
sort out which node has priority. Two mechanisms can be used; Collision
Avoidance method and Collision Detect method:
– Collision Avoidance method: In this case the CSMA is set up to
limit or prevent collisions between messages.
– Collision Detection method: In this case the network is set up to
allow for message conflicts, but then intervenes to detect and clean
up these conflicts. With the Collision Detection method each node
will check to see if the bus is clear before transmitting. If two or
more nodes transmit simultaneously, the nodes that are transmitting
will detect the conflict, stop transmitting, and then try to re-transmit
at a randomly determined time in the future. Because none of the
two nodes has priority, so whichever node retries first will gain
510 Multiprocessor Communications
advantage. If the nodes clash again or clash with a third node, there
will be further delay.
The primary problem with the Collision Detection method occurs
when there is a lot of contention on the network. Frames are
constantly aborted and retransmitted which wastes bandwidth and
creates long delays. Consider as an example a desktop network uses
Ethernet as technology that applies the collision detection methods
just mentioned; a well-managed Ethernet network is operated well
below full capacity, keeping such clashes to a minimum, but still
leaving us with a nondeterministic component in our communica-
tions. Since the original clashing messages were both destroyed,
this situation is sometimes referred to as destructive arbitration.
Figure 9.35 Bit Wise Arbitration: Nodes 1 & 2 start arbitration at point 1. Node 1 yields to
Node 2 at point 2, and stops transmitting. Only Node 2 can continue to transmit over the bus.
is a collision is for the node to see something on the bus that is different from
what it transmitted. So the successful node and any other listening nodes never
see any evidence of a collision on the bus.
The highest priority message always gets through, but at the expense of
the lower-priority messages. Thus, CAN’s real-time properties are analogous
to the properties of a preemptive real-time kernel on a single processor. In both
cases, the goal is to ensure that the highest-priority work gets completed as
soon as possible. It is still possible to miss a hard real-time deadline, but there
should never be a case where a high priority job misses its deadline because
it was waiting for a lower-priority task to complete.
Once the more dominant message has been transmitted and the bus be-
comes idle, the less dominant nodes are able to try again. The result is that no
bandwidth is wasted. Note that every node in the network will look at every
message transmitted on the bus. Most of the time any given node will ignore
most of the messages it sees.
The arbitration field used to define the priority of the message can be 11 or
29 bits long, depending which variation of the protocol is used. It is possible
to use the first few bits for priority and the remaining bits to identify the
message type. The CAN standard does not dictate what meaning you attach
to those bits, but the many higher-level protocols that sit on top of CAN do
define them. For example, the J1939 standard allows one portion of the bits
to be a destination address, since the CAN protocol itself specifies a source
512 Multiprocessor Communications
address for all packets, but does not mandate a destination address. This is
quite reasonable since much of the traffic on an automotive bus consists of
broadcasts of measured information, which is not destined for one specific
node.
Benefits of using Non-Destructive bitwise arbitration:
The use of Non-destructive bitwise arbitration achieves some benefits for
the network:
• It provides bus allocation on the basis of need, and delivers efficiency
benefits that cannot be gained from either fixed time schedule allocation
(e.g. Token ring) or destructive bus allocation (e.g. Ethernet.)
• With only the maximum capacity of the bus as a speed limiting factor,
CAN will not collapse or lock up. Outstanding transmission requests
are dealt with in their order of priority, with minimum delay, and with
maximum possible utilization of the available capacity of the bus.
A. Data Frames
CAN systems use Data Frames to transmit data over the network. A Data
Frame contains an identifier and various pieces of control information, and can
hold up to eight bytes of data. CAN systems provide two versions of the Data
Frame, the Base Format and the Extended Format. The Extended Data Format
Frame has been introduced in the early 1990’s as part of CAN Specification
2.0B to avoid the shortage in the Base format due to the length of the identifier.
By the beginning of 1990’s the number of the different messages created by
transmitters on the network was more than the number of the unique identifier
9.6 Controller Area Network (CAN) 513
codes that can be created using the 11 bits assigned to the identifier field of the
base format. The Extended Data Format adds more bits to the identifier field.
The identifier of the extended format has 29 bits that allows the CAN system
to create as many as 512 million different unique messages and priorities.
Base Format Data Frame (2.0A Format)
The Base Format Data Frame (2.0A Format) is shown in Figure 9.36.
Extended Format Data Frame (2.0B Format)
The Extended Format Data Frame is nearly identical to the Base Format
Data Frame. The only differences between these two formats are found in
the Identifier Extension (IDE) bit in the Control Field, and the size and
arrangement of the Arbitration Field. Extended format data frame (2.0B)
controllers are completely compatible with base format (2.0A) controllers and
can transmit and receive messages in either format. This means that both the
Base Format and the Extended Format can coexist in the same CAN system.
The rule is that Base Format frames always have priority over Extended Format
frames, but the Extended Format is designed to be backwards compatible with
the Base Format.
B. Remote Frame
Generally data transmission is performed on an autonomous basis with
the data source node (e.g. a sensor) sending out a Data Frame. It is also
possible for a destination to request the data from the source by sending a
Remote Frame. In other words Remote Frames are used for receivers to request
information from another node. They are often sent out on a regular schedule,
514 Multiprocessor Communications
to draw updates from sensors. The format for a Remote Frame is identical to
that of a Data Frame (Figure 9.37). Both frame types also feature base and
extended formats, and both have a single Remote Transmission Request bit at
the end of the Arbitration Field. With a Remote Frame, this bit is transmitted
recessively identify it as a Remote Frame. For Data Frames the RTR bit is
always transmitted dominantly, i.e.
The RTR bit is considered part of the bitwise arbitration so a Data Frame
will always dominate over a Remote Frame with the same identifier. This is
reasonable since a request for data should not take precedence over the data
being requested.
It can happen in very rare events that a Data Frame and Remote Frame will
be transmitted at the same time and with the same identifier. In such unlikely
events, the Data Frame wins arbitration. This is due to the RTR bit; RTR is
dominant bit in case of Data Frame. In this way, the node that transmitted the
Remote Frame receives the desired data immediately.
C. Error Frame
Receivers send out Error Frames whenever they detect that a frame contains
an error. Sending an error frame by one node forces all other nodes in the
network to send an error frame as well; the original transmitter then resend the
message. The Error Frame can be sent during any point in a transmission and is
always sent before a Data Frame or Remote Frame has completed successfully.
The transmitter constantly monitors the bus while it is transmitting. When the
transmitter detects the Error Frame, it aborts the current frame and prepares
to resend once the bus becomes idle again.
9.6 Controller Area Network (CAN) 515
• Stuff Rule Check (Bit Stuffing). CAN uses a technique known as bit
stuffing as a check on communication integrity. After five consecutive
identical bit levels have been transmitted, the transmitter will automat-
ically inject (stuff) a bit of the opposite polarity into the bit stream.
Receivers of the message will automatically delete (de-stuff) such bits
before processing the message in any way. Because of the bit stuffing
rule, if any receiving node detects more than five consecutive bits of the
same level, a stuff error is flagged and an Error Frame is sent.
• Bit Check (Bit Monitoring). Any transmitter automatically monitors
and compares the actual bit level on the bus with the level that it trans-
mitted. If the two are not the same, i.e. if it sends out a dominant bit when
it was supposed to transmit a recessive bit, or a recessive bit when the
message called for a dominant bit, a bit error is flagged.
9.7 Serial Communication Using SPI 517
Figure 9.39 Self-clocking synchronous serial transfer. Frames Sent without start or stop bits.
There are two common kinds of synchronous serial data transfer, one
related to data communications and the other to serial peripheral ICs. In data
communications, synchronous serial transfers are self clocking, and have no
shared clock signal. In self-clocking synchronous transmission, one frame
is sent immediately after the other, Figure 9.39. Frames do not have start
and stop bits for synchronization. Frame synchronization is usually derived
from bit or character synchronization. Whenever another data character is not
immediately ready for transmission, the transmitter repeatedly sends a special
SYNC character until it can transmit the next data character. To achieve bit
synchronization, a two-step process is typically used:
• Encoding the clock (transmitter clock) in the data.
• Re-drive the clock from the data (at the receiving end).
By this way the transmitter and receiver may have independent clocks, but
the receiver can synchronize its clock from the re-derived clock using the
transitions in the received signals. Since start and stop bits are not required,
this approach has low overhead and is typically used for transferring large
blocks of data at high speeds. Since this form of synchronous serial transfer
is not prevalent in embedded systems, it is not considered further in this text.
The second kind of synchronous serial transfer is commonly used with
serial peripheral ICs. A separate line is used to carry a common clock signal
used by the transmitter and receiver for synchronization (Figure 9.40). This
form of synchronous serial transfer is important in embedded systems. In some
versions of this scheme, the clock signal is generated by the transmitter, and
in others it is generated by the receiver.
This form of synchronous transmission is used with printers and fixed disk
devices in that the data is sent on one set of wires while a clock or strobe is
sent on a different wire. Printers and fixed disk devices are not normally serial
devices because most fixed disk interface standards send an entire word of
data for each clock or strobe signal by using a separate wire for each bit of the
word. In the PC industry, these are known as Parallel devices.
Compared with asynchronous, synchronous communication is usually
more efficient because only data bits are transmitted between sender and
receiver, and synchronous communication can be more costly if extra wiring
and circuits are required to share a clock signal between the sender and receiver.
9.7 Serial Communication Using SPI 519
DI/O DI/O
SCLK SCLK
GND GND
(a)
DO DO
DI DI
SCLK SCLK
GND GND
(b)
Figure 9.40 Synchronous serial transfer with a common clock: (a) 2-wire scheme; (b) 3-wire
scheme.
520 Multiprocessor Communications
CS (SS)
SPI Slave
SCLK
clock signal for the data transfer. It also determines the state of the chip select
lines, i.e. it activates the SLAVE it wants to communicate with. In this way
the master has control of the speed of data transfer and is, therefore, in control
of the data transfer.
The SPI bus specifies four logic signals: two control and two data lines. The
names of the four signals on the pins of SPI port depends on the manufacturer.
Figure 9.41 illustrates the names as given by Motorola. The SPI device can
be a simple shift register as shown in Figure 9.43 and it can extend to be an
independent subsystem, but the basic principle of a shift register must always
be present. The most common names of the four signals are (Note: The first
names are used by Motorola):
• SCLK — Serial Clock (output from master).
• SS — Slave Select (active low; output from master). This signal is also
called: Chip select (CS), Slave Transmit Enable (STE).
• MOSI/SIMO — Master Output- Slave Input (output from master). This
signal is sometimes called: Serial Data In (SDI), Data In (DI) and Serial
In (SI)
• MISO/SOMI — Master Input- Slave Output (output from slave). Also
called: Serial Data Out (SDO), Data Out (DO) and Serial Out (SO)
The SDO/SDI convention requires that SDO on the master be connected
to SDI on the slave, and vice-versa.
The MISO (the slave data output SDO) serves the reading back of data. It
also offers the possibility to cascade several devices. The data output of the
preceding device then forms the data input for the next IC.
SPI communication, as mentioned before, involves a master and a slave.
Multiple slave devices are allowed with individual slave select (chip select)
lines. Figure 9.42 shows the four signals in a single-slave configuration. The
chip select pin is mostly active-low. If multiple slave devices exist, the master
generates a separate slave select signal for each slave. In the unselected state
the MISO lines are in high-impedance (hi-Z) state and therefore inactive.
9.7 Serial Communication Using SPI 521
SCLK
MOSI
This prevents the unselected device to interfere with the currently activated
devices. This arrangement also permits several devices to talk to a single input.
The master decides with which peripheral device it wants to communicate.
The clock line SCLK is brought to the device whether it is selected or not.
The clock serves as synchronization of the data communication.
More Possible Signals: Interrupt Signal
SPI devices sometimes use another signal line to send an interrupt signal to
a host CPU. Examples include pen-down interrupts from touchscreen sensors,
thermal limit alerts from temperature sensors, alarms issued by real time clock
chips, SDIO, and headset jack insertions from the sound codec in a cell phone.
Interrupts are not covered by the SPI standard; their usage is neither forbidden
nor specified by the standard.
Master Slave
MOSI MOSI
SPI Shift
SCLK SCLK
Clock Generator Enable
/SS
/Vcc
Master Starts
Communication
Set SS line
2. Once the selected SS is low, one edge (rising or falling) of the SCLK
signals the devices (Master and Slave) to toggle the MOSI and MISO to
the correct bit of data being transmitted.
3. The other edge of the SCLK line (rising or falling) signals the devices to
register the bits on the MOSI and MISO, effectively reading the bit into
the device.
Steps 2 and 3 mean that the master and the slave are communicating in
a full duplex data transmission mode:
• the master sends a bit on the MOSI line; the slave reads it from that
same line
9.7 Serial Communication Using SPI 523
• the slave sends a bit on the MISO line; the master reads it from that
same line
4. The transmission continues in this fashion until the devices have
exchanged the specified number of bits (usually 8, 16, or 32).
This means that the master supplies the slave with eight bits (16 or 32 bits)
of data, which are shifted out of the master-out-slave-in (MOSI) pin. The same
eight (16 or 32) bits are shifted into the slave unit, one bit per clock pulse,
on its MOSI line. The slave simultaneously uses its own master-in-slave-out
connected to the MISO pin of the master to shift eight bits into the master.
In other words, at the end of the eight shift cycles, eight bits be shifted into
the slave from the master and eight bits shifted into the master from the slave.
SPI communication, then, is essentially a circle in which eight bits flow from
the master to the slave and a different set of eight bits flows from the slave to
the master. In this way the master and a slave can exchange data in a single
communication.
5. After the transmission is over the Master pulls the SS line for the slave
back high and either goes to another slave on the network or reinitiates
the transmission with the same slave by pulling the corresponding SS line
back to low.
Data is usually shifted out with the most significant bit first, while shifting
a new least significant bit into the same register. After that register has been
shifted out, the master and slave have exchanged register values. Then each
device takes that value and does something with it, such as writing it to
memory. If there is more data to exchange, the shift registers are loaded with
new data and the process repeats.
While the above discussion means that the amount of data sent and the
amount of data received must be equal, it is possible for the side that does
not have any data to send to provide dummy data. In fact, in the majority
of applications of SPI the data only goes in one direction and the opposite
direction always passes a dummy value. In many SPI applications, the slave is
the source of the data. As mentioned before, the frequency and the instances
at which the slave has to transmit is controlled by the master. Accordingly, in
such applications, the slave must always have a valid byte ready to send; the
slave does not have any control over when the master is sending the next clock
pulse requesting more data. If the slave device is dedicated to a single job, this
may not be difficult. Consider, for example, a thermistor that communicates
as an SPI slave. It could provide a single buffer of one byte that is always
524 Multiprocessor Communications
contain the last temperature reading. Whenever clock pulses appear, that byte
is transmitted and the master gets a reading.
Keeping in mind that SPI does not specify the length of the message or its
contents, failing in updating the buffer can cause a problem; if the transmitted
message ended by a checksum, repeating one character during transmission
will let the checksum fail. The master and slave are no longer synchronized,
and some recovery must take place.
It is possible to give the slave a longer time to update the contents of the
buffer by letting the master to pause after each byte is transmitted. This will
help to solve partially the problem of the slave since it still has a deadline to
update the buffer.
One of the possible solutions of this timing issue is by avoiding transmitting
and receiving at the same time. It is possible to provide an extra signal that the
slave asserts when it wants to transmit. When the master sees this signal, it
knows that the slave has a byte ready, and the master then provides the clock
to fetch that byte. When the master has something to send, it checks that the
slave is not sending before clocking out its own byte; anything simultaneously
received from the slave is ignored. Sending the messages in this way, in turn,
means that a large fraction of the potential bandwidth is lost. In exchange, one
gets a more reliable and flexible software
Every slave on the bus that hasn’t been activated using its slave select line
must disregard the input clock and MOSI signals, and must not drive MISO.
The master must select only one slave at a time.
Some devices use this technique of SPI implementation (clock data out
as data is clocked in) to implement an efficient, high-speed full-duplex data
stream for applications such as digital audio, digital signal processing, or
full-duplex telecommunications channels.
On many devices, the “clocked-out" data is the data last used to pro-
gramme the device. Read-back is a helpful built-in-self-test, often used for
high-reliability systems such as avionics or medical systems. At this point it is
important to mention that SPI does not have an acknowledgement mechanism
to confirm receipt of data. In fact, without a communication protocol, the SPI
master has no knowledge of whether a slave even exists. SPI also offers no
flow control. If the application needs hardware flow control, the user might
need to do something outside of SPI.
be hooked together. For such a network, two protocol variants are possible.
The first protocol deals with the case when the SPI network consists of only
one master and several slaves. The slaves can be microcontrollers. The second
protocol any microcontroller (or processor) connected to the network can take
the role of master. The selection of the slave can take place either by hardware
or software. When using the hardware variant, the master uses the chip select
to select the targeted slave. In case of software variant, the software assigns
an ID for each slave. The selection of the slave takes place by attaching its
ID into the frames. Only the selected slave drives its output, all other slaves
are in high-impedance state. The output remains active as long as the slave is
selected by its address.
First variant: Single-master protocol
The single-master protocol resembles the normal master-slave commu-
nication. The system can contain any number of microcontrollers, but only
one of them will be configured as master. The microcontroller configured as
a slave behaves like a normal peripheral device.
There are two types of connection of single master and multiple slave
devices: slave devices are connected in cascade and slaves are working
independently.
a. Cascading Several SPI devices: Daisy Chain SPI Configuration
Figure 9.45 shows the type of connection for cascading several devices.
The slaves are connected in a daisy chain configuration and all of them are
connected to the same SS line coming from the master. The first slave output
/SS
Master Slave
CS0
CS1 MOSI
CS2
SCKL /SS
MOSI
MISO
Slave
MOSI
/SS
MOSI
Slave
is connected to the second slave input, etc., thus forming a wider shift register.
In other words, the whole chain acts as an SPI communication shift register;
daisy chaining is often done with shift registers to provide a bank of inputs or
outputs through SPI.
b. Multiple Independent slave SPI Configuration
If independent slaves are to be connected to a master, the master generates a
separate slave select signal for each slave. The connection in this case is shown
in Figure 9.46. The clock and the MOSI data lines are brought to each slave
from the master. The MISO data lines of the different slaves are tied together
and go back to the master. Only the chip selects are separately brought to each
SPI device. The master uses the pins of its general purpose input/output ports
to generate the slave select signals. Accordingly, the number of slaves that can
be connected to one master is limited by the available chip select pins.
Second variant: Multiple Masters
The bus in this variant allows any device connected on the network to be
master; simply by deciding to transmit data. The protocol in this case is called
SCLK SCLK
MOSI MOSI
MISO Slave #1
MISO
MASTER /SS1
/SS2
/SS3 SCLK
MOSI
Slave #2
MISO
/SS
SCLK
MOSI
Slave #3
MISO
/SS
Table 9.4
SPI Mode CPOL CPHA
0 0 0
1 0 1
2 1 0
3 1 1
Register (SPCR) and Status Register (SPSR). Many microcontrollers are using
the same registers with the same names.
Bit 7 6 5 4 3 2 1 0
MBS LSB SPDR
Read/Write R/W R/W R/W R/W R/W R/W R/W R/W
Initial value X X X X X X X X Undefined
Figure 9.48 SPI Data Register (SPDR).
Bit Description
SPIE SPI Interrupt Mask bit
SPE SPI Enable bit.
DORD Dat a Order bit
Cleared causes the MSB of the data order to be transmitted
first
MSTR Master/Slave Select bit
CPOL Clock Polarity bit
CPHA Clock Phase bit
SPR1 SPI Clock Rate bits
SPR0
Bit 7 6 5 4 3 2 1 0
$0E($2E) SPIF WCOL
Read/Write R R R R R R R R
Initial value 0 0 0 0 0 0 0 0
(a)
Bit 7 6 5 4 3 2 1 0
SPIF WCOL SPI2X SPSR
Read/Write R R R R R R R R/W
Initial value 0 0 0 0 0 0 0 0
(b)
Figure 9.50 SPI status register SPSR (a) AT90S series (b) AVR mega series.
all the extra inputs and continuing to shift the same output bit. It is common
for different devices to use SPI communications with different lengths, as, for
example, when SPI is used to access the scan chain of a digital IC by issuing a
command word of one size (32 bits) and then getting a response of a different
size (153 bits, one for each pin in that scan chain).
• When communicating with more than one slave, I2C has the advantage
of in-band addressing as opposed to have a chip select line for each slave.
• I2C also supports slave acknowledgment which means that the sender
can be absolutely sure that it is actually communicating with something.
With SPI, a master can be sending data to nothing at all and have no way
to know that.
• In general SPI is better suited for applications that deal with longer data
streams and not just words like address locations. Mostly longer data
streams exist in applications where you’re working with a digital signal
processor or analogue-to digital converter. For example, SPI would be
perfect for playing back some audio stored in an EEPROM and played
through a digital to analogue converter DAC.
• SPI can support significantly higher data rates comparing to I2C, mostly
due to its duplex capability, accordingly it is far more suited for higher
speed applications reaching tens of megahertz.
• Since there is no device addressing involved in SPI the protocol is a lot
harder to use in multiple slave systems. This means when dealing with
more than one node, generally I2C is the way to go.
• SPI has higher clocking speed than I2C is better when used in high speed
applications
#asm
in r30, spsr
in r30, spdr
#endasm
Example 9.7 Using the UART for receiving messages from the PC:
This example explains how to use the UART of another Atmel microcon-
troller, the AT90S2313, to receive and transmit serial data via RS232 or COM
ports of PC’s. In this LED example only use the data reception feature. To
receive data from a PC a level converter, MAX232, is used. A level converter
is needed because the COMmunication port of a PC switches the data between
approx. −9.23 to 9.23 Volt, −9.23 Volt corresponds with a logical ‘0’ (lo),
9.23 Volt corresponds with a logical ‘1’ (hi), both on TTL level of 5 Volt, so
can be connected directly to the i/o’s of an AVR. The MAX232 can convert at
a maximum speed of 120 kbit/sec. Figure 9.53 shows a simple diagram of an
RS232 converter (receive data only):
Not much components needed, only five electrolytic capacitors and a
MAX232. For the PC cable use shielded data cable. The UART (software)
setup for the AVR is also very simple. First set the baud rate and enable the
UART, like this:
Figure 9.53 RS232 level converter MAX232 for receiving message-data for the moving
message sign.
Now the UART has been setup, now you can use the data in the software. The
number 25 must be calculated, with the following formula (explained in detail
in the AT90S2313 datasheet):
Another way of calculating this number is, let the assembler do it for you.
Simply put the numbers in the code, as follows:
Now the assembler does the calculation. Much easier to change the baud
rate. You can even use variables ’BAUD’ and ’XTAL’, like this:
This way you can change the parameters of the programme from a list
with variables. The advantage is when you have a huge programme, you don’t
have to fill in each parameter by hand, change just one variable and all others
change with that variable. Figure 9.54 shows a low-cost UART interface for
RS232 with low-cost standard components.
Figure 9.54 RS232 level converter with standard low-cost components for receiving message-
data for the moving message sign.
9.9 Review Questions 537
9.5 In an EIA-232 exchange, what is the purpose of the start and stop bits
in a data character?
9.6 What are the two primary signalling lines on the 12C bus? What is the
function of each?
9.7 Describe the steps involved in a bus master read operation on the 12C
bus.
9.8 Describe the steps involved in a bus master write operation on the 12C
bus.
9.9 With multiple masters, how is a bus contention situation resolved?
9.10 What are some of the more significant differences between the CAN
bus and the 12C, USB, and EIA-232 signalling?
9.11 What are the two primary signalling lines on the CAN bus? What is the
function of each?
9.12 The CAN bus signalling protocol specifies two states, dominant and
recessive. What do these terms mean in this context?
9.13 Describe the format for message that is sent over the CAN bus?
9.14 What are the bit times for data transferred at the following rates:
a. 4800 baud
b. 14.4 kbaud
c. 38.4 kbaud
d. 115 kbaud
9.15 Draw the 10-bit frame for the asynchronous transfer of the following
ASCII characters with even parity.
a. S
b. H
c. E
d. CR (carriage return)
9.16 Compare and contrast the transport mechanisms used for the four
bus architectures discussed in the chapter. What are the strengths and
weaknesses of each?
9.17 What kinds of devices can be placed on the CAN bus? Give some
examples.
9.18 What kinds of message types can be exchanged over the CAN bus?
Explain the purpose and meaning of each of the different types.
9.19 How are errors managed on the CAN bus? Compare the CAN strategy
with that of the other three busses that we have studied in this chapter.
9.20 Typically, a Universal Asynchronous Receiver Transmitter (UART) is
used to manage data flow over an EIA-232 network. Without using a
UART, design a logic block that will accept 7 data bit characters, in
References 539
parallel, from your microcontroller, add an odd parity bit over the 7
bits, convert each to a 10-bit serial EIA-232 compatible character, and
transmit the character over a 9600 baud bit stream.
9.21 A new generation automobile has about 100 embedded systems. How
do the bus arbitration bits, control bits for address and data length, data
bits, CRC check bits, acknowledgement bits and ending bits in the CAN
bus help the networking devices distributed in an automobile embedded
system?
9.22 Search the Internet and design a table that gives the features of the
following latest generation serial buses. (i) IEEE 802.3-2000 [1 Gbps
bandwidth Gigabit Ethernet MAC (Media Access Control)] for 125
MHz performance, (ii) lEE P802.3oe draft 4.1 [10 Gbps Ethernet MAC]
for 156.25 MHz dual direction performance, (iii) lEE P802.3oe draft
4.1 [12.5 Gbps Ethernet MACI for four channel 3.125 Gbps per channel
transreceiver performance, (iv) XAUI (10 Gigabit Attachment Unit), (v)
XSBI (10 Gigabit Serial Bus Interchange), (vi) SONET OC-48, OC-192
and OC-768 and (vii) ATM OC-12/461192.
9.23 Refer to the material in this chapter and use a Web search. Design a
table that compares the maximum operational speeds and bus lengths
and give two example of uses of each of the followings serial devices:
(i) UART, (ii) 1-wire CAN, (iii) Industrial 12C, (iv) SM 12C Bus, (v)
SPI of 68 Series Motorola Microcontrollers, (vi) Fault tolerant CAN,
(vii) Standard Serial Port, (viii) MicroWire, (ix) 12C, (x) High Speed
CAN, (xi) IEEE 1284, (xii) High Speed I2C, (xiii) USB 1.1 Low Speed
Channel and High Speed Channel, (xiv) SCSI parallel, (xv) Fast SCSI,
(xvi) Ultra SCSI-3, (xvii) FireWire/IEEE 1394 and (xviii) High Speed
USB 2.0.
9.24 Refer to the material in this chapter and use a Web search. Design a
table that compares the maximum operational speeds and bus lengths
and give two example of uses of each of the following parallel devices:
(i) ISA, (ii) EISA, (iii) PCI, (iv) PCI-X, (v) COMPACT PCI, (vi) GMII,
(Gigabit Ethernet MAC Interchange Interface), (vii) XGMI (10 Gigabit
Ethernet MAC Interchange).
References
[1] Scott I. Mackenzi, The 8051 Microcontroller, Prentice Hall, 1999.
[2] Claus Kuhnel, AVR RISC Microcontroller Handbook, Newnes, Boston, 1998.
[3] Arnold S. Berger, Embedded Systems Design- An Introduction to Processes, Tools and
Techniques, CMP Books, Nov. 2001.
540
Access Algorithms
Random, 26, 253, 255 Analysing, 25, 26
Associative, 256 comparing, 27
Burst mode, 256 Complexity of, 26, 32
Sequential or serial, 253, 256 Linear search, 29
Direct, 253, 257 Sorting, 32
time, 254 Successive approximation, 313,
Addressing mode 314
AVR indirect, 194 Performance of, 27
AVR instructions with, 187, 190 Amdahl’s law, 19, 20
AVR relative, 201 Use of, 20
Base register, 199 Analogue
Data direct, 194 Comparator, 147, 249, 436, 440
Data, 400, 435
Data indirect, 195, 196
Data conversion, 402
Data indirect with displacement,
Filter, 403, 434
195, 200
I/O subsystem, 399, 400, 401,
Data indirect with
402
pre-decrement, 196
Multiplexer, 405, 407
Data indirect with
Output, 141, 400, 401, 407
post-increment, 196
Analogue-to-Digital Converter
Direct programme, 207
(ADC)
Direct or absolute, 188, 190
Counting (ramp), 429
Displacement, 199 Errors in, 425
Immediate, 185, 186, 187 differential nonlinearity, 425
Indirect, 192, 193, 194, 194 direct conversion techniques,
Indirect program, 194, 207 426, 427, 429
Memory direct, 188, 190 Flash converters, 427, 428
Memory indirect, 193 gain error, 425
Program memory, 201 indirect conversion, 432
Register direct, 188, 189 linearity error, 425
Register indirect, 193, 194 offset error, 405, 425
Relative, 201, 207 peripheral, 436
Relative program, 207 precision, 410, 414, 419, 433
Stack, 203 range, 410, 419
Summary of, 213 resolution, 410, 413, 419
541
542 Index
Repairability 46 Stack
Repeatability, 11 Hardware, 122
pointer, 118, 120, 130
Sample and Hold, 403, 404 Software, 122, 123
Serial Stack, 121, 122
clock line (SCL), 485 Use of, 124
Communication, 457 System
Communication using SPI, 517 effectiveness metrics, 45
data line (SDA), 485 integration, 69
peripheral interface, 147, 422, validation, 70
519 hierarchy, 83
Serial peripheral interface (SPI) effectiveness metrics, 45
applications, 530
AVR, 429 Technology
Cascading several, 525 ASIC, 97, 98
Connecting devices on, 524 Design, 55, 56, 59, 71, 72,
Control register, 528 IC-technology, 55, 57, 62, 71, 72,
Daisy chain, 525 95, 110
data register, 528 PLD, 98, 99, 102
Differences between, 531 Processor, 55, 56, 60, 61, 71, 72,
Interface, 527 87, 103, 110
Interrupt flag, 529 Throughput, 9
Modes, 529 Time
Multiple independent slave, 526 Computational, 26
Signals used in, 519 CPU, 17, 18
status register, 528 Design, 7
Strengths of, 530 Execution, 5, 10, 13, 18, 20, 24
Weaknesses of, 531 Manufacturing, 7
Serviceability 6, 8, 46, 47 Memory access, 253
Signal conditioning, 402, 403, 407 Memory cycle, 186
Simulation Response, 5, 9
Functional, 68 Testing, 7
Detailed, 68 -to-market, 6, 7, 32, 42,
SPECS 16 -to-prototype, 6, 7
Speed-up ratio 18, 20, 24 Timer
Software applications, 368
Functional, 65, 68, 69 AVR, 336
generation of short time delays, Microcontroller. 336
242 Hardware, 333
generation of long time delays, Period, 328
245 Programmable Interval (PIT),
kernel, 143 331
life cycle, 57 Range, 329
prototyping and testing, 65 Resolution, 329
overview, 65 Software, 332
548 Index
Volume 1
An Introduction to Digital Signal Processing
A Focus on Implementation
Stanley Henry Mneney
ISBN: 978-87-92329-12-7