Embedded System Architecture by Ralf Niemann
Embedded System Architecture by Ralf Niemann
Embedded System Architecture by Ralf Niemann
Embedded Tutorial
Hardware/Software Codesign
of Embedded Systems
Lecture Contents
=
2007-02-06
Introduction
=
Codesign of embedded
systems
SW Specification
HW Specification
Programming
HW Design
SW Simulation
HW Simulation
SW Implementation
HW Implementation
2007-02-06
Design Time
Traditional Design:
HW/SW Codesign:
Specification
& Partitioning
Specification
& Partitioning
HW Design
&
Simulation
Co-sim. HW Design
SW Design
&
&
&
Co-verif. Simulation
Simulation
SW Design
&
Simulation
Integration
&
Test
Integration
&
Test
Reduced TTM
time
Prof. Z. Peng, ESLAB/ LiTH
time
5
HW/SW Codesign
=
2007-02-06
Why Codesign?
=
=
Reduce time-to-market.
Achieve better designs:
0 More design alternatives can be explored.
0 Better solutions can be found by advanced optimization
techniques.
Vertical Codesign
=
Instruction set processor design, for both generalpurpose systems and ASIPs (Application Specific
Instruction Processors).
Specification
Software
Instruction set
Hardware
Hardware
2007-02-06
Codesign of Processors
=
General-Purpose Processors
0 Architectural support for operating systems.
0 Cache design and tuning (e.g., selection of cache
size and control schemes).
0 Pipeline control design (control mechanisms,
compiler design).
ASIPs
0 Customization of instruction sets and specific
resources (e.g., accelerator and coprocessor).
0 Design of register files, busses and
interconnections.
0 Development of specific compiler.
Horizontal Codesign
=
Codesign of
Specialized processor
Programmable
ASICs
Processor
Hardware
Prof. Z. Peng, ESLAB/ LiTH
10
2007-02-06
11
12
2007-02-06
Embedded Systems
General purpose systems
Embedded systems
Microprocessor
market shares
in 1999
99%
1%
13
Embedded Controllers
Sensors
Environment
CPU
HW Unit
Application-special logic
Timers
A/D and D/A conversion
Actuators
Memory
Reactive systems.
0
0
14
2007-02-06
Sensors
I/O Interface
RAM
CPU
ROM
ASIC
Network Interface
ECU
ECU
ECU
ECU
ECU
ECU
Gateway
Gateway
15
Time constraints:
0 They have to perform in real-time: if data are not ready by
a certain deadline, the system fails to perform correctly.
0 Hard deadline failure to meet leads to major hazards.
0 Soft deadline failure to meet can be tolerated but quality
of service is reduced.
Power constraints:
0 There are several reasons why low power/energy
consumption is required.
0 Battery life:
High energy consumption short battery life time.
0 Cost aspects:
High power consumption strong power supply, and
expensive cooling system.
Prof. Z. Peng, ESLAB/ LiTH
16
2007-02-06
=
=
17
18
2007-02-06
=
=
19
3.
4.
5.
6.
20
10
2007-02-06
The Consequences
=
21
System-Level Design
Informal Specification,
Constraints
Modeling
Functional
Simulation
Arch. Selection
System Model
Formal
Verification
System
Architecture
Mapping
Estimation
Scheduling
Not OK
Not OK
Mapped and
Scheduled Model
OK
Software Model
Simulation
Structural
Simulation
Formal
Verification
Hardware Model
Lower-Level Design
Prof. Z. Peng, ESLAB/ LiTH
22
11
2007-02-06
23
Additional Improvements
=
Formal verification
0 It is impossible to do an exhaustive simulation.
0 Especially for safety critical systems, formal verification is
needed.
Simulation
0 Used not only for functional validation.
0 Should also be used after mapping and scheduling in order
to check, for example, timing properties.
0 May be used also during the implementation steps:
hardware/software co-simulation.
Hardware/software trade-offs
0 Hardware/Software partitioning to decide what is to be
mapped on a programmable processor (SW) and what is
going into HW.
0 Hardware/software co-synthesis to coordinate the HW
and SW synthesis processes and allow moving of
functionality from one to the other.
24
12
2007-02-06
Software generation:
0 Encoding in an implementation language (C, C++,
assembler).
0 Compiling (this can include particular optimizations for
application specific processors, DSPs, etc.).
0 Generation of a real-time kernel or adapting to an existing
operating system.
Hardware synthesis:
0 Encoding in a HDL (VHDL and Verilog).
0 Successive synthesis steps: high-level, register-transfer
level, logic -level synthesis.
Hardware/software integration:
0 The software is run together with the hardware model
(co-simulation).
Prototyping:
0 A prototype of the hardware is constructed and the
software is executed on the target architecture.
25
Lower-Level Design
There are established CAD tools on the market which
automatically perform many of the low level tasks:
=
=
=
=
=
26
13
2007-02-06
27
Concluding Remarks
=
28
14
2007-02-06
Analysis, Co-Simulation
and Design Space Exploration
Zebo Peng
Embedded Systems Laboratory (ESLAB)
Linkping University
Outline
Co-simulation approaches
2007-02-06
Hardware
Microprocessor
ASIC
Analog
circuit
Sensor
Software
C
o
S
Embedded
memory
Sourc
e: S3
Source: Stratus
Computers
DSP
Network
High-speed electronics
Prof. Z. Peng, ESLAB/ LiTH
2007-02-06
15
65
35
8
45
24
20
40
35
3
23
67
56
6
2007-02-06
Hardware/Software Partitioning
Input:
Hardware/Software Partitioning
Assumptions:
=
=
=
2007-02-06
Hardware/Software Partitioning
=
=
=
10
2007-02-06
Wi N Lim1 i Hw
iH
11
Features of CO Problems
=
...
0 25 tasks in 6 centuries.
12
2007-02-06
Features of CO Problems
=
13
Heuristics
=
14
2007-02-06
Transformational
Constructive
(Iterative improvement)
Heuristic Approaches to CO
Problem specific
Generic methods
Clustering
List scheduling
Left-edge algorithm
Kernighan-Lin
algorithm
s)
tic
s
i
r
eu
H
l
eta
M
(
Neighborhood search
Simulated annealing
Tabu search
Genetic algorithms
15
v2
v1
v5
v4
v3
v3
v1
v2
v5
v4
v1
4
v5
v4
v3
v2
v3
v2
v1
v1
v4
v4
v5
v3
v5
16
2007-02-06
17
Branch-and- Bound
=
4-City TSP
0
3
0
41
1
40
2
3
41
40
4
0
18
2007-02-06
0 0
41
40
Branch-and- Bound Ex
=
=
{0,1}
L3
{0,2}
L6
{0,3}
L 41
{0,1,2}
L 43
{0,1,3}
L8
{0,2,1}
L 46
{0,2,3}
L 10
{0,3,1}
L 46
{0,3,2}
L 45
{0,1,2,3}
L = 88
{0,1,3,2}
L = 18
{0,2,1,3}
L = 92
{0,2,3,1}
L = 18
{0,3,1,2}
L = 92
{0,3,2,1}
L = 88
19
Step 1
(Initialization)
Step 3 (Update)
Re-set xnow = xnext .
If c(xnow ) < best_cost, perform Step 1(B).
Goto Step 2.
20
10
2007-02-06
15
65
35
45
24
65
35
40
35
23
15
20
45
56
67
20
40
35
56
4
8
24
23
67
21
Step 1
(Initialization)
Step 3
(Update)
Solutions
22
11
2007-02-06
0 Simulated annealing
0 Tabu search
Cost
Solutions
23
The SA Algorithm
Select an initial solution xnow X;
Select an initial temperature t > 0;
Select a temperature reduction function ;
Repeat
Repeat
Randomly select xnext N(xnow );
= cost(xnext ) - cost(xnow );
If < 0 then xnow = xnext
else generate random p uniformly in the range (0, 1);
If p < exp(- /t) then xnow = xnext ;
Until iteration_count = nrep;
Set t = (t);
Until stopping condition = true.
Return xnow as the approximation to the optimal solution.
Prof. Z. Peng, ESLAB/ LiTH
24
12
2007-02-06
70000
60000
55000
50000
45000
40000
35000
0
200
400
600
800
1000
1200
1400
Number of iterations
Prof. Z. Peng, ESLAB/ LiTH
25
Analysis Techniques
=
26
13
2007-02-06
Performance Metrics
=
=
=
27
Simulation-based Techniques
=
=
=
=
=
28
14
2007-02-06
Static Analysis
Techniques that use results of information collected by
analyzing the programs without executing them.
=
Restriction on software
0 bounded loops
0 absence of recursive functions
0 absence of dynamic function calls
29
Program Analysis
=
Actual
WCET
=
t
Estimated
WCET
Source code.
Compiler.
Machine architecture description.
Operating system.
30
15
2007-02-06
31
=
=
32
16
2007-02-06
ILP Formulation
Let xi be the number of times a basic block Bi is executed;
ci be the execution time of the basic block Bi, which is
assumed to be a constant.
The total execution time of the program for a particular
execution is:
C1
ci xi
C3
i =1
C1 + C2 + C4 + 11 C5 + 10 C6 + C7
C7
C2
C4
C5
11
C6
10
33
max ci xi
i =1
34
17
2007-02-06
An Example
d1
/* k >= 0 */
s = k;
while (k < 10) {
if (ok)
j++;
else {
j = 0;
ok = true;
}
k++;
}
r = j;
x1 B1
d8
x2 B2 while (k<10)
d3
x3 B3 if (ok)
d5
d4
B5 j = 0;
x4 B4 j++;
x5
ok=true;
d6
d7
x6 B6
k++
d9
x7 B7
s = k;
d2
r = j;
d 10
CFG
35
Constraints I
=
d1
Structural constraints:
x1 B1 s = k;
d2
x2 B2 while (k<10)
d1 = 1
x1 = d1 = d2
x2 = d2 + d8 = d3 + d9
x3 = d3 = d4 + d5
...
x4 B4
d9
d3
x3 B3 if (ok)
d5
d4
j++;
x5 B5 j = 0;
ok=true;
d6
d7
x6 B6
k++
x7 B7 r = j;
d 10
d8
CFG
36
18
2007-02-06
Constraints II
=
Functionality constraints:
X1
X2
X3
X4
X5
X6
X7
/* k >= 0 */
s = k;
while (k < 10) {
if (ok)
j++;
else {
j = 0;
ok = true;
}
k++;
}
r = j;
37
38
19
2007-02-06
Simulation
=
39
Co-Simulation
=
Problems:
=
=
40
20
2007-02-06
Approaches to Co-Simulation 1
=
Gatelevel
model
(VHDL)
ASIC
model
(VHDL)
VHDL
simulation
SW
Co-simulation framework
0 Gate level simulation of the processor is very slow (tens of
clock cycles/sec).
Ex. 10 cycles/sec, 1 GHz processor 100 million seconds
(3.2 years) are needed to simulate one second of real time.
0 This provides a very accurate solution and is very simple
from the co-simulation point of view.
Prof. Z. Peng, ESLAB/ LiTH
41
Approaches to Co-Simulation 2
=
ISA
model
(C
progr.)
ASIC
model
(VHDL)
VHDL
simulation
SW
Co-simulation framework
0 There is no hardware model of the target processor; the
software is executed on an ISA model (usually in C);
execution on the ISA model provides interface information
(including timing) needed for co-simulation.
0 This is fast but timing accuracy depends on the interface
information.
Prof. Z. Peng, ESLAB/ LiTH
42
21
2007-02-06
Approaches to Co-Simulation 3
=
Translation-based models
Program
Running
directly
on host
Software
compiled
into native
code for
the host
ASIC
model
(VHDL)
VHDL
simulation
Co-simulation framework
0 There is no hardware model of the target processor; the
software is compiled into native code for the host
processor; software execution provides interface
information (including timing) needed for co-simulation.
Prof. Z. Peng, ESLAB/ LiTH
43
Approaches to Co-Simulation 4
=
44
22
2007-02-06
Approaches to Co-Simulation 5
=
45
Concluding Remarks
=
46
23
Hardware/Software Codesign
2.
3.
Processor Specialisation
4.
5.
6.
7.
8.
9.
Reconfigurable Systems
Hardware/Software Codesign
Functional
Simulation
Arch. Selection
System model
Formal
Verification
System
architecture
Mapping
Estimation
Scheduling
not OK
Mapped and
scheduled model
OK
Softw. model
Simulation
Softw. Generation
Simulation
Testing
OK
Prototype
Fabrication
Petru Eles, IDA, LiTH
Simulation
Formal
Verification
Hardw. model
Hardw. Synthesis
Softw. blocks
not OK
not OK
Hardw. blocks
Hardware/Software Codesign
Hardware/Software Codesign
Architecture Selection
General
Purpose
vs.
Application
Specific
Software
vs.
Hardware
fixed
reconfigurable
Monoprocessor
Mono vs. Multipr.
Single vs. Multichip
Multiprocessor
single chip
multi chip
Hardware/Software Codesign
low
high
Reconfigurable
hardware
Software
low
high
Software
low
Reconfigurable
hardware
Hardware
high
low
Hardware/Software Codesign
order of
order of
magnitude magnitude
energy
consumed
GP proc.
high
ASIP
FPGA
med.
low
ASIC
low
med.
high
flexibility
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Processor
Architecture
Algorithm(s)
Compiler
Simulator
Performance
numbers
Hardware/Software Codesign
Glue logic
A/D and D/A
Controller
(ASIP)
VLIW
processor
(ASIP)
On-chip
memory
DSP
(GP)
Hardware/Software Codesign
Register File 1
Register File 2
Register File 3
Datapath
Instruction fetch & decode
Hardware/Software Codesign
op1
op2
Cluster 1
op3
op4
op5
op6
Cluster 2
op7
op8
op9
op10 op11
Cluster 3
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Hardware/Software Codesign
Retargetable compiler
Configurable simulator
Hardware/Software Codesign
Retargetable Compiler
Retargetable compiler
Processor
Architecture
Algorithm
Retargetable
Compiler
Object code
Hardware/Software Codesign
Hardware/Software Codesign
Configurable Simulator
Such a simulator can be
configured for a particular
architecture (based on an
architecture description)
Processor
Architecture
Object code
Simulator
Performance
numbers
Hardware/Software Codesign
Hardware/Software Codesign
Proc.
Core3
Proc.
Core2
Proc.
Core1
Cache
DMA
Memory
Bridge
System bus
Peripheral bus
Peripheral
Reconfigurable
logic
Peripheral
Hardware/Software Codesign
Platform
Architecture
Applications
Mapping/
Compiling
Simulator
Performance
numbers
Hardware/Software Codesign
Instantiating a Platform
Hardware/Software Codesign
Platform
Architecture
Platform
Instance
Application
Mapping/
Compiling
Simulator
Performance
numbers
Hardware/Software Codesign
System Platforms
What we discussed about (see previous slides) are so called
hardware platforms.
The hardware platform is delivered together with a software layer:
hardware platform + software layer = system platform.
Software layer:
- real-time operating system
- device drivers
- network protocol stack
- compilers
The software layer creates an abstraction of the hardware
platform (an application program interface) to be seen by the
application programs.
Hardware/Software Codesign
Hardware/Software Codesign
Possible(!) definition
A core is a design block which is larger than a typical RTL
component.
Of course:
We also reuse software components!
Hardware/Software Codesign
Core 1
Core 2
Core 3
glue
glue
glue
Interconnection bus/switch
glue
Interface
Library
Vendor B
Library
Vendor A
Core 4
processor
Library
Vendor C
I/O
Hardware/Software Codesign
Types of Cores
Hard cores: are fully designed, placed, and routed by the supplier.
A completely validated layout with definite timing
rapid integration
low flexibility
less predictability
flexibility during
place and route
Hardware/Software Codesign
maximal flexibility
Hardware/Software Codesign
Reconfigurable Systems
Programmable Hardware Circuits:
They implement arbitrary combinational or sequential circuits
and can be configured by loading a local memory that determines
the interconnection among logic blocks.
Reconfiguration can be applied an unlimited number of times.
Main applications:
- Software acceleration
- Prototyping
Hardware/Software Codesign
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Processor
Memory
at t
at t 2
FPGA
Accelerator
at t 3
at
t4
lly
a
r
po ned
m
te rtitio
pa
Hardware/Software Codesign
CPU
On
chip
mem.
Kernels
Reconfigurable
datapath
Hw/Sw
partitioning
Datapath
synthesis
C code
Hardware/Software Codesign
Summary
Architecture selection is about making trade-offs along the
dimensions of speed, cost, flexibility, and power consumption.
ASIPs are programmable processors, specialised for a particular
application or for a family of applications.
Specialisation of an ASIP concerns instruction set, function units
and data path, memory system, interconnect, and control.
Two design tools are of great importance in order to perform
processor specialisation: retargetable compiler and configurable
simulator.
Not only processors can be specialised but also platforms. A
Platform is specialised to execute a certain family of applications.
The particular hardware to be used for a given application is a
specialised instantiation of the platform.
Hardware/Software Codesign
Summary (contd)
Hardware/Software Codesign
Low Power/Energy - 1
2.
3.
4.
5.
6.
Hardware/Software Codesign
Low Power/Energy - 2
Functional
Simulation
Arch. Selection
System model
Formal
Verification
System
architecture
Mapping
Estimation
Scheduling
not OK
Mapped and
scheduled model
OK
Softw. model
Simulation
Softw. Generation
Simulation
Testing
OK
Prototype
Fabrication
Petru Eles, IDA, LiTH
Simulation
Formal
Verification
Hardw. model
Hardw. Synthesis
Softw. blocks
not OK
not OK
Hardw. blocks
Hardware/Software Codesign
Low Power/Energy - 3
Hardware/Software Codesign
Low Power/Energy - 4
static
1
2
P = --- C V DD f N SW + Q SC V DD f N SW + I leak V DD
2
Switching power
Power required to
charge/discharge
circuit nodes
Short-circ. power
Dissipation due
to short-circuit
current
C
= node capacitances
NSW = switching activities
(number of gate transitions
per clock cycle)
f
= frequency of operation
Leakage power
Dissipation
due to leakage
current
Hardware/Software Codesign
Low Power/Energy - 5
drain
Vbs
n
ai
dr
ga
so
ur
ce
te
Threshold voltage:
-
gate
body
source
Hardware/Software Codesign
Low Power/Energy - 6
CMOS inverter
Vdd
Vbs
n
ai
dr
ga
so
ur
ce
te
drain
gate
CL
body
source
Dynamic power
-
Hardware/Software Codesign
Low Power/Energy - 7
CMOS inverter
Vdd
Vbs
n
ai
dr
ga
so
ur
ce
te
drain
body
Static power
Vbs = body bias voltage
Vth = threshold voltage
Vdd = supply voltage
CL = output load capacitance
Subthreshold leakage
conduction
Junction leakage (drain
and source to body)
CL
Hardware/Software Codesign
Low Power/Energy - 8
For long:
Leakage power has been considered negligible compared to
dynamic.
Today:
Total dissipation from leakage is approaching the total from
dynamic.
Hardware/Software Codesign
Low Power/Energy - 9
Hardware/Software Codesign
Low Power/Energy - 10
Hardware/Software Codesign
Low Power/Energy - 11
Hardware/Software Codesign
Low Power/Energy - 12
Hardware/Software Codesign
Low Power/Energy - 13
Behavioral level
Schedule and map operations so that number of cycles is
minimised (with increased number of switching per clock
cycle) you can run at slower clock rate you can reduce
supply voltage.
Allocate and share modules so that power consumption is
reduced (for example, by reducing switching activity)
Hardware/Software Codesign
Low Power/Energy - 14
Hardware/Software Codesign
Low Power/Energy - 15
Hardware/Software Codesign
Low Power/Energy - 16
Hardware/Software Codesign
Low Power/Energy - 17
Hardware/Software Codesign
Low Power/Energy - 18
Hardware/Software Codesign
Low Power/Energy - 19
application
power aware OS
hardware
Goal:
Energy optimization
QoS constraints satisfied
Hardware/Software Codesign
Low Power/Energy - 20
0.75V, 60mW
150MHz
1.3V, 450mW
RUN
RUN
600MHz
RUN
1.6V, 900mW
RUN
800MHz
160s
RUN
10s
1.5ms
10s
IDLE
40mW
140ms
90s
SLEEP
160W
Hardware/Software Codesign
Low Power/Energy - 21
dont forget
these!
Hardware/Software Codesign
Low Power/Energy - 22
Busy
Power state
Working
Requests
Busy
Idle
Tsd
Sleeping
Twu Working
?
T1 T2 T3
T4
Time
Hardware/Software Codesign
Low Power/Energy - 23
Send the device to sleep only if the saved energy justifies the overhead!
The main Problems:
Dont shut down such that delays occur too frequently.
Dont shut down such that the savings due to the sleeping are
smaller than the power overhead of the state changes.
Hardware/Software Codesign
Low Power/Energy - 24
Hardware/Software Codesign
Low Power/Energy - 25
Time-out Policy
It is assumed that, after a device is idle for a period (the interval T1 - T2
on slide 16), it will stay idle for at least a period which makes it efficient
to shut down.
Drawback: you waste energy during the period (compared to
instantaneous shut-down).
Policies:
- Fixed time-out period: you set the value of , which then stays constant.
- Adjusted at run-time: increase or decrease , depending on the
length of previous idle periods.
Hardware/Software Codesign
Low Power/Energy - 26
Predictive Policy
The length of an idle period is predicted. If the prediction is for an idle
period long enough, the shut-down is performed immediately (no time
interval T1 - T2 on slide 16).
Policy
Shut down after
Idle Period
- L-shaped distribution for --------------------------------------------------;
Previous Busy Period short busy period!
Idle Period
Busy Period
Hardware/Software Codesign
Low Power/Energy - 27
Stochastic Policy
Predictions are based on Markov models: requests and power state
transitions of the device are modelled as probabilistic state machines.
The power manager observes the arriving requests, the request
queue and the device generates shutdown commands.
Power manager
The device:
provides service
Petru Eles, IDA, LiTH
obs.
s.
ob
co
an
ds
Markov model:
device
Markov model:
request generator
ob
s.
Hardware/Software Codesign
Low Power/Energy - 28
Hardware/Software Codesign
Low Power/Energy - 29
3
5
Task
1
2
3
4
5
6
7
8
4
7
8
p3
p4
Bus
Petru Eles, IDA, LiTH
WCET
Energy
p3
p4
p3
p4
10
10
11
17
21
15
10
10
14
15
19
14
Hardware/Software Codesign
Low Power/Energy - 30
Time
p3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64
6
2
p4
bus
C1-2
C3-5
C5-7
C4-8
Hardware/Software Codesign
Low Power/Energy - 31
Time
p3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64
6
2
p4
bus
C1-2
C3-5
C5-7
C7-8
Hardware/Software Codesign
Low Power/Energy - 32
Hardware/Software Codesign
Low Power/Energy - 33
( V DD V t )
f = k -----------------------------, k: circuit dependent constant; Vt: threshold voltage.
V DD
The execution time of the task: t exe = N CY
V DD
-----------------------------------
2
k ( V DD V t )
Hardware/Software Codesign
Low Power/Energy - 34
Hardware/Software Codesign
Low Power/Energy - 35
Hardware/Software Codesign
Low Power/Energy - 36
109 cycles
52
Etotal = 40 J
slack
0
Petru Eles, IDA, LiTH
10
15
20
texe = 20 sec
25
time (sec)
Hardware/Software Codesign
Low Power/Energy - 37
750106 cycles
250106 cycles
52
Etotal = 32.5 J
texe = 25 sec
2.52
0
Petru Eles, IDA, LiTH
10
15
20
25
time (sec)
Hardware/Software Codesign
Low Power/Energy - 38
109 cycles
52
Etotal = 25 J
42
0
Petru Eles, IDA, LiTH
texe = 25 sec
10
15
20
25
time (sec)
Hardware/Software Codesign
Low Power/Energy - 39
Hardware/Software Codesign
Low Power/Energy - 40
6
V2 25010
cycles
750106 cycles
42
Etotal = 25 J
1
0
2
5
10
15
20
25
time (sec)
Hardware/Software Codesign
Low Power/Energy - 41
Hardware/Software Codesign
Low Power/Energy - 42
Energy 1
- 50 nJ/cycle at VDD = 5V.
- 32 nJ/cycle at VDD = 4V.
- 12.5 nJ/cycle at VDD = 2.5V.
Energy 2
- 12.5 nJ/cycle at VDD = 5V.
- 8 nJ/cycle at VDD = 4V.
- 3 nJ/cycle at VDD = 2.5V.
Hardware/Software Codesign
Low Power/Energy - 43
750106 cycles
42
Etotal = 14 J
1
0
2
5
10
15
20
25
time (sec)
Hardware/Software Codesign
Low Power/Energy - 44
750106 cycles
52
Etotal = 12.5 J
2
2.52
0
Petru Eles, IDA, LiTH
1
5
10
15
20
25
time (sec)
Hardware/Software Codesign
Low Power/Energy - 45
If power consumption per cycle is not constant (but differs from task
to task), the rule on slide 33 is not true any more.
Voltage levels have to be reduced with priority for those tasks which
have a larger energy consumption per cycle.
Hardware/Software Codesign
Low Power/Energy - 46
Hardware/Software Codesign
Low Power/Energy - 47
Scheduling Policies
Priority-based scheduling
Hardware/Software Codesign
Low Power/Energy - 48
E = NC C eff V dd + L g ( V dd K 3 e
K 4 V dd
K 5 V bs
+ V bs I ju ) t
Hardware/Software Codesign
Low Power/Energy - 49
E = NC C eff
2
V dd
+ L g ( V dd K 3 e
Dynamic decreases
with Vdd regardless
of increased time.
K 4 V dd
K 5 V bs
+ V bs I ju ) t
Leakage decreases
with Vdd, but growth
with time!
Hardware/Software Codesign
Low Power/Energy - 50
E = NC C eff V dd + L g ( V dd K 3 e
K 4 V dd
K 5 V bs
+ V bs I ju ) t
8e-10
7e-10
6e-10
5e-10
4e-10
3e-10
Dynamic energy
2e-10
1e-10
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Jejurikar et. al., DAC04
1 Vdd
Hardware/Software Codesign
Low Power/Energy - 51
E = NC C eff V dd + L g ( V dd K 3 e
K 4 V dd
K 5 V bs
+ V bs I ju ) t
8e-10
7e-10
6e-10
5e-10
4e-10
3e-10
Dynamic energy
2e-10
1e-10
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Jejurikar et. al., DAC04
Leakage energy
1 Vdd
Hardware/Software Codesign
Low Power/Energy - 52
E = NC C eff V dd + L g ( V dd K 3 e
K 5 V bs
+ V bs I ju ) t
Critical point!
If you go beyond this
70nm
with technology
Vdd energy grows
8e-10
K 4 V dd
7e-10
6e-10
5e-10
Dynamic + Leakage
4e-10
3e-10
Dynamic energy
2e-10
1e-10
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Jejurikar et. al., DAC04
Leakage energy
1 Vdd
Hardware/Software Codesign
Low Power/Energy - 53
Summary
Hardware/Software Codesign
Low Power/Energy - 54
Summary (contd)
Dynamic power management is implemented by the operating
system, and is mainly used in portable appliances to shut down or
place in stand-by unused devices.
Typical policies for power management are: time-out, predictive,
and stochastic.
Both at task mapping and at scheduling, design decisions can be
made with have a huge impact on power/energy consumption.
Real-time scheduling in the context of processors with voltage
scaling is extremely interesting. The main trade-off is voltage level
vs. execution time. One has to find the optimal voltage levels such
that energy consumption is reduced and deadlines are still
fulfilled.