Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4

Optimization Mechanism for Static Power
Reduction Using Dual-V

th
Assignment

ABEL SILVA-FILHO, RODOLFO P. SANTOS,
GABRIELA S. CLEMENTE, CRISTIANO ARAJO,
ADRIANO SARMENTO, MANOEL LIMA, AND EDNA BARROS

Federal University of Pernambuco, Informatics Center Recife PE, Brazil
Av. Jornalista Anbal Fernandes - Informatics Center, CEP: 50.740-560
{agsf, rps2, gsc2, cca2, aams, mel, ensb}@cin.ufpe.br

________________________________________________________________________

Power consumption reduction is a challenge nowadays. Techniques for dynamic and static power minimization
have been proposed, but most of them are very time consuming. This work proposes an algorithm for reducing
static power, which can be perfectly inserted in the conventional design flow for integrated systems considering
an open source environment (Open Access Infra-structure). The proposed approach, based on a Dual-Threshold
technique, replaces part of the cells of the circuit by cells with a higher threshold voltage without resulting in
timing violations in the circuit. The decision to replace a cell is based on timing estimates of the circuit
modeling with the cell replacement, before it is actually replaced. The fact that only some cells are replaced
every iteration results in a reduction of the runtime of the algorithm. Additionally results showed a reduction in
static power up to 39.28%, when applying the proposed approach in the ISCAS85 benchmark circuits.

General Terms: Low-Power, VLSI, CMOS, Static Power
Additional Key Words and Phrases: Design, Performance
________________________________________________________________________

1. INTRODUCTION

Nowadays power consumption is this is one of the biggest challenges faced by the
semiconductor industry [Kim, N. S. et al. 2003]. With the advancement of technology,
the increasing market of systems that are portable and have long battery lifetime, has
created an area of great importance in academic research and industry [ITRS 2008]. The
research for technologies that consume less power is a constant in our days. In the past,
more focus on VLSI design consisted of improving performance and reducing area. With
the growing demand for equipment using wireless network and portable devices, power
consumption has become a critical problem and it is becoming one of the main concerns
during the design of VLSI circuits. In this scenario it is extremely important to enlarge
the batteries lifetime as much as possible. In this sense, many efforts have been made in
order to develop optimization mechanisms to reduce energy consumption of electronic
systems at different design abstraction levels such as transistors [Lee et al. 2004], gate
level [Chi et al. 2007], architecture level [Silva-Filho et al. 2008], and system level
[Silva-Filho et al. 2006], as well as methodologies aiming low power designs such as
work developed by [Pedram et al. 2002] and [Keating et al. 2008].
Another aspect that has taken into account is the impact of previously neglected
power consumption sources. Until very recently, only dynamic power was considered as
a significant source of power consumption, supported by "Moore's Law" [Gordon E.

Moore. 1965]. Moore predicted that the number of transistors on a chip would double
every two years. However, with the reduction of the device sizes below 0.1 um, the static
power is imposing new challenges for low-power circuits. For circuits implemented using
90nm and 65nm technologies [LOW] leakage power is nearly the same as dynamic
power, and in many cases surpassing it, when a suitable optimization strategy is not
adopted.
Recently, several works have targeted leakage power minimization. A method for
reducing static power in standby mode on transistor level has been proposed in [Ketkar,
M. et al. 2002]. This approach proposes a heuristic which divides the optimization of
static power consumption in two distinct stages: the scaling of transistor sizes and the
assignment of the threshold voltage of each gate transistor, chosen between two possible
values. This approach results up to 21% of power savings for some ISCAS85 benchmark
circuits [Brglez, F. et al. 1985]. However, the algorithm implementing this approach is
very time consuming, since it uses as design space the libraries available, and these
libraries are usually large in size. A similar disadvantage is also found in the approach
proposed by Lee [Lee, D. et al. 2003]. Another technique for power optimization at
transistor level called MVT(Mixed-Vth) that is applied to CMOS circuits has been
proposed by [Wei, L. et al. 1999]. This work reports that up to 20% more power savings
can be achieved when compared to techniques applied at gate level. Using this approach
the final number of High V
th
Voltage (HVT) transistors is greater than the number of
HVT transistors obtained after the application of gate level algorithms. However, this
approach deals with very large standard cells libraries, which in general is not
commercially practicable.
A Dual-Threshold algorithm that introduces a key priority factor for cells
replacements has been proposed in [Jaffari, J. et al. 2004]. In this work, the amount of
sub-threshold current reduction related to the increase of the cell delays determines the
priority of a particular cell to be replaced in the circuit. Using this factor, a cell that
causes the greatest reduction in power consumption and less degradation in the circuit
timing has a higher replacement priority over the others. This technique reports an
average static power reduction of 25% over the initial static power consumption.
However, in this work when the replacements are made the timing of the circuit is not
updated. This can cause serious problems, since after each replacement the critical paths
of the circuit may change. So the next cells to be treated will have their timing constraints
underestimated because a previous replacement might result in more severe constraints to
these cells and they will not be considered. Moreover, the input circuits used by this
methodology are circuits with a single threshold voltage. This fact does not match with
the more commonly used design flows, when more than one threshold voltage is
available in libraries.
In the approach proposed by Elakkumanam [Elakkumanan, P. et al. 2005], the
minimization of power regarding the leakage dissipation and glitches is treated. A Dual-
V
th
technique and buffers insertion (for balanced paths) are applied. The problem is
solved using an ILP-based (Integer Linear Programming) technique. However the
application of an ILP based approach may not be suitable for large designs, because it is a
very time consuming optimization approach. A technique for reduction of leakage power
of 90nm and 65nm integrated circuits with multi-millions gates by exchanging Low V
th

Voltage (LVT) cells belonging to less critical paths of the circuit by HVT cells has been
proposed by Gupta [Gupta, S. et al. 2008]. This work targets the problem of long run-
time and static power consumption using a heuristic method. Additionally, this algorithm
aims to optimizing the netlist of an initial multi-V
th
circuit. Unlike path-based approaches
(in general, by optimizing the non-critical paths of the gate circuits), in [Gupta, S. et al.
2008] it does not update the timing at each cell replacement. Even with this feature, for
Comment [msb1]: O que isto que dizer??? Tem
um custo computacional muito grande??? Ou tais
clulas no existem????
circuits with multi-millions gates the algorithm runtime becomes a dominant factor
during design flow. However this approach always considers the worst cases when
dealing with the delays estimation. Thus, further power savings could be obtained if more
realistic values were estimated. Moreover, several heuristic parameters must be defined
by the designer.
This work proposes a novel approach for reducing static power consumption that is
suitable to be inserted in the conventional design flow of integrated circuits. This
approach uses a Dual-Threshold technique, where the cells of the circuit are replaced by
cells with a higher threshold voltage without causing timing violations in the circuit.
More realistic delay values and slew estimations are used by the proposed technique in
order to determine the replacement of each cell in the circuit, before it is actually
replaced. Additionally, this work aims to implement a tool that has a runtime compatible
with design constraints and which fits designs with tight time-to-market.
The rest of this paper is structured as follows. Next section explains the Dual-
Threshold technique which is used in this work for power optimization. Section 3
presents the proposed approach. In Section 4 experimental results are presented. These
results were obtained by applying the proposed algorithm to several combinational
circuits of the ISCAS85 benchmark. Finally, Section 5 gives conclusions and future
directions.

2. DUAL-V
TH
TECHNIQUE FOR POWER OPTIMIZATION

Basically this technique explores the excessive slacks in non-critical paths targeting the
reduction of static power consumption. The optimization process results in a circuit with
reduced power consumption and two types of gates composed of HVT transistors and
LVT transistors. Fig. 1 shows a circuit optimized using this technique.

Fig. 1. Circuit with two types of gates

The use of two threshold voltage values is a technique for reducing power either when
the circuit is in active or in standby mode. In order to obtain transistors with different
threshold voltages, during the manufacturing process several characteristics of the
transistors may vary, such as the channel doping, the channel length, the polarization of
the transistors body, and the thicknesses of gate oxide.
In situations where there are non-critical paths meaning that the signal propagation
occurs more quickly than required, these paths might be delayed without affecting the
circuit performance. As the transistors with a high V
th
voltage (HVT) present low sub-
threshold currents, while the sub-threshold currents of the transistors with low threshold
voltage (LVT) are higher, this approach optimizes the power consumption by replacing
the LVT cells that are in non-critical paths by HVT cells. However, it must be guaranteed
Non-critical path
Critical path
HVT
LVT
Comment [msb2]: Abel o tempo de execuo
resultante est sendo mostrado no algoritmo???

that the replacement of cells will have no impact on the overall performance of the
circuit. It can be achieved through optimization strategies such as proposed in this work.
Fig. 2 shows an example where there is a non-critical path composed of LVT cells.
Considering that the critical path defines the required time of the circuit paths equal to 10
ns, there is a path that arrives 5 ns earlier than required which means that this path has a
slack equal to 5 ns.

Fig. 2. Critical and non-critical paths

When the LVT cells are replaced by HVT cells, as shown in Fig. 3, the path still
presents a slack equal to 2 ns. This implies that the changes in the circuit caused by the
replacement of some cells belonging to a non-critical path can delay the information by
the value of the path slack without degrading the circuit performance. This illustrates how
this approach can reduce the static power consumption, since HVT cells dissipate less
static power. Currently, the application of a Dual-V
th
flow during synthesis has become
relatively common [Keating, M. et al. 2007]. This usually involves an initial synthesis
using a primary standard cells library, followed by an optimization using one or more
libraries with several thresholds voltage.

Fig. 3. Replacement of LVT cells by HVT cells in non-critical path

10 ns
5ns
D
c
l
Q
Q
D
c
l
Q
Q
D
c
l
Q
Q
D
c
l
Q
Q
10 ns
D
c
l
Q
Q
D
c
l
Q
Q
D
c
l
Q
Q
HVT
8 ns
D
c
l
Q
Q
HVT
HVT
critical path

3. THE PROPOSED APPROACH

There are several ways for optimizing static power consumption using threshold voltage
(V
th
) assignments including some approaches that try to find the optimal values of V
th
.
One problem is that these values are often not available in commercial standard cell
libraries and the designers do not have such a variety of threshold voltages. In general,
even a complex library has a finite and limited number of cells characterized with
different V
th
voltages, usually only two or three values are found.
Another important issue that should be pointed out about the other approaches is that
many schemes do not take into account the fact that at each time a cell is replaced by
another one with a different V
th
, the delays of the circuit must be updated, since the
critical paths may have been changed. Furthermore, if the runtime of the power
optimization algorithms is not considered, it may be impracticable to be applied in
circuits composed of millions of cells. Thus, we must carefully choose the optimization
approach, because many of them may be not useful. Depending on the applications, the
optimization algorithm may have a very high computational cost, increasing the design
time.
Since, in general, critical or non-critical paths are composed of a series of cells in
cascade; the replacement of a single cell in a path can completely change the timing
characteristics (slacks and slews) of the whole path. Path-based approaches incorporate
the changes in the circuit due to cells replacement going through the paths level by level
and updating the database after each change. A restriction of this method is that updating
the database at every change leads to a huge increase in the algorithm runtime, especially
for large circuits.
The problem of static power dissipation considered in this paper is being handled after
the global routing stage, when the timing and area constraints have been resolved and
information about the parasitic capacitances are available with good accuracy. However,
this feature does not prevent it from being applied after the synthesis step.
The proposed approach has as main goal to reduce the leakage power consumption
without increase of the computational cost. The main feature for maintaining the same
computational cost is that no update of timing in the database is necessary every time a
cell is analyzed, unlike the mentioned approaches. In contrast, this is done by simulating
the potential library cells that can replace the circuit cell with a higher Vth voltage.
The block diagram describing the proposed flow can be seen in Fig. 4. This flow takes
a circuit netlist as entry, and then a static analysis is performed in order to find the LVT
cells with a gap of sufficient slack to be exchanged for HVT cells. Each cell of the circuit
is analyzed in order to be created lists of cells that can be replaced and establish an order
for the treatment of the cells. Tests are done using cells from the library to enable the
choice of the cell that will replace the cells of the circuit. These tests include checking the
maximum fan-out of the cell from the library, checking the functionality of the cell to
compare the degradation of the slew and delay of this cell with values heuristically
defined. Only when all cells have been checked, the exchanges will be made, so that
effectively only one static analysis is performed to ensure that no timing violation is
caused by substitutions defined along the iterations of the algorithm. The following
subsections explain the proposed flow and present some definitions that are necessary to
understand the proposed approach.

Fig. 4. Detailed algorithm flow.

3.1. Creating a list of cells candidates to be replaced (circCellList)

Once the static timing analysis has been applied, the first step is to create a list
(circCellList) that contains all cells (targCells) that can be replaced. In order to support a
netlist containing cells with HVT (high V
th
) and LVT (low V
th
) cells, this list includes
LVT cells and cells with a slack greater than a parameter defined as a safe slack
(folgSlack). It should be noted that the ability to support an input netlist characterized
with more than one V
th
and the availability of Multi-V
th
libraries are essential features for
Initial netlist
Visit ordered
circCell list
Creation of substitute
cell list (libCell)
NO YES
Calculation of degraded
slew and delay of TargCell
STA

circCell == TargCell
Calculation of degraded
slew and delay of libCell
NO

YES

(degradedDelay < orDelay)
&& (degradedSlew < limSlew)

End of substitute
cell list?
Insert libCell in
probable cells list
End of substitute
cell list?

libCell choice
YES

YES

NO

NO

End of circCell ?
NO

Update limSlew
STA
Dual-V
TH
netlist
Do replacements
YES

optimization using Multi-V
th
based techniques. The pseudo code for the creation of the
circCellList is shown in Fig. 5.

Fig. 5. The pseudo code for the creation of the circCellList

After the creation of the circCellList, the input slews and output loads of the cells are
calculated. It should be emphasized that in this approach all the arcs of the cells are taken
into account, so the worst case are always considered when estimating the cells delays.
This makes the timing analysis of the circuit more complex but more accurate. Thus, such
precision leads to an increasing possibility of power reduction, and better assurance that
there are no timing violations in the circuit. Both delays arcs and slew arcs will be taken
into account.

3.2. Substitute candidates List (libCell)

For each cell that can be replaced by the algorithm, a list with their respective substitute
candidates will be determined. After that, this list is filtered in order to determine the
substitute cell that will replace the target cells.
At each algorithm iteration, this list will be created (circCell) by taking into account
some restrictions of the library. First, the library cells must have the same functionality as
the cells in the list circCell because the design behavior can not be changed. Moreover, as
the purpose of the cell replacement is to reduce the static power consumption, all the cells
in this list must be HVT cells.

Fig. 6. Pseudo code for the creation of the list of substitute candidates

A heuristic parameter is defined in order to limit the maximum fan-out of the
possible cells in the list. All cells belonging to the substitute candidates list must have a
maximum fan-out equal to or lower than the circCell maximum fan-out and higher than
half of this value. These parameters were defined after exhaustive simulations in order to
obtain the best optimization results, based on [Gupta et al. 2008]. The pseudo code for the
creation of the substitute candidates list is described in Fig. 6.

CircCellList( netlist, folgSlack) {
while (*iterator getnext()) {
if ((SlackCell > folgSlack) &&
(Cell = LVT)){CircCellList.insert
(iterator++, Cell)
}
}
substituteList (circCell) {
while (*libCell = iterator.getnext()) {
if ((cell = HVT) && (circCellFunction = libCellFunction)
&& (libCellFanout < circCellFanout) && (libCellFanout >
circCellFanout/2)){
substituteList.insert(iterator++, libCell).
}
}

3.3. Parameter limSlew

This parameter is the maximum slew degradation acceptable at the output pin after the
cell replacement. Due to the cascading of the cells, this parameter influences and
determines the input slews of all fan-out cells. In the proposed approach, the precise
estimation of this parameter is one of the main issues. This parameter is calculated
simulating all situations in which the substitute cells could replace the target cells without
causing timing violations in the circuit. Instead of replacing the cell after the choice of its
substitute and updating the circuit timing, the algorithm keeps the pair of cells to be
exchanged and their substitutes and the replacement is done only after all the cells have
been analyzed. Thus a single static timing analysis is performed at the end of the
algorithm.
The limSlew is used during these simulations to estimate the degradation of the input
slew of the cell being analyzed taking into account the possible slew degradation caused
by any cell previously analyzed belonging to its fan-in cone. It is calculated based on the
average of the degraded output slews of the cells in the substitute candidates cell list as
shown in the equation below, where origSlew is the original output slew related to the
worst timing arc.

limSlew = [Average_output_slew_with_degrated_input] - origSlew

(1)

These degraded slews are determined considering the same output capacitance as the
circuit cell output capacitance. So this parameter is dependent on the libraries used and
the design characteristics, being very generic, and does not need to be configured by the
user. Regarding the output capacitances used during the simulation, it is important to
emphasize that we are dealing with technologies below 90 nm, so parasitic
interconnections are dominant compared to the cell output load. Thus, the simulation with
the actual wire capacitance represents the worst case for the simulation and it guarantees
that no timing violations will be inserted.
After the choice of the substitute cells using the slew degradation defined in (1), the
limSlew parameter of all circuit cells is updated according to the real output slew
degradation obtained at the end of the analysis of each cell. Concerning the limSlew
parameter, it is classified according to the cell being analyzed. It may be (1) a HVT cell
which propagates its input slew to its output; (2) a LVT cell that will not be exchanged;
(3) or a LVT cell that will be exchanged.

Situation 1: HVT cells

Assuming a circuit with only three cells, cell I is the entry of the circuit, as shown in Fig.
7.

Fig. 7. Situation of HVT cell to calculate the limSlew
In Fig. 7, cell II is the one which the limSlew is being calculated. The circuit is
composed by three cells in cascade which means that the propagation of slew degradation
must be taken into account. In this situation, cell II has at one of its input cell I that may
C
I

C
II

HVT ou LVT
1
2
HVT
3
LVT
or not have been replaced. In the case that cell I has been replaced, the output slew of cell
II will be the same because its fan-in cone has no modification. This cell will not be
replaced because it is a HVT cell. However, in the case where cell number 1 has been
changed, cell number 2 must propagate the degraded slew due to the new output slew
value corresponding to the substitute of cell number 1. As the situation has changed, the
new degradation due to the slew propagation through cell number 2 reaching cell 3 must
be calculated. Thus, cell 3 will have its input slew degradation determined by this
parameter. Note that this requires traversing the circuit from its primary inputs to its
primary outputs. The limSlew of cells that are not replaced is calculated as follows:

(2)

The value circSlewDeg is given by the degraded slew at the output pin of the circuit cell
applying a degraded signal at its input.

Situation 2: LVT cells that will not be exchanged

Considering the same circuit shown before, in this situation we assume that cell 2, which
is considered to be LVT, has no substitute candidate or does not have a sufficient slack.
This case is illustrated in Fig. 8.

Fig. 8. Situation of the LVT cell not changed for the calculation of limSlew.

Since the algorithm runs from primary inputs to primary outputs, cell 3 has not been
analyzed until cell 2 analysis starts being performed. As cell 2 will not be exchanged,
there will be slew propagation through cell 2 that can be degraded, depending on what
happened with cell 1. In order to consider the propagation, we must simulate the cell with
the situation that may be new, depending on the exchange of cell 1. So, the equation to
calculate the slew degradation in this case is the same as (2).

Situation 3: LVT cells that will be exchanged

In this situation the circuit cell has a substitute candidate that has passed the simulation
tests. The Fig. 9 illustrates this situation.

Fig. 9. Situation of the LVT cell exchanged for the calculation of limSlew.

In this case the cell being analyzed (cell 2) will be replaced. Thus replacement of this
cell will result in a slew degradation that will be seen at the input pin of cell 3. Once the
slew degradation of cell 1 due to its exchange or not is well defined and cell 2 was
analyzed, following the above cases, we can determine the slew degradation caused by
the exchange of cell 2 in accordance with the following equation:
origSlew g circSlewDe LimSlew =
C
I

C
II

HVT ou LVT
1
3
LVT
2
LVT
C
I

C
II

HVT ou LVT
1
3
LVT
LVT HVT
2

limSlew = [Output_slew_of_chosen_cell] origSlew

(3)

All the situations discussed above are related to the calculation of the limSlew of a cell
that will be needed at the next cell input as a parameter for the calculation of its input slew
degradation.

3.4. Other degradation Delays related parameters

The parameters related to delays due to degradation include circDelayDeg, circSlewDeg,
orDelay, folgSlack and fan-out limit, which will be described in the following.

- circDelayDeg: this parameter corresponds to the delay degradation at the output
of the circuit cell when a degraded input is applied. As the contribution of this
work is that the proposed algorithm takes into account more realistic situations
in the circuit, the degraded input signal will be based on the limSlew parameter
related to all fan-in cells. This parameter will be used during the filtering process
applied to the cells belonging to the substitute candidates list. The calculation of
the delay degradation is performed to each timing arc of the cell.

- circSlewDeg: the calculation of the circSlewDeg parameter that is related to the
output slew degradation is similar to the calculation of the delay degradation of
the cells in the circuit. So, it must be calculated arc by arc. It is also used during
the analysis of the substitute candidates list in order to reduce the possible
library cells that can replace a cell in the circuit.

- orDelay: the delay budget of a cell is the maximum delay degradation value
allowed due to the exchange of the cell. Clearly, this value is proportional to the
delay contribution of the cell to the worst path that it belongs. This parameter is
calculated for each arc of the cell. It is important to the algorithm, specifically in
building the list of the probable cells list, since this parameter will be a key
reference point for testing the cells in alternates list, so that the degradation in
delay caused by the exchange of cell cannot exceed the value of orDelay.
Its calculation is done by using the following equation, where the delay
is the input-output arc delay of the cell:

(4)

The slack is the worst arc slack and is calculated with the static analysis. This
parameter is based on [Gupta et al. 2008].

- folgSlack: this parameter corresponds to a compensation of errors that can occur
due to any heuristics approximation and also compensates the propagation of
degraded slew due the cascading of cells in the circuit.
This parameter is used by the designer to minimize the penalties of
timing that can be caused by this heuristic. Thus for each circuit a value is
determined to optimize the results and ensure that the resulting circuit will not
have cells with negative slacks.

( ) Slack Fo Slack
Slack
Delay
OrDelay lg |
.
|
\
|
=
- Fan-out limit: In order to choose the cells that will be part of the substitute
candidates list, a lower limit of 50% on the drive strength of the cell is imposed.
This means that the library cells having drive strength less than 50% of the drive
strength of the original cell in the circuit will not be part of the list. Moreover,
the drive strength upper limit is defined as the drive strength of the original cell.

3.5. Cell sorting

The procedure to determine the possible standard cells that may replace a gate in the
circuit in order to reduce the static power consumption imposes an order in which the
gates of the circuit should be analyzed. This order must be such that a gate may be
analyzed only if all the gates belonging to the paths that arrive at the input of this gate
have been analyzed. This condition is necessary because the criteria used during the
simulations of the situations when a replacement might happen (for example the slew
degradation) are dependent on the fan-in gates of the gate under consideration.
The method implemented for obtaining the desired order is based on the Depth First
Search (DFS) algorithm. The search of the cells starts from the primary outputs of the
circuit toward the primary inputs. The starting node of the ordering method is a cell
connected to a primary output of the circuit. Then, all the cells connected to its input
(cells belonging to the fan-in cone) are visited. For each fan-in cell visited, the previous
step is again executed. This procedure is recursively executed until the search algorithm
visits a cell whose input pins are connected to primary inputs. If this occurs, the cell is
inserted at the end of the sort list.
During the search loop when the cell fan-ins are being visited if a fan-in cell has
already been visited, the next iteration of the loop is executed. Below there is a
description of the pseudo code executed having as starting nodes each primary output of
the circuit.

Fig. 10. Pseudo code of the cell sorting algorithm

3.6. Calculating the Effect of Cell Replacement

Another two parameters that are key points in the algorithm are libSlewDeg and
libDelayDeg. They consist respectively of the output slew and the delay of substitute
cells when a degraded signal is applied to its inputs. This signal degradation is defined
based on the parameters of the fan-in cells. The libSlewDeg is the degraded output slew
of substitute cell when it is applied a slew degraded signal at the input pins. The
libDelayDeg is the delay of the substitute cell, as a degraded signal is applied to its input.
updateFan-inCell (circCell) {
while (there is a fan-in cell of circCell that has not been visited) {
if (the current fan-in cell is not ordered) {
updateFan-inCell (fan-inCell)}}
set circCell ordered;
insert circCell at the end of the ordered list;
}

It should be noticed that in both cases the values are determined considering the same
output load of the circCell concerned.

Fig. 11. Pseudo code of libSlewDeg calculation

Using the above described parameters it is possible to calculate the degradDelay and
degradSlew. These parameters represent the degradation of the delay and slew,
respectively, caused by the supposed replacement of the circuit cell by a substitute
candidate cell that is being analyzed. Thus, the delay degradation is the subtraction
between the circDelayDeg and the original delay of the circuit cell and the slew
degradation is obtained by subtracting the circSlewDeg and the output slew of the
original cell in the circuit.

3.7. Calculating Probable cells list and Replacing Cells

The determination of the probable cells list is done through a series of tests with the
possible candidates list. These tests must ensure that once a new cell is placed in the
circuit, such cell will insert no timing violations in the circuit. The tests are based on
heuristic parameters that were found by exhaustive simulations, in order to achieve the
best power savings.

Fig. 12. Pseudo code for determining the probable cells list

They consist basically of two main criteria that allow the inclusion of the substitute
candidates in this list. The first one ensures that the delay degradation of the substitute
candidate will not exceed the delay budget of the cell in the circuit. The delay budget is
the parameter that determines how much delay degradation is allowed due to the
replacement of the cell in the circuit. The second criterion concerns the slew degradation.
It says that a possible candidate is not allowed in the probable cells list if its slew
degradation is larger than the limSlew of the cell in the circuit.
Once the list of probable cells has been created, a libcell must be selected; the
criterion for choosing the cell among the probable cells to be replaced in the circuit is
simple. The algorithm visits all the cells belonging to the list of probable cells and
loadCircCell = takeLoadCircCell (circCell);
inputSlewDeg = takeCircSlew(circSlew) + limSlew(takeInputCell(circCell);
libSlewDeg (circCell, loadCircCell, inputSlewDeg) {
while (*arcLibCell = iterator.getnext()) {
slew/delayArcVector = getTimingArcSlew/Delay(libCell);
}
}
probablesList(circCell, substituteList) {
orDelay = orDelay(circCel);
for (iterator = substituteList.begin(), iterator != substituteList.end(), iterador++) {
while(*arcLibCell = iterator.getnext()) {
degradDelay = libDelayDeg circDelayDeg;
degradSlew = libSlewDeg circSlewDeg;
if((degradDelay > orDelay) && (degradSlew > limSlew) {
break;
}
}
probablesList.insert();
}
}
chooses the cell that has the lowest leakage power consumption. The information about
the leakage power consumption of the probable cell is found in the standard cells library
file.

All cell replacements are made after analyzing every cell of the circuit. Once the
library cell that will replace each cell in the original circuit has been chosen, a list of pairs
composed by a circuit cell and its corresponding substitute cell is created. When this list
is completed, all cells replacements are performed at once. The static timing analysis
(STA) is performed at the beginning of the process, and in a second moment, at the end
of the algorithm. This is very important for reaching one of the goals of this work which
is to implement a tool that has a run-time compatible with design constraints and which
fits projects with tight time-to-market. This is due to the fact that the execution of STA is
extremely costly during the algorithm execution.
The resulting circuit after the complete execution of the algorithm consists of a netlist
compatible with commercial tools such as Cadence tool. . As this is an approach applied
on gate level in order to reduce the leakage power consumption, by replacing LVT cells
by HVT cells, the algorithm generates a new netlist similar to the original, but with some
replaced cells. Moreover, the changes in the circuit do not affect the placement and
routing of the gates, so the information listed in the constraint files generated during the
design flow will remain unchanged.
In order to illustrate in a detailed manner the proposed method, a step-by-step
explanation will be described using as example the Dual-Vth combinational circuit
illustrated in Fig. 13.

Fig. 13: Hypothetical circuit with HVT and LVT cells for exemplification

1. Sorting the cells of the circuit from primary input to primary output. In the example the
sequence is 1, 2, 3, 4, 5, and 6.
2. Creating the circCellList list. Assuming that only the cells 1, 2 and 3 have slack greater
than the parameter fogSlack=3

Cell 1 2 3 4 5 6
Slack 4 4 4 1 0 2
Slew 0.1 0.1 0.1 0.2 0.2 0.1

3. Iteration through the cells of the circuit (all cells);
Degraded slew
Input slew
HVT
LVT
I
5
III
IV
II
4
1
2
3
6

4. Creating the substitute cell list of cell 1 (cell 1 is in circCellList);

5. Assuming there are cells in the library that may be part of this list, execute the method for
creating the initial limSlew. Take the average of the output slews of substitute cells with the
degraded input (in this case the input slew is the same because the cell is at an input of the
circuit) and the same output load of cell 1;
limSlew1 initial = 0.1

6. Calculating circDelayDeg and circSlewDeg of cell I. With a degraded input signal (it will
be the same as cell 1 is at the input of the circuit) the delay and the output slew for cell 1 are
calculated.
circDelayDeg (inputSlew, outputLoad)= 0.1
circSlewDeg (inputSlew, outputLoad)= 0.1

7. For each cell in the substitute cell list, the libDelayDeg and libSlewDeg are calculated;
Assuming cell 1 and cell 2 can be exchanged
libDelayDeg I (inputSlew, outputLoad) = 0.3
libSlewDeg I (inputSlew, outputLoad) = 0.3
libDelayDeg II (inputSlew, outputLoad) = 0.2
libSlewDeg II (inputSlew, outputLoad) = 0.2

8. Considering the values obtained in step 7, the degradSlew and degradDelay of cell 1 are
calculated;
degradSlew I = 0.3 0.1 = 0.2
degradDelay I = 0.3 0.1 = 0.2
degradSlew II = 0.2 0.1 = 0.1
degradDelay II = 0.2 0.1 = 0.1

9. The orDelay of cell 1 is calculated;
orDelay (cell I) = 0.1

10. Test to create the probable cell list. If delay degradation is smaller than orDelay and
slew degradation is smaller than the limSlew, the substitute cell will be included in probable
cell list;
degradSlew1I > limSlew does not included in the probable list
degradDelay 1 > orDelay does not included in the probable list
degradSlew 2 = limSlew - OK
degradDelay 2 = orDelay OK

11. If probable cell list is not empty, the cell that has the lower consumption of power will be
chosen to replace cell 1;
Cell 2 in the lib was chosen

12. Assuming that there is a substitute for cell 1, the limSlew of cell 1 is updated. It will be
taking in account during the analysis of cell 4. It will be the value of libSlewDeg minus the
original slew of cell 1;
limSlew 1 = libslewDeg initiall slew (cell 1)= 0.2 0.1 = 0.1

13. Analyzing cell 2. Since cell 2 has the same conditions as cell 1, its treatment will be
similar, so cell 2 will also have a substitute and its limSlew will be updated;
After analysis limSlew 2 is updated = 0.2

14. Analyzing cell 3. As one of its inputs is connected to the output of cell 2, the worst slew at
the input should be taken in account;

15. As cell 3 is present in circCellList, its substitute cell list is created;

16. The initial limSlew is created similarly to the cells 1 and 2;
With degradation of 0.2 in the input
Initial limSlew 3 = 0.1
17. Calculating circDelayDeg and circSlewDeg of cell 3. With a degraded input (will be
governed by limSlew generated and updated in cell 2), the degraded output delay and slew of
cell 3 are calculated;
circDelayDeg 3 (inputSlew, outputLoad) = 0.2
circSlewDeg 3(inputSlew, outputLoad) = 0.2

18. For each cell in substitute cell list, the libDelayDeg and libSlewDeg are calculated;
Supposing two substitutes:
libDelayDeg 1 (inputSlew, outputLoad) = 0.5
libSlewDeg 1 (inputSlew, outputLoad) = 0.5
libDelayDeg 2 (inputSlew, outputLoad) = 0.4
libSlewDeg 2 (inputSlew, outputLoad) = 0.4

19. The degradSlew and degradDelay of cell 3 are calculated;
degradSlew 1 = 0.5 0.2 = 0.3
degradDelay 1 = 0.5 0.2 = 0.3
degradSlew 2 = 0.4 0.2 = 0.2
degradDelay 2 = 0.4 0.2 = 0.2

20. The orDelay of cell 3 is calculated;
Assuming that orDelay (cell 3) = 0.1

21. Test to create the probable cell list. If delay degradation is smaller than orDelay and
slew degradation is smaller than the limSlew, the substitute cell will be included in probable
cell list;
degradSlew 1 > limSlew does not included in the probable list
degradSlew 2 > limSlew does not included in the probable list

22. Assuming that there are no substitute cells able to pass the tests.

23. The limSlew will be updated. However, this parameter is governed by the circSlewDeg
minus the original output slew of cell 3 (worst arc);
limSlew 3 = circSlewDeg original output slew of cell 3 = 0.2 0.1 = 0.1

24. Analyzing cell 4. As this is a HVT cell, automatically its treatment will be different. So the
algorithm only will be concerned with the propagation of possible delay degradations of its
fan-in cone.

25. Calculating circDelayDeg and circSlewDeg of cell 4. With a degraded input ( it will be
governed by limSlew generated and updated in cell 2), the degraded output delay and slew of
cell 4 are calculated;
circSlewDelay 4 (inputSlew, outputLoad) = 0.2

26. The limSlew will be updated. However, this parameter is governed by the circSlewDeg
minus the original output slew of cell 4 (worst arc);

27. Analyzing cell 5. At the input of the cell there are the output degradation of cell
I1(governed by its substitute) and cell 3 (governed by the propagation of its input. This cell
will not be replaced);

28. Choosing the worst degradation between the inputs of cell 5 for the calculation of its
limSlew.
Worst Degradation generated by cell 2 = 0.2

29. As this is a HVT cell, it will not be replaced and its treatment will be similar to the cell 4;
circSlewDeg 5 (inpuSlew, outputLoad) = 0.3
limSlew = circSlewDeg original output slew of cell 5 = 0.3 0.2 = 0.1

30. Analyzing cell 6. Even being a LVT cell, it is not part of cirCellList list. Its treatment will
be different of the treatments done in the LVT circuit cells. A similar treatment to the one
given to HVT cells will be done.

31. With the worst limSlew between cells 3 and 4, the degradation in its input is calculated in
order to calculate the circSlewDeg;
Worst Degradation generated by cell 2 or cell 4 = 0.1

32. With circSlewDeg, its limSlew is created being governed by the subtraction between
circSlewDeg and the original output slew of cell4.
circSlewDeg 4 (inputSlew, outputLoad) = 0.2

Once all the steps mentioned above are executed, the new circuit would have the cells 1
and 2 replaced by theirs corresponding substitutes. So the reduction in power could be
supported even for a circuit synthesized with two threshold voltages.
The circuit generated by the proposed algorithm is shown in Fig. 14 below.

Fig. 14: Circuit resulting after the algorithms steps.

Degraded slew
Input slew
HVT
LVT
I
5
III
IV
II
4
1
2
3
6
4. EXPERIMENTAL RESULTS

In order to optimize the static power consumption, the proposed algorithm has been
applied to several combinational circuits belonging to the ISCAS85 benchmark. The
proposed algorithm was implemented in C/C++, using the OpenAccess infrastructure
[Guiney et al. 2006]. The synthesized circuits of this benchmark have varied size from 4
cells to 2327 cells, using a 90 nm TSMC standard cells library. The multi-V
th
designs
were created by the use of the tcbn90lphphvttc and tcbn90lphphvttc libraries, which
characterizes the HVT and LVT cells respectively.
In order to execute the algorithm with all the benchmark circuits and extract the
results as consistent as possible with a real situation within an integrated circuit design
flow, each circuit has a value related to time constraints imposed. The timing restriction
of each circuit was determined according to the cycle time of the most critical path, so the
input circuit has no cell with negative slack. With an accuracy of 0.1 ns, the most
conservative timing constraints were determined.
The results were satisfactory with an average static power reduction of 26.88% of the
total static power of the original circuit. The maximum static power saved reached
39.28% as illustrated in Fig. 15. Except for circuit c17, which was discarded due to its
very small size, all circuits presented a reduction in their power consumption. Another
important point is the fact that the goal of developing a tool that perfectly fits the
conventional design flow was achieved.
Since the main objective is the reduction in static power, the heuristics parameters
were set up so that it would provide the maximum reduction in static power. These
settings have been chosen in a way that no timing penalties are inserted in the circuit. For
example, the folgSlack parameter, which guarantees that the slews and delays
propagation will not result in timing violations, must be chosen as small as possible.
However, the optimization process must not cause any negative slack cells. Table I shows
the results for the ISCAS85 benchmark circuits.

Table I: Experimental Results
Circuit
Total
cells
# of replaced
cells (%)
folgSlack
(ps)
Cycle time
(ps)
Leakage
reduction (%)
C499 146 11.6% 0 1.9 5.00%
C1355 146 24.7% 9 1.9 11.81%
C880 167 35.3% 28 1.3 39.28%
C2670 287 30.7% 33 2.0 24.50%
C3540 522 32.0% 84 3.1 31.38%
C5315 616 33.1% 17 2.8 29.10%
C7552 849 39.8% 46 4.9 31.87%
C6288 2357 20.6% 155 5.6 15.23%
Average 29.5% 21.3%

The results presented in table I show that, as expected, the circuits with larger number
of gates have a higher cycle time due to the increased number of parasitic
interconnections in the circuit. Thus, their performance decreases with increasing size.
With respect to the folgSlack parameter, we may notice a tendency to its growth with the
growth of the circuits. As outlined before, this parameter works in order to compensate
possible timing violations during the signals propagation because of the degradations
caused when a cell is replaced. As the circuits grow, the value of this parameter also
tends to grow because it must prevent the circuit from having timing violations.

Fig. 15. Static power savings

Fig. 16 shows the total number of cells replaced by the algorithm in each circuit. The
amount of cells replaced is proportional to the amount of cells in the circuit. This shows
that the influence in the timing of replacing a cell in a small circuit is much more
significant than the influence on a circuit with a large number of cells.

Fig. 16. Total number of cells replaced

4. CONCLUSIONS AND FUTURE WORKS

A Dual-V
th
algorithm for reducing static power consumption with a viable run-time for
commercial project of integrated circuits was presented in this paper. The static power
savings reached up to 39.28% of the total static power consumption of the original
algorithm, considering the ISCAS85 benchmark circuits.
As future work, integration with a dynamic optimization approach intended for low
power constraints that has been developed in parallel to this work will be done.

0
5
10
15
20
25
30
35
40
45
C499 C1355 C880 C2670 C3540 C5315 C7552 C6288
L
e
a
k
a
g
e

r
e
d
u
c
t
i
o
n

(
%
)
ISCAS85 circuits
Static Power Reduction
fpower
0
100
200
300
400
500
C499 C1355 C880 C2670 C3540 C5315 C7552 C6288
N
u
m
b
e
r

o
f

c
e
l
l
s
ISCAS85 circuits
Number of changed cells
fpower
Comment [msb3]: Abel, acho que faltou colocar
resultados do tempo de execuo. Se no tiver
resultados para comparar com outros trabalhos eu
colocaria os tempos de execuo do algoritmo. Voce
teria estes valores???
REFERENCES

BRGLEZ, F. AND FUJIWARA H. 1985. A neural netlist of 10 combinational benchmark circuit and a target
transistor in fortran. In International Symposium on Circuits and Systems, 663-398.
CHI, J. C., LEE, H. H., TSAI, S. H., AND CHI, M. C. 2007. Gate level multiple supply voltage assignment algorithm
for power optimization under timing constraint. In IEEE Transaction on Very Large Scale Integration
(VLSI) Systems. 15, 6, 637-648.
ELAKKUMANAN, P., THYAGARAJAN, K., PRASAD, K., SRIDHAR, R. 2005. Optimal Vth assignment and buffer
insertion for simultaneous leakage and glitch minimization though integer linear programming (ILP). In
48th Midwest Symposium on Circuit and Systems, 2, 1880-1883.
GUINEY, M. AND LEAVITT, E. 2006. An introduction to OpenAccess: an open source data model and API for IC
design. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, 434-436.
GUPTA, S., SINGH, J., AND ROY, A. 2008. A novel cell-based heuristic method for leakage reduction in multi-
million gate VLSI sesigns. In Proceedings of the 9th international Symposium on Quality Electronic
Design, 526-530.
INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS (ITRS). 2008. [ONLINE]. Available:
http://public.itrs.net
JAFFARI, J., AFZALI-KUSHA, A. 2004. New dual-threshold voltage assignment technique for low-power digital
circuits. In Proceedings of the 16th International Conference on Microelectronics (ICM), pp. 413-416
KEATING, M., FLYNN, D., AITKEN, R., GIBBONS, A., SHI K. 2007. Low Power Methodology Manual for System-
on-Chip Design. Springer Publications.
KETKAR, M. AND SAPATNEKAR, S. S. 2002. Standby power optimization via transistor sizing and dual threshold
voltage assignment. In Proceedings of the 2002 IEEE/ACM international Conference on Computer-Aided
Design (ICCAD '02). 375-378.
KIM, N. S., AUSTIN, T., BLAAUW, D., MUDGE, T., FLAUTNER, K., HU, J. S., IRWIN, M. J., KANDEMIR, M., AND
NARAYANAN, V. 2003. Leakage current: Moore's law meets static power. In IEEE Computer Society, 36,
12, 68-75.
LEE, D., BLAAUW, D. 2003. Static leakage reduction through simultaneous threshold voltage and state
assignment. In Proceedings of the Design Automation Conference (DAC'03), pp. 191-194.
LEE, D., BLAAUW, D. AND SYLVESTER, D. 2004. Gate Oxide Leakage Current Analysis and Reduction for
VLSI Circuits. In: IEEE Transactions on VLSI Systems, vol. 12, no. 2, pp. 155-166.
PEDRAM, M., AND RABAEY, J.M. 2002. Power Aware Design Methodology, Kluwer Academic Pub.
SILVA-FILHO, A.G., CORDEIRO, F.R., SANTANNA, R.E., AND LIMA, M.E. 2006. Heuristic for Two-Level Cache
Hierarchy Exploration Considering Energy Consumption and Performance, In: Int. Circuit and System
Design, Power and Timing Modeling, Optimiz. and Simulation (PATMOS), pp. 75-83.
SILVA-FILHO, A.G., AND LIMA, S.M.L. 2008. Energy Consumption Reduction Mechanism by Tuning Cache
Configuration using NIOS II Processor, In: IEEE Internacional SOC Conference (SOCC), pp. 291-294.
WEI, L., CHEN, Z., ROY, K., JOHNSON, M. C., YE, Y., AND DE, V. K. 1999. Design and optimization of dual-
threshold circuits for low-voltage low-power applications. In IEEE Transaction on. Very Large Scale
Integration (VLSI) Systems, 7, 1, 16-24.
WEI, L., CHEN, Z., ROY, K., YE, Y., AND DE, V. 1999. Mixed-Vth (MVT) CMOS circuit design methodology for
low power applications. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference,
430-435

Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4

Uploaded by

Copyright:

Available Formats

Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4

Uploaded by

Copyright:

Available Formats

Optimization Mechanism for Static Power

Reduction Using Dual-V

You might also like