Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4
Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4
Artigo Rodolfo v4-Rev-COMPILED Cca2 Aams Agsf Ensb-V4
\
|
=
- Fan-out limit: In order to choose the cells that will be part of the substitute
candidates list, a lower limit of 50% on the drive strength of the cell is imposed.
This means that the library cells having drive strength less than 50% of the drive
strength of the original cell in the circuit will not be part of the list. Moreover,
the drive strength upper limit is defined as the drive strength of the original cell.
3.5. Cell sorting
The procedure to determine the possible standard cells that may replace a gate in the
circuit in order to reduce the static power consumption imposes an order in which the
gates of the circuit should be analyzed. This order must be such that a gate may be
analyzed only if all the gates belonging to the paths that arrive at the input of this gate
have been analyzed. This condition is necessary because the criteria used during the
simulations of the situations when a replacement might happen (for example the slew
degradation) are dependent on the fan-in gates of the gate under consideration.
The method implemented for obtaining the desired order is based on the Depth First
Search (DFS) algorithm. The search of the cells starts from the primary outputs of the
circuit toward the primary inputs. The starting node of the ordering method is a cell
connected to a primary output of the circuit. Then, all the cells connected to its input
(cells belonging to the fan-in cone) are visited. For each fan-in cell visited, the previous
step is again executed. This procedure is recursively executed until the search algorithm
visits a cell whose input pins are connected to primary inputs. If this occurs, the cell is
inserted at the end of the sort list.
During the search loop when the cell fan-ins are being visited if a fan-in cell has
already been visited, the next iteration of the loop is executed. Below there is a
description of the pseudo code executed having as starting nodes each primary output of
the circuit.
Fig. 10. Pseudo code of the cell sorting algorithm
3.6. Calculating the Effect of Cell Replacement
Another two parameters that are key points in the algorithm are libSlewDeg and
libDelayDeg. They consist respectively of the output slew and the delay of substitute
cells when a degraded signal is applied to its inputs. This signal degradation is defined
based on the parameters of the fan-in cells. The libSlewDeg is the degraded output slew
of substitute cell when it is applied a slew degraded signal at the input pins. The
libDelayDeg is the delay of the substitute cell, as a degraded signal is applied to its input.
updateFan-inCell (circCell) {
while (there is a fan-in cell of circCell that has not been visited) {
if (the current fan-in cell is not ordered) {
updateFan-inCell (fan-inCell)}}
set circCell ordered;
insert circCell at the end of the ordered list;
}
It should be noticed that in both cases the values are determined considering the same
output load of the circCell concerned.
Fig. 11. Pseudo code of libSlewDeg calculation
Using the above described parameters it is possible to calculate the degradDelay and
degradSlew. These parameters represent the degradation of the delay and slew,
respectively, caused by the supposed replacement of the circuit cell by a substitute
candidate cell that is being analyzed. Thus, the delay degradation is the subtraction
between the circDelayDeg and the original delay of the circuit cell and the slew
degradation is obtained by subtracting the circSlewDeg and the output slew of the
original cell in the circuit.
3.7. Calculating Probable cells list and Replacing Cells
The determination of the probable cells list is done through a series of tests with the
possible candidates list. These tests must ensure that once a new cell is placed in the
circuit, such cell will insert no timing violations in the circuit. The tests are based on
heuristic parameters that were found by exhaustive simulations, in order to achieve the
best power savings.
Fig. 12. Pseudo code for determining the probable cells list
They consist basically of two main criteria that allow the inclusion of the substitute
candidates in this list. The first one ensures that the delay degradation of the substitute
candidate will not exceed the delay budget of the cell in the circuit. The delay budget is
the parameter that determines how much delay degradation is allowed due to the
replacement of the cell in the circuit. The second criterion concerns the slew degradation.
It says that a possible candidate is not allowed in the probable cells list if its slew
degradation is larger than the limSlew of the cell in the circuit.
Once the list of probable cells has been created, a libcell must be selected; the
criterion for choosing the cell among the probable cells to be replaced in the circuit is
simple. The algorithm visits all the cells belonging to the list of probable cells and
loadCircCell = takeLoadCircCell (circCell);
inputSlewDeg = takeCircSlew(circSlew) + limSlew(takeInputCell(circCell);
libSlewDeg (circCell, loadCircCell, inputSlewDeg) {
while (*arcLibCell = iterator.getnext()) {
slew/delayArcVector = getTimingArcSlew/Delay(libCell);
}
}
probablesList(circCell, substituteList) {
orDelay = orDelay(circCel);
for (iterator = substituteList.begin(), iterator != substituteList.end(), iterador++) {
while(*arcLibCell = iterator.getnext()) {
degradDelay = libDelayDeg circDelayDeg;
degradSlew = libSlewDeg circSlewDeg;
if((degradDelay > orDelay) && (degradSlew > limSlew) {
break;
}
}
probablesList.insert();
}
}
chooses the cell that has the lowest leakage power consumption. The information about
the leakage power consumption of the probable cell is found in the standard cells library
file.
All cell replacements are made after analyzing every cell of the circuit. Once the
library cell that will replace each cell in the original circuit has been chosen, a list of pairs
composed by a circuit cell and its corresponding substitute cell is created. When this list
is completed, all cells replacements are performed at once. The static timing analysis
(STA) is performed at the beginning of the process, and in a second moment, at the end
of the algorithm. This is very important for reaching one of the goals of this work which
is to implement a tool that has a run-time compatible with design constraints and which
fits projects with tight time-to-market. This is due to the fact that the execution of STA is
extremely costly during the algorithm execution.
The resulting circuit after the complete execution of the algorithm consists of a netlist
compatible with commercial tools such as Cadence tool. . As this is an approach applied
on gate level in order to reduce the leakage power consumption, by replacing LVT cells
by HVT cells, the algorithm generates a new netlist similar to the original, but with some
replaced cells. Moreover, the changes in the circuit do not affect the placement and
routing of the gates, so the information listed in the constraint files generated during the
design flow will remain unchanged.
In order to illustrate in a detailed manner the proposed method, a step-by-step
explanation will be described using as example the Dual-Vth combinational circuit
illustrated in Fig. 13.
Fig. 13: Hypothetical circuit with HVT and LVT cells for exemplification
1. Sorting the cells of the circuit from primary input to primary output. In the example the
sequence is 1, 2, 3, 4, 5, and 6.
2. Creating the circCellList list. Assuming that only the cells 1, 2 and 3 have slack greater
than the parameter fogSlack=3
Cell 1 2 3 4 5 6
Slack 4 4 4 1 0 2
Slew 0.1 0.1 0.1 0.2 0.2 0.1
3. Iteration through the cells of the circuit (all cells);
Degraded slew
Input slew
HVT
LVT
I
5
III
IV
II
4
1
2
3
6
4. Creating the substitute cell list of cell 1 (cell 1 is in circCellList);
5. Assuming there are cells in the library that may be part of this list, execute the method for
creating the initial limSlew. Take the average of the output slews of substitute cells with the
degraded input (in this case the input slew is the same because the cell is at an input of the
circuit) and the same output load of cell 1;
limSlew1 initial = 0.1
6. Calculating circDelayDeg and circSlewDeg of cell I. With a degraded input signal (it will
be the same as cell 1 is at the input of the circuit) the delay and the output slew for cell 1 are
calculated.
circDelayDeg (inputSlew, outputLoad)= 0.1
circSlewDeg (inputSlew, outputLoad)= 0.1
7. For each cell in the substitute cell list, the libDelayDeg and libSlewDeg are calculated;
Assuming cell 1 and cell 2 can be exchanged
libDelayDeg I (inputSlew, outputLoad) = 0.3
libSlewDeg I (inputSlew, outputLoad) = 0.3
libDelayDeg II (inputSlew, outputLoad) = 0.2
libSlewDeg II (inputSlew, outputLoad) = 0.2
8. Considering the values obtained in step 7, the degradSlew and degradDelay of cell 1 are
calculated;
degradSlew I = 0.3 0.1 = 0.2
degradDelay I = 0.3 0.1 = 0.2
degradSlew II = 0.2 0.1 = 0.1
degradDelay II = 0.2 0.1 = 0.1
9. The orDelay of cell 1 is calculated;
orDelay (cell I) = 0.1
10. Test to create the probable cell list. If delay degradation is smaller than orDelay and
slew degradation is smaller than the limSlew, the substitute cell will be included in probable
cell list;
degradSlew1I > limSlew does not included in the probable list
degradDelay 1 > orDelay does not included in the probable list
degradSlew 2 = limSlew - OK
degradDelay 2 = orDelay OK
11. If probable cell list is not empty, the cell that has the lower consumption of power will be
chosen to replace cell 1;
Cell 2 in the lib was chosen
12. Assuming that there is a substitute for cell 1, the limSlew of cell 1 is updated. It will be
taking in account during the analysis of cell 4. It will be the value of libSlewDeg minus the
original slew of cell 1;
limSlew 1 = libslewDeg initiall slew (cell 1)= 0.2 0.1 = 0.1
13. Analyzing cell 2. Since cell 2 has the same conditions as cell 1, its treatment will be
similar, so cell 2 will also have a substitute and its limSlew will be updated;
After analysis limSlew 2 is updated = 0.2
14. Analyzing cell 3. As one of its inputs is connected to the output of cell 2, the worst slew at
the input should be taken in account;
15. As cell 3 is present in circCellList, its substitute cell list is created;
16. The initial limSlew is created similarly to the cells 1 and 2;
With degradation of 0.2 in the input
Initial limSlew 3 = 0.1
17. Calculating circDelayDeg and circSlewDeg of cell 3. With a degraded input (will be
governed by limSlew generated and updated in cell 2), the degraded output delay and slew of
cell 3 are calculated;
circDelayDeg 3 (inputSlew, outputLoad) = 0.2
circSlewDeg 3(inputSlew, outputLoad) = 0.2
18. For each cell in substitute cell list, the libDelayDeg and libSlewDeg are calculated;
Supposing two substitutes:
libDelayDeg 1 (inputSlew, outputLoad) = 0.5
libSlewDeg 1 (inputSlew, outputLoad) = 0.5
libDelayDeg 2 (inputSlew, outputLoad) = 0.4
libSlewDeg 2 (inputSlew, outputLoad) = 0.4
19. The degradSlew and degradDelay of cell 3 are calculated;
degradSlew 1 = 0.5 0.2 = 0.3
degradDelay 1 = 0.5 0.2 = 0.3
degradSlew 2 = 0.4 0.2 = 0.2
degradDelay 2 = 0.4 0.2 = 0.2
20. The orDelay of cell 3 is calculated;
Assuming that orDelay (cell 3) = 0.1
21. Test to create the probable cell list. If delay degradation is smaller than orDelay and
slew degradation is smaller than the limSlew, the substitute cell will be included in probable
cell list;
degradSlew 1 > limSlew does not included in the probable list
degradDelay 1 > orDelay does not included in the probable list
degradSlew 2 > limSlew does not included in the probable list
degradDelay 2 > orDelay does not included in the probable list
22. Assuming that there are no substitute cells able to pass the tests.
23. The limSlew will be updated. However, this parameter is governed by the circSlewDeg
minus the original output slew of cell 3 (worst arc);
limSlew 3 = circSlewDeg original output slew of cell 3 = 0.2 0.1 = 0.1
24. Analyzing cell 4. As this is a HVT cell, automatically its treatment will be different. So the
algorithm only will be concerned with the propagation of possible delay degradations of its
fan-in cone.
25. Calculating circDelayDeg and circSlewDeg of cell 4. With a degraded input ( it will be
governed by limSlew generated and updated in cell 2), the degraded output delay and slew of
cell 4 are calculated;
circSlewDelay 4 (inputSlew, outputLoad) = 0.2
26. The limSlew will be updated. However, this parameter is governed by the circSlewDeg
minus the original output slew of cell 4 (worst arc);
limSlew 4 = circSlewDeg original output slew of cell 4 = 0.2 0.1 = 0.1
27. Analyzing cell 5. At the input of the cell there are the output degradation of cell
I1(governed by its substitute) and cell 3 (governed by the propagation of its input. This cell
will not be replaced);
28. Choosing the worst degradation between the inputs of cell 5 for the calculation of its
limSlew.
Worst Degradation generated by cell 2 = 0.2
29. As this is a HVT cell, it will not be replaced and its treatment will be similar to the cell 4;
circSlewDeg 5 (inpuSlew, outputLoad) = 0.3
limSlew = circSlewDeg original output slew of cell 5 = 0.3 0.2 = 0.1
30. Analyzing cell 6. Even being a LVT cell, it is not part of cirCellList list. Its treatment will
be different of the treatments done in the LVT circuit cells. A similar treatment to the one
given to HVT cells will be done.
31. With the worst limSlew between cells 3 and 4, the degradation in its input is calculated in
order to calculate the circSlewDeg;
Worst Degradation generated by cell 2 or cell 4 = 0.1
32. With circSlewDeg, its limSlew is created being governed by the subtraction between
circSlewDeg and the original output slew of cell4.
circSlewDeg 4 (inputSlew, outputLoad) = 0.2
limSlew 4 = circSlewDeg original output slew of cell 4 = 0.2 0.1 = 0.1
Once all the steps mentioned above are executed, the new circuit would have the cells 1
and 2 replaced by theirs corresponding substitutes. So the reduction in power could be
supported even for a circuit synthesized with two threshold voltages.
The circuit generated by the proposed algorithm is shown in Fig. 14 below.
Fig. 14: Circuit resulting after the algorithms steps.
Degraded slew
Input slew
HVT
LVT
I
5
III
IV
II
4
1
2
3
6
4. EXPERIMENTAL RESULTS
In order to optimize the static power consumption, the proposed algorithm has been
applied to several combinational circuits belonging to the ISCAS85 benchmark. The
proposed algorithm was implemented in C/C++, using the OpenAccess infrastructure
[Guiney et al. 2006]. The synthesized circuits of this benchmark have varied size from 4
cells to 2327 cells, using a 90 nm TSMC standard cells library. The multi-V
th
designs
were created by the use of the tcbn90lphphvttc and tcbn90lphphvttc libraries, which
characterizes the HVT and LVT cells respectively.
In order to execute the algorithm with all the benchmark circuits and extract the
results as consistent as possible with a real situation within an integrated circuit design
flow, each circuit has a value related to time constraints imposed. The timing restriction
of each circuit was determined according to the cycle time of the most critical path, so the
input circuit has no cell with negative slack. With an accuracy of 0.1 ns, the most
conservative timing constraints were determined.
The results were satisfactory with an average static power reduction of 26.88% of the
total static power of the original circuit. The maximum static power saved reached
39.28% as illustrated in Fig. 15. Except for circuit c17, which was discarded due to its
very small size, all circuits presented a reduction in their power consumption. Another
important point is the fact that the goal of developing a tool that perfectly fits the
conventional design flow was achieved.
Since the main objective is the reduction in static power, the heuristics parameters
were set up so that it would provide the maximum reduction in static power. These
settings have been chosen in a way that no timing penalties are inserted in the circuit. For
example, the folgSlack parameter, which guarantees that the slews and delays
propagation will not result in timing violations, must be chosen as small as possible.
However, the optimization process must not cause any negative slack cells. Table I shows
the results for the ISCAS85 benchmark circuits.
Table I: Experimental Results
Circuit
Total
cells
# of replaced
cells (%)
folgSlack
(ps)
Cycle time
(ps)
Leakage
reduction (%)
C499 146 11.6% 0 1.9 5.00%
C1355 146 24.7% 9 1.9 11.81%
C880 167 35.3% 28 1.3 39.28%
C2670 287 30.7% 33 2.0 24.50%
C3540 522 32.0% 84 3.1 31.38%
C5315 616 33.1% 17 2.8 29.10%
C7552 849 39.8% 46 4.9 31.87%
C6288 2357 20.6% 155 5.6 15.23%
Average 29.5% 21.3%
The results presented in table I show that, as expected, the circuits with larger number
of gates have a higher cycle time due to the increased number of parasitic
interconnections in the circuit. Thus, their performance decreases with increasing size.
With respect to the folgSlack parameter, we may notice a tendency to its growth with the
growth of the circuits. As outlined before, this parameter works in order to compensate
possible timing violations during the signals propagation because of the degradations
caused when a cell is replaced. As the circuits grow, the value of this parameter also
tends to grow because it must prevent the circuit from having timing violations.
Fig. 15. Static power savings
Fig. 16 shows the total number of cells replaced by the algorithm in each circuit. The
amount of cells replaced is proportional to the amount of cells in the circuit. This shows
that the influence in the timing of replacing a cell in a small circuit is much more
significant than the influence on a circuit with a large number of cells.
Fig. 16. Total number of cells replaced
4. CONCLUSIONS AND FUTURE WORKS
A Dual-V
th
algorithm for reducing static power consumption with a viable run-time for
commercial project of integrated circuits was presented in this paper. The static power
savings reached up to 39.28% of the total static power consumption of the original
algorithm, considering the ISCAS85 benchmark circuits.
As future work, integration with a dynamic optimization approach intended for low
power constraints that has been developed in parallel to this work will be done.
0
5
10
15
20
25
30
35
40
45
C499 C1355 C880 C2670 C3540 C5315 C7552 C6288
L
e
a
k
a
g
e
r
e
d
u
c
t
i
o
n
(
%
)
ISCAS85 circuits
Static Power Reduction
fpower
0
100
200
300
400
500
C499 C1355 C880 C2670 C3540 C5315 C7552 C6288
N
u
m
b
e
r
o
f
c
e
l
l
s
ISCAS85 circuits
Number of changed cells
fpower
Comment [msb3]: Abel, acho que faltou colocar
resultados do tempo de execuo. Se no tiver
resultados para comparar com outros trabalhos eu
colocaria os tempos de execuo do algoritmo. Voce
teria estes valores???
REFERENCES
BRGLEZ, F. AND FUJIWARA H. 1985. A neural netlist of 10 combinational benchmark circuit and a target
transistor in fortran. In International Symposium on Circuits and Systems, 663-398.
CHI, J. C., LEE, H. H., TSAI, S. H., AND CHI, M. C. 2007. Gate level multiple supply voltage assignment algorithm
for power optimization under timing constraint. In IEEE Transaction on Very Large Scale Integration
(VLSI) Systems. 15, 6, 637-648.
ELAKKUMANAN, P., THYAGARAJAN, K., PRASAD, K., SRIDHAR, R. 2005. Optimal Vth assignment and buffer
insertion for simultaneous leakage and glitch minimization though integer linear programming (ILP). In
48th Midwest Symposium on Circuit and Systems, 2, 1880-1883.
GUINEY, M. AND LEAVITT, E. 2006. An introduction to OpenAccess: an open source data model and API for IC
design. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, 434-436.
GUPTA, S., SINGH, J., AND ROY, A. 2008. A novel cell-based heuristic method for leakage reduction in multi-
million gate VLSI sesigns. In Proceedings of the 9th international Symposium on Quality Electronic
Design, 526-530.
INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS (ITRS). 2008. [ONLINE]. Available:
http://public.itrs.net
JAFFARI, J., AFZALI-KUSHA, A. 2004. New dual-threshold voltage assignment technique for low-power digital
circuits. In Proceedings of the 16th International Conference on Microelectronics (ICM), pp. 413-416
KEATING, M., FLYNN, D., AITKEN, R., GIBBONS, A., SHI K. 2007. Low Power Methodology Manual for System-
on-Chip Design. Springer Publications.
KETKAR, M. AND SAPATNEKAR, S. S. 2002. Standby power optimization via transistor sizing and dual threshold
voltage assignment. In Proceedings of the 2002 IEEE/ACM international Conference on Computer-Aided
Design (ICCAD '02). 375-378.
KIM, N. S., AUSTIN, T., BLAAUW, D., MUDGE, T., FLAUTNER, K., HU, J. S., IRWIN, M. J., KANDEMIR, M., AND
NARAYANAN, V. 2003. Leakage current: Moore's law meets static power. In IEEE Computer Society, 36,
12, 68-75.
LEE, D., BLAAUW, D. 2003. Static leakage reduction through simultaneous threshold voltage and state
assignment. In Proceedings of the Design Automation Conference (DAC'03), pp. 191-194.
LEE, D., BLAAUW, D. AND SYLVESTER, D. 2004. Gate Oxide Leakage Current Analysis and Reduction for
VLSI Circuits. In: IEEE Transactions on VLSI Systems, vol. 12, no. 2, pp. 155-166.
PEDRAM, M., AND RABAEY, J.M. 2002. Power Aware Design Methodology, Kluwer Academic Pub.
SILVA-FILHO, A.G., CORDEIRO, F.R., SANTANNA, R.E., AND LIMA, M.E. 2006. Heuristic for Two-Level Cache
Hierarchy Exploration Considering Energy Consumption and Performance, In: Int. Circuit and System
Design, Power and Timing Modeling, Optimiz. and Simulation (PATMOS), pp. 75-83.
SILVA-FILHO, A.G., AND LIMA, S.M.L. 2008. Energy Consumption Reduction Mechanism by Tuning Cache
Configuration using NIOS II Processor, In: IEEE Internacional SOC Conference (SOCC), pp. 291-294.
WEI, L., CHEN, Z., ROY, K., JOHNSON, M. C., YE, Y., AND DE, V. K. 1999. Design and optimization of dual-
threshold circuits for low-voltage low-power applications. In IEEE Transaction on. Very Large Scale
Integration (VLSI) Systems, 7, 1, 16-24.
WEI, L., CHEN, Z., ROY, K., YE, Y., AND DE, V. 1999. Mixed-Vth (MVT) CMOS circuit design methodology for
low power applications. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference,
430-435