Low Power CAD: Trends and Challenges
Low Power CAD: Trends and Challenges
Low Power CAD: Trends and Challenges
Massoud Pedram
Department of EE - Systems
University of Southern California
Los Angeles, CA 90089
Email: pedram@pollux.usc.edu
Nikkei Microdevices
Nikkei Business Publications Inc.
October 1994
Abstract
Essential elements of a low power design environment include means of analyzing the dissipation of a proposed or an existing design, mechanisms for minimizing the power consumption
when needed and techniques to explore the impact of design trade-o s on the power consumption, area and performance of a design. This paper describes the state of the art of
CAD tools and methodologies as well as references to nd additional more in-depth technical
information in speci c elds and highlights the research areas.
1 Introduction
Low power, yet high-throughput and computationally intensive, circuits are becoming a
critical application domain. One driving factor behind this trend is the growing class of
personal computing devices (digital pens, portable desktops, audio- and video-based multimedia products) as well as wireless communications and imaging systems (personal digital
assistants, personal communicators, smart cards) which demand high-speed computations,
complex functionalities and often real-time processing capabilities with low power consumption. Another crucial driving factor is that excessive power consumption is becoming the
limiting factor in integrating more transistors on a single chip or on a multiple-chip module.
Unless power consumption is dramatically reduced, the resulting heat will limit the feasible
packing and performance of VLSI circuits and systems. Indeed, circuits synthesized for low
power are also less susceptible to run time failures.
In rising to the challenge to reduce power the semiconductor industry has adopted a
multifaceted approach, attacking the problem on three fronts:
Reducing chip and packaging capacitance through process scaling and advanced interconnect substrates such as MCM. This approach can be e ective but is very expensive.
Supply voltage scaling. There are some immediate advantages to this approach but
they are not generally scalable without incurring the cost of new IC fab processing.
Furthermore, supply scaling may be contrary to the industry \pull" for voltage standards and may run into fundamental problems when issues such as signal-to-noise are
considered.
1
Level
Algorithmic
System
Example Techniques
Correct data representation and choice of algorithms
Power mode management
Application of various energy recovery techniques
Use of optimum supply voltage and level converters for modules
Task partitioning between hardware modules and programmable processors
Architectural
Concurrency increasing transformations
Pipelining, scheduling, module and register assignment
Design of application-speci c instruction sets
Use of on-chip cache, Sub 1-volt swing bus architectures
Logic
Power-sensible retiming and state assignment
Two- and multi-level logic optimization targeting low power dissipation
Path balancing, technology mapping and pin assignment for low power
Physical
Low power physical partitioning, oorplanning, placement and routing
Transistor and/or wire sizing, library design
Table 1: Examples of power reduction mechanisms at various levels of abstraction
inside gates during technology mapping Gate resizing, signal-to-pin assignment and I/O
encoding can further reduce the power consumption. At the physical design level, power
may be reduced by appropriate netlist partitioning, placement, global routing, wire sizing
and clock tree generation (see Table 1).
The design for low power problem cannot be achieved without good power prediction
and optimization tools (see Figure 1). The remainder of this paper describes the CAD tools
and methodologies required to e ect e cient design for low power during the logic synthesis
and layout optimization. As low power design is a relatively new eld, the paper is targeted
at a wide audience to achieve the following:
Convey an understanding of the breadth and depth of the problem.
Explain the state of the art in CAD tools and methodologies.
Describe the needs and challenges for new CAD tools to support low power design.
3
Hardware Description
System-Level
and
Behavioral
Synthesis
Behavioral
Power Prediction
and/or
Simulation
Logic Synthesis
and
Physical Design
Logic-Level
Power Prediction
and/or
Simulation
Final Layout
negligible values. With reduced power supply and device threshold voltages, the subthreshold
current will however become more pronounced. In addition, at short channel lengths, the
subthreshold current also becomes exponentially dependent on drain voltage instead of being
independent of VDS (see 15] for a recent analysis). The subthreshold current will remain
102 - 105 times smaller than the \on current" even at submicron device sizes.
The short-circuit power consumption for an inverter gate is proportional to the gain of
the inverter, the cubic power of supply voltage minus device threshold, the input rise/fall
time, and the operating frequency 41]. The maximum short circuit current ows when there
is no load this current decreases with the load. If gate sizes are selected so that the input
and output rise/fall times are about equal, the short-circuit power consumption will be less
than 15% of the dynamic power consumption. If, however, design for high performance is
taken to the extreme where large gates are used to drive relatively small loads, then there
will be a sti penalty in terms of short-circuit power consumption.
It is widely accepted that the short-circuit and subthreshold currents in CMOS circuits
can be made small with proper circuit and device design techniques. The dominant source of
power dissipation is thus the charging and discharging of the node capacitances (also referred
to as the dynamic power dissipation) and is given by:
2
Cg Eg (sw)
P = 2TVdd
cycle g
where Vdd is the supply voltage, Tcycle is the clock cycle time, Cg is the capacitance seen by
gate g and Eg (sw) is the expected number of transitions at the output of g per clock cycle.
Calculation of E (sw) is di cult as it depends on (1) the input patterns and the sequence
in which they are applied, (2) the delay model used and (3) the circuit structure.
Switching activity at the output of a gate depends on not only the switching activities
at the inputs and the logic function of the gate, but also on the spatial and temporal
dependencies among the gate inputs. For example, consider a two-input and gate g with
independent inputs i AND j whose signal probabilities are 1/2, then Eg (sw) = 3=8. Now
suppose it is known that only patterns 00 and 11 can be applied to the gate inputs and that
both patterns are equally likely, then Eg (sw) = 1=2. Alternatively, assume that it is known
that every 0 applied to input i is immediately followed by a 1 while every 1 applied to input
j is immediately followed by a 0, then Eg (sw) = 4=9. The rst case is an example of spatial
correlations between gate inputs while the second case illustrates temporal correlations on
gate inputs.
Based on the delay model used, the power estimation techniques could account for steadystate transitions (which consume power, but are necessary to perform a computational task)
and/or hazards and glitches (which dissipate power without doing any useful computation).
It is shown in 1] that although the mean value of the ratio of hazardous component to the
total power dissipation varies signi cantly with the considered circuits (from 9% to 38%),
the hazard/glitch power dissipation cannot be neglected in static CMOS circuits. Indeed,
an average of 15-20% of the total power is dissipated in glitching. The glitch power problem
is likely to become even more important in future scaled technology.
In real networks, statistical perturbations of circuit parameters may change the propagation delays and produce changes in the number of transitions because of the appearance
or disappearance of hazards. It is therefore useful to determine the change in the signal
transition count as a function of this statistical perturbations. Variation of gate delay parameters may change the number of hazards occurring during a transition as well as their
duration. For this reason, it is expected that the hazardous component of power dissipation
is more sensitive to IC parameter uctuations than the power strictly required to perform
the transition between the initial and nal state of each node.
The major di culty in computing the signal probabilities is the reconvergent nodes.
Indeed, if a network consists of simple gates and has no reconvergent fanout stems (or nodes),
then the exact signal probabilities can be computed during a single post-order traversal of
the network. For networks with reconvergent fanout, the problem is much more di cult.
as the sum of the power dissipation for all cells in the power vector path.
In summary, accuracy and e ciency are the key requirements for any power analysis
prediction tool. PowerMill and Entice-Aspen are steps in the right direction as they provide
intermediate level simulation that bridges the gaps between circuit-level and switch-level
simulation paradigms (see Table 2).
Max
Mean
RMS
STD
With
Without
Spatial Correlations
With Without With Without
Temporal Correlations
0.0463 0.2020 0.2421 0.2478
0.0115 0.0591 0.0658 0.0969
0.0185 0.0767 0.0722 0.1103
0.0149 0.0505 0.0960 0.0544
tion if the OBDD representation of the entire circuit can be constructed. Otherwise, a circuit
partitioning scheme which breaks the circuit into blocks for which OBDD representations
can be built is recommended. In this case, the correlation coe cients must be calculated
and propagated from the circuit inputs toward the circuit outputs in order to improve the
accuracy.
Estimation under a Real Delay Model
The above methods only account for steady-state behavior of the circuit and thus ignore hazards and glitches. This section reviews some techniques that examine the dynamic
behavior of the circuit and thus estimate the power dissipation due to hazards and glitches.
In 17], the exact power estimation of a given combinational logic circuit is carried out
by creating a set of symbolic functions such that summing the signal probabilities of the
functions corresponds to the average switching activity at a circuit line x in the original
combinational circuit. The inputs to the created symbolic functions are the circuit input lines
at time instances 0; and 1. Each function is the exclusive or of the characteristic functions
describing the logic values of x at two consecutive instances. The major disadvantage of this
estimation method is its exponential complexity. However, for the circuits that this method
is applicable to, the estimates provided by the method can serve as a basis for comparison
among di erent approximation schemes.
The concept of a probability waveform is introduced in 5]. This waveform consists of a
sequence of transition edges or events over time from the initial steady state (time 0; ) to the
nal steady state (time 1) where each event is annotated with an occurrence probability.
The probability waveform of a node is a compact representation of the set of all possible
logical waveforms at that node. Given these waveforms, it is straight-forward to calculate
the switching activity of x which includes the contribution of hazards and glitches. Given
such waveforms at the circuit inputs and with some convenient partitioning of the circuit, the
authors examine every sub-circuit and derive the corresponding waveforms at the internal
circuit nodes 27].
A tagged probabilistic simulation approach is described in 35] that correctly accounts
for reconvergent fanout and glitches. The key idea is to break the set of possible logical
waveforms at a node n into four groups, each group being characterized by its steady state
values. Next, each group is combined into a probability waveform with the appropriate
steady-state tag. Given the tagged probability waveforms at the input of a simple gate, it is
10
Zero-delay
Accuracy
E ciency
;;
++
Real Delay
Probabilistic Tagged Probabilistic Symbolic
;
+
++
+
;
;;
Combinational Techniques
Error
Run Time
30-60%
1
Sequential Techniques
Non-linear Equations C-K Equations
5-10%
None
5-8
Exponential
Table 5: Comparing various power estimation techniques for sequential logic circuits
algebraic system of equations in terms of the signal probabilities of the present state and
combinational inputs of the FSM. The xed point (or zero) of this system of equations can
be found using the Picard-Peano or Newton-Raphson iteration. Increasing the number of
variables or the number of equations in the above system results in increased accuracy. For
a wide variety of examples, it is shown that the approximation scheme is within 1-3% of
the exact method, but is orders of magnitude faster for large circuits. Previous sequential
switching activity estimation methods have signi cantly greater inaccuracies (see Table 5).
19] considers how changes in the global function of an internal node a ects the switching
activity (and thus, the power consumption) of nodes in its transitive fanout. Power consumption in a combinational logic circuit has been reduced by some 10% as a result of this
optimization.
Common Subexpression Extraction
Extraction based on algebraic division (using cube-free primary divisors or kernels) has
proven to be very successful in creating an area-optimized multi-level Boolean network 3].
The kernel extraction procedure is modi ed in 30] to generate multi-level circuits with low
power consumption. The main idea is to calculate the power savings factor for each candidate
kernel based on how its extraction will a ect the loading on its input lines and the amount of
logic sharing. Results show 12% reduction in power compared to a minimum-literal network.
Path Balancing
Balancing path delays reduces hazards/glitches in the circuit which in turn reduces the
average power dissipation in the circuit. This can be achieved before technology mapping by
selective collapsing and logic decomposition or after technology mapping by delay insertion
and pin reordering.
The rationale behind selective collapsing is that by collapsing the fanins of a node into
that node, the arrival time at the output of the node can be changed. Logic decomposition
can be performed so as to minimize the level di erence between the inputs of nodes which
are driving high capacitive nodes. The key issue in delay insertion is to use the minimum
number of delay elements to achieve the maximum reduction in spurious switching activity.
Path delays may sometimes be balanced by appropriate signal to pin assignment. This is
possible as the delay characteristics of CMOS gates vary as a function of the input pin which
is causing a transition at the output.
Technology Decomposition
It is di cult to come up with a decomposed network which will lead to a minimum power
implementation after technology mapping since gate loading and mapping information are
unknown at this stage. Nevertheless, it has been observed that a decomposition scheme
which minimizes the sum of the switching activities at the internal nodes of the network, is
a good starting point for power-e cient technology mapping.
Given the switching activity value at each input of a complex node, a procedure for AND
decomposition of the node is described in 36] which minimizes the total switching activity in
14
the resulting two-input AND tree under a zero-delay model. The decomposition procedure
(which is similar to Hu man's algorithm for constructing a binary tree with minimum average weighted path length) is optimal for dynamic CMOS circuits and produces very good
results for static CMOS circuits. It is shown that the low power technology decomposition
reduces the total switching activity in the networks by 5% over the conventional balanced
tree decomposition method.
Technology Mapping
A successful and e cient solution to the minimum area mapping problem was suggested
in 21] and implemented in programs such as DAGON and MIS. The idea is to reduce
technology mapping to DAG covering and to approximate DAG covering by a sequence of
tree coverings which can be performed optimally using dynamic programming.
The problem of minimizing the average power consumption during technology mapping
is addressed in 36]. This approach consists of two steps. In the rst step, power-delay
curves (that capture power consumption versus arrival time tradeo s) at all nodes in the
network are computed. In the second step, the mapping solution is generated based on
the computed power-delay curves and the required times at the primary outputs. For a
NAND-decomposed tree, subject to load calculation errors, this two step approach nds the
minimum area mapping satisfying any delay constraint if such a solution exists. Compared
to a technology mapper that minimizes the circuit delay, this procedure leads to an average
of 18% reduction in power consumption at the expense of 16% increase in area without any
degradation in performance.
Figures 2 and 3 compare the results of this power-delay mapper with the area-delay
mapper of 9] for the s832 benchmark circuit. From Figure 2, we can see that the powerdelay mapper reduces the number of high switching activity nets at the expense of increasing
the number of low switching activity nets. From Figure 3, we can see that for the remaining
high switching activity nets, the power-delay mapper reduces the average load on the nets.
By taking these two steps, this mapper minimizes the total weighted switching activity and
hence the total power consumption in the circuit.
15
Figure 3: Average load per net vs. switching rate for s832
16
Signal-to-Pin Assignment
In general, library gates have pins that are functionally equivalent which means that
inputs can be permuted on those pins without changing function of the gate output. These
equivalent pins may have di erent input pin loads and pin dependent delays. It is well
known that the signal to pin assignment in a CMOS logic gate has a sizable impact on the
propagation delay through the gate.
If we ignore the power dissipation due to charging and discharging of internal capacitances, it becomes obvious that high switching activity inputs should be matched with pins
that have low input capacitance. However, the internal power dissipation also varies as a
function of the switching activities and the pin assignment of the input signals. To nd the
minimum power pin assignment for a gate g, one must solve a di cult optimization problem 38]. Alternatively, one can use heuristics, for example, a reasonable heuristic assigns
the signal with largest probability of assuming a controlling value (zero for NMOS and one
for PMOS) to the transistor near the output terminal of the gate. The rationale is that
this transistor will switch o as often as possible, thus blocking the internal nodes from
non-productive charge and discharge events.
Table 6 summarizes the reported or predicted (in one-three years) power reduction as a
result of various logic synthesis steps. The percentage reduction is given based on our current
low power design tools at USC, reported results in the literature, or our preliminary studies.
Note that the power savings at the di erent design phases are, in the best case, additive. For
example, a 10% power savings from network optimization together with a 15% power savings
from state assignment will yield a total power savings of (1 ; 0:9 0:85) 100 = 23:5%.
However, more often than not, the total power savings is less, say in this example 20%, since
the various optimizations may adversely a ect each other.
Optimization
% Power Reduction
Retiming
10-15
State Assignment
15-30
Two-Level Minimization
10-25
Network Optimization
10-20
Subexpression Extraction
10-30
Path Balancing
5-10
Technology Decomposition
5-10
Technology Mapping
20-40
Pin Assignment
10-15
Table 6: Power reduction due to logic synthesis
power, reliability), various optimization techniques are used to partition, place, resize and
route gates.
Under a zero-delay model, the switching activity of gates remains unchanged during
layout optimization, and hence, the only way to reduce power dissipation is to decrease the
load on high switching activity gates by proper netlist partitioning and gate placement, gate
and wire sizing, transistor reordering, and routing. At the same time, if a real-delay model is
used, various layout optimization operations in uence the hazard activity in the circuit. This
is however a very di cult analysis and optimization problem and requires further research.
Circuit Partitioning
Netlist partitioning is key in breaking a complex design into pieces which are subsequently
optimized and implemented as separate blocks. In general, the o -block capacitances are
much higher than the on-block capacitances (one to two orders of magnitude). It is therefore
essential to develop partitioning schemes that keep the high switching activity nets entirely
within the same block as much as possible. Techniques based on local neighborhood search
(e.g., the FM heuristic 13]) can be easily adapted to do this. In particular, it is adequate to
assign net weights based on the switching activity values of the driver gates and then nd a
minimum cost partitioning solution.
Floorplanning
18
timing constraints and uses a linear programming solver to nd the global optimum solution.
Wire Sizing
Wiresizing and/or driver sizing are often needed to reduce the interconnect delay on
time-critical nets. Wiresizing however tends to increase the load on the driver and hence increase the power dissipation. 10] presents a combined wiresizing and driver sizing approach
which reduces the interconnect delay with only a small increase in the power dissipation.
Experimental results show that for the same delay constraint, this approach reduces the
power by about 10% when compared to the conventional method of driver sizing only. Alternatively, this approach produces delay values which are up to 40% lower when compared
to the conventional method (at the cost of increasing the power dissipation by 25%).
Clock Tree Generation
Clock is the fastest and most heavily loaded net in a digital system. Power dissipation of
the clock net contributes a large fraction of the total power consumption. A two-level clock
distribution scheme based on area pad technology for MCMs is described in 42]. The rst
level of the tree is routed on the MCM substrate connecting the clock source to the clock
area pads while the second level tree lies inside each die with the area pads as the source.
The objective is to minimize the load on the clock drivers subjects to meeting a tolerable
clock skew. A signi cant power reduction (70% for one benchmark circuit) over the method
with one clock pad per die is reported by using this scheme.
Table 7 summarizes the reported or predicted (in one-three years) power reduction as a
result of various layout synthesis steps.
5 Concluding Remarks
The need for lower power systems is being driven by many market segments. There are
several approached to reducing power, however the highest Return On Investment approach
is through designing for low power. Unfortunately designing for low power adds another
dimension to the already complex design problem the design has to be optimized for Power
as well as Performance and Area.
Optimizing the three axes necessitates a new class of power conscious CAD tools. The
problem is further complicated by the need to optimize the design for power at all design
phases. The successful development of new power conscious tools and methodologies requires
20
Optimization
% Power Reduction
Circuit Partitioning
10-30
Floorplanning
15-25
Placement
10-15
Routing
5-10
Transistor / Gate Sizing
10-30
Wire Sizing
10-25
Clock Tree Generation
10-30
Table 7: Power reduction due to layout optimization
a clear and measurable goal. In this context the research work should strive to reduce power
by 5-10x in three years through design and tool development. That is, any power reduction
through process scaling or voltage scaling should be above and beyond the 5-10x goals.
References
21
10] J. Cong, C-K. Koh, and K-S. Leung. Wiresizing with driver sizing for performance and power
optimization. In Proceedings of the 1994 International Workshop on Low Power Design, pages
81{86, April 1994.
11] C. Deng. Power analysis for CMOS/BiCMOS circuits. In Proceedings of the 1994 International
Workshop on Low Power Design, pages 3{8, April 1994.
12] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco. Estimate of signal probability in
combinational logic networks. In First European Test Conf., pages 132{138, 1989.
13] C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions.
In Proceedings of the 19th Design Automation Conference, pages 175{181, June 1982.
14] J. P. Fishburn and A. E. Dunlop. TILOS: A posynomial programming approach to transistor
sizing. In Proceedings of the IEEE International Conference on Computer Aided Design, pages
326{328, November 1985.
15] T. A. Fjeldly and M. Shur. Threshold voltage modeling and the subthreshold regime of
operation of short-channel MOSFET's. IEEE Transactions on Electron Devices, 40(1):137{
145, Jan. 1993.
16] B. J. George, D. Gossain, S. C. Tyler, M. G. Wloka, and G. K. H. Yeap. Power analysis and
characterization for semi-custom design. In Proceedings of the 1994 International Workshop
on Low Power Design, pages 215{218, April 1994.
17] A. A. Ghosh, S. Devadas, K. Keutzer, and J. White. Estimation of average switching activity in combinational and sequential circuits. In Proceedings of the 29th Design Automation
Conference, pages 253{259, June 1992.
18] L. H. Goldstein. Controllability/observability of digital circuits. IEEE Transactions on Circuits and Systems, 26(9):685{693, September 1979.
19] S. Iman and M. Pedram. Multi-level network optimization for low power. In Proceedings of
the IEEE International Conference on Computer Aided Design, November 1994.
20] S. M. Kang. Accurate simulation of power dissipation in VLSI circuits. IEEE Journal of Solid
State Circuits, 21(5):889{891, Oct. 1986.
21] K. Keutzer. DAGON: Technology mapping and local optimization. In Proceedings of the
Design Automation Conference, pages 341{347, June 1987.
22] R. Marculescu, D. Marculescu, and M. Pedram. Logic level power estimation considering spatiotemporal correlations. In Proceedings of the IEEE International Conference on Computer
Aided Design, November 1994.
23] J. Monteiro, S. Devadas, and A. Ghosh. Retiming sequential circuits for low power. In
Proceedings of the IEEE International Conference on Computer Aided Design, pages 398{
402, November 1993.
24] J. Monteiro, S. Devadas, and A. Ghosh. Estimation of switching activity in sequential logic
circuits with applications to synthesis for low power. In Proceedings of the 31st Design Automation Conference, page , June 1994.
25] J. Monteiro, S. Devadas, B. Lin, C-Y. Tsui, M. Pedram, and A. M. Despain. Exact and approximate methods of switching activity estimation in sequential logic circuits. In Proceedings
of the 1994 International Workshop on Low Power Design, pages 117{122, April 1994.
26] F. N. Najm. Transition density, a stochastic measure of activity in digital circuits. In Proceedings of the 28th Design Automation Conference, pages 644{649, June 1991.
22
27] F. N. Najm, R. Burch, P. Yang, and I. Hajj. Probabilistic simulation for reliability analysis of
CMOS VLSI circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, 9(4):439{450, April 1990.
28] K. P. Parker and J. McCluskey. Probabilistic treatment of general combinational networks.
IEEE Transactions on Computers, C-24:668{670, Jun. 1975.
29] S. Rajgopal and G. Mehta. Expreriences with simulation-based schematic level current estimation. In Proceedings of the 1994 International Workshop on Low Power Design, pages 9{14,
April 1994.
30] K. Roy and S. C. Prasad. Circuit activity based logic synthesis for low power reliable operations. IEEE Transactions on VLSI Systems, 1(4):503{513, December 1993.
31] H. Savoj, R. K. Brayton, and H. J. Touati. Extracting local don't cares for network optimization. In Proceedings of the IEEE International Conference on Computer Aided Design, pages
514{517, November 1991.
32] S.C. Seth, L. Pan, and V.D. Agrawal. PREDICT - Probabilistic estimation of digital circuit
testability. In Proceedings of the Fault Tolerant Computing Symposium, pages 220{225, June
1985.
33] A. A. Shen, A. Ghosh, S. Devadas, and K. Keutzer. On average power dissipation and random pattern testability of CMOS combinational logic networks. In Proceedings of the IEEE
International Conference on Computer Aided Design, November 1992.
34] C-Y. Tsui, M. Pedram, C-A. Chen, and A. M. Despain. Low power state assignment targeting two- and multi-level logic implementations. In Proceedings of the IEEE International
Conference on Computer Aided Design, November 1994.
35] C-Y. Tsui, M. Pedram, and A. M. Despain. E cient estimation of dynamic power dissipation
under a real delay model. In Proceedings of the IEEE International Conference on Computer
Aided Design, pages 224{228, November 1993.
36] C-Y. Tsui, M. Pedram, and A. M. Despain. Technology decomposition and mapping targeting
low power dissipation. In Proceedings of the 30th Design Automation Conference, pages 68{73,
June 1993.
37] C-Y. Tsui, M. Pedram, and A. M. Despain. Exact and approximate methods for calculating
signal and transition probabilities in fsms. In Proceedings of the 31st Design Automation
Conference, page , June 1994.
38] C-Y. Tsui, M. Pedram, and A. M. Despain. Power e cient technology decomposition and
mapping under an extended power consumption model. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 13(9), September 1994.
39] A. Tyagi. Hercules: A power analyzer of MOS VLSI circuits. In Proceedings of the IEEE
International Conference on Computer Aided Design, pages 530{533, November 1987.
40] H. Vaishnav and M. Pedram. PCUBE: a performance driven placement algorithm for low
power designs. In Proceedings of the European Design Automation Conference, pages 72{77,
September 1993.
41] H. J. M. Veendrick. Short-circuit dissipation of static CMOS circuitry and its impact on the
design of bu er circuits. IEEE Journal of Solid State Circuits, 19:468{473, August 1984.
42] Q. Zhu, J. G. Xi, W. W-M. Dai, and R. Shukla. Low power clock distribution based on area
pad interconnect for multichip modules. In Proceedings of the 1994 International Workshop
on Low Power Design, pages 87{92, April 1994.
23