Automatic Clock Gating For Power Reductionl
Automatic Clock Gating For Power Reductionl
Automatic Clock Gating For Power Reductionl
Zia Khan
Gaurav Mehta
ABSTRACT
As the transistor count and operating frequencies of graphics chips increase, power
consumption has become a critical problem in the design of these components. Often
the location of these components on a system chassis makes it is difficult to provide
enough airflow to cool the parts or to mount a cooling device. So, the power
consumption above certain limit is a critical problem for these components.
Significant power savings can be achieved in a chip by turning off portions of logic
whose output may not be needed all the time. Shutting off clocks to unneeded sections
of a design can do this. Determining when a section of the design can be shut off
requires detailed analysis of the design. The Power Compiler can automate this
process for certain types of logic structures.
This paper presents the use of an automated tool to implement clock gating for
reducing power consumption in a design. The Power Compiler (v1998.02) from
Synopsys was used to automatically insert clock-gating circuitry in the design. In this
paper we discuss how we analyzed the risk factors and implemented the changes in
design methodology to incorporate a clock gating methodology for power optimization.
1.0 Introduction:
Demand for high performance graphics in personal computers is driving the need to
provide increasing levels of performance at an affordable cost. Increased complexity,
larger transistor count and higher operating frequencies deliver improved performance
but come at a cost of higher power consumption. Dissipation of this power poses
significant design challenges.
Since graphics chips in personal computer market have low selling prices, even a small
increase in the cost of cooling results in significant increase in the total system cost.
Often the location of these components on system chassis makes it difficult to provide
enough airflow to cool the parts or to mount cooling devices. Also, the need to prolong
battery life of mobile products poses additional demand to reduce power dissipated in
designs. So, minimizing power consumption is a critical problem for these products [1].
Traditionally, graphics designs have used several power saving techniques but power
problem has not been addressed as aggressively as other design specifications like
performance, area, and schedule.
Clock gating is one of the most effective and widely known techniques to reduce power.
However, extensive clock gating is not commonly used due to complexity of
implementation, impact on clock skew, layout area and clock synthesis issues.
In many designs, extensive clock gating is not done due to the effort required to identify
and implement clock-gating opportunities. These roadblocks were overcome by
enhancing the design methodology to reduce the impact of clock gating and automating
clock gating identification and implementation task. This paper describes the
methodology used in a graphics controller design to automatically identify and
implement clock gating using Power Compiler.
In this paper we present the results of our work on clock gating using SynopsysPower
Compiler tool. First, we analyze power characteristics of a graphics design and identify
methods that can reduce power significantly. In subsequent sections we describe how
Power Compiler implements clock-gating structures in the design and its limitations.
We discuss how we developed our methodology to overcome these limitations. Then
we describe our design trade-offs and present data to justify our choices. Finally, we
present results to show the effectiveness of our methodology.
SNUG 1999
For the purpose of this analysis, we identified various sections of the design and their
contribution to total power consumption. These sections were broadly classified into
embedded RAMs, IO buffers, analog circuits (D-to-A converters and PLLs) and
combinational and sequential gates. Since the clocks toggle the sequential circuits
continuously during normal operation, we classified the power consumed by sequential
cells as clock power. This data is shown in Figure 1.
I/O+ DAC
13%
RAMs
3%
Combinational Cells
10%
Clock Power
Clock Network
(PLLs, clock buffers)
29%
Clock Nets, PLLs &
Sequential Cells
74%
Sequential Cells
71%
while running power virus test. A histogram in Figure 2 shows the activity factors of
data nodes (i.e. non-clock nodes) in the design. It shows that 80% percent of the nodes
toggle only 2% of the time. These data nodes do not toggle frequently because either
the logic is idle, unused or computing the same operand.
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0.0-0.02
0.02-0.4
0.04-0.06
0.06-0.08
0.08-0.10
>0.1
SNUG 1999
overhead. There are two common methods for implementing clock gating. These are
described in following sections.
Unit1
En_1
GCLK1
En_2
GCLK2
En_3
GCLK3
Unit2
Clk
Unit3
The central clock unit method has been used successfully in many designs. The
popularity of this method is mainly due to the ease of its implementation. There is no
modification required in the HDL code, synthesis or timing analysis flow. The central
clock unit is designed as a separate unit and other unit designers do not have to worry
about clocking issues. However, there are some disadvantages of this method.
All clock-gating elements are in the central clock unit and gated clocks are routed to
destination logic sections. This requires more routing tracks to be used for clock
network.
This method is not very practical if the number of clocks is large, say few hundred
clocks. Thus, this method is applicable only to simple designs in which the unit
clocks are derived by gating the main clock.
SNUG 1999
Unit1
En
Global
Clock Unit
Unit2
Global Clk
En
Unit3
En
Distributed clock gating scheme is made possible by enhancing the design flow to
handle clock gating circuits within units without impacting layout, timing analysis and
clock skew.
SNUG 1999
te
CLOCK
GATE
enclk
gated_clk
te
parent_clk
en
en
enclk
gated_clk
enclk
ungated_clk
CLOCK
GATE
te
en
en
ungated_clk
clk
vdd
vdd
te
en
CLOCK
GATE
Figure 5: Insertion of Dummy Clock Gating cell for balancing the delays
The decision to enable or disable a clock gate for a flip-flop is made after a detailed
analysis of the design and its various modes of operation. This is a complex task and
represents significant effort on behalf of the designer. There are a large number of recirculating flip-flops in the design. Thus analysis of gating conditions for all recirculating flip-flops to determine their suitability for flop-level clock gating is a
SNUG 1999
Clk
Clk
Clock Gate
clk
en_clk
B
Select
en
Data
te
Flip
flop
ckb
L o clk_en
Clk_g
Out
Data
Flip
flop
Out
SNUG 1999
gating AND gate and enable latch may be placed far apart during layout resulting in
a significant clock skew between the AND gate and the clock pin of the enable
latch. This could create timing problems that may be very difficult to address.
Power Compiler inserts clock gates during elaboration time. Hence it does not
understand the total loading on the gated clock net. Sometimes these nets could get
very heavily loaded (i.e. a very large fan out). This may cause clock skew problems
in the design.
Power Compiler inserts a separate gating element for each vector. For example,
even if vectors A[3:0] and B[7:0] are controlled by same enable signal, Power
Compiler creates two separate clock-gating elements. This could make the tool use
more clock gate elements than is necessary thus incurring area penalty.
Bank Width
SNUG 1999
% Flops Covered
1
4
8
No
With Merging No Merging With Merging
Merging
4592
1621
100%
100%
851
817
74%
93%
504
606
63%
86%
7.0 Conclusions:
Management of power in modern graphics chips is an important design requirement.
Significant power reduction is possible by selectively switching off power to unused
units of a design. However, the complexity of the task requires an automated solution.
An effective methodology for reducing power in a large graphics controller has been
shown to successfully reduce power of synthesized core logic up to 40%. This power
saving was made possible by two levels of clock gating; block level clock-gating by
manual changes to design and flip-flop level clock gating by using Power Compiler.
8.0 Acknowledgment:
The authors would like to thank Omar Malik for his leadership in setting the direction
and scope of this effort. We would also like to thank Nick Sadowy and Balaji
Veeraswamy for their help.
SNUG 1999
10
9.0 References:
1. Tiwari, V., et. al.: Reducing Power in High-performance Microprocessors, Design
Automation Conference, 1998.
2. Intel740 Power Consumption Report, Intel Internal Document, 1997.
3. Mehta, G., et. al.: Where is power going on Intel740?, Intel Internal Document,
1997.
4. Power Compiler Reference Manual, Synopsys Inc., Release 1998.02, 1998.
SNUG 1999
11