Minimizing Sensitivity to Delay Variations in High-Performance
Synchronous Circuits
Xun Liu Marios C. Papaefthymiou
Department of Electrical Engineering
and Computer Science
University of Michigan
Ann Arbor, Michigan 48109
Abstract
This paper investigates retiming and clock skew scheduling for improving the tolerance of synchronous circuits to
delay variations. It is shown that when both long and short
paths are considered, circuits optimized by the combined
application of the two techniques are more tolerant to delay variations than when optimized by either of the two
techniques separately. A novel mixed-integer linear programming formulation is given for simultaneous retiming
and clock scheduling with a target clock period and tolerance under setup and hold constraints. Experiments with
LGSynth93 and ISCAS89 benchmark circuits demonstrate
the effectiveness of the combined optimization. For half of
the test circuits, tolerance to delay variations increased by
at least 23% over the separate application of retiming and
clock scheduling. Moreover, for two thirds of the test circuits, maximum tolerance improved by at least 11%.
1. Introduction
Retiming is an architectural-level transformation that optimizes digital circuits by relocating their storage elements.
Clock scheduling adjusts the delays of the clock signals in
a circuit and can be used as an alternative to retiming. Significant research has been devoted to each of the two optimizations separately. The investigation of the combined
application of these techniques has been limited, however.
This paper investigates the simultaneous application of
retiming and clock scheduling for increasing the tolerance
of a digital circuit’s timing to delay variations. These variations often present a fundamental constraint in the design
of high-performance circuits. Typically, there are three
sources of delay variations: process parameter variations,
temperature or environmental variations, and power supply variations. The creation of new design techniques and
Eby G. Friedman
Department of Electrical Engineering
University of Rochester
Rochester, New York 14627
methodologies that minimize the sensitivity of circuit timing to delay variations is of paramount importance for highperformance design.
Two main analytical contributions are contained in this
paper. First, we give a set of O(E 2 ) constraints for the
problem of simultaneous retiming and clock scheduling to
achieve a target clock period and delay tolerance. Second, we formulate the problem of simultaneous retiming
and clock scheduling under setup and hold constraints as a
mixed-integer linear program (MILP). A circuit with maximum tolerance to delay variations can be computed by performing a binary search over the range of possible tolerance
values.
In experiments with benchmark circuits from the
LGSynth93 and ISCAS89 suites, simultaneous retiming
and clock scheduling resulted in significantly more tolerant
circuits than the independent application of the two optimization techniques. For half of the circuits in our test suite,
maximum tolerance to delay variations improved by at least
23% over separate retiming or clock skew scheduling. For
about two thirds of the test circuits, maximum tolerance to
delay variations improved by at least 11%.
Retiming has been investigated for a variety of clocking
disciplines [7, 9, 10, 15], delay models [8, 16], and optimization objectives [1, 4, 12, 14]. A linear programming
formulation of the clock scheduling problem was first described in [5]. The combined application of retiming and
clock scheduling was discussed in [11]. A two-step procedure for maximizing the operating frequency of a synchronous circuit by combining retiming with clock scheduling was proposed in [2]. That work is concerned only with
setup violations, however, and does not explore the expanded solution space resulting when both setup and hold
constraints are considered.
The main challenge with the integration of retiming and
clock scheduling is the formulation of the problem as a conjunction of linear constraints. As is the case with other
0
s(j)
7/6
i
[-5,6]
0
3/1
5/1
j
[-2,4]
k
(a)
0
s(j)
7/6
i
3/1
0
5/1
j
[-2,7]
[-1,7]
k
(b)
Figure 1. (a) Original and (b) retimed circuit.
retiming problems [8, 16], the co-existence of setup and
hold constraints introduces disjunctions among constraints.
Thus, the resulting solution space precludes the application
of powerful convex programming techniques. This paper
presents a mixed-integer linear program for simultaneous
retiming and clock scheduling that is derived by a combination of upper bounding and graph-theoretic techniques.
The remainder of this paper has eight sections. Section 2 demonstrates the performance advantage of simultaneous retiming and clock scheduling. Background material is given in Section 3. In Section 4, we give a shortestpaths formulation for the problem of clock scheduling with
a target tolerance, a target clock period, and fixed register
locations. Section 5 presents necessary and sufficient conditions for achieving correct timing when a circuit is optimized by simultaneous retiming and clock scheduling under setup and hold constraints. An alternative formulation
of these conditions in terms of an auxiliary graph is given
in Section 6. This formulation is used in Section 7 to derive an equivalent mixed-integer linear program. Section 8
compares the results obtained by the separate application of
retiming and clock scheduling with those obtained by the simultaneous application of the two optimizations. Our contributions are summarized in Section 9.
2. Motivation
The effectiveness of simultaneous retiming and clock
scheduling is demonstrated by the circuit in Figure 1. Each
vertex represents a block of combinational logic, and each
rectangle represents an edge-triggered register. Each pair
x=y denotes the maximum and minimum propagation delay of the signals through the corresponding node. The
clock skew between the input/output registers i and k is assumed to be zero. The setup and hold constraints along each
combinational path yield a range [x; y] of permissible clock
skews [13] for register j . The permissible skew range of j
is obtained by intersecting all these possible ranges.
Consider the original circuit in Figure 1(a). For a target
clock period of 12 time units, the intersection of the two
ranges is [-2,4]. When clock skew is zero, the permissible
range of j is [-2,2], assuming symmetric clock delay variations. Thus the tolerance of this circuit is 4. When clock
signals arrive at j with a delay s(j ) = 1, however, the permissible range is [-2,4], and delay tolerance increases to 6.
Figure 1(b) shows a retimed version of the original circuit that is obtained by shifting j forward. In this case, the
intersection of the two skew ranges is [-1,7]. When clock
skew is zero, the permissible range of j is [-1,1], and the
tolerance drops to 2. When the arrival of the clock signals
at j is delayed by s(j ) = 3, however, the permissible range
becomes [-1,7], and tolerance increases to 8. This value is
the maximum tolerance that can be achieved by simultaneous retiming and clock scheduling. Moreover, it cannot be
achieved by the separate application of the optimizations on
the original circuit.
An interesting observation in this example is that the delay tolerance of the retimed circuit is smaller than that of
the original circuit when skews are zero. Nevertheless, the
retimed circuit exhibits maximum tolerance to delay variations when clock skews are nonzero.
3. Background
3.1. Circuit and Delay Model
h
i
An edge-triggered circuit is modeled as a directed multigraph G = V; E; d; w . The vertices V correspond to the
combinational logic elements in the circuit. Each vertex
v
V is associated with a nonnegative weight d(v ) which
describes the propagation delay through the corresponding
logic block. Our results can be extended to include the case
where each logic block has a maximum propagation delay
dmax (v ) and a minimum propagation delay dmin (v ).
The directed edges E of the graph model the interconnections between the combinational blocks. Each edge
e
E corresponds to a wire that connects an output of a
combinational block to the input of another combinational
block, possibly through one or more globally clocked, edgetriggered registers. For each edge e E , the register count
of the corresponding wire is given by an integer, nonnegative edge-weight w(e). In every directed cycle of G, there
is an edge with a strictly positive register count.
2
2
2
3.2. Retiming
h
i
A retiming of an edge-triggered circuit G = V; E; d; w
is an integer-valued vertex-labeling r : V
Z that denotes
a transformation of the original circuit G into a functionally
!
3.3. Clock Skew Scheduling
v
v
Figure 2. Retiming a vertex v by
( ) = 1.
r v
e
equivalent circuit Gr = hV; E; d; wr i. For each edge u ! v
in Gr , wr is defined by the equation
r (e) = w(e) + r(v) , r(u) :
(1)
w
The retiming transformation for a vertex v in V is shown in
Figure 2. The output of v’s computation in Gr is generated
r (v ) clock cycles later than in G. The retimed circuit Gr is
well-formed if for all edges e 2 E , we have
r (e) 0 :
(2)
w
Equation (1) implies that for every vertex pair u; v in V ,
p
v dethe change in the register count along any path u
pends solely on its two endpoints:
;
r (p) = w(p) + r(v) , r(u) ;
w
where
()=
w p
P
e2p w(e). Thus, the maximum decrease
p
in the register count of any path u
(
(3)
W u; v
) = min
n
;
v
is
( ) : ;p
w p
u
v
o
(4)
:
;p
v that can become combinational (and
The only paths u
possibly lead to a timing violation) in Gr are those for
which w(p) = W (u; v) in G. For each of the O(V 2 ) vertex
pairs u; v in V , the quantities
(
D u; v
(
n
p
) = max ( ) : ;
u; v
d p
u
n
p
) = min ( ) : ;
P
d p
u
( )= (
)
( )= (
)
v; w p
v; w p
W u; v
W u; v
o
o
;
(5)
;
(6)
where d(p) = x2p d(x), represent the longest and shortest propagation delays from u to v, respectively, whenever
the retimed circuit includes a combinational path between
the two vertices. Therefore, the clock period of any retimed
circuit Gr is always some element in the O(V 2 )-size set of
D (u; v ).
When only long paths are considered, a retimed circuit
that achieves a given clock period c can be computed in
O (V E ). A retimed circuit that achieves the minimum possible clock period can be computed in O(V E + V 2 lg V )
steps [9].
In synchronous circuits, clock signals provide a global
time reference that synchronizes the flow of data between
storage elements. These signals are delivered by a distribution network [6]. A variety of factors such as differences
in interconnect delay, parasitic impedances, and process parameters variations affect their arrival times at the storage
elements of the circuit. The difference between the arrival
times at two sequentially-adjacent registers is known as the
clock skew between these registers [6].
A clock schedule of a circuit G = hV; E; d; wi is a realvalued edge-labeling s : E !R that gives the propagation
delay from the global clock source to each wire e in the
circuit. By adjusting these delays, timing violations can be
fixed (or created). For example, consider a combinational
p
e
v which is bounded by registers on ? ! u and
path u
;
e
0
v !?. If s(e) s(e ), then the time available for the propa0
gation of signals from e to e0 decreases by s(e),s(e0 ). Conversely, if s(e) s(e0 ), then the available time increases
by s(e0 ) , s(e). These changes may introduce new critical
paths or eliminate existing ones. They may also introduce
or eliminate hold violations.
A linear programming framework for clock scheduling
was first presented in [5]. A graph-theoretic approach to
clock scheduling was subsequently described in [3]. In both
papers, the placement of the storage elements was assumed
to be fixed. Algorithms for scheduling local clocks to improve the tolerance of a circuit to process parameter variations were presented in [13].
4. Clock Scheduling Constraints
This section gives a precise statement of the clock
scheduling problem with a given tolerance as a singlesource shortest-paths problem with O(E 2) constraints.
The following theorem captures the timing conditions
that must be satisfied by a clock schedule that achieves a
target clock period. These conditions can be extended to
include nonzero setup and hold times. The proof of the theorem follows from [5].
Theorem 1 Let G = hV; E; d; wi be an edge-triggered circuit and c a given constant. Moreover, let sm : E ! R and
sM : E ! R be assignments of minimum and maximum
clock delays, respectively. Then, G is timed correctly if and
e
e
only if for every pair ? ! u, v !? in E such that w(e) 1,
0
w (e ) 1, and W (u; v ) = 0, we have
0
( ) + m ( ) , M ( ) 0
( ) + M ( ) , m( )
u; v
D u; v
s
s
e
e
s
s
e
0
e
0
;
c :
(7)
(8)
We can now express the clock scheduling problem with
a target clock period and tolerance as a shortest-paths
problem with O(E 2 ) inequalities that can be computed in
O(E 2) time and can be solved in O(E 3) steps using the
Bellman-Ford single-source shortest-paths algorithm.
Theorem 2 Let G = hV; E; d; wi be an edge-triggered circuit. Moreover, let c and t be given real constants. Then, G
achieves a clock period c with tolerance t if and only if there
exist nonnegative functions sm : E ! R and sM : E ! R
e
such that for each edge u ! v,
( ) sM (e) , t ;
sm e
e
and for every edge pair ? ! u, v
1, w(e0 ) 1, and W (u; v) = 0,
(9)
e ? in E such that w(e)
!
0
( ) sm (e) + (u; v) ;
sM (e) sm (e ) + c , D(u; v) :
sM e0
(10)
0
(11)
For a target clock period c, the maximum tolerance s
can be determined by a binary search in t. Given sm and
sM , the corresponding schedule s with maximum tolerance
to symmetric delay variations is obtained by setting s(e) =
(sm (e) + sM (e))=2 for all e in E .
The following theorem gives a set of ( ) constraints
for correct timing when clock scheduling and retiming are
applied simultaneously. Its correctness follows from Theorem 2.
O E2
Theorem 3 Let G = hV; E; d; wi be a synchronous circuit,
and let c and t be given constants. Moreover, let r : V ! Z
be a retiming function, let sM : E ! R be an assignment of
maximum clock delays, and let sm : E ! R be an assignment of minimum clock delays. Then the retimed circuit Gr
is well-formed and achieves a clock period c with tolerance
e
t if and only if for every edge u ! v 2 E ,
( ) sM (e) , t ;
w(e) + r(v) , r(u) 0 ;
sm e
(12)
(13)
e
and for every pair of edges ? ! u; v !? 2 E ,
0
) 0 )
(14)
Wr (u; v) 1 or (wr (e) = 0 or wr (e ) = 0) ;
where E (e; e ) = D(u; v) + sM (e) , sm (e ) , c and
E (e; e ) = , ((u; v) + sm (e) , sM (e )) for the setup
(
E e; e0 >
0
0
0
and hold constraints, respectively.
0
0
6. Companion Graph
A companion graph G0 = hV 0; E 0 ; w0i can be used to
transform the timing constraints from Theorem 3 into a
mixed-integer linear program. The construction of G0 from
the circuit graph G is identical to that in [8]. Each edge
e
e1
u ! v 2 E is segmented into two edges, u ! xuv and
e
2
xuv ! v, where xuv is a dummy vertex. The edge e1 has
exactly one register when the corresponding edge e 2 E has
a positive register count and zero registers otherwise. Thus,
the register count of e1 serves as an index function for the
register count of the corresponding generating edge e 2 E .
The edge e2 carries the balance of the registers up to w(e).
In mathematical terms, the companion graph G0 =
hV 0; E 0; w0i is defined as
n
o
= nV [ xuv : u !e v 2 E ;
o
e2
e1 x ; x !
e
= u!
uv uv v : u ! v 2 E ;
V0
E0
5. Clock Scheduling and Retiming
e
For simplicity, the constraints of Theorem 3 assume zero
setup and hold times. Non-zero times Tsetup and Thold
can be included in a straightforward manner by setting
E (e; e0 ) > ,Tsetup or E (e; e0 ) > ,Thold , as appropriate, in the left-hand side of the implication in Relation (14).
For a target clock period c, the maximum tolerance rs over
all retimings and clock schedules can be determined by a
binary search in t.
e
where for each edge u ! v
2 E,
w (e1 ) = min f1; w(e)g ; and
w (e2 ) = w(e) , min f1; w(e)g :
0
0
The following lemma recasts Theorem 3 in terms of G0
and a corresponding retiming function r0 . Given r0, r(u)
can be obtained for every u 2 V by setting r(u) = r0 (u).
Lemma 4 Let G
= h
i
V; E; d; w be a circuit graph, let
; E 0; w0 be its corresponding companion graph,
and let c and t be given constants. Moreover, let r0 V 0
Z be a retiming function, let sM E
R be an assignment
of maximum clock delays, and let sm E
R be an assignment of minimum clock delays. Then the retimed circuit Gr
is well-formed and achieves a clock period c with tolerance
e v E , we have
t if and only if for every edge u
G0
= hV
0
i
: !
: !
! 2
sm (e) sM (e) , t ;
e
for every edge u ! v 2 E ,
w (e) + r (v) , r (u) 0 ;
e1 x 2 E ,
for every edge u !
uv
w (e1 ) + r (xuv ) , r (u) 1 ;
:
!
(15)
0
0
0
0
(16)
0
0
0
0
(17)
Wr'(e1,e1')
Wr'(e1,e1')
2
0
Figure 3. Solution space for Relation (20).
w
0
e
r
F
F
0
w
v
0
r
e
0
E
x
r
W u; v
v
0
x
r
W v; u
0
u
;
u; v
V
0
E
u; v
E e; e
0
W
>
0
u; v
E e; e
E e; e
0
w
D u; v
0
u; v
0
e
w
s
s
e
e
s
e
0
s
e
0
0
c
0
e
and hold constraints, respectively.
Relation (19) is recast as an equivalent disjunction in the
following lemma.
u; v
Lemma 5 For every pair of edges ?
Relation (19) is equivalent to the disjunction
(
E e; e
0
)0
or
w
r (e1 ) + wr
0
0
0
0
( 1) , r (
e
0
W
0
e?
!
0
u; v
2
)1
E(e,e')
maintaining their feasibility. The final constraints set comprises linear inequalities with integer and real unknowns.
The following lemma gives upper bounds on the quantity wr0 0 (e1 ) + wr0 0 (e01 ) , Wr0 (u; v) from Relation (20) that
restrict the solution space in the first and second quadrant.
Lemma 6 Let r0 : V 0 ! Z be a retiming function that
satisfies the conditions in Lemma 4. Then, for every pair of
e
E
,
;
(20)
where E (e; e0 ) = D(u; v) + sM (e) , sm (e0 ) , c and
0
0
E (e; e ) = , ((u; v ) + sm (e) , sM (e )) for the setup
and hold constraints, respectively.
The solution space of Relation (20) is described by the
solid lines in Figure 3. This space is not convex and precludes the use of convex programming techniques.
7. Mixed-Integer Linear Program
This section presents a set of O(E 2 ) constraints that ensure correct timing under simultaneous retiming and clock
skew scheduling. These constraints are obtained by restricting the solution space of the constraints in Lemma 4 while
e
edges ? ! u; v !? 2 E ,
0
) 2 (21)
)
r ( 1 ) + r ( 1 ) , r ( ) 2 , max ( ) (22)
) is an upper bound of ( ) that dewhere max (
r (e1 ) + wr (e1 ) , Wr
w
w
0
0
e
w
E
!e
Emax
Figure 4. Equivalent convex solution space.
e2 2 ,
!
( 2 ) + ( ) , ( uv )
( ( 1 ) + ( uv ) , ( ))
(18)
) + ( ) : 2 g, and for
where = max f (
e
e ?2 ,
!
every pair of edges ? !
( ) 0 )
(19)
r ( ) 1 or ( r ( 1 ) = 0 or r ( 1 ) = 0)
where (
) = ( ) + M ( ) , m ( ) , and
( ) = , (( ) + m ( ) , M ( )) for the setup
e
1
xuv ; xuv
for every pair of edges u !
0
Emin
E(e,e')
0
0
0
0
0
0
e
e; e
W
0
0
u; v
0
0
(
u; v
(
;
E e; e
E
0
0
e; e
E e; e
0
;
0
pends on the maximum possible clock skew values, as they
are determined by the largest realizable chip die size.
The convex solution space derived from the bounds in
Lemma 6 is illustrated in Figure 4. The bold line segments
represent possible solutions. The shaded lines and points
denote the points of the original solution space that are now
excluded. The horizontal line in the second quadrant arises
from Inequality (21), and the sloped upper bound in the first
quadrant arises from Inequality (22). The two vertical lines
correspond to the bounds on E (e; e0 ).
Based on Lemmas 5 and 6, the simultaneous retiming
and clock scheduling problem can now be recast as a mixedinteger linear program with O(E 2 ) constraints.
Theorem 7 Let G = hV; E; d; wi be a synchronous circuit,
and let c and t be given constants. Moreover, let r : V ! Z
be a retiming function, let sM : E ! R be an assignment of
maximum clock delays, and let sm : E ! R be an assignment of minimum clock delays. Then the retimed circuit Gr
is well-formed and achieves a clock period (Gr ) c with
e
tolerance t if and only if for every edge u ! v 2 E ,
s
m (e) sM (e) , t ;
(23)
e
for every edge u ! v
2E,
0
( ) + r (v) , r (u) 0 ;
w0 e
0
e
1
xuv
for every edge u !
0
(24)
2E,
0
( ) + r (xuv ) , r (u) 1 ;
w 0 e1
0
0
(25)
e2 v 2 E ,
!
w (e2 ) + r (v) , r (xuv )
(26)
F (w (e1 ) + r (xuv ) , r (u)) ;
where F = max fW (u; v) + W (v; u) : u; v 2 V g, and for
e
e
every pair of edges ? ! u; v !? 2 E ,
E (e; e ) Emax (e; e ) ;
(27)
E (e; e ) Emin (e; e ) ;
(28)
wr (e1 ) + wr (e1 ) , Wr (u; v) 2 ;
(29)
e
1
xuv ; xuv
for every pair of edges u !
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
( ) + wr (e1 ) , Wr (u; v) 2 , EE (e;(e;e e) ) ;
wr0 0 e1
0
0
0
0
0
max
0
(30)
= D(u; v) + sM (e) , sm (e ) , c and
E (e; e ) = , ((u; v) + sm (e) , sM (e )) for the setup
where E (e; e0 )
0
0
0
and hold constraints, respectively.
8. Experimental Results
This section presents results from the application of simultaneous retiming and clock scheduling on LGSynth93
and ISCAS89 benchmark circuits. Each test circuit was optimized to achieve maximum delay tolerance with a clock
period 1:1 cmin , where cmin was the shortest clock period of the original circuit. The following experimental
procedure was applied. Each circuit was optimized using
retiming, clock scheduling, and simultaneous retiming and
clock scheduling. An additional optimization heuristic was
applied, in which the original circuit was first retimed for
maximum tolerance with zero skew, and clock skews were
subsequently scheduled to increase tolerance further.
Our results are listed in Table 1. The first three columns
give the name and size of each test circuit. The fourth column gives the target clock period. The fifth column gives
the maximum tolerance of the original circuit with zero
skew. The sixth column gives the maximum tolerance s
of the original circuit after clock scheduling. (Retiming results are omitted, since clock scheduling always resulted in
circuits with greater tolerance.) The seventh column gives
the maximum tolerance r;s achieved by applying the two
optimizations in sequence. The eighth column gives the relative improvement achieved over separate scheduling, and
the ninth column gives the runtime of the heuristic. The
tenth column gives the maximum tolerance rs that was
achieved by simultaneous retiming and clock scheduling.
The relative improvements achieved over separate scheduling are given in the eleventh column. The runtimes of the
combined optimization are listed in the last column.
Simultaneous retiming and clock scheduling improved
the tolerance of all test circuits and resulted in significant
improvements for most of them. For half of the circuits in
our test suite, relative improvements over scheduling were
at least 23%. For about two thirds of the circuits, improvements exceeded 11%. Our sequential retiming and clock
scheduling heuristic improved the maximum tolerance of
most test circuits. For one quarter of the circuits, relative
improvements exceeded 10%. The runtime of this optimization was comparable to scheduling. Our experiments
were performed on an Intel Pentium II with 128MB of main
memory. Our simultaneous retiming and clock scheduling
algorithm was terminated if no further improvements were
achieved for 10 hours of execution.
Gate delays were calculated using the formula a + b
(f anout + rand). The parameters a and b denote the intrinsic gate delay and the delay increment of a single gate
load, respectively. Their values were obtained from the library iwls93.mis2lib in the LGSynth93 benchmark.
The parameter rand was a uniformly distributed random
number that introduced variation to gate delays. The range
of rand was [-1,0] and [0,1] for minimum and maximum
propagation delays, respectively.
Our simultaneous retiming and clock scheduling algorithm explores the solution space using a branch and bound
approach. During its execution, it maintains a permissible
range for the retiming value of each vertex. Once the retiming function is fixed, the clock delays are computed using a
Bellman-Ford single-source shortest-paths algorithm. The
optimal tolerance is determined by iterating this algorithm
in a binary search. The overall complexity of our algorithm
is exponential in the worst case. When register mobility is
constrained by considering loops, however, the permissible
region of most vertices becomes very small.
9. Conclusion
This paper explores the application of retiming and clock
scheduling for maximizing the tolerance of synchronous
circuits to delay variations. When both long and short paths
are considered, we show that the combined optimization can
result in more delay tolerant circuits than if either of the
two optimizations is applied separately. Moreover, we give
a MILP formulation of the simultaneous retiming and clock
scheduling problem. Our experiments show that retiming
and clock scheduling can significantly increase the maximum tolerance of benchmark circuits to delay variations.
Circuit
nodes
daio
dk27
tav
bbtas
s208
dk512
dk17
s420
dk15
dk14
ex4
opus
ex6
dk16
ex1
s713
17
24
26
31
37
39
40
46
49
69
70
71
102
120
193
377
edges
30
254
59
87
112
107
114
177
154
238
207
242
379
567
887
590
c
2.91
3.74
3.36
3.78
4.67
4.11
4.69
5.67
5.26
5.64
5.83
8.53
7.28
8.89
14.63
41.30
0.00
0.21
0.00
0.20
0.00
0.23
0.26
0.00
0.29
0.35
0.00
0.47
0.41
0.49
0.00
0.00
s
0.76
0.59
1.05
0.46
1.64
0.63
0.52
1.67
1.55
1.27
0.66
2.11
1.02
1.38
2.16
3.42
r;s
0.76
0.64
1.08
0.63
1.67
0.70
0.58
1.76
1.55
1.40
0.70
2.11
1.06
1.38
2.16
3.78
r;s =s
(%)
0
8
2
39
2
11
0
5
0
11
7
0
4
0
0
10
,1
CPU (r;s )
(sec)
0.1
0.4
0.4
0.9
1.5
1.4
1.4
4.4
2.2
7.7
7.2
10.1
27.6
133.0
367.0
546.0
rs
0.79
0.64
1.22
0.63
1.67
0.70
0.60
2.28
2.05
1.62
0.83
2.69
1.16
1.45
2.94
4.21
rs =s
,1
(%)
4
8
15
39
2
11
3
36
32
28
26
27
14
5
36
23
CPU (rs )
(sec)
1
265
2
6914
29086
77313
6505
53360
202
116412
38346
7370
76091
12030
49821
48871
Table 1. Tolerance to delay variations for original and optimized circuits.
Acknowledgments
This research was supported in part by the National Science Foundation under Grant No. MIP-9423886 and Grant
No. MIP-9610108, a grant from the New York State Science
and Technology Foundation, and by grants from the Xerox,
IBM, and Intel Corporations.
References
[1] S. Chakradhar and S. Dey. Resynthesis and retiming for
optimum partial scan. In Proc. 31st ACM/IEEE Design Automation Conf., pages 87–93, June 1994.
[2] L.-F. Chao and E. H.-M. Sha. Retiming and clock skew for
synchronous systems. In Proc. International Symp. on Circuits and Systems, pages 283–286, June 1994.
[3] R. B. Deokar and S. S. Sapatnekar. A graph-theoretic approach to clock skew optimization. In Proc. International
Symp. on Circuits and Systems, pages 407–410, May 1995.
[4] S. Dey and S. Chakradhar. Retiming sequential circuits to
enhance testability. In Proc. 12th IEEE VLSI Test Symp.,
pages 28–33, April 1994.
[5] J. P. Fishburn. Clock skew optimization. IEEE Trans. on
Computers, 39(7):945–951, July 1990.
[6] E. G. Friedman. Clock Distribution Networks in VLSI Circuits and Systems. IEEE Press, 1995.
[7] A. T. Ishii, C. E. Leiserson, and M. C. Papaefthymiou.
Optimizing two-phase, level-clocked circuitry. J. ACM,
41(1):148–199, Jan. 1997.
[8] K. N. Lalgudi and M. C. Papaefthymiou. D ELAY: an efficient tool for retiming with realistic delay modeling. In Proc.
32nd ACM/IEEE Design Automation Conf., June 1995.
[9] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1), 1991.
[10] B. Lockyear and C. Ebeling. Optimal retiming of multiphase, level-clocked circuits. In Advanced Research in VLSI
and Parallel Systems: Proc. 1992 Brown/MIT Conf. MIT
Press, March 1992.
[11] H.-G. Martin. Retiming by combination of relocation and
clock delay adjustment. In Proc. European Design Automation Conf., pages 384–389, September 1993.
[12] J. Monteiro, S. Devadas, and A. Ghosh. Retiming sequential circuits for low power. In Digest of Technical Papers of
the 1993 IEEE International Conf. on CAD, pages 398–402,
Nov. 1993.
[13] J. L. Neves and E. G. Friedman. Optimal clock skew
scheduling tolerant to process variations. In Proc. 33rd
ACM/IEEE Design Automation Conf., pages 623–628, June
1996.
[14] M. C. Papaefthymiou and K. H. Randall. T IM : A timing
package for two-phase, level-clocked circuitry. In Proc. 30th
ACM/IEEE Design Automation Conf., June 1993.
[15] N. Shenoy, R. K. Brayton, and A. Sangiovanni-Vincentelli.
Retiming of circuits with single phase level-sensitive
latches. In International Conf. on Computer Design, Oct.
1991.
[16] T. Soyata, E. Friedman, and J. Mulligan. Incorporating interconnect, register, and clock distribution delays into the retiming process. IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, 16(1):105–120, Jan. 1997.