Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Minimizing Sensitivity to Delay Variations in High-Performance Synchronous Circuits Xun Liu Marios C. Papaefthymiou Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, Michigan 48109 Abstract This paper investigates retiming and clock skew scheduling for improving the tolerance of synchronous circuits to delay variations. It is shown that when both long and short paths are considered, circuits optimized by the combined application of the two techniques are more tolerant to delay variations than when optimized by either of the two techniques separately. A novel mixed-integer linear programming formulation is given for simultaneous retiming and clock scheduling with a target clock period and tolerance under setup and hold constraints. Experiments with LGSynth93 and ISCAS89 benchmark circuits demonstrate the effectiveness of the combined optimization. For half of the test circuits, tolerance to delay variations increased by at least 23% over the separate application of retiming and clock scheduling. Moreover, for two thirds of the test circuits, maximum tolerance improved by at least 11%. 1. Introduction Retiming is an architectural-level transformation that optimizes digital circuits by relocating their storage elements. Clock scheduling adjusts the delays of the clock signals in a circuit and can be used as an alternative to retiming. Significant research has been devoted to each of the two optimizations separately. The investigation of the combined application of these techniques has been limited, however. This paper investigates the simultaneous application of retiming and clock scheduling for increasing the tolerance of a digital circuit’s timing to delay variations. These variations often present a fundamental constraint in the design of high-performance circuits. Typically, there are three sources of delay variations: process parameter variations, temperature or environmental variations, and power supply variations. The creation of new design techniques and Eby G. Friedman Department of Electrical Engineering University of Rochester Rochester, New York 14627 methodologies that minimize the sensitivity of circuit timing to delay variations is of paramount importance for highperformance design. Two main analytical contributions are contained in this paper. First, we give a set of O(E 2 ) constraints for the problem of simultaneous retiming and clock scheduling to achieve a target clock period and delay tolerance. Second, we formulate the problem of simultaneous retiming and clock scheduling under setup and hold constraints as a mixed-integer linear program (MILP). A circuit with maximum tolerance to delay variations can be computed by performing a binary search over the range of possible tolerance values. In experiments with benchmark circuits from the LGSynth93 and ISCAS89 suites, simultaneous retiming and clock scheduling resulted in significantly more tolerant circuits than the independent application of the two optimization techniques. For half of the circuits in our test suite, maximum tolerance to delay variations improved by at least 23% over separate retiming or clock skew scheduling. For about two thirds of the test circuits, maximum tolerance to delay variations improved by at least 11%. Retiming has been investigated for a variety of clocking disciplines [7, 9, 10, 15], delay models [8, 16], and optimization objectives [1, 4, 12, 14]. A linear programming formulation of the clock scheduling problem was first described in [5]. The combined application of retiming and clock scheduling was discussed in [11]. A two-step procedure for maximizing the operating frequency of a synchronous circuit by combining retiming with clock scheduling was proposed in [2]. That work is concerned only with setup violations, however, and does not explore the expanded solution space resulting when both setup and hold constraints are considered. The main challenge with the integration of retiming and clock scheduling is the formulation of the problem as a conjunction of linear constraints. As is the case with other 0 s(j) 7/6 i [-5,6] 0 3/1 5/1 j [-2,4] k (a) 0 s(j) 7/6 i 3/1 0 5/1 j [-2,7] [-1,7] k (b) Figure 1. (a) Original and (b) retimed circuit. retiming problems [8, 16], the co-existence of setup and hold constraints introduces disjunctions among constraints. Thus, the resulting solution space precludes the application of powerful convex programming techniques. This paper presents a mixed-integer linear program for simultaneous retiming and clock scheduling that is derived by a combination of upper bounding and graph-theoretic techniques. The remainder of this paper has eight sections. Section 2 demonstrates the performance advantage of simultaneous retiming and clock scheduling. Background material is given in Section 3. In Section 4, we give a shortestpaths formulation for the problem of clock scheduling with a target tolerance, a target clock period, and fixed register locations. Section 5 presents necessary and sufficient conditions for achieving correct timing when a circuit is optimized by simultaneous retiming and clock scheduling under setup and hold constraints. An alternative formulation of these conditions in terms of an auxiliary graph is given in Section 6. This formulation is used in Section 7 to derive an equivalent mixed-integer linear program. Section 8 compares the results obtained by the separate application of retiming and clock scheduling with those obtained by the simultaneous application of the two optimizations. Our contributions are summarized in Section 9. 2. Motivation The effectiveness of simultaneous retiming and clock scheduling is demonstrated by the circuit in Figure 1. Each vertex represents a block of combinational logic, and each rectangle represents an edge-triggered register. Each pair x=y denotes the maximum and minimum propagation delay of the signals through the corresponding node. The clock skew between the input/output registers i and k is assumed to be zero. The setup and hold constraints along each combinational path yield a range [x; y] of permissible clock skews [13] for register j . The permissible skew range of j is obtained by intersecting all these possible ranges. Consider the original circuit in Figure 1(a). For a target clock period of 12 time units, the intersection of the two ranges is [-2,4]. When clock skew is zero, the permissible range of j is [-2,2], assuming symmetric clock delay variations. Thus the tolerance of this circuit is 4. When clock signals arrive at j with a delay s(j ) = 1, however, the permissible range is [-2,4], and delay tolerance increases to 6. Figure 1(b) shows a retimed version of the original circuit that is obtained by shifting j forward. In this case, the intersection of the two skew ranges is [-1,7]. When clock skew is zero, the permissible range of j is [-1,1], and the tolerance drops to 2. When the arrival of the clock signals at j is delayed by s(j ) = 3, however, the permissible range becomes [-1,7], and tolerance increases to 8. This value is the maximum tolerance that can be achieved by simultaneous retiming and clock scheduling. Moreover, it cannot be achieved by the separate application of the optimizations on the original circuit. An interesting observation in this example is that the delay tolerance of the retimed circuit is smaller than that of the original circuit when skews are zero. Nevertheless, the retimed circuit exhibits maximum tolerance to delay variations when clock skews are nonzero. 3. Background 3.1. Circuit and Delay Model h i An edge-triggered circuit is modeled as a directed multigraph G = V; E; d; w . The vertices V correspond to the combinational logic elements in the circuit. Each vertex v V is associated with a nonnegative weight d(v ) which describes the propagation delay through the corresponding logic block. Our results can be extended to include the case where each logic block has a maximum propagation delay dmax (v ) and a minimum propagation delay dmin (v ). The directed edges E of the graph model the interconnections between the combinational blocks. Each edge e E corresponds to a wire that connects an output of a combinational block to the input of another combinational block, possibly through one or more globally clocked, edgetriggered registers. For each edge e E , the register count of the corresponding wire is given by an integer, nonnegative edge-weight w(e). In every directed cycle of G, there is an edge with a strictly positive register count. 2 2 2 3.2. Retiming h i A retiming of an edge-triggered circuit G = V; E; d; w is an integer-valued vertex-labeling r : V Z that denotes a transformation of the original circuit G into a functionally ! 3.3. Clock Skew Scheduling v v Figure 2. Retiming a vertex v by ( ) = 1. r v e equivalent circuit Gr = hV; E; d; wr i. For each edge u ! v in Gr , wr is defined by the equation r (e) = w(e) + r(v) , r(u) : (1) w The retiming transformation for a vertex v in V is shown in Figure 2. The output of v’s computation in Gr is generated r (v ) clock cycles later than in G. The retimed circuit Gr is well-formed if for all edges e 2 E , we have r (e)  0 : (2) w Equation (1) implies that for every vertex pair u; v in V , p v dethe change in the register count along any path u pends solely on its two endpoints: ; r (p) = w(p) + r(v) , r(u) ; w where ()= w p P e2p w(e). Thus, the maximum decrease p in the register count of any path u ( (3) W u; v ) = min n ; v is ( ) : ;p w p u v o (4) : ;p v that can become combinational (and The only paths u possibly lead to a timing violation) in Gr are those for which w(p) = W (u; v) in G. For each of the O(V 2 ) vertex pairs u; v in V , the quantities ( D u; v ( n p ) = max ( ) : ; u; v d p u n p ) = min ( ) : ; P d p u ( )= ( ) ( )= ( ) v; w p v; w p W u; v W u; v o o ; (5) ; (6) where d(p) = x2p d(x), represent the longest and shortest propagation delays from u to v, respectively, whenever the retimed circuit includes a combinational path between the two vertices. Therefore, the clock period of any retimed circuit Gr is always some element in the O(V 2 )-size set of D (u; v ). When only long paths are considered, a retimed circuit that achieves a given clock period c can be computed in O (V E ). A retimed circuit that achieves the minimum possible clock period can be computed in O(V E + V 2 lg V ) steps [9]. In synchronous circuits, clock signals provide a global time reference that synchronizes the flow of data between storage elements. These signals are delivered by a distribution network [6]. A variety of factors such as differences in interconnect delay, parasitic impedances, and process parameters variations affect their arrival times at the storage elements of the circuit. The difference between the arrival times at two sequentially-adjacent registers is known as the clock skew between these registers [6]. A clock schedule of a circuit G = hV; E; d; wi is a realvalued edge-labeling s : E !R that gives the propagation delay from the global clock source to each wire e in the circuit. By adjusting these delays, timing violations can be fixed (or created). For example, consider a combinational p e v which is bounded by registers on ? ! u and path u ; e 0 v !?. If s(e)  s(e ), then the time available for the propa0 gation of signals from e to e0 decreases by s(e),s(e0 ). Conversely, if s(e)  s(e0 ), then the available time increases by s(e0 ) , s(e). These changes may introduce new critical paths or eliminate existing ones. They may also introduce or eliminate hold violations. A linear programming framework for clock scheduling was first presented in [5]. A graph-theoretic approach to clock scheduling was subsequently described in [3]. In both papers, the placement of the storage elements was assumed to be fixed. Algorithms for scheduling local clocks to improve the tolerance of a circuit to process parameter variations were presented in [13]. 4. Clock Scheduling Constraints This section gives a precise statement of the clock scheduling problem with a given tolerance as a singlesource shortest-paths problem with O(E 2) constraints. The following theorem captures the timing conditions that must be satisfied by a clock schedule that achieves a target clock period. These conditions can be extended to include nonzero setup and hold times. The proof of the theorem follows from [5]. Theorem 1 Let G = hV; E; d; wi be an edge-triggered circuit and c a given constant. Moreover, let sm : E ! R and sM : E ! R be assignments of minimum and maximum clock delays, respectively. Then, G is timed correctly if and e e only if for every pair ? ! u, v !? in E such that w(e)  1, 0 w (e )  1, and W (u; v ) = 0, we have 0 ( ) + m ( ) , M ( )  0 ( ) + M ( ) , m( )  u; v D u; v s s e e s s e 0 e 0 ; c : (7) (8) We can now express the clock scheduling problem with a target clock period and tolerance as a shortest-paths problem with O(E 2 ) inequalities that can be computed in O(E 2) time and can be solved in O(E 3) steps using the Bellman-Ford single-source shortest-paths algorithm. Theorem 2 Let G = hV; E; d; wi be an edge-triggered circuit. Moreover, let c and t be given real constants. Then, G achieves a clock period c with tolerance t if and only if there exist nonnegative functions sm : E ! R and sM : E ! R e such that for each edge u ! v, ( )  sM (e) , t ; sm e e and for every edge pair ? ! u, v 1, w(e0 )  1, and W (u; v) = 0, (9) e ? in E such that w(e)  ! 0 ( )  sm (e) + (u; v) ; sM (e)  sm (e ) + c , D(u; v) : sM e0 (10) 0 (11) For a target clock period c, the maximum tolerance s can be determined by a binary search in t. Given sm and sM , the corresponding schedule s with maximum tolerance to symmetric delay variations is obtained by setting s(e) = (sm (e) + sM (e))=2 for all e in E . The following theorem gives a set of ( ) constraints for correct timing when clock scheduling and retiming are applied simultaneously. Its correctness follows from Theorem 2. O E2 Theorem 3 Let G = hV; E; d; wi be a synchronous circuit, and let c and t be given constants. Moreover, let r : V ! Z be a retiming function, let sM : E ! R be an assignment of maximum clock delays, and let sm : E ! R be an assignment of minimum clock delays. Then the retimed circuit Gr is well-formed and achieves a clock period c with tolerance e t if and only if for every edge u ! v 2 E , ( )  sM (e) , t ; w(e) + r(v) , r(u)  0 ; sm e (12) (13) e and for every pair of edges ? ! u; v !? 2 E , 0 ) 0 ) (14) Wr (u; v)  1 or (wr (e) = 0 or wr (e ) = 0) ; where E (e; e ) = D(u; v) + sM (e) , sm (e ) , c and E (e; e ) = , ((u; v) + sm (e) , sM (e )) for the setup ( E e; e0 > 0 0 0 and hold constraints, respectively. 0 0 6. Companion Graph A companion graph G0 = hV 0; E 0 ; w0i can be used to transform the timing constraints from Theorem 3 into a mixed-integer linear program. The construction of G0 from the circuit graph G is identical to that in [8]. Each edge e e1 u ! v 2 E is segmented into two edges, u ! xuv and e 2 xuv ! v, where xuv is a dummy vertex. The edge e1 has exactly one register when the corresponding edge e 2 E has a positive register count and zero registers otherwise. Thus, the register count of e1 serves as an index function for the register count of the corresponding generating edge e 2 E . The edge e2 carries the balance of the registers up to w(e). In mathematical terms, the companion graph G0 = hV 0; E 0; w0i is defined as n o = nV [ xuv : u !e v 2 E ; o e2 e1 x ; x ! e = u! uv uv v : u ! v 2 E ; V0 E0 5. Clock Scheduling and Retiming e For simplicity, the constraints of Theorem 3 assume zero setup and hold times. Non-zero times Tsetup and Thold can be included in a straightforward manner by setting E (e; e0 ) > ,Tsetup or E (e; e0 ) > ,Thold , as appropriate, in the left-hand side of the implication in Relation (14). For a target clock period c, the maximum tolerance rs over all retimings and clock schedules can be determined by a binary search in t. e where for each edge u ! v 2 E, w (e1 ) = min f1; w(e)g ; and w (e2 ) = w(e) , min f1; w(e)g : 0 0 The following lemma recasts Theorem 3 in terms of G0 and a corresponding retiming function r0 . Given r0, r(u) can be obtained for every u 2 V by setting r(u) = r0 (u). Lemma 4 Let G = h i V; E; d; w be a circuit graph, let ; E 0; w0 be its corresponding companion graph, and let c and t be given constants. Moreover, let r0 V 0 Z be a retiming function, let sM E R be an assignment of maximum clock delays, and let sm E R be an assignment of minimum clock delays. Then the retimed circuit Gr is well-formed and achieves a clock period c with tolerance e v E , we have t if and only if for every edge u G0 = hV 0 i : ! : ! ! 2 sm (e)  sM (e) , t ; e for every edge u ! v 2 E , w (e) + r (v) , r (u)  0 ; e1 x 2 E , for every edge u ! uv w (e1 ) + r (xuv ) , r (u)  1 ; : ! (15) 0 0 0 0 (16) 0 0 0 0 (17) Wr'(e1,e1') Wr'(e1,e1') 2 0 Figure 3. Solution space for Relation (20). w 0 e r F F 0 w v 0 r e 0 E x r W u; v v 0 x r W v; u 0 u ; u; v V 0 E u; v E e; e 0 W > 0 u; v E e; e E e; e 0 w D u; v 0 u; v 0 e w s s e e s e 0 s e 0 0 c 0 e and hold constraints, respectively. Relation (19) is recast as an equivalent disjunction in the following lemma. u; v Lemma 5 For every pair of edges ? Relation (19) is equivalent to the disjunction ( E e; e 0 )0 or w r (e1 ) + wr 0 0 0 0 ( 1) , r ( e 0 W 0 e? ! 0 u; v 2 )1 E(e,e') maintaining their feasibility. The final constraints set comprises linear inequalities with integer and real unknowns. The following lemma gives upper bounds on the quantity wr0 0 (e1 ) + wr0 0 (e01 ) , Wr0 (u; v) from Relation (20) that restrict the solution space in the first and second quadrant. Lemma 6 Let r0 : V 0 ! Z be a retiming function that satisfies the conditions in Lemma 4. Then, for every pair of e E , ; (20) where E (e; e0 ) = D(u; v) + sM (e) , sm (e0 ) , c and 0 0 E (e; e ) = , ((u; v ) + sm (e) , sM (e )) for the setup and hold constraints, respectively. The solution space of Relation (20) is described by the solid lines in Figure 3. This space is not convex and precludes the use of convex programming techniques. 7. Mixed-Integer Linear Program This section presents a set of O(E 2 ) constraints that ensure correct timing under simultaneous retiming and clock skew scheduling. These constraints are obtained by restricting the solution space of the constraints in Lemma 4 while e edges ? ! u; v !? 2 E , 0 )  2 (21) ) r ( 1 ) + r ( 1 ) , r ( )  2 , max ( ) (22) ) is an upper bound of ( ) that dewhere max ( r (e1 ) + wr (e1 ) , Wr w w 0 0 e w E !e Emax Figure 4. Equivalent convex solution space. e2 2 , ! ( 2 ) + ( ) , ( uv )   ( ( 1 ) + ( uv ) , ( )) (18) ) + ( ) : 2 g, and for where = max f ( e e ?2 , ! every pair of edges ? ! ( ) 0 ) (19) r ( )  1 or ( r ( 1 ) = 0 or r ( 1 ) = 0) where ( ) = ( ) + M ( ) , m ( ) , and ( ) = , (( ) + m ( ) , M ( )) for the setup e 1 xuv ; xuv for every pair of edges u ! 0 Emin E(e,e') 0 0 0 0 0 0 e e; e W 0 0 u; v 0 0 ( u; v ( ; E e; e E 0 0 e; e E e; e 0 ; 0 pends on the maximum possible clock skew values, as they are determined by the largest realizable chip die size. The convex solution space derived from the bounds in Lemma 6 is illustrated in Figure 4. The bold line segments represent possible solutions. The shaded lines and points denote the points of the original solution space that are now excluded. The horizontal line in the second quadrant arises from Inequality (21), and the sloped upper bound in the first quadrant arises from Inequality (22). The two vertical lines correspond to the bounds on E (e; e0 ). Based on Lemmas 5 and 6, the simultaneous retiming and clock scheduling problem can now be recast as a mixedinteger linear program with O(E 2 ) constraints. Theorem 7 Let G = hV; E; d; wi be a synchronous circuit, and let c and t be given constants. Moreover, let r : V ! Z be a retiming function, let sM : E ! R be an assignment of maximum clock delays, and let sm : E ! R be an assignment of minimum clock delays. Then the retimed circuit Gr is well-formed and achieves a clock period (Gr )  c with e tolerance t if and only if for every edge u ! v 2 E , s m (e)  sM (e) , t ; (23) e for every edge u ! v 2E, 0 ( ) + r (v) , r (u)  0 ; w0 e 0 e 1 xuv for every edge u ! 0 (24) 2E, 0 ( ) + r (xuv ) , r (u)  1 ; w 0 e1 0 0 (25) e2 v 2 E , ! w (e2 ) + r (v) , r (xuv )  (26) F  (w (e1 ) + r (xuv ) , r (u)) ; where F = max fW (u; v) + W (v; u) : u; v 2 V g, and for e e every pair of edges ? ! u; v !? 2 E , E (e; e )  Emax (e; e ) ; (27) E (e; e )  Emin (e; e ) ; (28) wr (e1 ) + wr (e1 ) , Wr (u; v)  2 ; (29) e 1 xuv ; xuv for every pair of edges u ! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ( ) + wr (e1 ) , Wr (u; v)  2 , EE (e;(e;e e) ) ; wr0 0 e1 0 0 0 0 0 max 0 (30) = D(u; v) + sM (e) , sm (e ) , c and E (e; e ) = , ((u; v) + sm (e) , sM (e )) for the setup where E (e; e0 ) 0 0 0 and hold constraints, respectively. 8. Experimental Results This section presents results from the application of simultaneous retiming and clock scheduling on LGSynth93 and ISCAS89 benchmark circuits. Each test circuit was optimized to achieve maximum delay tolerance with a clock period 1:1  cmin , where cmin was the shortest clock period of the original circuit. The following experimental procedure was applied. Each circuit was optimized using retiming, clock scheduling, and simultaneous retiming and clock scheduling. An additional optimization heuristic was applied, in which the original circuit was first retimed for maximum tolerance with zero skew, and clock skews were subsequently scheduled to increase tolerance further. Our results are listed in Table 1. The first three columns give the name and size of each test circuit. The fourth column gives the target clock period. The fifth column gives the maximum tolerance  of the original circuit with zero skew. The sixth column gives the maximum tolerance s of the original circuit after clock scheduling. (Retiming results are omitted, since clock scheduling always resulted in circuits with greater tolerance.) The seventh column gives the maximum tolerance r;s achieved by applying the two optimizations in sequence. The eighth column gives the relative improvement achieved over separate scheduling, and the ninth column gives the runtime of the heuristic. The tenth column gives the maximum tolerance rs that was achieved by simultaneous retiming and clock scheduling. The relative improvements achieved over separate scheduling are given in the eleventh column. The runtimes of the combined optimization are listed in the last column. Simultaneous retiming and clock scheduling improved the tolerance of all test circuits and resulted in significant improvements for most of them. For half of the circuits in our test suite, relative improvements over scheduling were at least 23%. For about two thirds of the circuits, improvements exceeded 11%. Our sequential retiming and clock scheduling heuristic improved the maximum tolerance of most test circuits. For one quarter of the circuits, relative improvements exceeded 10%. The runtime of this optimization was comparable to scheduling. Our experiments were performed on an Intel Pentium II with 128MB of main memory. Our simultaneous retiming and clock scheduling algorithm was terminated if no further improvements were achieved for 10 hours of execution. Gate delays were calculated using the formula a + b  (f anout + rand). The parameters a and b denote the intrinsic gate delay and the delay increment of a single gate load, respectively. Their values were obtained from the library iwls93.mis2lib in the LGSynth93 benchmark. The parameter rand was a uniformly distributed random number that introduced variation to gate delays. The range of rand was [-1,0] and [0,1] for minimum and maximum propagation delays, respectively. Our simultaneous retiming and clock scheduling algorithm explores the solution space using a branch and bound approach. During its execution, it maintains a permissible range for the retiming value of each vertex. Once the retiming function is fixed, the clock delays are computed using a Bellman-Ford single-source shortest-paths algorithm. The optimal tolerance is determined by iterating this algorithm in a binary search. The overall complexity of our algorithm is exponential in the worst case. When register mobility is constrained by considering loops, however, the permissible region of most vertices becomes very small. 9. Conclusion This paper explores the application of retiming and clock scheduling for maximizing the tolerance of synchronous circuits to delay variations. When both long and short paths are considered, we show that the combined optimization can result in more delay tolerant circuits than if either of the two optimizations is applied separately. Moreover, we give a MILP formulation of the simultaneous retiming and clock scheduling problem. Our experiments show that retiming and clock scheduling can significantly increase the maximum tolerance of benchmark circuits to delay variations. Circuit nodes daio dk27 tav bbtas s208 dk512 dk17 s420 dk15 dk14 ex4 opus ex6 dk16 ex1 s713 17 24 26 31 37 39 40 46 49 69 70 71 102 120 193 377 edges 30 254 59 87 112 107 114 177 154 238 207 242 379 567 887 590 c 2.91 3.74 3.36 3.78 4.67 4.11 4.69 5.67 5.26 5.64 5.83 8.53 7.28 8.89 14.63 41.30  0.00 0.21 0.00 0.20 0.00 0.23 0.26 0.00 0.29 0.35 0.00 0.47 0.41 0.49 0.00 0.00 s 0.76 0.59 1.05 0.46 1.64 0.63 0.52 1.67 1.55 1.27 0.66 2.11 1.02 1.38 2.16 3.42 r;s 0.76 0.64 1.08 0.63 1.67 0.70 0.58 1.76 1.55 1.40 0.70 2.11 1.06 1.38 2.16 3.78 r;s =s (%) 0 8 2 39 2 11 0 5 0 11 7 0 4 0 0 10 ,1 CPU (r;s ) (sec) 0.1 0.4 0.4 0.9 1.5 1.4 1.4 4.4 2.2 7.7 7.2 10.1 27.6 133.0 367.0 546.0 rs 0.79 0.64 1.22 0.63 1.67 0.70 0.60 2.28 2.05 1.62 0.83 2.69 1.16 1.45 2.94 4.21 rs =s ,1 (%) 4 8 15 39 2 11 3 36 32 28 26 27 14 5 36 23 CPU (rs ) (sec) 1 265 2 6914 29086 77313 6505 53360 202 116412 38346 7370 76091 12030 49821 48871 Table 1. Tolerance to delay variations for original and optimized circuits. Acknowledgments This research was supported in part by the National Science Foundation under Grant No. MIP-9423886 and Grant No. MIP-9610108, a grant from the New York State Science and Technology Foundation, and by grants from the Xerox, IBM, and Intel Corporations. References [1] S. Chakradhar and S. Dey. Resynthesis and retiming for optimum partial scan. In Proc. 31st ACM/IEEE Design Automation Conf., pages 87–93, June 1994. [2] L.-F. Chao and E. H.-M. Sha. Retiming and clock skew for synchronous systems. In Proc. International Symp. on Circuits and Systems, pages 283–286, June 1994. [3] R. B. Deokar and S. S. Sapatnekar. A graph-theoretic approach to clock skew optimization. In Proc. International Symp. on Circuits and Systems, pages 407–410, May 1995. [4] S. Dey and S. Chakradhar. Retiming sequential circuits to enhance testability. In Proc. 12th IEEE VLSI Test Symp., pages 28–33, April 1994. [5] J. P. Fishburn. Clock skew optimization. IEEE Trans. on Computers, 39(7):945–951, July 1990. [6] E. G. Friedman. Clock Distribution Networks in VLSI Circuits and Systems. IEEE Press, 1995. [7] A. T. Ishii, C. E. Leiserson, and M. C. Papaefthymiou. Optimizing two-phase, level-clocked circuitry. J. ACM, 41(1):148–199, Jan. 1997. [8] K. N. Lalgudi and M. C. Papaefthymiou. D ELAY: an efficient tool for retiming with realistic delay modeling. In Proc. 32nd ACM/IEEE Design Automation Conf., June 1995. [9] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1), 1991. [10] B. Lockyear and C. Ebeling. Optimal retiming of multiphase, level-clocked circuits. In Advanced Research in VLSI and Parallel Systems: Proc. 1992 Brown/MIT Conf. MIT Press, March 1992. [11] H.-G. Martin. Retiming by combination of relocation and clock delay adjustment. In Proc. European Design Automation Conf., pages 384–389, September 1993. [12] J. Monteiro, S. Devadas, and A. Ghosh. Retiming sequential circuits for low power. In Digest of Technical Papers of the 1993 IEEE International Conf. on CAD, pages 398–402, Nov. 1993. [13] J. L. Neves and E. G. Friedman. Optimal clock skew scheduling tolerant to process variations. In Proc. 33rd ACM/IEEE Design Automation Conf., pages 623–628, June 1996. [14] M. C. Papaefthymiou and K. H. Randall. T IM : A timing package for two-phase, level-clocked circuitry. In Proc. 30th ACM/IEEE Design Automation Conf., June 1993. [15] N. Shenoy, R. K. Brayton, and A. Sangiovanni-Vincentelli. Retiming of circuits with single phase level-sensitive latches. In International Conf. on Computer Design, Oct. 1991. [16] T. Soyata, E. Friedman, and J. Mulligan. Incorporating interconnect, register, and clock distribution delays into the retiming process. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 16(1):105–120, Jan. 1997.