Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

chap4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Chapter 4: Retiming

Keshab K. Parhi
有环的电路,如何接近迭代边界;(流水线是在前馈&&割集上减小关键路径)
主要是加快速度,对节省功耗的影响是很小的。
Retiming :延时不变,关键路径减小;相比流水线,后者减小关键路径的同时也会增加延时
Moving around existing delays 不再有 loop的限制
• Does not alter the latency of the system
• Reduces the critical path of the system
• Node Retiming 第一类
D 3D
5D +2D 3D 大前提:操作必须是
-2D 线性时不变的!
+2D
2D
•Cutset Retiming 第二类 D
D
2D
B D
D
右侧cutset 2-in 3-out。在一个in上减小1个D,3个out都各增加一个D。
把cutset看做node。

A F
D
C E
D
Chap. 4 2
Retiming vs pipelining

• Generalization of Pipelining
• Pipelining is Equivalent to Introducing
Many delays at the Input followed by
Retiming

Chap. 4 3
• Retiming Formulation
Retiming
r(U) r(V)
ω ω’
U V U V
Source node Destination node

ω’ = ω + r(V) - r(U)

•Properties of retiming
–The weight of the retimed path p = V 0 --> V1 --> …..Vk is given by
ωr(p)= ω(p) + r(Vk) - r(V0) 注意是一个cycle
–Retiming does not change the number of delays in a cycle.
–Retiming does not alter the iteration bound in a DFG as the
number of delays in a cycle does not change
–Adding the constant value j to the retiming value of each node
does not alter the number of delays in the edges of the retimed
graph. 是给边+j还是给Node+j?给node!rv=ru, then w'=w

•Retiming is done to meet the following


2大作用:– Clock period minimization
– Register minimization
Chap. 4 4
• Retiming for clock period minimization
– Feasibility constraint
ω’(U,V) ≥ 0 ⇒ causality of the system
⇒ ω(U,V) ≥ r(U) - r(V) (one inequality per edge)
– Critical Path constraint
r(U) - r(V) ≤ W(U,V) - 1 for all vertices U and V in the graph
such that D(U,V) > c where c = target clock period. The two
quantities W(U,V) and D(U,V) are given as:
W(U,V) = min{w(p) : U→V}
D(U,V) = max{t(p) : U→V and w(p) = W(U,V)
(1)
G
D
(1) (1) 2D
A B C D E
(1) (1) (1)

F
D W(A,E) = 1 & D(A,E) = 5
(2)

Chap. 4 5
• Algorithm to compute W(U,V) and D(U,V):
• Let M = tmaxn, where tmax is the maximum computation time of
the nodes in G and n is the # of nodes in G.
• Form a new graph G’ which is the same as G except the edge
weights are replaced by w’(e) = Mw(e) – t(u) for all edges
UàV.
• Solve for all pair shortest path problem on G’ by using Floyd
Warshall algorithm. Let S’UV be the shortest path form U à
V.
• If U ≠ V, then W(U,V) = S’UV/M and D(U,V) = MW(U,V) -
S’UV + t(V). If U = V, then W(U,V) = 0 and D(U,V) = t(U).
• Using W(U,V) and D(U,V) the feasibility and critical path
constraints are formulated to give certain inequalities.
The inequalities are solved using constraint graphs and if a
feasible solution is obtained then the circuit can be
clocked with a period ‘c’.

Chap. 4 6
• Solving a system of inequalities : Given M inequalities in N
variables where each inequality is of the form ri – rj ≤ k for
integer values of k.
Ø Draw a constraint graph
ØDraw the node i for each of the N variables ri, I= 1, 2,
…, N.
ØDraw the node N+1.
ØFor each inequality ri – rj ≤ k , draw the edge jài of
length k.
ØFor each node i, i = 1, 2, …, n, draw the edge N+1 ài
from the node N+1 to node I with length 0.
Ø Solve using a shortest path algorithm.
ØThe system of inequalities have a solution iff the
constraint graph contains no negative cycles.
ØIf a solution exists, one solution is where ri is the
minimum length path from the node N+1 to node i.

Chap. 4 7
star
• K-slow transformation
– Replace each D by kD
Clock
(1) (1) 0 A0 → B0
A B Titer= 2ut
1 A1 → B1
D 2 A2 → B2

After 2-slow transformation


Clock
(1) (1) 0
A B A0→B0
1 Tclk= 2ut
2D 2 A1→B1 Titer= 2×2ut=4ut
3
4 A2→B2

*Input new samples every alternate cycles.


*null operations account for odd clock cycles.
*Hardware utilized only 50% time
Chap. 4 8
• Retiming 2-slow graph
D

A B

Tclk = 1ut
Titer = 2×1=2ut Titer是什么意思?

*Hardware Utilization = 50 %

*Hardware can be fully utilized if


two independent operations are
available.
Chap. 4 9
2-Slow Lattice Filter (Fig. 4.7)
不能插入pipeline,所有的割集都不是前馈的

critical path

critical loop

A 100 stage Lattice Filter with critical path 2 multiplications and 101 additions

critical path = 6 < 7

Chap. 4
The 2-slow version 10
A retimed version of the 2 slow circuit
with critical path of 2 multiplications
and 2 additions

If Tm = 2 u.t. and Ta = 1 u.t., then


Tclk = 6 u.t., Titer = 2X6 = 12 u.t.

In Original Lattice Filter, T iter = 105 u.t.


Chap. 4 Iteration Period Bound = 7 u.t. 11
Other Applications of Retiming
• Retiming for Register Minimization
(Section 4.4.3)
• Retiming for Folding (Chapter 6)
• Retiming for Power Reduction (Chap. 17)
• Retiming for Logic Synthesis (Beyond
Scope of This Class)
• Multi-Rate/Multi-Dimensional Retiming
(Denk/Parhi, Trans. VLSI, Dec. 98, Jun.99)
Chap. 4 12

You might also like