Chapter4 FA16
Chapter4 FA16
Chapter4 FA16
Chapter 4:
Delay
Delay Definitions
when an input changes, the output will retain
Timing Optimization
In Digital circuits, there will be number of critical paths that limit the
2. The Logical Level (types of functional blocks used (eg., ripple carry or lookahead adders),
the number of stages of gates in a clock cycle, and fan in and fan out of the gates.)
3. The Circuit Level (delay can be tuned at the circuit level by choosing transistor sizes or
using other styles of CMOS logic.)
Delay Estimation
We would like to be able to easily estimate delay
Effective Resistance
Shockley models have limited value
Ids = Vds/R
RC Delay Model
RC delay models approximate the nonlinear transistor I-V and C-V
d
k
s
s
kC
R/k
g
kC
kC
s
d
k
s
kC
2R/k
kC
kC
d
10
RC Values
Capacitance
11
2 Y
12
2 Y
2C
2C
Y
13
2 Y
2C
2C
2C
2C
Y
R
R
C
C
C
14
2 Y
2C
2C
Y
2C
R
2C
C
C
d = 6RC
15
Example
Sketch a 3-input NAND gate with transistor
2
3
3
3
17
diffusion capacitance.
2C
2
2C
2C
2C
2
2C
2C
3C
3C
3C
2C
2
3
3
3
2C
2C
3C
3C
3C
3C
18
diffusion capacitance.
2
3
5C
5C
5C
3
3
9C
3C
3C
19
20
t pd
Ri to sourceCi
nodes i
R2
R3
C1
C2
RN
C3
CN
21
the RC tree:
22
Propagation Delay
Example: 2-input NAND
Estimate worst-case rising and falling delay of 2-input
2x
Y
h copies
23
Propagation Delay
Example: Estimate worst-case rising and falling
2x
6C
2C
Y
4hC
h copies
24
Propagation Delay
Estimate worst-case rising and falling propagation
2x
6C
Y
4hC
2C
h copies
Y
(6+4h)C
t pdr
25
Propagation Delay
Estimate worst-case rising and falling propagation
2x
6C
Y
4hC
2C
h copies
Y
(6+4h)C
t pdr 6 4h RC
26
Propagation Delay
Estimate worst-case rising and falling propagation
2x
6C
Y
4hC
h copies
2C
27
Propagation Delay
Estimate worst-case rising and falling propagation
2x
x
R/2
R/2
2C
Y
(6+4h)C
Y
4hC
6C
h copies
2C
t pdf
28
Propagation Delay
Estimate worst-case rising and falling propagation
2x
x
R/2
R/2
2C
Y
(6+4h)C
Y
4hC
6C
h copies
2C
t pdf 2C
R
2
6 4h C R2 R2
7 4h RC
29
Contamination Delay
Best-case (contamination) delay can be substantially
2x
6C
R R
Y
(6+4h)C
Y
4hC
2C
tcdr 3 2h RC
2x
6C
2C
Y
4hC
x
R/2
R/2
2C
Y
(6+4h)C
tcdf = (6 + 4h) RC
30
manufacturing process
Observe that the delay of an ideal fanout-of-1 inverter with no parasitic capacitance
is = 3RC
Normalized Delay (d) relative to this inverter delay:
31
1.
2.
gate
Designers simplfy delay analysis by characterizing a gate by
the slope and y-intercept of the function
In LDM, normalized delay is expressed in terms of two
components of delay:
d=f+p
The parasitic delay (p) is the time for a gate to drive its own
internal diffusion capacitance (no load)
The effort delay/stage effort (f) depends on h and g:
f = gh
h (fanout/electrical effort) that is the ratio of the capacitance of
the external load to input capacitance of the gate
Complexity (logical effort, g) of the gate. Complex gates have
greater logical efforts, indicating that they take longer to drive
32
a given function
33
2
A
Cin = 3
g = 3/3
2
Cin = 4
g = 4/3
4
1
Cin = 5
g = 5/3
34
Catalog of Gates
Logical effort of common gates
Gate type
Number of inputs
1
NAND
4/3
5/3
6/3
(n+2)/3
NOR
5/3
7/3
9/3
(2n+1)/3
4, 4
6, 12, 6
8, 16, 16, 8
Inverter
Tristate / mux
XOR, XNOR
35
Parasitic Delay
The parasitic delay of a gate is the delay of
Catalog of Gates
Parasitic delay of common gates
Gate type
Number of inputs
1
NAND
NOR
2n
Inverter
Tristate /
mux
XOR, XNOR
37
Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay: d =
Frequency: fosc =
g=
h=
p=
38
Logical Effort:
g=1
Electrical Effort:
h=1
Parasitic Delay:
p=1
Stage Delay: d = 2
Frequency: fosc = 1/(2*N*d) = 1/4N
39
Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay:
g=
h=
p=
d=
40
Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay:
g=1
h=4
p=1
d=5
Cout-path
Path Effort
F f i gi hi
10
g1 = 1
h1 = x/10
x
g2 = 5/3
h2 = y/x
Cin-path
y
g3 = 4/3
h3 = z/y
z
g4 = 1
h4 = 20/z
20
42
Cout-path
Path Effort
F f i gi hi
Cin-path
43
G
H
GH
h1
=
=
=
=
h2
= GH?
90
5
15
90
44
G
H
GH
h1
=1
= 90 / 5 = 18
= 18
= (15 +15) / 5 = 6
h2
= 90 / 15 = 6
= g1g2h1h2 = 36 = 2GH
90
5
15
90
45
Branching Effort
Introduce branching effort
b
B bi
Note:
BH
F = GBH
46
Multistage Delays
Path Effort Delay
DF f i
P pi
Path Delay
D d i DF P
47
f gi hi F
1
N
D NF P
This is a key result of logical effort
Gate Sizes
How wide should the gates be for least delay?
f gh g CCoutin
gi Couti
Cini
f
Working backward, apply capacitance transformation
A to B
x
x
y
45
y
45
50
y
45
y
Logical Effort
Electrical Effort
Branching Effort
Path Effort
Best Stage Effort
Parasitic Delay
Delay
D=
45
G=
H=
B=
F=
P=
51
y
45
y
45
Logical Effort
G = (4/3)*(5/3)*(5/3) = 100/27
Electrical Effort
H = 45/8
Branching Effort
B=3*2=6
Path Effort
F = GBH = 125
Best Stage Effort
f 3 F 5
Parasitic Delay
P=2+3+2=7
Delay
D = 3*5 + 7 = 22 = 4.4 FO4
52
f gh g CCoutin
y=
x=
gi Couti
Cini
f
x
x
A
y
45
y
45
53
f gh g CCoutin
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10
gi Couti
Cini
f
45
A P: 4
N: 4
P: 4
N: 6
P: 12
N: 3
45
54
InitialDriver
DatapathLoad
N:
f:
D:
64
1
64
2
64
3
64
4
55
InitialDriver
2.8
16
= NF1/N + P
= N(64)1/N + N
23
DatapathLoad
N:
f:
D:
64
1
64
65
64
2
8
18
64
3
4
15
64
4
2.8
15.3
Fastest
56
Derivation
Consider adding inverters to end of path
Logic Block:
n1Stages
Path Effort F
n1
D NF pi N n1 pinv
1
N
N - n1 ExtraInverters
i 1
1
1
1
D
F N ln F N F N pinv 0
N
1
N
pinv 1 ln 0
57
has no closed-form
solution
Neglecting parasitics (pinv = 0), we find =
2.718 (e)
For pinv = 1, solve numerically for = 3.59
58
Sensitivity Analysis
How sensitive is delay to using exactly the best
1.6
D(N) /D(N)
number of stages?
1.51
1.4
1.26
1.2
1.15
1.0
( =2.4)
(=6)
0.0
0.5
0.7
1.0
1.4
2.0
N/ N
We can be sloppy!
I like = 4
59
Example, Revisited
Ben Bitdiddle is the memory designer for the Motoroil 68W86,
32 bits
Register File
16 words
4:16 Decoder
16
Decoder specifications:
16 word register file
Each word is 32 bits wide
Each bit presents load of 3 unit-sized transistors
True and complementary address inputs A[3:0]
Each input may drive 10 unit-sized transistors
Ben needs to decide:
How many stages to use?
How large should each gate be?
How fast can decoder operate?
60
Number of Stages
Decoder effort is mainly electrical and branching
Electrical Effort:
Branching Effort:
H=
B=
Path Effort:
F=
Number of Stages:
N=
61
Number of Stages
Decoder effort is mainly electrical and branching
Electrical Effort:
Branching Effort:
H = (32*3) / 10 = 9.6
B=8
Path Effort:
F = GBH = 76.8
Number of Stages:
N = log4F = 3.1
10
A[2] A[2]
10
10
A[1] A[1]
10
10
G=
f
D
y=
A[0] A[0]
10
10
word[0]
96 units of wordline capacitance
word[15]
63
10
A[2] A[2]
10
10
A[1] A[1]
10
10
A[0] A[0]
10
10
word[0]
96 units of wordline capacitance
word[15]
64
Comparison
Compare many alternatives with a spreadsheet
Design
NAND4-INV
29.8
NAND2-NOR2
20/9
30.1
INV-NAND4-INV
22.1
NAND4-INV-INV-INV
21.1
NAND2-NOR2-INV-INV
20/9
20.5
NAND2-INV-NAND2-INV
16/9
19.7
INV-NAND2-INV-NAND2-INV
16/9
20.4
NAND2-INV-NAND2-INV-INV-INV 6
16/9
21.6
65
Review of Definitions
Term
Stage
Path
number of stages
logical effort
G gi
electrical effort
Cout
Cin
branching effort
Con-path Coff-path
Con-path
B bi
effort
f gh
F GBH
effort delay
DF f i
parasitic delay
P pi
delay
d f p
Cout-path
Cin-path
D d i DF P
66
F GBH
N log 4 F
1
N
D NF P
f F N1
gi Couti
Cini
f
67
68
Summary
Logical effort is useful for thinking of delay in circuits
69