A Novel Approach to Reduce Delay and Power in
VLSI Interconnects
Submitted in partial fulfillment of
the requirements for the degree of
Master of Science (by Research)
in
Electronics and Communication Engineering
by
Sandeep Saini
<saini sandeep@research.iiit.ac.in>
http://web.iiit.ac.in/∼saini sandeep
Under Guidance of
Dr M. B. Srinivas
Centre for VLSI and Embedded System Technologies
International Institute of Information Technology
Hyderabad, INDIA
May, 2010
© Copyright by Sandeep Saini, 2009
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
Hyderabad, India
CERTIFICATE
It is certified that the work contained in this thesis, titled “A Novel Approach to Reduce Delay and
Power in VLSI Interconnects” by Sandeep Saini, has been carried out under my supervision and is not
submitted elsewhere for a degree.
Date
Advisor: M B Srinivas
To my Parents
Acknowledgement
I am greatly indebted to my advisor, Dr. M. B. Srinivas. Sir, I could not have realized my
potential without your invaluable guidance, consistent encouragement and emphasis on quality of the
research contribution.Professor Srinivas is a wonderful teacher and person to work with. Professor
Srinivas has shared his profound knowledge and professional manner of conducting research. I am very
thankful to him for all the time he devoted to scientific discussions with me, as well as for his constant
encouragement.
Special thanks to J.V.R.Ravindra and Srihari for the brain-storming sessions we had, and for
guiding me in right direction right from the beginning of my research work.I appreciate very much their
invaluable assistance.
To my lab mates through Bachelors and Masters, I owe big thanks for the fun-centered atmosphere in CVEST and OBH. I have been fortunate enough to meet Gaurav, Maneesh, Khosla, Bajaj,
Rishi, Manan, Handa, Ramavtar, Sumit, Bhatt, Anshul, Bharat, Avinash, Kashi, Mohit, Abheet, Gopal.
All CVEST lab mates were equally supportive.
Finally, and most importantly, this thesis is dedicated to my parents, whose unconditional love
and support I have enjoyed throughout my life.
v
Abstract
Interconnects play a major role in deep submicron (DSM) technologies such as 90nm and
below. While gate delay dominated interconnect delay in earlier technologies, it is no longer the case
and delays associated with interconnects are becoming increasingly important. This is because in DSM
technologies, interconnect can no longer be seen as a simple resistor but the associated parasitics such
as capacitance and inductance also need to be considered. Thus any signal propagating through such an
interconnect can be expected to be delayed.
Buffer insertion is one popular technique to reduce (eliminate) the delay. In this technique,
buffers are placed at regular intervals along an interconnect that seeks to restore the signal each time it
is affected by the parasitics. However, buffers themselves have certain switching time that contributes to
delay. A large number of such buffers along an interconnect can thus contribute to overall delay to signal
propagation. Also buffer switching contributes to power dissipation. Further in DSM technologies,
leakage power is a major problem and buffers may consume power even when they are not switching.
Thus there is an urgent need to evolve techniques that while reducing the overall delay, also consume
lesser power, dynamic as well as static.
In this thesis, Schmitt trigger as an alternate to buffer to reduce delay and power in interconnects is examined. The most favorable feature of Schmitt trigger is it’s adjustable threshold voltage, and
since it can be controlled, the threshold voltage can be chosen to be above or below Vdd /2 a voltage at
which buffer normally operate. Thus a Schmitt trigger can be designed to switch faster than a buffer
leading to a reduction in delay. Further, the adjustable low-voltage threshold of the schmitt trigger handles more noise and voltage glitches as compared to buffer. Proposed approach is first implemented for
linear interconnects of various lengths and then on buses which are groups of interconnects. It is shown
that the proposed approach is better in terms of delay, power and crosstalk noise reduction compared to
that of buffers.
vi
List of Publications
1 Sandeep Saini, A. Mahesh Kumar, Sreehari Veeramachaneni, M.B.Srinivas, ”Alternative approach to Buffer Insertion for Delay and Power Reduction in VLSI Interconnects”, accepted in,
Journal of Low Power Electronics, to be published by American Scientific Publishers.
2 Sandeep Saini, A. Mahesh Kumar, Sreehari Veeramachaneni, M.B.Srinivas, ”Schmitt trigger as
an alternate to buffer for delay reduction in on chip buses”, Tencon 2009. 23rd to 26th Nov 2009,
Singapore, pages 1-5.
3 Sandeep Saini, A. Mahesh Kumar, Sreehari Veeramachaneni, M.B.Srinivas, ”Alternative approach to Buffer Insertion for Delay and Power Reduction in VLSI Interconnects”, VLSI design
2010, 3rd to 7th January 2010, Banglore, pages 411-416.
vii
Contents
Chapter
1 Introduction . . . . . . . . . . . . . .
1.1 Objective . . . . . . . . . . . . .
1.2 Motivation . . . . . . . . . . . . .
1.3 Literature Survey . . . . . . . . .
1.3.1 Need for a better approach
1.4 Contribution of the Thesis . . . .
1.5 Organization of the Thesis . . . .
Page
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
2 Introduction to Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Design Flows for DSM ASICs . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Interconnect Design Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Physical Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Interconnect Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1.1 Diffusion barrier . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1.2
Surface and grain boundary scattering . . . . . . . . . . . . .
2.3.1.3 Temperature effect . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1.4 High frequency effects . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Inductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3.1 Partial inductance . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3.2 Loop-based inductance . . . . . . . . . . . . . . . . . . . . .
2.3.3.3
High frequency effects . . . . . . . . . . . . . . . . . . . . .
2.4 Interconnect Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Single Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1.1
Lumped models . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1.2
Distributed models . . . . . . . . . . . . . . . . . . . . . . .
2.4.1.3 Lumped representation of distributed interconnects . . . . . .
2.4.1.4
Modeling frequency dependent effects . . . . . . . . . . . . .
2.4.2 Parallel Coupled Interconnects . . . . . . . . . . . . . . . . . . . . . .
2.5 Design Methodologies for Interconnect . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Constructing an Interconnect Tree . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 8
.
8
.
10
.
10
.
11
.
11
.
12
. 12
.
13
.
13
.
14
.
14
.
14
.
16
.
16
.
17
.
17
.
17
.
18
.
18
.
18
.
19
.
20
.
21
.
21
.
23
.
23
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
3
5
6
6
CONTENTS
2.5.2
2.5.3
2.5.4
2.5.5
ix
Wire Sizing, Shaping, and Spacing
Repeater Insertion . . . . . . . . .
Shielding Techniques . . . . . . . .
Net-Ordering and Wire Swizzling .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Buffer Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Repeater / buffer insertion process: An overview . . . . . . . . . . . . . . . . .
3.3 Propagation delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Power dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Short-circuit power dissipation . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Dynamic power dissipation . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Total power dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Area of the repeater system . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Design criteria for interconnect within a repeater system . . . . . . . . . . . . .
3.6.1 Constrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.2 Unconstrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.2.1
Power-delay-product design criterion . . . . . . . . . . . . . .
3.6.2.2
Power-delay-area-product design criterion . . . . . . . . . . .
3.7 Application of interconnect design methodology . . . . . . . . . . . . . . . . . .
3.8 Need for a better approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 28
.
29
. 30
.
31
. 34
.
35
.
36
.
37
.
38
.
41
.
41
.
41
.
41
.
42
. 42
. 44
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
. . 45
.
45
.
46
.
46
.
50
.
50
.
53
.
54
5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 NTRS 1997 predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Signal Propagation on a Linear Interconnect . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Types of interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Effect of Buffer Insertion on Delay, Noise and Power Reduction . . . . . . . . . . .
5.3.1 Delay Reduction using Buffer Insertion . . . . . . . . . . . . . . . . . . . .
5.3.2 Noise and Power reduction using Buffer Insertion . . . . . . . . . . . . . . .
5.4 Effect of Schmitt trigger on delay, noise and power reduction in Linear Interconnects
5.4.1 Delay reductions with Schmitt trigger . . . . . . . . . . . . . . . . . . . . .
5.4.2 Noise and power reduction with Schmitt trigger approach . . . . . . . . . . .
5.5 Replacement of Buffers in Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1 Signal Propagation in Buses . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2 Definitions and Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2.1
Low Power Coding . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2.2
Crosstalk Avoidance Coding . . . . . . . . . . . . . . . . . . . .
5.5.2.3
Error Control Coding . . . . . . . . . . . . . . . . . . . . . . . .
. . 55
.
55
. 56
.
56
.
60
.
62
.
64
.
68
.
69
.
71
. 74
.
74
.
77
.
77
.
77
.
78
4 Schmitt Trigger as an alternate to Buffer . . . . . . . .
4.1 Classical Schmitt Trigger . . . . . . . . . . . . .
4.2 Hysteresis in Schmitt Trigger . . . . . . . . . . .
4.3 CMOS Schmitt Trigger . . . . . . . . . . . . . .
4.4 Low Voltage Schmitt Trigger . . . . . . . . . . .
4.5 CMOS buffer . . . . . . . . . . . . . . . . . . .
4.6 Schmitt trigger as an alternate to buffer Insertion
4.7 Conclusions . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
25
26
27
CONTENTS
x
5.6
5.5.2.4 CAC coding Schemes . . . . . . . . . .
5.5.2.5 Relationship between delay and crosstalk
5.5.2.6
Interconnect Power Model . . . . . . .
5.5.3 Comparison with existing bus coding technique . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
78
78
80
81
82
6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1 Scope of further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
List of Figures
Figure
1.1
1.2
1.3
1.4
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
3.1
3.2
3.3
3.4
3.5
3.6
3.7
Page
The waveform for an 8 bit wide 1 mm long bus at 65nm technology . . . . . . . . . .
Percentage of nets requiring buffers. M3 and M6 represent nets on third and sixth metal
layer in a six metal layer technology. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Buffers as a percentage of the total cell count for the chip. . . . . . . . . . . . . . . . .
Hysteresis in Schmitt trigger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A conventional ASIC design flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A data path in a synchronous digital system . . . . . . . . . . . . . . . . . . . . . . .
Components of dynamic power dissipation due to different capacitance sources: gate
capacitance, diffusion capacitance, and interconnect capacitance. . . . . . . . . . . . .
Interconnect coupling noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cross section of an on-chip copper interconnect. . . . . . . . . . . . . . . . . . . . . .
Current distribution in the cross section of an interconnect at high frequencies. Darker
color indicates higher current density. . . . . . . . . . . . . . . . . . . . . . . . . . .
Skin depth of Cu as a function of frequency. . . . . . . . . . . . . . . . . . . . . . . .
Current distributions in the cross section of two parallel wires at high frequencies due
to the proximity effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lumped interconnect models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Circuit models of transmission lines. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modeling frequency dependent impedance with lumped elements. . . . . . . . . . . .
Decoupling multiple parallel coupled interconnects. . . . . . . . . . . . . . . . . . . .
An example of an A-tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shaping interconnect to minimize delay. . . . . . . . . . . . . . . . . . . . . . . . . .
Staggering repeaters to reduce the worst case delay and crosstalk noise. . . . . . . . .
Buffered interconnect tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples of net-ordering and wire swizzling. . . . . . . . . . . . . . . . . . . . . . .
Comparisions of Interconnect delay to gate delay . . . . . . . . . . . . . . . . . . . .
Minimum signal propagation delay and transient power dissipation as a function of line
width for a repeater system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Uniform repeater system driving a distributed RC interconnect. . . . . . . . . . . . . .
Wire sizing in a repeater insertion system . . . . . . . . . . . . . . . . . . . . . . . .
Optimum numbers of repeaters for minimum propagation delay for different line widths.
Optimum repeater size for minimum propagation delay for different line widths. . . . .
Minimum signal propagation delay as a function of interconnect width (l=5mm). . . .
xi
2
4
5
6
9
10
11
12
13
15
15
18
19
20
21
22
24
24
25
26
27
28
30
31
31
33
34
35
LIST OF FIGURES
xii
3.8
3.9
3.10
3.11
3.12
3.13
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
5.21
Minimum signal delay as a function of interconnects width for different line lengths. .
Dynamic power dissipation as a function of interconnect width for l=20 mm. . . . . .
Total transient power dissipation as a function of interconnects width. . . . . . . . . .
Interconnect area as a function of interconnects width for different line lengths. . . . .
Total area of the repeaters as a function of the interconnect width for different line lengths.
Product of interconnect and transistor area as a function of the interconnect width for
different line lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Schmitt trigger implementation with comparator . . . . . . . . . . . . . . . . . . . . .
Hysteresis in conventional Schmitt trigger. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
N-subcircuit driven by a voltage source: (a) circuit; (b) current-voltage characteristic;
(c) superposition of N- and P-subcircuit characteristics. . . . . . . . . . . . . . . . . .
1 V CMOS Schmitt trigger circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.4 V CMOS Schmitt trigger circuit derived from 1 V Schmitt trigger . . . . . . . . .
Measured hysterisis characteristics of 0.4 V CMOS Schmitt trigger circuit, and measured input-output waveform characteristics a Measured hysterisis characteristic of 0.4
V CMOS Schmitt trigger circuit b Measured input-output (Vin-Vout2) waveform characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CMOS buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 bit bus with buffers to restore signals. . . . . . . . . . . . . . . . . . . . . . . . . .
An RC interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interconnect structure used for simulations . . . . . . . . . . . . . . . . . . . . . . . .
Output end signals on a 2mm, 5mm and 10mm RC interconnect at 180nm technology.
Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 180nm technology.
Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 90nm technology.
Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 65nm technology.
Buffers inserted in an RLC interconnect. . . . . . . . . . . . . . . . . . . . . . . . . .
Delay reduction in 2mm interconnect with triangular input. . . . . . . . . . . . . . . .
Delay reduction in 2mm interconnect with square wave input. . . . . . . . . . . . . .
Delay reduction in 5mm interconnect with square wave input. . . . . . . . . . . . . .
Delay reduction in 10mm interconnect with square wave input. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Delay reduction using Schmitt trigger approach in 2mm interconnect with square wave
input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Delay reduction using Schmitt trigger approach in 5mm interconnect with square wave
input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Delay reduction using Schmitt trigger approach in 10mm interconnect with square wave
input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Noise reduction using schmitt trigger . . . . . . . . . . . . . . . . . . . . . . . . . . .
Behavior of buffer and Schmitt trigger towards a noisy signal. . . . . . . . . . . . . .
Data transfer on an 8 bit data bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data transfer on an 8 bit data bus in 65nm technology. . . . . . . . . . . . . . . . . . .
A 3 Bit to 4 wire coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
37
38
39
40
40
45
47
47
48
51
51
52
52
53
57
57
60
61
61
62
63
63
64
65
65
67
67
69
70
70
72
73
75
76
79
LIST OF FIGURES
5.22 Transition Probability Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.23 Example of Transition Probability Graph . . . . . . . . . . . . . . . . . . . . . . . . .
5.24 Data signals rectified using Schmitt trigger approach in an 8 bit data bus. . . . . . . . .
xiii
81
82
83
List of Tables
Table
3.1
Page
Uniform repeater system for different optimization criteria . . . . . . . . . . . . . . .
43
Projected advances in CMOS chip performance . . . . . . . . . . . . . . . . . . . . .
Interconnect dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interconnect Resistance, Inductance and Capacitance values . . . . . . . . . . . . . .
Propagation delay values for an interconnect of different length with and without buffer
insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Power consumption values for an interconnect of different length with and without
buffer insertion approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6 Propagation delay values for an interconnect of different length with buffer insertion
and delay reduction using Schmitt trigger approach . . . . . . . . . . . . . . . . . . .
5.7 Power consumption values for an interconnect of different length with buffer insertion
and reduction using Schmitt trigger approach. . . . . . . . . . . . . . . . . . . . . . .
5.8 Delay and Crosstalk Classes for various 3-bit combinations (transitions) . . . . . . . .
5.9 Propagation delay values for 8 bit buses of different length with buffer insertion and
delay reduction using Schmitt trigger approach . . . . . . . . . . . . . . . . . . . . .
5.10 Power consumption values for 8 bit buses of different length with buffer insertion and
reduction using Schmitt trigger approach. . . . . . . . . . . . . . . . . . . . . . . . .
56
58
59
5.1
5.2
5.3
5.4
xiv
66
68
71
73
80
84
84
Abbreviations
VLSI : Very Large Scale Integration
DSM : Deep Sub Micron
ASIC : Application-specific integrated circuit
RTL : Register transfer level
VHDL : VHSIC hardware description language
CMOS: Complementary metaloxidesemiconductor
PTM : Predictive Technology Model
PDP : Power-Delay-Product
MOSFET
: metaloxidesemiconductor field-effect transistor
SA
: Switching Activity
TA
: Transition Activity
MCF : Miller’s Coupling Factor
xv
Chapter 1
Introduction
1.1 Objective
In deep submicron (DSM) technologies, interconnects no longer behave as resistors but may
have associated parasitics such as capacitance and inductance. With a linear increase in interconnect
length, both the interconnect capacitance (C) and interconnect resistance (R) increase linearly, making
the RC delay increase quadratically. Although the RC delay is not a precise measure of the time necessary for a signal to propagate through a wire, the total RC delay of a section of a line may be useful
as a figure of merit. In order to increase the operating speed of an integrated circuit, it is necessary to
reduce the RC delay. In addition to increased signal propagation delay, increased power dissipation is
another effect of large interconnect impedance.
The total RC delay of an interconnect line can be reduced drastically with the insertion of a
signal amplifier known as a repeater. In CMOS technology, the simplest form of a repeater is produced
from a two transistor inverter. But as is discussed in Chapter 3, buffer insertion is becoming a bulky
technique for DSM technologies, requiring to find the solution with different approach. The objective
of the thesis is to develop an alternative approach to buffer Insertion for the purpose of delay, power and
noise reduction in VLSI interconnect in DSM technology.
1.2 Motivation
With the continuous trend of Very Large Scale Integration (VLSI) technology scaling and
frequency increasing, interconnect delay becomes a significant bottleneck in system performance [1,
2]. This trend is a result of increased resistance, capacitance and inductance of the interconnect when
feature sizes enter the nanometer era. From International Technology Roadmap for Semiconductors
(ITRS) projection, interconnect delay can contribute to more than 50% of the delay when the feature
size is beyond 180 nm [3, 4]. As a result, delay optimization techniques for interconnect are increasingly
important for achieving timing closure of high performance designs.
1
Figure 1.1 The waveform for an 8 bit wide 1 mm long bus at 65nm technology
2
Signals on an interconnect get highly distorted due to propagation delay and coupling effects
of adjacent lines. The effect of this is shown in figure 1.1 for a group of 8 interconnects laid side by
side at 65nm technology. This figure depicts the delayed signals on interconnects of length equal to 1
mm. There are not only visible propagation delays in each signal but also quite significant presence
of noise glitches due to switching signals on adjacent lines. Hence along with power and delay, noise
cancellation is also an important point to be noted while developing the algorithm/technique for better
transmission.
Reduction of delay and power consumption is the main motivation behind using repeater/buffer
insertion technique. In this technique a large interconnect is broken into smaller pieces and joined with
CMOS buffers. For example, assume a long interconnect has 5 units of resistance and 10 units of capacitance. The total RC delay would be 50 units. However, if five repeaters are inserted within this line to
break the interconnect into five equal pieces, the RC delay would be 1 x 2 + 1 x 2+ 1 x 2+ 1 x 2 + 1 x 2
= 10 units. If the delay of the five repeaters is less than 40 units, then there is a speed benefit to inserting
CMOS repeaters. Hence the solution for this problem has been approached in the same manner.
1.3 Literature Survey
The objective of buffer insertion is to find where to insert buffers in the interconnect so that
the timing requirements are met. Since the propagation (Elmore) delay has a square dependence on
the length of an RC interconnect line, subdividing the line into shorter sections is an effective strategy
to reduce the total propagation delay [6]. The interconnect can be subdivided into shorter sections by
inserting repeaters, which break the quadratic dependence of the delay on the interconnect length but
add additional parasitic impedances due to the inserted repeaters. Thus, an optimum number and size of
repeaters exist that minimizes the total propagation delay of the line [6].
Buffer insertion for a single net or interconnect tree is a well-researched problem. Ginneken
[9] proposed a time dynamic programming algorithm in 1991 to maximize the slack of the net that has a
time complexity of O(n2 ). Since then, his algorithm has become a classic in this field and a substantial
body of research has developed on the basis of van Ginneken’s algorithm. The work in [14] suggested
a wire segmenting algorithm to be used as a precursor to van Ginneken’s algorithm resulting in faster
run-time. Lillis et al. [13] extended the framework to minimize buffer cost while satisfying the timing
requirements. Li et al. [15] improved the time bound on van Ginneken’s algorithm to O(nlogn). The
authors of [16] proved that optimizing the total cost given arbitrary buffer costs is a NP-hard problem,
and also suggested techniques to improve the efficiency of Lillis’ algorithm. Previous researchers [10,
11, 12] have taken other approaches to solve different variants of the buffer insertion problem like simultaneous routing, simultaneous gate sizing, and inclusion of slew and signal integrity constraints.
3
Figure 1.3 Buffers as a percentage of the total cell count for the chip.
1.3.1 Need for a better approach
Buffer Insertion is a very effective approach for delay reduction. But as is clear from the
above section, in every new generation deep submicron technology, buffer insertion is becoming a major
problem, because of their number and also because they now a major source of power dissipation. Hence
a trade-off is required between delay and power consumed. Thus there is a need for a new approach that
while reducing the delay, also consumes less power.
Schmitt trigger is a special logic element adjusted to work with analog input signals. The
primary purpose of Schmitt trigger is to restore the shape of digital signals. Hence this element can replace buffer as far as restoring the signal is concerned. Because of transmission line effects digital shape
transforms from square to trapezoid or triangle or more complex signal. Of course during transmission
signals become noisy and distorted. Schmitt trigger is a comparator circuit but inside it is a positive
feedback what results in hysteresis and memory effect. Compared to simple logical elements, Schmitt
triggers have two threshold levels. Between these threshold values U1 and U2 the state of output doesn’t
change leading to what is called a hysteresis. Such effect stabilizes output against rapid triggering by
noise.
The benefit of a Schmitt trigger over a circuit with only a single input threshold (such as buffer)
is its greater stability (noise immunity). With only one input threshold, a noisy input signal near that
threshold could cause the output to switch rapidly back and forth from noise alone. A noisy Schmitt
Trigger input signal near one threshold can cause only one switch in output value, after which it would
have to move beyond the other threshold in order to cause another switch. Schmitt trigger can be easily
5
Figure 1.4 Hysteresis in Schmitt trigger.
implemented with 6 CMOS transistors.
This implementation ensures more noise reduction and early rise and fall of signal, which
causes less propagation delay too. Thus if Buffer is replaced with Schmitt trigger in interconnects, it
is expected to achieve more noise, delay and power reduction. In this thesis the advantages of having
Schmitt trigger in place of buffer in an interconnect are shown in detail.
1.4 Contribution of the Thesis
Delay and noise are two equivalent factors in DSM technology. For the purpose of signal
restoration and to handle the on-chip delay and noise, buffer insertion technique has been modified and
Schmitt trigger is used to replace it in VLSI interconnects at all the possible nodes. In Schmitt trigger,
the threshold voltage of the device can be adjusted, so if it is set to low then it can get an early rise in
rising signals and hence less propagation delay. The results of this replacement approach for various
lengths of linear interconnect for all technology nodes are compared in this work. It is shown in results
that the proposed technique is better for all the technologies. Since Schmitt trigger has the property
of dual threshold, hence this provides better noise immunity to the circuit. Better results are observed
when noise reduction results are compared for Buffer insertion and Schmitt trigger approach. The same
replacement approach has been proposed for data buses as an alternate to bus coding techniques for
delay, crosstalk noise and power reduction. It has been compared with some of the existing bus coding
techniques and found to be better than them.
1.5 Organization of the Thesis
Rests of the chapters in this thesis are organized as follows:
6
• Chapter 2 provides a description of interconnects while explaining the interconnect design criteria,
their basic properties and the models to represent them in circuits. Various existing models for
interconnect design are discussed in this chapter. Then it deals with the existing problems in
interconnects and their growing trends with next coming technologies, the possible solutions and
effectiveness of these solutions.
• Chapter 3 gives an introduction to conventional buffer insertion technique for the purpose of signal
restoration and delay reduction. Benefits of buffer insertion in linear interconnect and their use in
delay and noise reduction are explained along with various buffer insertion existing in literature.
This chapter provides the understanding of basics about propagation delay, power dissipation and
design criteria. It is also shown how buffer insertion is becoming a bulky technique and going
to consume more and more resources in incoming technologies. Limitations of buffer insertion
technology in terms of area and power consumption are discussed in the end.
• Chapter 4 introduces Schmitt trigger. History, invention and basic circuit implementation of
Schmitt trigger are discussed in early sections of the chapter. Implementation and working of
Schmitt trigger is discussed in detail. CMOS Schmitt triggers are mentioned in the later sections
of the chapter. Benefits of Schmitt trigger over buffer for the purpose of signal restoration and
delay reduction is discussed in the end.
• Chapter 5 contains the simulation results for all types of interconnects, namely local, intermediate
and global, with existing as well as proposed approaches. First of all the problems in Interconnects are simulated and then the conventional solution of buffer insertion. Simulations are done
in for 180nm to 65nm nodes using PTM parameters with H-Spice tool. Simulations are based on
the following criteria:
Propagation Delay
Noise reduction
Power reduction
• Chapter 6 draws conclusions of the thesis.
7
Chapter 2
Introduction to Interconnects
Due to the importance of interconnects in current and future ICs, significant research has been
published over the past several decades, covering different areas such as parasitic extraction, interconnect models, and interconnect design methodologies.
In this chapter, a brief review of the background of on-chip electrical interconnect is provided.
In Section 2.1, a typical design flow for application-specific integrated circuits (ASIC) is described.
Challenges in DSM technologies due to interconnect dominant behavior are discussed. In Section
2.2, different design criteria that need to be considered during the interconnect design procedure are
described. The impedance characteristics of interconnect are presented in Section 2.3; specially, the
resistance, capacitance, and inductance. Interconnect models and design methodologies are reviewed in
Sections 2.4 and 2.5, respectively. Finally, some conclusions are offered in Section 2.6.
2.1 Design Flows for DSM ASICs
A conventional design flow for ASICs is shown in Fig. 2.1 [19]. A typical design process
can be divided into two steps: functional design (front-end) and physical design (back-end). The functional design phase includes functional specification, VHDL/Verilog coding in the register transfer level
(RTL), and logic synthesis. A gate level netlist is generated as the result of logic synthesis. Functional
design is implemented during the front-end design process. The back-end physical design process converts a gate level netlist into a layout, including floorplaning, module placement, and interconnects
routing. From the physical layout, parasitic impedances are extracted. A post-layout timing analysis
tool is used to detect any timing violations. Necessary corrections are made in the physical layout or
gate level netlist to fox these violations. This design flow is successful for those technologies where
gate delays dominate. The timing of the circuits is determined by the gate types and loads. The effect
of the interconnect parasitic impedances typically produces only a few timing violations in a medium
speed application, making the design flow efficient. With interconnect becoming increasingly important, the interconnect delay needs to be considered during the functional design process. Due to the lack
of placement and routing information, the interconnect delay is approximated with statistical fan-out
based wire load models. The circuit design based on these inaccurate delay models can produce a large
8
number of timing violations. Design iterations are usually required to achieve timing closure. A method
to alleviate this problem is to introduce physical information earlier into the logic synthesis stage. An
initial floor plan is created before the synthesis procedure to provide an estimate of the location of the
cells as well as the interconnect lengths. A timing model based on this estimation is significantly more
accurate, making the synthesis process more efficient and resulting in a placed gate level netlist. This
synthesis procedure is called physical synthesis. In the DSM regime, the functional and physical design
processes are no longer separated, requiring tight integration of the front-end and back-end design processes. Interconnect plays an important role in both the physical synthesis and timing verification stages
Figure 2.1 A conventional ASIC design flow.
in the design flow. Requirements placed on the interconnect analysis are different in these two stages.
During the synthesis process, since the detailed routing information is not available, higher efficiency
with reasonable accuracy is preferred, such as closed-form models. In the post-layout verification stage,
realistic timing information describing the entire IC is determined, requiring both high efficiency and
high accuracy.
9
2.2
Interconnect Design Criteria
Since interconnect has become a dominant issue in high performance ICs, the focus of the
circuit design process has shifted from logic optimization to interconnect optimization. Multiple criteria
should be considered during the interconnect design process, such as delay, power dissipation, noise,
bandwidth, and physical area. These criteria are individually discussed in the following subsections.
2.2.1 Delay
Interconnect delay is a primary design criterion due to the close relationship to the speed of
a circuit. Early interconnect design methodologies [20] focused primarily on delay optimization. A
typical data path in a synchronous digital circuit is shown in Fig. 2.2. In the case of zero clock skew,
the minimum allowable clock period is
Tp min = TC
Q
+ Tint + Tlogicmax + Tsetup
(2.1)
where TC Q is the time required for the data to leave the initial register after the clock signal
arrives, Tint is the interconnect delay, Tlogicmax is the maximum logic gate delay, and Tsetup is the
required setup time of the receiving register. From (interconnect logical), by reducing Tint , the clock
period can be decreased, increasing the overall clock frequency of the circuit (assuming the data path is
a critical path).
In advanced microprocessors, multiple computational cores can be fabricated on the same die [17].
Figure 2.2 A data path in a synchronous digital system
Communication among these cores and on-chip memories generally requires multiple clock cycles.
Sometimes the computational core enters an idle state waiting for the required data or control signals
from other regions of the IC. The computational resource of these cores, therefore, cannot be efficiently
utilized due to the large amount of multi-cycle communication. By reducing the interconnect delay, the
speed of the system, i.e., the computational efficiency of the cores, can be improved at the architecture
level.
10
2.2.2
Power Dissipation
Due to higher clock frequencies and on-chip integration levels, power dissipation has significantly increased. The on-chip power dissipation of current state-of-the-art microprocessors is on the
order of hundreds of watts and the power density has exceeded the power density of a kitchen hot plate.
In Fig. 2.3 power-sharing, the components of dynamic power due to different capacitance sources are
shown for a state-of-the-art microprocessor [21]. The dynamic power due to the interconnect capacitance can be greater than 50% of the total dynamic power. Furthermore, the repeaters and pipeline
registers inserted in the interconnect introduce additional dynamic, leakage, and short-circuit power .
High power dissipation increases the packaging cost due to heating problems and shortens the battery
life in portable applications. Power dissipation, therefore, is another important criterion in interconnect
design.
Figure 2.3 Components of dynamic power dissipation due to different capacitance sources: gate capacitance, diffusion capacitance, and interconnect capacitance.
2.2.3
Noise
With interconnect scaling, coupling capacitance between (and among) interconnects dominates the ground capacitance. Furthermore, inductive coupling has to be considered due to increasing
signal frequencies, making coupling noise more significant (and complicated). Interconnect coupling induced noise can be classified into two categories: voltage level noise and delay uncertainty, as shown in
Fig. 2.4 interconnect-coupling. Noise may cause a malfunction in the circuit if the noise level is greater
than a certain threshold, thereby reducing yield. In addition to coupling effects, delay uncertainty can
11
Figure 2.4 Interconnect coupling noise.
also be caused by other factors, such as process variations (on both interconnects and the inserted repeaters or pipeline registers), temperature variations, and power/ground noise. Delay uncertainty is both
spatially dependent (due to process variations) and temporally dependent (due to coupling, temperature
variations, and power/ground noise). Timing margins are assigned to manage this delay uncertainty,
thereby increasing the clock period and reducing the overall performance of the circuits. When delay
uncertainty exceeds these margins, setup or hold violations may occur, reducing the yield.
2.2.4
Physical Area
With technology scaling, billions of transistors can now be integrated onto a single monolithic
die. The number of interconnects has therefore also significantly increased. The die size, however,
is expected to remain approximately fixed for future technologies as predicted in [18]. The number
of metal layers, therefore, needs to be increased to provide sufficient metal resources for interconnect
routing. Increasing the number of metal layers, however, increases the fabrication cost. Furthermore,
buffers and pipeline registers inserted along the interconnects make the constraint on silicon area more
stringent. The area criterion, therefore, should be considered during the interconnect design processes,
such as wire sizing and repeater insertion.
2.3 Interconnect Characteristics
The impedance characteristics of on-chip interconnect includes the resistance, capacitance,
and inductance. These parameters can be extracted from the geometry of the interconnect structures, as
illustrated in the following subsections.
12
2.3.1 Resistance
For a conductor with a rectangle cross-section, the resistance is described by the following
expression,
l
(2.2)
WH
Where ρ is the material resistivity. l, W, and H are the length, width, and thickness of the interconnect,
respectively. In present DSM CMOS technologies, copper has been adopted to replace aluminum as the
primary interconnect material due to the lower resistivity of copper as compared to aluminum. Due to
specialized processing and operating conditions of the on-chip copper interconnect, certain non-ideal
effects need to be considered, making the effective resistivity deviate from the idea bulk resistivity.
R=ρ∗
Figure 2.5 Cross section of an on-chip copper interconnect.
2.3.1.1 Diffusion barrier
For on-chip Cu interconnect, a thin and highly resistive barrier layer is built on three sides
of the interconnect to prevent Cu from diffusing into the surrounding dielectric, as shown in Fig. 2.6.
This barrier layer consumes part of the cross sectional area allocated to the interconnect. The effective
resistivity ρb due to this barrier induced reduction in the cross sectional area is
rhob =
ρ0
(1 −
Ab
WH )
(2.3)
Where ρ0 is the bulk resistivity at a given temperature, and Ab is the cross sectional area occupied by
the barrier layer.
13
2.3.1.2
Surface and grain boundary scattering
When the dimensions of the interconnect are scaled deep into the DSM regime, the resistivity
of the interconnect increases as the wire dimensions shrink. This behavior is due to surface and grain
boundary scattering [22], as illustrated in Fig. 2.7.
The electron mean-free path λ of copper is 42.1 nm at 0 degree Celsius. [22]. When any
dimension of the wire shrinks to the order of λ, the electrons will experience more collisions at the
surface, increasing the effective resistivity. A typical value of ρ for copper is 0.47 [22]. Note that in
(2.6) and (2.7), only one dimension (thin film structure) surface scattering is considered. For thin wires
with two-dimensional surface scattering effect, the effective resistivity is larger. A reduced k is used in
[24] to consider this two-dimensional surface scattering effect.
2.3.1.3 Temperature effect
The resistivity of copper increases approximately linearly with temperature and can be characterized as
ρt = ρ0 (1 + βδt)
(2.4)
where β is the temperature coefficient of resistivity (TCR) and δ T is the difference in temperature from a reference temperature. Since the electron mean-free path λ will decrease with increasing
temperature, the k will be resulting in a smaller ratio of ρs /ρ0 . The TCR for thin-film interconnect,
therefore, is smaller than that of bulk Cu [23].
2.3.1.4 High frequency effects
At sufficiently high frequencies, the current density in an interconnect is no longer uniform, as
shown in figure 2.8. The current tends to flow near the interconnect surface. This phenomenon is called
the skin effect [25]. The effective cross sectional area of the interconnect is reduced, thereby increasing
the interconnect resistance.
The skin depth is the distance below the conductor surface where the current density drops to
1/e of that at the surface, and is determined as:
r
δ(f ) =
ρ
πµf
(2.5)
where µ is the permeability in the conductor. Expression (2.4) actually characterizes the DC resistance,
and is no longer accurate when δ is smaller than the wire cross sectional dimension. The skin depth
of bulk Cu as a function of frequency at 20 degree Celsius is shown in figure 2.7. As the frequency
increases to tens of GHz, the skin depth enters the DSM region and decreases slowly.
Whether to consider these non-ideal effects depends upon the accuracy requirements of the models and
14
Figure 2.6 Current distribution in the cross section of an interconnect at high frequencies. Darker color
indicates higher current density.
Figure 2.7 Skin depth of Cu as a function of frequency.
15
the operating regime of the circuits. Often more than one effect needs to be simultaneously considered.
For example, the skin effect and surface scattering effect when simultaneously considered is known as
the anomalous skin effect (ASE).
2.3.2 Capacitance
Since interconnect delay dominates gate delay in the DSM regime, the requirement on the
accuracy of parasitic extraction of the interconnect impedances increases. 2-D or 3-D extraction is
generally required. A 3-D field solver, such as FastCap [26],can provide accurate capacitance results,
however, with large timing and memory requirements. With increasing integration, the number and
geometric complexity of the on-chip interconnects drastically increases. It is, therefore, not practical to
apply a field solver to an entire IC. Modern 3-D on-chip capacitance extraction can be divided into three
steps. Initially, test patterns are measured or simulated with a 2-D or 3-D field solver. The generated
data are used to derive closed-form formulae or to build look-up tables. The geometric parameters of the
interconnects are extracted next. Finally, the geometric parameters are matched to the test patterns, and
the capacitance values are obtained through formulae or look-up tables. Due to the short-range nature
of electrostatic interaction, only the nearest neighbors are considered during the process of capacitance
extraction. The capacitance matrices, therefore, are fairly sparse. Interconnect capacitance is composed
of two components, the capacitance between the interconnect and adjacent metal layers or substrate Cg ,
and the coupling capacitance between neighboring interconnects in the same layer Cc .Cc is expected
to dominate Cg in the DSM regime due to the increasing aspect ratio and decreasing wire spacing. In
early stage interconnect design and analysis, adjacent layers are generally treated as a ground plane for
capacitance extraction. By numerical fitting, closed-form capacitance expressions have been derived for
parallel lines above one ground plane or between two ground planes in [27, 28].
2.3.3 Inductance
As compared with resistance and capacitance, the interconnect inductance is significantly
more difficult to extract. One reason for this difficulty is due to the loop-based inductance definition,
Lij =
ψij
Ij
(2.6)
Where ψij is the magnetic flux in loop i induced by the current Ij in loop j. To form a loop,
the current return paths need to be identified. The current distribution in a circuit, however, a priori
depends on the interconnect characteristics. The effect of inductance in wide global interconnects in top
metal layers is more significant than that of local interconnects in lower metal layers. Since the wires
in adjacent layers are generally orthogonal, adjacent layers can no longer be treated as a ground plane
as in capacitance extraction. Another reason for the difficulty in inductance extraction is due to long
range inductive coupling effects. Artificially restricting the inductance extraction to nearby geometries
not only introduces inaccuracy but may also result in unstable models. The pattern matching method
16
used for capacitance extraction, therefore, can not be used for inductance extraction due to the complex
geometries surrounding the wire.
2.3.3.1 Partial inductance
One way to avoid determining a priori the current return path is to use the concept of partial
inductance [28]. In determining the partial inductance, the flux area extends from the conductor to
infinity. The loop inductance of a closed loop can be uniquely determined by the partial self-inductance
of each segment of the loop and the partial mutual inductance between any pair of those segments. The
partial inductance is used in partial element equivalent circuit (PEEC) models, which can be used to
accurately simulate a circuit. Partial inductance nonlinearly depends upon the interconnect length. This
behavior is the result of inductive coupling among different segments of the same line [25]. For a loop
formed by two closely placed parallel interconnects (where the length of the loop is more than ten times
longer than the loop width), the loop inductance depends linearly on the length of the loop. Note that the
inductance of a wire not forming a closed loop has no physical meaning [28]. When applying the concept
of partial inductance in circuit models, all of the wires that form the current loops should be included,
e.g., the reference ground lines. The current return paths are determined from circuit simulation. The
PEEC model generally results in huge and dense inductance matrices, increasing the computational
complexity of the simulation. Various methods have been presented to sparsify the inductance matrices
[29], such as the shell technique, the halo technique, and the K matrix technique.
2.3.3.2 Loop-based inductance
As an alternative to the PEEC model, a loop-based inductance model is preferred in welldesigned interconnect structures, such as shielded buses and clock distribution networks. In early design
stages, a good assumption regarding the current return path is the nearby power/ground networks, since
these tracks are generally wide with low resistive impedance. ’FastHenry’ is a commonly used numerical tool for extracting the partial or loop inductance of simple interconnects structures. By estimating
the distribution of the return current, more accurate loop-based inductance models have been developed
[30, 31].
2.3.3.3
High frequency effects
Inductance is also a function of frequency due to the variation of the current distribution with
frequency. In addition to the skin effect mentioned in Subsection 2.3.1, the current distribution inside
a conductor also changes with frequency due to the proximity effect [25]. The proximity effect in two
parallel interconnects is illustrated in figure 2.8. If the current in these two wires flows in opposite
directions, the currents concentrate towards each other, as shown in Fig. 2.10(a); otherwise, the two
currents shift away from each other, as shown in Fig. 2.10(b). Both the skin effect and the proximity
effect are essentially due to the same mechanism. The current tends to concentrate closer to the current
17
Figure 2.8 Current distributions in the cross section of two parallel wires at high frequencies due to the
proximity effect.
return path in order to minimize the inductance [35]. Note that at high frequencies, the resistance of a
conductor also depends on the surrounding signal activities due to the proximity effect.
Another effect of frequency on the inductance is due to multi-path current redistribution [34].
In an integrated circuit, there are many possible current return paths, e.g., the power/ground network,
nearby signal lines, and the substrate. The distribution of the return current among these possible paths
is determined by the impedance of the individual paths. At different frequencies, the relationship among
the impedances of different paths will change, as well as the distribution of the return current, as shown
in Fig. 2.11. The return current is distributed in those paths so as to minimize the total impedance at a
specific frequency [25].
2.4 Interconnect Models
Interconnect modeling is critical in both the circuit design and verification processes. An
efficient and accurate interconnect model can significantly enhance these processes. In Subsections
2.4.1 and 2.4.2, models of single interconnect and coupled interconnects are described, respectively.
2.4.1 Single Interconnect
The single interconnect model is the basis for many interconnect network simulation tools.
Various on-chip interconnect models have been presented over the past several decades, from lumped
C/RC/RLC models to distributed transmission lines. A tradeoff between efficiency and accuracy is
required in selecting the appropriate model.
2.4.1.1
Lumped models
For local interconnects with a length of tens of micrometers and below, the circuit behavior is
typically dominated by the capacitance and effective resistance of the gates. Modeling the interconnect
as a lumped capacitance or lumped RC structure is generally sufficiently accurate. Commonly used
lumped models include L, T, and π shaped structures, as depicted in figure 2.9.
18
Figure 2.9 Lumped interconnect models.
2.4.1.2
Distributed models
For long intermediate and global interconnects, the signal propagation delay along the interconnect is larger than the gate delay. In this case, the distributed characteristics of the interconnect
should be considered. Distributed interconnect can be characterized by the Telegrapher’s equations in
transmission line theory,
∂V
= −(R + sL) ∗ I
(2.7)
∂x
∂I
= −CV
∂x
(2.8)
Where R, L, and C are the interconnect impedance parameters per unit length, x is the distance along the interconnect, and s is the complex frequency. The conductance between the signal line
and ground can typically be ignored in on-chip structures. If the interconnect is non-uniform, these
parameters are a function of x. If frequency dependent effects need to be considered, these interconnect parameters are also a function of s. Besides the difficulties in inductance extraction, including
inductance in the model also makes circuit analysis more complicated due to inductance induced signal
reflection, ringing, and coupling effects. A figure of merit to characterize the condition when on-chip
inductance should be considered is presented in [35],
t
2
√r < l < p
2 LC
R L/C
(2.9)
Where tr is the signal transition time and l is the interconnect length.
Transmission line models are based on transverse electro-magnetic (TEM) mode or quasiTEM mode wave propagation. The TEM or quasi-TEM mode assumption is valid when the line crosssectional dimension is much smaller than the wavelength. This requirement can be generally satisfied
in on-chip structures. For example, the wavelength of a 100 GHz frequency component is on the order
of 1 mm, which several orders greater than the cross-sectional dimension are of interconnects in DSM
technologies. When using a transmission line model, both the resistance and the inductance should be
19
extracted from the loop formed by the signal line and the ground return path. Since the resistance of the
ground return path is generally much smaller than that of the signal line, the resistance of the ground
can be ignored. In a typical circuit representation of a transmission line, the loop inductance is assigned
to the signal line as shown in Fig.2.10.
Figure 2.10 Circuit models of transmission lines.
2.4.1.3 Lumped representation of distributed interconnects
A transient time domain simulation of a transmission line can be grouped into two categories:
impulse response convolution and lumped equivalent circuits [35]. In the first method, the transmission line is initially analyzed in the frequency domain. Next, a time domain impulse response (called a
Green’s function) is obtained based on the frequency domain solution. Finally, the time domain solution
is determined by convolving the Green’s function with the voltages at the line ports. Accurate results
can be provided with the penalty of long simulation times and excessive memory requirements due to
the convolution procedure. Furthermore, this method is not compatible with general circuit simulators,
such as SPICE. The second method is to partition the transmission line into a number of segments and
model each segment as a lumped structure. Additional segments provide more accurate results, but
consume more computational resources. The key issue in this method, therefore, is to determine the
20
appropriate number of segments.
Using lumped models to represent a distributed transmission line introduces inaccuracy when
evaluating circuits that operate at high frequencies. The highest frequency of interest, therefore, should
be determined in order to evaluate the maximum error induced by using lumped models. The frequency
domain representation of a normalized saturated ramp signal with rise time tr is
Vr (s) =
2.4.1.4
(1 − str )
(tr ∗ s2 )
(2.10)
Modeling frequency dependent effects
After partitioning a distributed line into lumped segments, frequency dependent effects can be
modeled in each segment by a ladder structure of frequency independent lumped RL elements, as shown
in figure 2.11. Additional ladder stages provide higher accuracy when operating at high frequencies.
Two stages are used in [30] and three stages are used in [31, 36]. The value of the circuit elements can
be obtained by matching the impedance of the model to the extracted impedance at different frequencies.
Figure 2.11 Modeling frequency dependent impedance with lumped elements.
2.4.2
Parallel Coupled Interconnects
Modeling parallel coupled interconnects draws special attention in the circuit design process
due to the commonly used bus structure [37]. A general solution for coupled multiconductor systems
is composed of two steps, decoupling the systems into independent interconnects, followed by applying
21
Figure 2.12 Decoupling multiple parallel coupled interconnects.
single line models to each of these interconnects. The decoupling procedure is illustrated in figure 2.12.
The Telegrapher’s equation describing a coupled multiple interconnect system becomes
∂V
= −(R + sL) ∗ I
∂x
(2.11)
∂I
= −CV
∂x
(2.12)
Where V and I are vectors of voltage and current along N coupled interconnects. R, L and C
are the matrices characterizing the impedance parameters per unit length.
The use of (2.11 and 2.12) assumes that the capacitive and inductive coupling among interconnects is restricted in the direction perpendicular to the direction of the signal propagation, i.e., forward
coupling [38] is ignored. For well designed circuits, this simplification is often valid [38]. By applying a
modal analysis [37], a coupled multiconductor system can be decoupled, i.e., the impedance matrices R
+ sL and sC in (2.11 and 2.12) can be converted into (much simpler) diagonal matrices. The modal decoupling method, however, generally is not analytically tractable, except for certain special cases, such
as two identical interconnects [40], multiple lossless wires [41], wires in a homogeneous dielectric , and
22
wires only coupled to direct neighbors. In general, the computational complexity required to decouple
a large number of coupled lossy interconnects with a modal analysis is high.
2.5
Design Methodologies for Interconnect
Since interconnect plays an important role in ICs, interconnect design methodologies have
been developed at different levels to satisfy specific performance requirements. In Subsection 2.5.1,
interconnect topology optimization methods are discussed, where interconnect trees are constructed.
Wire geometry optimization methods are reviewed in Subsection 2.5.2. Circuit level interconnect design
methodologies are described in Subsections 2.5.3, 2.5.4, and 2.5.5, including buffer insertion, shielding
techniques, and net-ordering/wire swizzling, respectively.
2.5.1 Constructing an Interconnect Tree
An interconnect tree network is a commonly used structure in ICs. Signals are transmitted
from the root of a tree to each leaf of the tree. When the circuit is dominated by the gates, the interconnects can be modeled as a lumped capacitance. A minimum Steiner tree (MST) is generally constructed
in this case such that the total wire length required to connect the source and sinks is minimized. The
capacitance of the tree, therefore, is also minimized, as well as the circuit delay and dynamic power.
With the circuit now dominated by the interconnect, both the interconnect resistance and inductance
need to be considered during the tree construction process. In this case, the delay at different sinks is
different. The required arrival time at each sink is also different. The slack at a node is defined asTslack = Trat − T delay
(2.13)
Where Trat is the require arrival time at that node and Tdelay is the delay from the source to that node.
In a properly designed tree, the slack at the source should be maximized for high performance while
minimizing the area and power overhead.
Some examples of tree constructions are A-tree, P-tree, and C-tree. In an A-tree, the Manhattan distance from the source to each sink is minimized. Subject to this constraint, the total wire length
is also minimized. An example of an A-tree is illustrated in Fig.2.13. During constructing of a P-tree,
the solution space is limited to a set of topologies induced by a permutation on the sinks. From this solution space, the optimal solution is chosen based on the delay or delay-area product. In the C-tree, the
sinks are first clustered according to the spatial, temporal, and polarity properties. After the clustering
procedure, tree structures are built within and among this clusters.
23
Figure 2.13 An example of an A-tree.
2.5.2
Wire Sizing, Shaping, and Spacing
Given a metal layer in a specified technology, the thickness of the wires and inter-layer dielectric (ILD) is fixed. The wire width and space, however, can be varied to satisfy different design criteria.
By explicitly characterizing the relationship between the interconnect impedance and wire geometries,
tradeoffs among the delay, bandwidth, and power of the global interconnect can be made. In [52], the
effects of inductance are included during the wire width optimization process to lower the power dissipation.
Figure 2.14 Shaping interconnect to minimize delay.
It is known that the optimal shape of an RC interconnects that minimize the Elmore delay is
an exponential taper, as shown in Fig. 2.14. Wire tapering increases the wire width near the driver and
decreases the wire width near the load. Since the near end resistance sees more downstream capacitance
than the far end resistance, assigning less resistance to the near end than to the far end will reduce
24
the total RC delay. In [44], the optimal shape of an RC interconnect is also shown to be exponential.
Exponential shaping, however, is more difficult to implement than uniformly sized wires.
2.5.3 Repeater Insertion
The delay of an RC interconnect is 0:377 RCl2 , which is proportional to the square of the
wire length l. By splitting the interconnect into k segments with repeaters, the interconnect delay term
is reduced to 0:377 RCl2 /k. These repeaters, however, introduce additional gate delay. The optimal
number and size of the repeaters can be determined to achieve the minimum delay [20]. As signals
propagate along the interconnect, sharper transition edges are regenerated by the repeaters, increasing
the bandwidth of the interconnect. By dividing the interconnect into segments, the coupling between
interconnects is also reduced due to the shorter length of coupling between neighboring lines. Inserting
repeaters in long interconnects, however, introduces an area and power penalty. A tradeoff among different design criteria is, therefore, required for an effecient repeater insertion methodology.
Figure 2.15 Staggering repeaters to reduce the worst case delay and crosstalk noise.
In [44], a repeater staggering technique is proposed to reduce the worst case delay and crosstalk
noise in bus structures. As shown in Fig. 2.15, the repeaters in adjacent wires are interleaved. By placing a repeater in the middle of two repeaters in adjacent wires, a potential worst case capacitive coupling
only persists for half the wire length. For the other half length, the capacitive coupling is the best case.
The worst case delay as well as the delay uncertainty can therefore be reduced. One of the advantages of
this technique is that no additional area overhead is required. By staggering the repeaters, the inductive
coupling among the wires can also be averaged. As shown in Fig. 2.15, for two simultaneously switching adjacent wires, the direction of current is the same for half the wire length and opposite for the other
half length. Inductive coupling due to the current flowing in different directions in the neighboring wire
can be cancelled. In [45], the optimum position of staggered repeaters is determined for RC interconnect
to achieve the minimum worst case delay.
25
Figure 2.16 Buffered interconnect tree.
Another significant application of repeater insertion is the buffered tree. The repeaters inserted
in an interconnect tree are also called buffers. Buffer insertion in tree structures is an important design
tool for interconnect optimization. Van Ginneken presented a dynamic programming algorithm to insert
buffers in a Steiner tree to minimize the Elmore delay. Van’s algorithm is composed of two phases. The
first phase is a bottom-up process, where all of the possible buffer insertion candidates are determined
for each node in the tree. In this process, those suboptimal candidates are eliminated such that the
number of candidates does not increase exponentially. After the candidates at the root are determined,
the candidate with the maximum slack is chosen. The second phase traces back the computations in the
first phase from this candidate and places buffers at the appropriate locations. Various extensions to this
algorithm have been presented in the last decade which considers low power, blockage constraints, and
more accurate delay models. In a properly designed buffered tree, as shown in Fig. 2.16, the buffers
should be inserted in the following situations:
1. Splitting long interconnect (buffers 1 and 2);
2. Isolating large capacitances from the critical path (buffer 3);
3. Cascading buffers to drive large capacitances (buffers 4, 5, and 6);
4. Reversing the signal polarity if necessary (inverter 7).
Note that interconnect tree construction, buffer insertion, and wire sizing can be performed simultaneously in order to achieve an optimal solution.
2.5.4 Shielding Techniques
Shielding techniques are widely used in ICs to reduce capacitive and inductive coupling. By
inserting a shield line (generally connected to the power or ground grid) between signal lines, the effective capacitance of the interconnect is almost fixed and no longer depends upon the signal switching
26
Figure 2.17 Examples of net-ordering and wire swizzling.
activity. With shielding, the normalized peak crosstalk noise can be reduced to less than 5% of Vdd for
RC interconnect with lengths ranging up to 2 mm.
Inductive coupling can also be reduced by inserting a shield line, though not as efficiently
as reducing capacitive coupling due to the long range magnetic coupling property. The shield line
provides a nearby current return path, reducing the self and mutual inductance of the signal lines. Due
to the importance of the on-chip clock signal, the clock distribution network in a high speed circuit is
generally shielded on both sides in the same layer . Additional parallel shielding in the N-2 layer has
been reported in [46] to further prevent inductive coupling from the lower layers. The primary drawback
of the shielding technique is the overhead of the metal resources.
2.5.5
Net-Ordering and Wire Swizzling
Interconnect coupling is closely related to the signal switching activity. For example, simultaneously opposite switching on two adjacent RC lines produces the worst case delay. By ordering the
nets such that the sensitive nets are not placed adjacent to each other, the total capacitive coupling among
the nets can be minimized. Examples of net-ordering and wire swizzling are shown in Fig. 2.17. The
net-ordering technique, however, is less efficient in reducing long range inductive coupling. In [47], the
net-ordering and shield insertion techniques are simultaneously performed to minimize both capacitive
and inductive coupling.
In wire swizzling, the wires are split into several segments, and the wire sequences in each
segment are changed, such that the capacitive coupling among the wires averages out for each wire,
reducing both the worst case delay and the delay uncertainty. For a group of k wires, the number of
permutations required to realize all possible adjacencies is k/2. For the example shown in Fig. 2.17, k =
4 and two permutations are required: 1234 and 2413. In [48], it is also shown that the mutual inductance
in a bus structure can be reduced by wire swizzling.
27
Chapter 3
Buffer Insertion
Over the past 10 years, the source of the critical signal delays has undergone a major transition.
With the scaling of active device feature sizes into the deep submicron regime, the on chip interconnect
has become the primary bottleneck in signal flow within high complexity, high speed integrated circuits
(ICs).The smaller feature size in DSM technology nodes reduces the delay of the active devices, however, the effect on delay due to the passive interconnects has increased rapidly, as described by the 2005
International Technology Roadmap for Semiconductors (lTRS). The transition from an IC dominated
by gate delays for feature sizes greater than 250um to where the interconnects are the primary source
of delay is graphically illustrated in Fig 3.1. As noted in the figure, the disparity between the relative
delay of the interconnect and active devices is exacerbated in each successive technology node. The
local wire delay decreases with feature size due to a reduction in the distance among the active devices.
Special attention must, however, be placed on the global lines, since the overall speed of current ICs is
most often limited by the long distance global interconnects.
Figure 3.1 Comparisions of Interconnect delay to gate delay
In this chapter we discuss the concept of buffer/repeater insertion in interconnects. This topic
is a highly researched one and various works have been done in optimizing interconnect delay with the
help of buffer insertion.
28
3.1 Background
As VLSI technology moves into the nanoscale regime, interconnect delay becomes a dominant
constraint in circuit design. A great amount of effort has been made to reduce interconnect delay and
buffer insertion appears tobe a very effective technique. It is witnessed in [13] that a large number of
buffers are needed with current IC technology. In two recent IBM ASIC designs, 25% gates are buffers
[14].
Interconnect design has become a dominant issue in high-speed integrated circuits (ICs). With
the decreased feature size of CMOS circuits, on-chip interconnect now dominates both circuit delay
and power dissipation. Many algorithms have been proposed to determine the optimum wire size that
minimizes a cost function such as the delay [49].
According to [2], the number of long interconnects doubles every three years thus increasing
the importance of on-chip interconnect further. The behavior of inductive interconnect can no longer be
neglected, particularly in long, low-resistance interconnect lines [3]. As on-chip inductance becomes
important, some wire optimization algorithms have been enhanced to consider RC impedances [4].
Uniform repeater insertion is an effective technique for driving long interconnects. Based on a
distributed RC interconnect model, a repeater insertion technique to minimize signal propagation delay
was introduced in [5]. A uniform repeater structure decreases the total delay as compared to a tapered
buffer structure when driving long resistive interconnects while buffer tapering is more efficient for driving large capacitive loads [6, 7]. Different techniques have been developed to enhance the model of a
repeater system that considers a variety of design factors [8,14]. The drain/source capacitance of each
repeater and multistage repeaters are considered in [15]. Noise-aware techniques for repeater insertion
and wire sizing have been described in [16-19]. In [20,22], signal integrity, interconnect reliability, and
manufacturability issues have been discussed.
The work described in [23] assumes that increasing the interconnect width while maintaining the thickness, spacing, and height from the substrate does not reduce the signal delay since the
resistance decreases and the capacitance increases. This assumption however is not accurate. Different
factors affect the total delay such as the coupling capacitance, the driver size, and the load capacitance.
Furthermore, with increasing inductive impedances, trends in the propagation delay with changing line
width depend upon the number of repeaters and the size of the inserted repeaters.
For an RC line, repeater insertion outperforms wire sizing [24]. It is shown in this chapter that
this behavior is not the case for an RC line. The minimum signal propagation delay always decreases
with increasing line width for RC lines if an optimum repeater system is used. With increasing demand
for low-power ICs, different strategies have been developed to minimize power in the repeater insertion
29
process. Power dissipation and area overhead have been considered in previous work [25-30]. The line
inductance, however, has yet to be considered in the optimization process of sizing a wire driven by a
repeater system. As shown in Fig. 3.2, the minimum delay for a signal to propagate along an RC line
decreases while the power dissipation increases for wider interconnect [31].
Figure 3.2 Minimum signal propagation delay and transient power dissipation as a function of line
width for a repeater system.
3.2 Repeater / buffer insertion process: An overview
The primary objective of a uniform repeater insertion system is to minimize the time for a
signal to propagate through a long interconnect. Uniform repeater insertion techniques divide the interconnect into equal sections and employ equal size repeaters to drive each section as shown in Fig. 3.3.
In some practical situations, the optimum location of the repeaters cannot be achieved due to physical
space constraints. Also changing the repeater size can compensate for a change in the ideal physical
placement. Bakoglu and Meindl have developed closed-form expressions for the optimum number and
size of repeaters to achieve the minimum signal propagation delay in an RC interconnect [5]. Adler
and Friedman characterized a timing model for a CMOS inverter driving an RC load [32, 33]. They
used this model to enhance the accuracy of the repeater insertion process in RC interconnects. Alpert
considered the interconnect width as a design parameter [24]. He showed that, for RC lines, repeater
insertion outperforms wire sizing.
The delay can be greatly affected by the line inductance, particularly low-resistance materials
with fast signal transitions. Ismail and Friedman extended previous research in repeater insertion by
30
Figure 3.3 Uniform repeater system driving a distributed RC interconnect.
Figure 3.4 Wire sizing in a repeater insertion system
considering the line inductance [34]. They showed that on-chip inductance can decrease the delay, area,
and power of the repeater insertion process as compared to an RC line model [35].
Interconnect sizing within a repeater system affects two primary design parameters, the number of repeaters and the optimum size of each repeater as shown in Fig. 3.4. Different tradeoffs in sizing
long inductive interconnect driven by an optimum repeater system are investigated in this paper. Design
criteria are developed to determine the optimum width, while considering different design objectives,
such as the delay, power, and area.
3.3 Propagation delay
The interconnect resistance decreases with increasing line width, increasing Lint /Rint the
ratio between the line inductance and resistance. An increase in Lint /Rint decreases the number of
inserted repeaters to achieve the minimum propagation delay. For an RC line, the minimum signal
31
propagation delay decreases with wider wires until no repeaters should be used. Wire sizing outperforms
repeater insertion in RC lines.
Expressions for the optimum number of repeaters kopt−RC and the optimum repeater size
hopt−RC [34] are
s
Rint (Wint )Cint (Wint )
1
kopt−RC (Wint ) =
∗
(3.1)
2.3R0 C0
[1 + 0.16(TLint /Rint (Wint ))3 ]0.24
s
hopt−RC (Wint ) =
R0 Cint (Wint )
1
∗
Rint (Wint )C0 [1 + 0.16(TLint /Rint (Wint ))3 ]0.3
where
(3.2)
s
TLint /Rint (Wint ) =
Lint (Wint )/Rint (Wint )
R0 C0
(3.3)
C0 and R0 are the input capacitance and output resistance of a minimum size repeater, respectively. Rint (Wint ) and Cint (Wint ) are the interconnect line resistance and capacitance as functions of
the interconnect width.
For a copper interconnect line, low k dielectric material, R0 = 2k ohms , and C0 =1fF,kopt−RC
is determined from (3.1). For different line lengths l, the optimum number of repeaters kopt−RC is
illustrated in Fig. 3.5. It is shown in the figure that for an RC line, the optimum number of repeaters
which minimizes the signal propagation delay decreases with an increase in the line width for all line
lengths. The number of repeaters reaches zero (or only one driver at the beginning of the line) for an
interconnect width=3 mm and 4 mm for l=5mm and 10 mm, respectively. For widths greater than 4 mm,
the wire should be treated as one segment. A repeater system should not be used above a certain width
for each line length.
The line capacitance per unit length increases with line width. As the number of inserted
repeaters decreases with wider lines, a longer line section is driven by each repeater. An increase in the
section length and width increases the capacitance driven by each repeater. To drive a high capacitive
load, a larger repeater size is required to decrease the overall delay. As shown in Fig. 3.6, the optimum
repeater size hopt−RC is an increasing function of line width.
The minimum signal propagation delay of an optimum repeater system decreases with increasing line width as the total gate delay decreases. For an interconnect line, the total signal propagation
delay is
tpd−total (Wint ) = kopt−RC (Wint ) ∗ tpd−section (Wint )
(3.4)
Where tpd−section (Wint ) is the signal delay of each RC section as a function of the interconnect width.
1.35
e2.9ζ
+ 0.74(Rtr (Wint )Csection tpd−section (Wint ) + Rsection (Wint )CL (Wint )
ωn
tpd−section (Wint ) =
+R (W )C (W ) + 0.5R
(W )C
(W ))
tr
int
L
int
section
int
section
int
(3.5)
32
Figure 3.5 Optimum numbers of repeaters for minimum propagation delay for different line widths.
where
ωn ∗ (0.5R
section (Wint )Csection (Wint )+
2
ζ=
C
(W )R (W ) + C (W )(R
section
int
tr
int
int
L
section (Wint )
+ Rtr (Wint )))
1
ωn = p
Lsection (Wint )(Csection (Wint ) + CL (Wint ))
CL (Wint ) = Csection (Wint ) + hopt−RC (Wint )C0 ,
Rtr (Wint ) =
R0 (Wint )
,
hopt−RC (Wint )
,
(3.6)
(3.7)
(3.8)
(3.9)
Rsection (Wint ) =
Rline (Wint )
,
kopt−RC (Wint )
(3.10)
Lsection (Wint ) =
Lline (Wint )
,
kopt−RC (Wint )
(3.11)
Csection (Wint ) =
Cline (Wint )
,
kopt−RC (Wint )
(3.12)
33
Figure 3.6 Optimum repeater size for minimum propagation delay for different line widths.
The minimum delay [obtained from (3.4)] is shown in Fig. 3.7 as a function of interconnect
width. An increase in the inductive behavior of the line and a reduction in the number of repeaters
decrease the minimum signal propagation delay that can be achieved by a repeater system. The signal
delay for different line lengths is shown in Fig. 3.8. The lower limit in the propagation delay decreases
with increasing line width until the number of repeaters is zero. For a system of repeaters, there is no
optimum width at which the total propagation delay is minimum. Rather, the delay is a continuously
decreasing function of line width. The propagation delay with no repeaters in an RC line produces a
smaller signal propagation delay than using any number of repeaters with any repeater size. For RC
interconnect, wire sizing outperforms repeater insertion, producing a smaller signal propagation delay.
This characteristic is an important trend when developing a wire sizing methodology for a repeater
system.
3.4 Power dissipation
The power characteristics of a repeater insertion system are discussed in this section. The work
described in [25 - 30] considers power and area as design constraints. The line inductance, however, has
not been considered. In Section 3.4.1, the factors that affect the short-circuit power while considering
the line inductance of an interconnect driven by a repeater system is discussed. The dependence of
the dynamic power on wire size is described in Section 3.4.2. The total transient power dissipation
characteristics are summarized in Section 3.4.3.
34
Figure 3.7 Minimum signal propagation delay as a function of interconnect width (l=5mm).
3.4.1
Short-circuit power dissipation
Short-circuit current flows when both transistors within an inverting repeater are simultaneously on. In interconnects thin lines cause less dynamic power and higher short-circuit power to be
dissipated. Hence for thin resistive lines, the number of repeaters can be large. In this work the shortcircuit power dissipation in all repeaters along a line is considered. Short-circuit power depends on both
the input signal transition time and the load characteristics. A simple and accurate expression for the
short-circuit power dissipation of a repeater driving an RC load has been presented in [32]
Psc−section =
1
∗ Ipeak tbase Vdd f,
2
(3.13)
Where Ipeak is the peak current that flows from Vdd to ground, tbase is the time period during
which both transistors are on, Vdd is the supply voltage, and f is the switching frequency.
Tang used this expression to characterize the short-circuit power of an RC load [40]. A closed
form expression for the signal transition time at the far end of an RC line has been described in [41 - 43].
Increasing the line width has two competing effects on the short-circuit power. As described in [43],
the short-circuit power decreases when a line is under-damped. For wide interconnect, the short-circuit
power increases as the line capacitance becomes dominant. Furthermore, increasing the length of the
section by reducing the number of repeaters increases the short-circuit power of each section due to the
higher section impedance.
35
Figure 3.8 Minimum signal delay as a function of interconnects width for different line lengths.
The total short-circuit power of a repeater system is
Psc−total = kopt−RLS ∗ Psc−section
3.4.2
(3.14)
Dynamic power dissipation
The dynamic power is the power required to charge and discharge the various device and
interconnect capacitances. The total dynamic power is the summation of the CV 2 f power from the line
capacitance and the repeaters.
Pdyn−total = Pdyn−line + Pdyn−repeaters ;
(3.15)
2
Pdyn−repeaters = kopt−RC ∗ hopt−RC ∗ C0 ∗ Vdd
∗f
(3.16)
2
Pdyn−line = Cint ∗ Vdd
∗f
(3.17)
where
Pdyn−repeaters depend on both the number and size of each repeater. While the number of
repeaters decreases, the repeater size increases.
The dynamic power dissipated by a line increases with greater line capacitance (as the line
width is increased). The dynamic power of the repeaters, however, decreases since fewer repeaters
are used with wider lines. As shown in Fig. 3.9, the total dynamic power is a minimum for thin
interconnect. The effect of sizing the interconnect on the total transient power dissipation is discussed
in next subsection.
36
Figure 3.9 Dynamic power dissipation as a function of interconnect width for l=20 mm.
3.4.3 Total power dissipation
In order to develop an appropriate criterion for determining the optimal interconnect width
between repeaters, the total transient power dissipation of a system needs to be characterized. The total
transient power can be described as
n
Ptotal (Wint ) = Vdd f [kopt−RC (Wint )(1/2Ipeak (Wint )tbase (Wint ) + hopt−RC (Wint )Vdd C0 ) + Vdd Cint (Wint )]
(3.18)
All of the terms in (3.18) are functions of the line width except Vdd , C0 , and f. As described
in subsections 3.4.1 and 3.4.2, both transient power components decrease with increasing line width,
thereby decreasing the total power until the line capacitance becomes dominant.
For an RC interconnect, fewer repeaters are necessary to drive a line while achieving the
minimum propagation delay [34]. For an inductive interconnect, the line capacitance is typically larger
than the input capacitance of the repeaters. Increasing the width reduces the power dissipation of the
repeaters and increases the power dissipation of the line. The reduction in power dissipated by the
repeaters overcomes the increase in the interconnect power until the line capacitance dominates the line
impedance. After exceeding a certain width, the total power increases with increasing line width.
The total power dissipation as a function of line width for different interconnect lengths is
shown in Fig. 3.10. As the line width increases from the minimum width (i.e., 0.1. mm in the example
technology), the total power dissipation is reduced. A minimum transient power dissipation therefore
37
Figure 3.10 Total transient power dissipation as a function of interconnects width.
occurs with thin interconnect (see Fig. 3.10). The minimum transient power dissipation is obtained
from
∂Ptotal
=0
∂Wint
(3.19)
where ∂Ptotal /∂Wint is a nonlinear function of Wint. Numerical methods are used to obtain
values of Wint for specific interconnect and repeater parameters.
Over a range of practical interconnect width, the total transient power increases as shown in
Fig. 3.10. As the line length increases, the total power dissipation rapidly increases with increasing line
width as the interconnect capacitance becomes dominant.
3.5
Area of the repeater system
For a specific interconnect width within a repeater system, the optimum number and size
of the repeaters can be determined. Previous studies on repeaters have considered the silicon area,
ignoring the metal layer resources [25 - 30]. Long global interconnects are typically wide and require
shielding. In order to develop appropriate criteria for considering the area overhead, both the transistors
and interconnect are need to be characterized. The area of the interconnect metal can be described as
Aline (Wint ) = Wint l.
38
(3.20)
The interconnect metal area is illustrated in Fig. 3.11 as a function of the interconnect width.
For CMOS inverters used as repeaters, the total silicon area of the active repeaters is
Arepeater (Wint ) = 3kopt−RC (Wint )hopt−RC (Wint )L2n
(3.21)
where Ln is the feature size. The PMOS transistor of each repeater is assumed to be twice
the size of the NMOS transistor to achieve a symmetric transition. For an RC line, fewer repeaters are
needed to minimize the propagation delay, reducing the silicon area as shown in Fig. 3.12.
The active repeaters and the passive interconnects utilize different layers, making the area
overhead of both elements independent, particularly for interconnects routed on the upper layers. A
weighted product in (3.22) is used as a criterion to consider both area parameters in sizing the interconnect,
Aproduct (Wint ) = Arepeater (Wint )wr Aline (Wint )wl
(3.22)
where wr and wl are the weights of the two cost functions. For wr = wl = 1; the area product
of the system increases with different interconnect widths as shown in Fig. 3.13. Despite the reduction
in repeater area with increasing interconnect width, the increased area occupied by the interconnect
increases the overall area of the repeater system.
Figure 3.11 Interconnect area as a function of interconnects width for different line lengths.
39
Figure 3.12 Total area of the repeaters as a function of the interconnect width for different line lengths.
Figure 3.13 Product of interconnect and transistor area as a function of the interconnect width for
different line lengths.
40
3.6 Design criteria for interconnect within a repeater system
In this section, different designs criteria to size interconnect within a repeater system are developed. The optimization criteria can be applied to different repeater systems. In subsection 3.6.1, a
constrained system is considered. Application to an unconstrained system is discussed in subsection
3.6.2.
3.6.1 Constrained systems
For a constrained system, there is a delay target (minimum speed or maximum delay) and/or a
limit on the power dissipation. The minimum signal propagation delay determines a lower limit on the
line width while the maximum power dissipation determines the upper limit.
If the minimum limit on the line width obtained from (3.4) is greater than the maximum width
obtained from (3.18), both limits cannot be simultaneously satisfied and one of the design constrains
needs to be relaxed. If the minimum limit is lower than the maximum limit, both constraints can be
satisfied.
For a constrained system, the transistor or metal area has an upper limit. The two factors
change differently with the width; therefore, there is a tradeoff between the two area components.
3.6.2
Unconstrained systems
For an RC line, there are four criteria to size interconnect in an unconstrained system. The
first criterion is for minimum power while sacrificing speed. The optimum solution for this criterion is
obtained from (3.19).
The second criterion is for minimum delay. As no optimum interconnect width exists for
minimum propagation delay, the practical limit is either the maximum repeater size or no repeaters, and
whichever produces a tighter constraint. The constraint in this case is either the maximum repeater size
or the maximum line width. The optimum number of repeaters for a target line width is determined
from [34]. If not possible, no repeaters should be used and the design problem reduces to choosing the
width of a single section of interconnect [31].
The third and fourth criteria are presented in the following subsections. In Section 3.6.2.1, the
Power-Delay-Product (PDP) as a criterion to size an interconnect within a repeater system is described.
The Power-Delay-Area-Product (PDAP) is introduced in Section 3.6.2.2 as an alternative design criterion.
3.6.2.1
Power-delay-product design criterion
The PDP criterion satisfies both the power dissipation and speed with no constraints on the
area. From the discussions in Sections 3.2 and 3.3, the minimum signal propagation delay of an RC
interconnect driven by a repeater system decreases with increasing line width. Alternatively, the total
41
transient power has a global minimum at a narrow width. Over the entire range of line width, the
total transient power increases with increasing line width. At a line width smaller than the line width
for minimum power, the power and delay both increase. An upper limit on the line width is reached
where the minimum propagation delay of a repeater system is attained. Beyond that limit, a single
segment sizing criterion should be used to optimize the width according to a cost function (i.e., delay
[1] or power [4143]). Between these two limits, a tradeoff exists between the power dissipation and
signal propagation delay. A single expression for the Power-Delay-Product (PDP) as a function of the
interconnect width is
P DP (Wint ) = Ptotal (Wint )wp ∗ tpd−total (Wint )wd ,
(3.23)
where wp and wd are the weights of the cost functions. A local minimum for the PDP exists for each line
length. The minimum power delay product is obtained by numerically solving the nonlinear equation,
∂P DP
=0
∂Wint
(3.24)
The weights wp and wd describe which design objective is more highly valued.
3.6.2.2
Power-delay-area-product design criterion
The criterion described in Section 3.6.2.1 does not include the area of the system as a design
parameter. In order to include the area of the system, the PDAP criterion is introduced. This criterion
satisfies both the power dissipation and speed while considering area. The Power-Delay-Area-Product
(PDAP) can be used as a criterion to size the interconnect. A single expression for the PDAP as a
function of the interconnect width is
P DP (Wint ) = Ptotal (Wint )wp ∗ tpd−total (Wint )wd ∗ Arepeater (Wint )wr ∗ Aline (Wint )wl
(3.25)
A local minimum for the PDAP exists for each line length. The minimum PDAP is obtained by numerically solving the nonlinear equation,
∂P DP
=0
(3.26)
∂Wint
3.7 Application of interconnect design methodology
The four criteria are applied to a 65nm CMOS technology to determine the optimum solution
for different line lengths. No limit on the maximum buffer size is assumed. In order to characterize
the line inductance in terms of the geometric dimensions, an interconnect line shielded by two ground
lines is assumed. An interconnect line with resistance per square R¤ = 250mW/¤, capacitance per
unit length for minimum width CW min =66 fF/mm, and inductance per unit length for minimum width
LW min =1nH/mm is used. For a repeater system with the following characteristics, C0 = 1fF and wp =
wd 1; the optimum solution for each criterion is listed in Table 3.1. A clock signal with a 20 ps transition
42
l = 5mm:
Wint (µm)
Number Of Repeaters
Repeater Size (of Minimum)
Minimum delay (ns)
Totol Increase (times)
Power (mW)
Total Increase (Percentage)
l = 5mm:
Wint (µm)
Number Of Repeaters
Repeater Size (of Minimum)
Minimum delay (ns)
Totol Increase (times)
Power (mW)
Total Increase (Percentage)
Minimum Power
No repeater
Minimum PDP
0.8
1
43.3
0.157
2
1.73
0
2.1
0
61.2
0.051
1
1.98
14.5
2.1
0
61.2
0.051
1
1.98
14.5
0.8
5
43.2
3.87
19.36
5.2
0
20
0
225.6
0.19
1
21.31
310
3.9
1
80.7
0.43
1.26
7.58
45.7
Table 3.1 Uniform repeater system for different optimization criteria
time ramp input signal and 250MHz frequency is used to determine the propagation delay and power
dissipation.
The optimum line width for each design criterion is listed in the first row for each line length.
The optimum number and size of the repeaters for each line width is listed in the second and third row
of each line length. The per cent increase in the minimum propagation delay based on the optimum
power and PDP as compared to no repeaters is also listed. The per cent increase in the total transient
power dissipation is provided.
For an l = 5mm line, the optimum interconnect width for both minimum PDP and no repeaters
is the same, producing a 14.5% increase in power as compared to the optimum width for minimum
power and a reduction of 68% as compared to the optimum width for minimum signal propagation
delay.
For short interconnects, few repeaters are necessary to produce the minimum propagation delay. For longer interconnect, an increase in the line capacitance rapidly increases the power dissipation,
while the minimum propagation delay decreases more slowly.
For l=15mm, the optimum solution that minimizes PDP increases the delay by 1.26 rather than
20 times for the solution for minimum power. The power increases by 45% rather than 3.1 times for
the no repeater solution. Optimizing the interconnect to produce the minimum power delay produces a
smaller increase in both the power and delay as compared to separately optimizing either the power or
delay. A reduction in the minimum propagation delay of 89% and in the power dissipation of 65% is
achieved if the optimum width for the minimum PDP is used rather than the optimum width for either
minimum power or no repeaters.
43
3.8 Need for a better approach
In this chapter various aspects of buffer insertion technique for interconnect modeling including delay reduction; power consumption and area consumed by repeaters in system are discussed.
Figure 3.7 depicts the increase in propagation delay with decreasing technology size. Figure 3.5 shows
the exponential increase in the optimum number of buffer required for different interconnect length with
decreasing technology size. Hence even an optimum number of buffers are not enough to reduce the
enormous increase in the propagation delay.
It is observed from graph in figure 3.9 that with decreasing interconnect width total dynamic
power and interconnect power is decreasing but the power consumed by repeaters is increasing. This
proves to be a major factor in power optimization.
One of the major limitations of buffer insertion is the increasing number of repeaters in system.
As it is shown in chapter 1, figure 1.2 shows the increasing percentage of total buffered nets in every
technology node. Similarly figure 1.3 shows the increase in buffered cell with each next technology.
Hence buffers are occupying a major portion of total area in the system. Similar results are shown in
figure 3.12, which shows the exponential increase in area consumed by buffers for different interconnect
length.
All these factors are not in favor of buffer insertion for interconnect modeling. Thus a major
breakthrough is needed to handle interconnects. Hence keeping in mind of all the problems being
faced and to be coming with buffer insertion, in this thesis, an alternate to buffer is proposed and tried
analyzing the results. In the new approach buffer is replaced by Schmitt trigger and analyzed all the
above mentioned factors in next chapter.
44
Chapter 4
Schmitt Trigger as an alternate to Buffer
4.1 Classical Schmitt Trigger
The classic Schmitt trigger is implemented using an op-amp with two resistors to conduct a
regenerative feedback [61].
Schmitt triggers are typically built around comparators, connected to have positive feedback
instead of the usual negative feedback. For this circuit the switching occurs near ground, with the
amount of hysteresis controlled by the resistances of R1 and R2: circuit representation of Schmitt trigger
is shown in figure 4.1.
Figure 4.1 Schmitt trigger implementation with comparator
The comparator gives out the highest voltage it can, +VS , when the non-inverting (+) input
is at a higher voltage than the inverting (-) input, and then switches to the lowest output voltage it can,
−VS , when the positive input drops below the negative input. For very negative inputs, the output will
be low, and for very positive inputs, the output will be high, and so this is an implementation of a
”non-inverting” Schmitt trigger.
45
For instance, if the Schmitt trigger is currently in the high state, the output will be at the
positive power supply rail (+VS ). V+ is then a voltage divider between Vin and +VS . The comparator
will switch when V+ = 0 (ground). Current conservation shows that this requires:
Vin
Vs
=−
R1
R2
(4.1)
1
and so Vin must drop below - R
R2 ∗ Vs to get the output to switch. Once the comparator output
1
has switched to -VS , the threshold becomes + R
R2 ∗ Vs to switch back to high. So this circuit creates a
1
switching band centered around zero, with trigger levels ± R
R2 ∗ Vs . The input voltage must rise above
the top of the band, and then below the bottom of the band, for the output to switch on and then back
off. If R1 is zero or R2 is infinity (i.e., an open circuit), the band collapses to zero width, and it behaves
as a standard comparator. The output characteristic is shown in the picture on the right. The value of
R1
the threshold T is given by R
∗ Vs and the maximum value of the output M is the power supply rail.
2
4.2 Hysteresis in Schmitt Trigger
A phenomenon wherein two (or more) physical quantities bear a relationship which depends
on prior history. More specifically, the response Y takes on different values for an increasing input X
than for a decreasing X.
If one cycles X over an appropriate range, the plot of Y versus X gives a closed curve which is
referred to as the hysteresis loop. The response Y appears to be lagging the input X. Hysteresis occurs in
many fields of science. Schmitt trigger also have this property. Reason for hysteresis in Schmitt trigger
is dual threshold voltage. Once the input voltage crosses one of the threshold voltages, output remains
above that voltage untill it crosses the other threshold voltage. This delay results in the hysteresis curve
of Schmitt trigger. figure 4.2 shows the hysteresis in classical schmitt trigger.
4.3 CMOS Schmitt Trigger
The CMOS Schmitt trigger, along with its transfer characteristics is shown in figure 4.3.
In bipolar technology, p-n-p transistors are much slower than their n-p-n counterparts [61],
and the bipolar prototype for the whole circuit of Fig. 4.3 (a) is not known. A bipolar Schmitt trigger
includes an n-p-n differential pair loaded with a resistor. The circuit of Fig. 4.3(a) includes two similar
subcircuits (M1, M2, M3 and M4, M5, M6). Each of them is a highly nonlinear load for the other.
However, as shown subsequently, at each transition point one subcircuit can be considered as a linear
resistive load for the other. In the circuit of Fig. 4.3 (a), the bottom circuit MI, M2, M3 (which is
called here the N-subcircuit), is loaded by the top circuit, M4, M5, M6 (P-subcircuit). As a result of
the circuit symmetry, the inverse statement is also valid. To obtain the voltage-current characteristics
of these nonlinear loads, one can take, for example, the N-subcircuit, apply a voltage source VO, and
46
Figure 4.2 Hysteresis in conventional Schmitt trigger.
Figure 4.3
47
calculate the source current IO, assuming a constant voltage VG at the gates of M1 and M2 [Fig. 4.4
(a)].
Figure 4.4 N-subcircuit driven by a voltage source: (a) circuit; (b) current-voltage characteristic; (c)
superposition of N- and P-subcircuit characteristics.
When the voltage V, is very small, transistor M3 will be off, and M1 and M2 are in the triode
mode of operation. The current I, is equal to
I = 2K1 (VG − VT N ) ∗ VN
(4.2)
If we consider transistor M1 , or
I = 2K2 (VG − VN − VT N )(Vo − VN )
(4.3)
if one considers M2 Here k1 = 0.5(µn Cox )(W/L), as usual, and VT N is the threshold voltage of n-channel
transistors. For pchannel transistors, one has to use µp and VT P . It is assumed in (4.2) and (4.3) that VG
> VT N . For the triode mode of operation, VN << VT N and the last equation can be simplified to
I = 2K2 (VG − VT N )(Vo − VN )
(4.4)
from 4.2 and 4.4 one can obtain that
VN = VO
48
k2
k2
(4.5)
and
I=
2K1 K2 (VG − VT N )
Vo
k2
(4.6)
from (4.6) one can find that
RLN = [
k1−1 + k2−1
∂I −1
] =
∂Vo
2(VG − VT N )
(4.7)
It is seen from (4.5) and (4.7) that, in this part of the subcircuit operation, transistors M1 and
M2 may be considered as a series connection of two resistors.
When Vo , increases, M2 enters into saturation (pinch-off). Then I,is determined, depending
on the considered transistor, or by
I = 2k1 [VG − VT N − (VN /2)]VN
(4.8)
I = k2 (VG − VN − VT N )2
(4.9)
or
from 4.8 and 4.9 one can find that
r
VN = (VG − VT N )(1 −
k1
)
k1 + k2
(4.10)
and does not depend on Vo . This means [Fig. 4.4(b)] that when the voltage V, achieves the value of
VoS = VG − VT N
(4.11)
then current Ibecomes constant, equal to
IN =
k1 k2
(VG − VT N )2
k2
(4.12)
Yet, an additional increase of Vo will gradually introduce some changes. When Vo achieves the value of
r
k1
VoT = VG − (VG − VT N )
(4.13)
k1 + k2
then transistor M3 will be turned on, V . starts to increase again, and the current I, is diminishing. When
V, becomes equal to
p
VoC = VG + (VG − VT N ) k1 /k3
(4.14)
transistor M2 will be completely turned off and I, becomes equal to zero. At this instant, voltage VN
will be equal to VG - VT N and M1 is entering into saturation. Transistor M1 cames the current
IN = k1 (VG − VT N )2
(4.15)
which is completely intercepted by M3. Additional increase of Vp up to VDD does not bring any changes
and completes the current-voltage characteristic of the N-subcircuit.
49
4.4 Low Voltage Schmitt Trigger
With shrinking technology, power consumption is increasing in all CMOS devices and hence
low voltage and low power designs of Schmitt trigger have been proposed. Fig. 4.5 shows the 1 V
Schmitt trigger circuit. In this design, a dynamic body-bias is applied to a simple CMOS inverter circuit,
whereby the threshold voltages of the two MOSFETs can be changed, thus changing the switching
voltage. The operation of the circuit of Fig. 4.5 can be described as follows. First, the values of
bias voltages Vbias,p and Vbias,n are, respectively, set externally to values (-—Vthp3 + 0.1) V and (—Vthn3 - 0.1)) V. This ensures that the drain voltage magnitudes of the MOSFETs Mp3 and Mn3 (body
voltage magnitudes of the MOSFETs Mp1 and Mn1) will have a value of +0.1 V minimum, and 0.1 V maximum, respectively, when the transistor is conducting. This will limit forward body-bias in
transistors Mn1 and Mp1 to 0.4 V. A forward bias greater than 0.4 V may trigger latch-up in a CMOS
circuit. When a low value signal is applied to Vin , Vout2 goes low. Vout2 provides zero forward bodybias to the transistors of Mn1 through Mn3 operating in linear region and a forward bias of 0.4 V to
Mp1 through Mp3 operating in saturation region. The substrate of transistor Mn1 is biased at -0.5 V
and its threshold voltage now corresponds to the value at zero substrate bias, Vtho,n1 , while the substrate
of transistor Mp1 is biased at +0.1 V with its threshold voltage corresponding to +0.4 V forward-bias
value, Vth,p1 .
Transistor Mp1 remains on and Mn1 remains off until Vin increases to a certain voltage Vhl , at
which output, Vout1 switches from a high to a low value and Vout2 switches from a low to a high value.
Since Mn1 substrate is zero body-bias, its threshold voltage Vtho,n1 is higher than the value for forward
body-bias. Hence, a higher voltage is needed to turn Mn1 on. For a ramp input, this results in a time
delay t1, as Vout1 goes to a low value and Vout2 goes to a high value of VDD. This provides a 0.4 V
forward body bias to Mn1 through the transistor Mn3 operating in saturation at the end of the switching
transient period. A zero bodybias is now provided to Mp1 through the transistor Mp3 operating in
the linear region at the end of the switching transient. Transistor Mp1 is now off and Mn1 remains
on until Vin decreases to a certain voltage Vlh , at which output, Vout1 switches from low to high and
Vout2 switches from high to low. Since Mn1 has forward substrate body-bias, a lower voltage is now
needed to turn it off. This results in a time delay t2 for a ramp input. The different switching voltage
or switching time causes the hystersis. Vout1 is buffered by an Mp2-Mn2 inverter, which provides high
fan-out capability. Thus, output is taken at the Vout2 terminal. The circuit of 0.4V Schmitt trigger is
shown in figure 4.6 and the output voltage curve is shown in figure 4.7.
4.5 CMOS buffer
A buffer is designed with 2 CMOS inverters placed back to back as shown in figure 4.8.
Buffer is designed with minimal lambda parameters for 65nm technology by keeping Wp = 3Wn to
ensure equal rise and fall time. Second inverter is four times the size of first one to meet the current
carrying ability.
50
Figure 4.5 1 V CMOS Schmitt trigger circuit
Figure 4.6 0.4 V CMOS Schmitt trigger circuit derived from 1 V Schmitt trigger
51
Figure 4.7 Measured hysterisis characteristics of 0.4 V CMOS Schmitt trigger circuit, and measured
input-output waveform characteristics a Measured hysterisis characteristic of 0.4 V CMOS Schmitt
trigger circuit b Measured input-output (Vin-Vout2) waveform characteristics
Figure 4.8 CMOS buffer.
52
4.6 Schmitt trigger as an alternate to buffer Insertion
It has been discussed in chapter 3 that buffers are used for the purpose of signal restoration
and delay reduction. The most basic form of an interconnect is a linear interconnect with no neighbors. When we consider only one linear interconnect RC delay is the major factor deciding the signal
propagation delay. Hence buffers have to handle only signal delay. In this thesis, Schmitt trigger as a
replacement to buffer is studied.
Initally the focus is on a linear interconnect and the effect of replacing buffer with Schmitt
trigger is studied. Further since interconnects are also organized in groups to act as address or data
buses effect of Schmitt trigger in buses is also studied. In particular any possible advantage of Schmitt
trigger in mitigating signal cross talk is studied in detail.
As an example of a bus a typical 4 bit bus is shown in figure 4.9. Buffers are placed at regular
distances between transmitter and receiver. The interconnects segments in between them are considered
to be RC models. The total delay between transmitter and receiver is the sum of RC delay of all the
elements and switching time of buffers.
Figure 4.9 4 bit bus with buffers to restore signals.
In deep submicron technology, parasitics play a noticeable role in deciding the delay and
waveform shape. RC delay or Elmore delay [42] becomes the main factor the total delay. Crosstalk
effect of adjacent signals increases the switching activity an in turn delay. Delay reduction in buses is
dealt with various bus coding techniques [62, 63, 64, and 66]. In all these techniques extra hardware is
added before transmitter and data bits are encoded to have minimum switching activity [67] and thus
delay. In this process some amount of delay but at the cost of extra hardware is saved.
We have used a Schmitt trigger in place of the conventional buffer due to following reasons:
53
• Schmitt trigger can act as a signal restoring circuit; this is the main reason why we have looked
into the approach of using Schmitt trigger as an alternate of buffer in interconnects as a data
restoring element.
• Switching time of both the buffer as well as Schmitt trigger is same for a certain DSM technology,
but lower threshold of the Schmitt trigger allows the reduction in rise time and hence saves in
terms of total delay. Although the savings in rise time delay are of few ps only, but when we
consider slowly rising signals this saving is very significant.
• With the introduction of Schmitt trigger, all sorts of bus coding techniques can be neglected and
thus resulting in reduction of extra hardware and power consumption by those transistors.
• A low threshold buffer cannot be used just to lower the triggering voltage level, as it will result
in non uniform duty cycle, which is never desired in data transmission. While a low threshold
Schmitt trigger doesn’t hamper the duty cycle of the waveform due to presence of dual threshold.
• Higher noise margin of Schmitt trigger allows the circuit to handle the larger noise glitches. Thus
making the proposed approach more efficient. With this noise margin all the 6 types of crosstalk
noises are removed quite effectively.
• Reduced noise glitches result in lesser power consumption and hence help in reducing the total
power consumed.
• Schmitt trigger has 15% more cell area, when fabricated. But major reduction in power and delay
and better noise handling justifies the extra area consumed.
4.7 Conclusions
In this chapter a new circuit element called Schmitt trigger which has the property of hysteresis
and dual threshold to switch between two logic levels have been studied. The element is studied as an
alternative to Buffers in conventional approaches. It has been observed that Schmitt trigger outplays
buffer on the following points:
• Programmable dual threshold property of Schmitt trigger allows the designer to have lower thresholds for fast signal switching.
• Lower thresholds are beneficial at the time of switching too, as it would not allow all the transistors
to be in active or saturation mode.
• Noise immunity of Schmitt trigger is more than buffer due to larger bandgap.
In next chapter the simulation results with this replacement approach and then the results
achieved would be discussed.
54
Chapter 5
Results and Discussion
5.1 NTRS 1997 predictions.
Interconnect design has become a dominant issue in high-speed integrated circuits (ICs). With
the decreased feature size of CMOS circuits, on-chip interconnect now dominates both circuit delay and
power dissipation. The number of long interconnects doubles every three years [68], further increasing
the importance of on-chip interconnect.
The 1997 National Technology Roadmap for Semiconductors (NTRS ’97) [69] proposes aggressive goals for chip performance as CMOS devices approach 40 nm minimum feature sizes. Table
5.1 indicates some pertinent factors which have been adapted from [69] to reflect technology shrinkage
of 0.7 per generation and a corresponding doubling of clock frequency every two generations. Although
the clock frequency for the 250 nm technology generation in Table 5.1 is smaller than the NTRS ’97
value of 750 MHz, it rises nearly to the 3000 MHz value predicted for across-chip clock frequency for
high-performance processor chips in the 40 nm generation. CV/I data for nMOSFETs from [70] shows
that device scaling will be able to provide comparable decreases in gate delay of 0.7 per generation.
MOSFETs can meet these higher clock frequencies requirements but chip performance will
be limited by long, lossy lines to use Davidson’s terminology [71]. Since the RC time constants of
interconnects remain the same when comparably scaled, interconnects increasingly dominate delay and
cycle time as devices are scaled.
To raise clock frequencies the effects of long, lossy wires must be reduced. Only two approaches are possible - reduce length or reduce loss. Interconnect length can be reduced by confining
high-speed clocking to a limited area or by using repeaters to chop long wires into a series of short
wires. Interconnect loss can be reduced by changing materials to improve resistivity or interconnect
cross sections. If the dimensions of interconnect cross sections are doubled, the interconnect’s resistance per unit length will drop by a factor of four while the interconnect’s capacitance per unit length
(Cint ) will remain the same.
55
Year
Technology (nm)
Fc (MHz)
CV/I [82] (ps)
Die Area (mm2)
Chip edge length (mm)
Logic transistor density (M/100mm2)
1997
250
500
7
300
17
1.8
1999
180
700
5
340
19
3.6
2003
130
1000
3.5
432
21
7.2
2006
90
1400
2.5
520
23
14.4
2009
60
2000
1.8
620
25
28.8
2012
40
2800
1.2
750
27
57.6
Table 5.1 Projected advances in CMOS chip performance
It should be noted that the effects of long, lossy lines on delay are exacerbated by the NTRS
’97 projections of increased die area shown in Table 1. Corresponding chip edge lengths are also shown,
assuming a square die. Increased areas allow many more transistors on a chip. A consistent set of logic
transistor densities which double in every generation is also shown in Table 5.1. (These values are considerably smaller than the values NTRS ’97 assumes, decreasing from about half the value at 250 nm to
about a third at 40 nm.) An advantage of CMOS technology has been the ability to increase processor
performance by using more transistors instead of faster circuits.
5.2 Signal Propagation on a Linear Interconnect
Various Interconnect models and their representations have been discussed in chapter 2, section 2.4. L, T, and Π shaped structures, as depicted in figure 2.9. When large interconnects are modeled
into RC models they are divided into different smaller sections cascaded one after other. A typical RC
interconnect model is shown in figure 5.1.
Each RLC element has its own delay and glitch introduced in the output waveform.
All simulations in this work use latest technology parameter model files from Predictive Technology Model (PTM) website( http://www.eas.asu.edu/ ptm/latest.html) have been used. Structure used
for each simulation is shown in figure 5.2.Corresponding values of width, space between adjacent lines,
thickness and height above the ground have been considered for simulations. Dimensions of the interconnect taken for simulations are given in Table 5.2. Corresponding values of Resistance(R), Inductance
(L), Coupling capacitance (Ccouple ) and Ground capacitance(Cground ) are provided in Table 5.3.
5.2.1 Types of interconnects
For simulation purposes different types of linear interconnect are taken into consideration.
Three major types of such interconnect are:
56
Figure 5.1 An RC interconnect
Figure 5.2 Interconnect structure used for simulations
57
Tech
180
nm
130
nm
90
nm
65
nm
L
(mm)
2
5
10
2
5
10
2
5
10
2
5
10
t
(um)
0.45
0.65
1.25
0.45
0.45
1.20
0.30
0.45
1.20
0.20
0.35
1.2
h
(um)
0.65
0.65
0.65
0.45
0.45
0.45
0.30
0.30
0.30
0.20
0.2
0.2
K
3.5
3.5
3.5
3.2
3.2
3.2
2.8
2.8
2.8
2.2
2.2
2.2
W
(um)
0.28
0.35
0.80
0.20
0.28
0.60
0.15
0.20
0.50
0.10
0.14
0.45
S
(um)
0.28
0.35
0.80
0.20
0.28
0.60
0.15
0.20
0.50
0.10
0.14
0.45
Table 5.2 Interconnect dimensions
• Local interconnect
• Intermediate Interconnect
• Global Interconnect
Local interconnects are the smallest length interconnects typically used to connect the consecutive logic blocks. While designing these interconnects the width is kept to be half of the height and
thickness of the interconnect. Minimum spacing between two local interconnects is kept to be at least
equal to the width of line. Interconnects up to 2mm length are considered in this category. Intermediate
interconnects are the larger interconnects typically used to connect long distance placed logic blocks.
These can have a length of around 5mm. While designing these interconnects the typical ratio in width,
thickness and height is kept to be 4:9:6. Global interconnects are the longest interconnects possible on
the chip. These are mainly used to provide power supplies to different parts of the chip. Typical length
of these interconnects can be as long as 10mm also. The ratio of width, thickness and height is kept to
be 4:8:3.
In the first test case different RC elements of length 2mm,5mm and 10mm for 180nm technology are taken into consideration. A fast rising signal with operating frequency of 500MHz is fed at
input end. Output is observed at output end for each length and shown in figure 5.3.
In figure 5.3 waveform 1 is the input signal with operating voltage of 1.8V and frequency
500MHz. interconnects are considered to be only RC elements in this simulation. Waveform 2 is the
output wave appearing at the end of 2mm long RC interconnect. It can be observed that the output is
delayed and parabolic shaped due to Elmore delays. It is observed that though the output is maintaining
58
Tech
180
nm
130
nm
90
nm
65
nm
L
(mm)
2
5
10
2
5
10
2
5
10
2
5
10
R
(ohms)
349
483
880
488
1242
2444
977
2444
4888
2200
5500
11000
L
(nH)
3.6
9.72
16.5
3.6
10.5
21.2
3.8
10.6
22.4
3.99
10.9
23.2
Ccouple
(fF)
94.8
246
435
122
312.5
612.5
107.2
268
536.3
107.2
268
536.2
Cground
(fF)
33.2
104
214
34
82.5
173.4
38.2
97
194.2
38.2
97
194.2
Table 5.3 Interconnect Resistance, Inductance and Capacitance values
the same frequency as of input signal its gets delayed in reaching the output.
When waveform 3 is observed, which corresponds to a 5mm long, interconnect at same technology node with same input signal, it is found that due to larger R and C factors the signal is more
delayed and deformed. To reach 50 percent of Vdd it has taken almost 40% of the clock cycle and same
implies in reaching to 10% of Vdd while switching back to zero level. Thus there is an alarming issue
due to such a high RC delay factors. The situation is worse in 10mm length. When waveform 4 is
observed, which corresponds to 10 mm long interconnect, it can be seen that the output is not delayed
by such a large time amount that it’s not even reaching the 50% of Vdd during the complete clock cycle. Same pattern is observed while switching from high to low. This situation makes the use of large
interconnects almost impossible for data transmission, frequency mismatch can result in a large amount
of data loss.
When RLC models are to be considered, the situation becomes more complicated. Added
mutual inductance adds some more distortions to the output signal. Effect of inductance on the same
signal is shown in figure 5.4.
By observing the output waveforms in figure 5.4 it can be observed that interconnect inductance adds noise glitches to the output signal. Hence the signal is delayed and due to added noise glitches
it consumes more power.
Same trends have been observed for smaller feature sizes also. Output results for same interconnect length for 90nm and 65nm technologies are simulated. Figure 5.5 and 5.6 show the results
59
Figure 5.3 Output end signals on a 2mm, 5mm and 10mm RC interconnect at 180nm technology.
corresponding to 90nm and 65nm technology respectively.
Thus it is concluded that in DSM with each next generation technology linear interconnects
are facing the following problems
• RC delay in the output signal
• Noise induced due to interconnect inductance
• Extra power consumption.
In the incoming section, the conventional approach to handle these problem that is with buffer
Insertion is discussed.
5.3 Effect of Buffer Insertion on Delay, Noise and Power Reduction
With the continuous trend of Very Large Scale Integration (VLSI) circuits technology scaling
down and frequency increasing, interconnect delay becomes a significant bottleneck in system performances. This trend is a result of increased resistance of the interconnect when feature sizes enter the
nanometer era. From International Technology Roadmap for Semiconductors (ITRS) projection, interconnect delay can contribute to more than 50% of the delay when the feature size is beyond 180 nm. As
60
Figure 5.4 Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 180nm technology.
Figure 5.5 Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 90nm technology.
61
arrangements is shown in figure 5.7.
Figure 5.7 Buffers inserted in an RLC interconnect.
Operating voltage for 65nm technology is 1.2 volts and operating frequency is 1 giga hertz.
One triangular and one fast rising signal as input signal are taken into consideration. Output of RC
interconnect and buffered interconnects are observed in figures 5.8 and 5.9.
Figure 5.8 Delay reduction in 2mm interconnect with triangular input.
Figure 5.8 shows a triangular input being fed to a 2mm RC interconnect. Waveform 1 is the
input triangular wave. Waveform 2 is the delayed output due to RC effects. Here it is observed that the
signal is highly deformed and hence buffer insertion is required. Waveform 3 is the output of buffered
interconnect, which is not deformed and also reaches the output end earlier that delayed wave. Hence it
can be considered that buffer insertion is capable of reducing delay in VLSI interconnects.
63
Figure 5.9 Delay reduction in 2mm interconnect with square wave input.
Further experiments are carried on longer interconnects of lengths 5 mm and 10mm. Figure
5.10 and 5.11 respectively show the reduced propagation time for these two lengths.
It is observed from figures 5.10 and 5.11 that buffer insertion is quite a handful technique for
delay reduction in VLSI interconnects. Simulations have been carried out on for different technologies
for all the interconnect lengths mentioned above. Detailed statistical results are provided in table 5.4. It
can be observed that from this table for all technology nodes buffer insertion is a useful technique which
gives significant delay reduction.
5.3.2 Noise and Power reduction using Buffer Insertion
Advances in integrated circuit technology have led to an increase in switching speeds of digital circuits. This increase is the primary reason why inductance induced noise (e.g., oscillation, delay,
and crosstalk) is beginning to cause chips to fail. Thus, a great interest in inductance of on-chip signal
lines is shown. Inductance is associated with a current loop. In a VLSI chip, when a single signal
line switches, numerous current loops are formed through the interconnect substrate, power and ground
lines. A conventional transmission line assumes only one current return path. We can use conventional
transmission line analysis if we assume that there is no transient potential drop on the return paths and
thus lump them together as a single terminal. The interconnect circuit and model are shown in Fig.
64
Figure 5.10 Delay reduction in 5mm interconnect with square wave input.
Figure 5.11 Delay reduction in 10mm interconnect with square wave input.
65
Table 5.4 Propagation delay values for an interconnect of different length with
sertion
Technology
180nm 130nm
90nm
Length = 2mm
Delay without
any element insertion (ps)
22.7
41.2
72.5
Delay in
buffered Interconnect (ps)
16.5
31.76
53.14
% reduction with Buffer Insertion
27.1
23.5
26.54
Length = 5mm
Delay without
any element insertion (ps)
83.1
271.5
458.2
Delay in
buffered Interconnect (ps)
64.5
198.5
338.36
% reduction with Buffer Insertion
22.3
27.4
26.2
Length = 10mm
Delay without
any element insertion (ps)
267.8 1047.2 1833.56
Delay in
buffered Interconnect (ps)
192.1
717.9
1285
% reduction with Buffer Insertion
28.5
31.4
30.2
and without buffer in65nm
163.9
128.76
21.6
916.5
704.2
23.2
4127.5
2948
29.6
5.12. The driver resistance is modeled as a constant linear resistance, denoted by Rsource . The receiver
can be one of the following: (i) a static gate, (ii) a transmission gate, (iii) a pass transistor, or (iv) a
domino gate. The load can be modeled as a capacitance (in the case of a static inverter, domino gate,
and non-conducting pass transistor or transmission gate) or a resistance conducting pass transistor or
transmission gate) and is assumed constant and is denoted by Cload or Rload .
Buffer is designed by cascading two inverters back to back by keeping the size of second
buffer to be four times than first. Output of Inverter will remove all the glitches which are of magnitude
less than Vdd/2. Hence first inverter will give a clean output with an opposite logic level and second
inverter will bring it back to the original logic level with all the noise glitches removed. In this process
the earlier property of delay reduction is still followed by the circuit. Hence we observe that buffer
insertion is capable of removing unwanted inductive noise glitches occurring in linear interconnect. If
the noise glitch is more than Vdd/2 then the signal will switch back to opposite logic level and come
back resulting in extra switching.
In next simulation the same input is fed to buffered interconnect. Here each buffer will remove
the glitches occurring it their corresponding previous RLC elements and hence providing a clean output
at the final output end. The simulations are shown in figure 5.13.
66
Figure 5.12
Figure 5.13
67
Reduced noise glitches result in less current in the circuit. This implies that for the same
circuit operation we would be operating with lower power consumption. Detailed analysis of reduced
power in the interconnect is shown in table 5.5.
Table 5.5 Power consumption values for an interconnect of different length with and without buffer
insertion approach.
Technology
180nm 130nm 90nm
65nm
Length = 2mm
Power consumption
in the interconnect (µW)
111.2
151.5
198.5
245.2
Power consumption
in buffered Interconnect (µW)
89.6
121.6
153.6
177.6
% reduction with Buffer Insertion
19.5
19.9
22.8
27.7
Length = 5mm
Power consumption
in the interconnect (µW)
301.2
410.2
504.6
614.5
Power consumption
in buffered Interconnect (µW)
231.1
340
417.5 497.28
% reduction with Buffer Insertion
23.3
18.1
17.3
19.1
Length = 10mm
Power consumption
in the interconnect (µW)
620.1
921.4 1114.6 1340
Power consumption
in buffered Interconnect (µW)
531.6
715
907.6 1094.1
% reduction with buffer Insertion
14.4
22.4
19.1
18.4
While buffer insertion is a useful technique, as discussed in chapter 4, the limitations of buffer
insertion technique have forced us to think of new algorithms and approaches. In next section the effect
of using Schmitt trigger in place of buffer in linear interconnects would be studied.
5.4 Effect of Schmitt trigger on delay, noise and power reduction in Linear Interconnects
In chapter 4, it was suggested that replacing buffers with Schmitt trigger may have an advantage. In this section effect of Schmitt trigger in interconnect delay, noise and power reduction would be
analyzed and discussed.
68
5.4.1 Delay reductions with Schmitt trigger
If an RC delayed waveform rising slowly to high level is considered, then a buffer will get
triggered at Vdd/2 and then within the switching time of the buffer, output will be generated. However
in case of Schmitt trigger, there are two thresholds that can be controlled by changing the W/L ratio
of the transistors. Hence they can be kept to be much lower than Vdd/2 and higher than Vt of the
transistors. With this approach, whenever the rising signal will reach the set threshold the output will
switch to the high level within the switching time of the Schmitt trigger.
For comparison with buffer insertion same input signal and interconnect model have been considered for Schmitt trigger. Simulations are shown in figure 5.14. It can be seen that waveform 1 is the
input waveform and waveform 2 is the delayed waveform due to RLC effects. Waveform 3 is the output
using buffer insertion and waveform 4 is the output waveform using Schmitt trigger. It can be observed
that Schmitt trigger results in less propagation delay.
Figure 5.14 Delay reduction using Schmitt trigger approach in 2mm interconnect with square wave
input.
Same trend is being followed for 5mm and 10mm long interconnects. Simulations results of
these two interconnect are shown in figure 5.15 and 5.16 respectively.
It is observed from figures 5.14 and 5.15 that Schmitt trigger replacement approach is more
efficient than buffer insertion. Hence experiments for all these interconnect lengths for different tech69
Figure 5.15 Delay reduction using Schmitt trigger approach in 5mm interconnect with square wave
input.
Figure 5.16 Delay reduction using Schmitt trigger approach in 10mm interconnect with square wave
input.
70
nology nodes have been conducted. Detailed results are provided in Table 5.6.
Table 5.6 Propagation delay values for an interconnect of different length with buffer insertion and delay
reduction using Schmitt trigger approach
Technology
180nm 130nm 90nm
65nm
Length = 2mm
Delay in
buffered Interconnect (ps)
16.5
31.76
53.14 128.76
Delay with
Schmitt trigger approach (ps)
12.3
21.65
38.1
95.6
% reduction with Schmitt trigger
25.3
31.4
28.3
25.6
Length = 5mm
Delay in
buffered Interconnect (ps)
64.5
198.5 338.36 704.2
Delay with
Schmitt trigger approach (ps)
46.1
148.8
236.8
514
% reduction with Schmitt trigger
28.5
25
30
27
Length = 10mm
Delay in
buffered Interconnect (ps)
192.1
717.9
1285
2948
Delay with
Schmitt trigger approach (ps)
134.4
490.3
842.5 2098.2
% reduction with Schmitt trigger
29.8
31.7
34.5
28.2
5.4.2 Noise and power reduction with Schmitt trigger approach
As it has been discussed in section 5.3, interconnects do suffer from inductive noise. Buffer
insertion, along with reducing the signal propagation delay, is quite capable of handling inductive noises
and thus reducing the power consumption too (table 5.5). Buffers are quite useful until we have noise
glitches with magnitude less than Vdd/2. In these cases there won’t be any unwanted switch to the
opposite logic level due to noisy signal. But when the glitches are of higher magnitude, unwanted
switching to opposite level can occur due to triggering of buffer at Vdd/2. These switches always add
to extra power consumption too. Noise reduction using Schmitt trigger approach is shown in figure 5.17.
However, Schmitt trigger possesses dual threshold voltage. Hence to switch from on logic
level to other the noisy signal has to cross both the thresholds. Thus more noise margin is obtained with
Schmitt trigger. Extensive noise analysis has been carried out by introducing artificial noise at Vdd/2
of the input signal. The noise is generated by coupling the signal with a high frequency waveform.
Thus glitches are produced at Vdd/2. When this waveform is fed to buffered interconnect, buffer will
make unwanted switching from high to low due to instability at Vdd/2 and thus adding extra delay to
71
Figure 5.17 Noise reduction using schmitt trigger
the output signal. Figure 5.18 shows that waveform 3, which is the output waveform of the buffered
interconnect makes one incomplete transition from low to high and hence adding to delay as well as
power consumption increments. When the same input is applied to Schmitt trigger only, on reaching
Vdd/3 the signal switches to the opposite logic level. Since the glitches are not large enough to cover
cross both the thresholds of Schmitt trigger, output signal doesn’t show any unwanted transitions. This
is shown by waveform 2 in figure 5.18.
Power reduction in Schmitt trigger approach is due to following reasons.
• From the point of view of device operation Schmitt trigger is made of 6 MOS transistors. We have
set our lower thresholds to be just above the threshold voltage of the transistor. If we consider
the switching from lower to higher logic level, whenever the signal crosses the lower threshold,
Schmitt trigger will make the signal to switch to higher level within its switching time. At the
point of switching, 3 NMOS transistors are in on state while the 3 PMOS transistors are in cut off
region. While in case of switching point of buffer, when switching occur, all the 4 MOS transistors
are in saturation region. Thus we have a further reduction in static power consumption. Detailed
data regarding the power consumption for various interconnect lengths for all technologies are
provided in Table 5.7.
72
Figure 5.18 Behavior of buffer and Schmitt trigger towards a noisy signal.
Table 5.7 Power consumption values for an interconnect of
reduction using Schmitt trigger approach.
Technology
180nm
Length = 2mm
Power consumption
in buffered Interconnect (µW)
89.6
Power consumption with
Schmitt trigger approach (µW)
70.8
% reduction with Schmitt trigger
21
Length = 5mm
Power consumption
in buffered Interconnect (µW)
231.1
Power consumption with
Schmitt trigger approach (µW)
181.24
% reduction with Schmitt trigger
21.8
Length = 10mm
Power consumption
in buffered Interconnect (µW)
531.6
Power consumption with
Schmitt trigger approach (µW)
416.8
% reduction with Schmitt trigger
23.5
73
different length with buffer insertion and
130nm
90nm
65nm
121.6
153.6
177.6
96.8
21.1
114.6
25.7
140
21.2
340
407.5
497.28
266.2
22.5
299.5
26.5
392.1
21.4
715
907.6
1094.1
572.3
20
697.1
22.1
834.1
23.3
5.5 Replacement of Buffers in Buses
In deep-submicron technology, minimizing the propagation delay and power consumption
on buses is the most important design objective in system-on-chip design. In particular, the coupling
effects between wires on the bus that can cause serious problems such as crosstalk delay, noise and
power consumption. One of the fastest growing areas in computing industry is the provision of high
throughput low power digital signal processing (DSP) and communication systems. The recent trends
show that the systems-on-chip (SOC) used for such systems are becoming increasingly more complex
as they add more functionality, while having size, performance, and power consumption constraints.
The basic problems affecting the issue are:
• Minimizing the crosstalk delay.
• Minimizing the power consumption on bus,
Bus coding techniques are often used to reduce delay and power in interconnect buses. It is known
that lowering transition-switching activity on the bit lines of bus leads to a significant reduction in the
(dynamic) bus power consumption.
5.5.1 Signal Propagation in Buses
In a data bus interconnects are laid side by side very close to each other. Parallel data bits are
transmitted on them simultaneously. The values on adjacent interconnects always keep changing with
new data values. Every rise or fall in the data value in one line always affects the adjacent lines due to
the existence of coupling capacitance in between them. At 180nm technology this affect was very low
because the interconnects were laid very far from each other. Thus the coupling capacitance value was
very low or close to negligible. This implied that the crosstalk noise glitches were not so prominent in
the transmitted signal. Also lower values of interconnect resistance and capacitances result in lower RC
delay in 180nm technology. Thus we get the transmitted signal on buses to be only slightly delayed with
negligible noise glitches. Figure 5.19 shows the data bits transmitted on all the bits of a 2mm long 8bit
bus.
However the case is not the same in case of current technologies. When we come to DSM
technologies, feature size reduces a lot, which results in increased coupling capacitance and higher resistance values. These factors results in a tremendous increase in R and C values and hence a larger
Elmore Delay and due to very less distance between adjacent interconnects the coupling capacitance
value is very high. Hence crosstalk noise has a much larger magnitude. Figure 5.20 shows the data
being transmitted on an 8 bit bus.
74
Figure 5.20 Data transfer on an 8 bit data bus in 65nm technology.
76
Thus we observe that data signal on data buses in DSM technologies get distorted due to delay
as well as crosstalk noises. Various bus coding techniques have earlier been proposed to recover the
signal from these effects. These are discussed in next subsection.
5.5.2 Definitions and Related Work
5.5.2.1
Low Power Coding
The power dissipation in the bus depends on data transition activity. We refer to codes that
reduce the average transition activity as low-power codes (LPCs). A simple but effective LPC is the
bus-invert code in which the data is inverted and an invert bit is sent to the decoder if the current data
word differs from the previous data word in more than half the number of bits. The effectiveness of businvert coding decreases with increase in the bus width .Therefore, for wide buses, the bus is partitioned
into several sub-buses each with its own invert bit. It should be noted that bus-invert coding is nonlinear.
It has been shown that linear codes do not reduce transition activity.
5.5.2.2
Crosstalk Avoidance Coding
The delay of a wire in the bus depends on the transitions on the wire and wires adjacent to
it. The worst-case delay of a wire is . The purpose of the crosstalk avoidance coding is to limit the
worst-case delay to . Crosstalk avoidance codes (CACs) are proposed to reduce the worst-case delay
by ensuring that a transition from one codeword to another codeword does not cause adjacent wires
to transition in opposite directions. We refer to this condition as Forbidden transition (FT) condition.
Shielding the wires of a bus by inserting grounded wires between adjacent wires is the simplest way to
satisfy this condition. A forbidden transition code (FTC) that requires fewer wires that shielding has
been proposed. There is no linear code that satisfies the FT condition while requiring fewer wires than
shielding. The number of valid n-bit code words, MFT (n) satisfying the forbidden transition condition
is, MFT (n) = Fn+2 ; where Fn is the Fibonacci sequence satisfying Fn = Fn−1 + Fn−2 with initial
conditions F1 = F2 = 1.
The worst-case delay can also be reduced to by avoiding bit patterns ”010” and ”101” from
every codeword. We refer to this condition as Forbidden pattern (FP) condition. The simplest method
to satisfy the FP condition is to duplicate every data wire whereby each data bit is transmitted using
two adjacent wires. There is no linear forbidden pattern code (FPC) that satisfies the FP condition while
requiring fewer wires than duplication. The number of codewords is given by, MFP (n) = 2 Fn+1 ; where
Fn is the Fibonacci sequence. However, this increase in the number of codewords translates into at most
one additional data bit that can be encoded for the same n.
77
5.5.2.3
Error Control Coding
Error control is possible if the Hamming distance between any two codewords in the codebook
is greater than one. If the minimum Hamming distance between any two codewords is two, then all
single errors appearing on the bus can be detected.
If the minimum Hamming distance is three, then all single errors can be corrected. Error
detection is simpler to implement than error correction but requires retransmission of the data when an
error occurs. In systematic codes, a few redundant bits are appended to the input bits to generate the
codeword. Hamming code is an example of a linear systematic error correcting code.
5.5.2.4 CAC coding Schemes
Coding involves mapping k data/information bits to n code bits resulting in an (n, k) code
having a code rate of k/n. This mapping can be done either by involving memory or without memory
(memory less). However, codes with memory, in general, suffer from error propagation at the decoder.
Complex techniques, such as those employed in communication systems, are needed to ensure that error
propagation is not catastrophic. Such techniques are prohibitively complex to be used for on-chip buses
in the foreseeable future .Further, even when error propagation is not a concern; codes with memory
tend to have significantly more complex encoders and decoders as compared to memoryless codes. So
memory less codes are preferred over codes with memory. The design of memoryless codes boils down
to determining a subset C of size/cardinality 2k consisting of n-bit vectors derived from the set S of
all possible 2n n-bit vectors. The code words in C, referred to as the codebook, provide delay, power,
or reliability benefits by satisfying specific constraints. For example, a (n, k, p) CAC achieves delay
reduction by reducing the worst-case delay of a bus from (1+4 ) 0 to (1+p ) 0, where 0 is the delay of a
crosstalk-free bus line, 0 is the ratio of the coupling capacitance to the bulk capacitance, and p = 1,2 or
3 is the maximum coupling. For large buses, it is impractical to encode all k bits at once due to the large
complexity in the design and the implementation of the codec. Therefore, partial coding is employed
wherein the bus is broken into sub-buses of smaller width which are encoded into sub-channels. This
is because the hardware complexity grows exponentially with k. For example: a 32 bit bus broken into
sub buses of size 3 and encoded into buses of size 4(3-4 encoding).
The mapping between data words and codewords is shown as well. This coding scheme removes the FT condition present in the data. Using the partial coding technique described above, an array
of ten of these simple coders could be used to implement a crosstalk immune 32-bit bus with 53 wires.
When compared to a 63-wire shielded channel, this amounts to cutting ten wires from the channel for
the cost of a handful of gates.
5.5.2.5 Relationship between delay and crosstalk
The equations for computing the delay between two data tuples dt (data already available on
the bus lines), dt+1 (data that have to be transmitted on the bus lines) at time instances ’t’ and ’t+1’ is
78
Figure 5.21 A 3 Bit to 4 wire coder
given by equations (5.1) and (5.2). Equation (5.2) describes the expression for delay computation on the
kth data line (wire) whereas equation (5.1) shows that for an n-bit line, delay between two data tuples
is defined as the maximum value of delay when all the bit positions of two data tuples (dt, dt+1) are
considered.
D(dt , dt+1 ) = M ax(Dk (D(dt , dt+1 ))f or1 ≤ k ≤ n
((1 + λ) M1 −λ M2 ) M1
Dk (D(dt , dt+1 )
= ((1 + 2λ) Mk −λ(Mk−1 + Mk+1 ))∗ Mk
CG ∗ RS
((1 + λ) Mn −λ Mn−1 ) Mn
(5.1)
if k = 1,
if 1 < k < n.
(5.2)
if k = n.
Where Mk is defined as Mk = dkt+1 - dkt , D denotes delay function, Max denotes maximum value, RS
and CG represent the total resistance of a particular wire and total capacitance between a line and the
ground respectively. In equation (5.2) technology parameter λ is given by the ratio of coupling capacitance (CC ) to capacitance to ground (CG ) (i.e λ = CC /CG ) and n denotes the number of data lines.
Table 5.7 shows different crosstalk classes defined in literature depending upon the transition activity
between adjacent interconnects.
For example consider, dt = 010 and dt+1 = 100. Then the delay D (dt , dt+1 ) on the bus
computed using equations (5.1) and (5.2) is CG RS (1+3λ). For different 3-bit transitions (from dt to
dt+1 ), the normalized delay (with respect to CG RS ) on the middle line and crosstalk class are given in
Table 5.7. The classification of the delay into classes has been dealt in [62, 63]. Throughout the chapter
the symbols ↑, ↓, - are used to indicate 0 → 1, 1 → 0 and 1→ 1 (or)0 → 0 bit transitions respectively.
79
Transitions(4k−1 , 4k , 4k+1 )
↑ − ↑, ↓ − ↓, ↑ − ↓, ↓ − ↑, ↑ −−
,↓ −−, − − −, −− ↑, −− ↓
↑↑↑, ↓↓↓
↑↑ −, ↓↓ −, − ↑↑, − ↓↓
− ↑ −, − ↓ −, ↑↓↓, ↑↑↓, ↓↓↑, ↓↑↑
− ↑↓, − ↓↑, ↓↑ −, ↑↓ −
↑↓↑, ↓↑↓
Delay of Line 0 k 0
Crosstalk class Cc
0
1
1+λ
1 +2λ
1 +3λ
1 +4λ
1
2
3
4
5
6
Table 5.8 Delay and Crosstalk Classes for various 3-bit combinations (transitions)
5.5.2.6
Interconnect Power Model
In general, four sources of power dissipation in any CMOS VLSI circuit is given by,
Pavg = Pstatic + Pdynamic + Pleakage + Pshortckt
(5.3)
The major share in the overall power dissipation is that of dynamic power dissipation. Furthermore, the
dynamic power dissipation in a CMOS VLSI circuit is given by
2
PDynamic = (X ∗ (CS + C) + Y + CC )) ∗ VDD
∗ fc
(5.4)
Where CS is self-capacitance, CL is loading capacitance,CC is coupling capacitance,VDD is supply
voltage and fc is clock frequency. X and Y are formulated in the following ways.
X=
W
X
Xi , andY =
i=o
W
−1
X
Yi , Yi+1
(5.5)
i=o
where W is the number of bit lines of the bus. X denotes the self transition activity for the selfcapacitance and loading capacitances
Self transition activity
i
X denotes the self transition activity for the self-capacitance CS and loading capacitances CL . Let Pr,s
be the transition probability that the signal line i of bus changes from state r(0,1) to s(0, 1). Then, we
can compute the quantity of Xi for signal line by,
Xi = pi0,1
(5.6)
Coupling transition activity
Let
denote the coupling transition probability that the signal line i of bus changes from p(0,1) to
r(0,1) and at the same time, the adjacent signal line j of bus changes from q(0,1) to s(0,1). Then we can
compute Yi,j between signal lines and by,
pi,j
pq,rs
i,j
i,j
i,j
i,j
i,j
Yi,j = α ∗ (pi,j
00,01 + p11,10 + p00,10 + p11,10 ) + β ∗ (p01,10 + p10,01 )
80
(5.7)
The capacitance ratio γ is defined as, γ = Cs /(Cs + Cl ). The value of gamma will increase as the
aspect ratio of the interconnect increases. It is easily shown that the dynamic power consumption is
proportional to the value of,
Z =X +γ∗Y
(5.8)
The total power consumption for an encoding graph Gen with mapping function f by the quantity of Z
Z=
X
p(ci , cj ) ∗ w(f (ci ), f (cj ))
(5.9)
where p(ci , cj ) is the transition probability from ci to cj in the transition probability graph. Note that
p(ci , cj ) is the percentage of the transition occurrences over the entire transitions obtained from the
transition profile.
Transition from one state to other is a random process. So a probablity distribution state
diagram is drawn from the average number of transitions for every specific transition. The graph is
shown in figure 5.20:
Figure 5.22 Transition Probability Graph
One such example of this graph is shown in figure 5.21, for which Z = 3.45
Our main aim is to reduce the value of Z.
5.5.3 Comparison with existing bus coding technique
As it has been shown in section 5.5, the main aim of bus coding techniques is to reduce or
remove crosstalk noise by various methods. At the input stage encoder is placed to reduce crosstalk and
thus coded signal is transmitted on the data bus which will not suffer from crosstalk noise. Thus total
delay and power consumption are reduced by using bus coding techniques.
In this thesis, it is proposed to replace the signal restoring buffers in data buses with Schmitt
triggers. Since Schmitt trigger is able to handle very large noise glitches too, hence it will not suffer
from unwanted glitches due to crosstalk noises. It is shown that if both encoder and decoder are removed
from input and output ends and Schmitt triggers are used as signal restoring elements in the place of
81
Figure 5.23 Example of Transition Probability Graph
buffers, then it can result in over all delay reduction. Due to large bandgap, Schmitt trigger is able to
remove all the crosstalk noise classes from input signal and thus provide a clean output. Moreover the
extra hardware in form of encoder and decoder is not required in the proposed approach. Thus along
with area savings more power savings are also achieved by proposed approach.
Output signal on an 8 bit data bus is shown in figure 5.24.
Statistical gains in terms of delay and power savings are shown in table 5.9 and 5.10 respectively.
5.6 Conclusion
In this chapter, the utility of Schmitt trigger in a voiding cross talk noise has been discussed.
It has been shown that the existence of dual threshold of Schmitt trigger helps in avoiding cross talk
in buses providing the noise falls in threshold interval. Where the noise is below the threshold level,
Schmitt trigger can be adjusted to prevent the noise from coupling to adjacent lines.
82
Figure 5.24 Data signals rectified using Schmitt trigger approach in an 8 bit data bus.
83
Table 5.9 Propagation delay values for 8 bit buses of different length with buffer insertion and delay
reduction using Schmitt trigger approach
Technology
180nm 130nm 90nm
65nm
Length = 2mm
Delay in
buffered Interconnect (ps)
18.5
35.76
59.14 138.76
Delay with
Schmitt trigger approach (ps)
16.3
31.65
53.1
125.6
% reduction with Schmitt trigger
12.1
12.5
10.1
9.5
Length = 5mm
Delay in
buffered Interconnect (ps)
74.7
215.5 354.36 765.2
Delay with
Schmitt trigger approach (ps)
66.1
187.8
304.8
700
% reduction with Schmitt trigger
12.5
13.55
14.3
9.8
Length = 10mm
Delay in
buffered Interconnect (ps)
192.1
717.9
1285
2948
Delay with
Schmitt trigger approach (ps)
181.4
638.3 1130.5 2600.2
% reduction with Schmitt trigger
9.8
11.7
12.5
13.2
Table 5.10 Power consumption values for 8 bit buses of different length with
duction using Schmitt trigger approach.
Technology
180nm 130nm 90nm
Length = 2mm
Power consumption
in buffered Bus (µW)
99.6
141.6 183.6
Power consumption with
Schmitt trigger approach (µW)
89.8
126.8 160.6
% reduction with Schmitt trigger
11
12.1
13.7
Length = 5mm
Power consumption
in buffered bus (µW)
211.1
310
387.5
Power consumption with
Schmitt trigger approach (µW)
181.24 266.2 299.5
% reduction with Schmitt trigger
12.8
09.5
11.5
Length = 10mm
Power consumption
in buffered bus (µW)
581.6
785
977.6
Power consumption with
Schmitt trigger approach (µW)
520.8
708.3 850.1
% reduction with Schmitt trigger
12.5
11.6
14.1
84
buffer insertion and re65nm
197.6
170
15.2
467.28
392.1
13.4
1184.1
934.1
15.3
Chapter 6
Conclusions and Future Work
In this thesis, Schmitt trigger has proposed as an alternate to existing buffer insertion technique for linear VLSI interconnects for delay, power and noise reduction.
It has been shown that the replacement of buffer with Schmitt trigger helps in reducing delay and power
consumption. Schmitt trigger possesses the property of dual threshold as compared to single threshold
buffer. This property enables the desirable settings of both the voltage thresholds. Thus the signal can
be made to rise/fall faster by keeping the voltage threshold to be lower. Lesser number of transistors
in active mode at the time of switching also results in lesser power consumption thus resulting in more
savings in power consumptions due to the Schmitt trigger as compared to buffer.
It has also been shown that Schmitt trigger helps in reducing cross talk noise in the circuit.
Crosstalk noise is a problem in data buses where close proximity of interconnects results in induced
noise in neighboring lines, which contributes to glitches. It is that there may not be a need for bus
coding techniques for the purpose of crosstalk noise reduction if Schmitt trigger is used for. The extra
hardware which is required in the form of encoder and decoder for bus coding techniques is also not
required. Hence area savings may also be achieved.
6.1 Scope of further work
This thesis focused mainly on linear VLSI interconnects and analyze local, inter- mediate and
global interconnects only. However, since practical interconnects also have tree and mesh structures,
utility of Schmitt trigger in such interconnect structures may be also be explored.
85
Bibliography
[1] Chandrakasan, A. P., S. Sheng, and R. W. Brodersen, “Low-power Digital CMOS Design,” IEEE
Journal of Solid State Circuits, pp. 473-484, April 1992.
[2] Proakis and Manolakis, “Digital Signal Processing, Principles, Algorithms, and Applications, 3/e”
Prentice Hall of India 2003
[3] P. Saxena and N. Menezes and P. Cocchini and D.A. Kirkpatrick, Repeater scaling and its impact
on CAD, IEEE Transactions on Computer-Aided Design, vol. 23, no. 4, pp. 451463, 2004.
[4] P.J. Osler, Placement driven synthesis case studies on two sets of two chips: hierarchical and flat,
in Proceedings of International Symposium on Physical Design, San Diego, California, 2004, pp.
190197.
[5] Maged M. Ghoneima, Muhammad M. Khellah,“Skewed Repeater Bus: A Low-Power Scheme
for On-Chip Buses”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008.
[6] Y.I. Ismail and E.G. Friedman, Effects of inductance on the propagation delay and repeater insertion
in VLSI circuits, in Proceedings of the Conference on Design Automation, New Orleans, Louisiana,
1999, pp. 721724.
[7] Y.I. Ismail, E.G. Friedman, and J.L. Neves, Repeater insertion in tree structured inductive interconnect, in Proceedings of the International Conference on Computer-Aided Design, San Jose, California, 2001, pp. 420424.
[8] Z. Jiang, S. Hu, J. Hu, Z. Li, and W. Shi, A new RLC buffer insertion algorithm, in Proceedings of
the International Conference on Computer-Aided Design, San Jose, California, 2006, pp. 553557.
[9] L. P. P. P. van Ginneken, Buffer placement in distributed RC-tree network for minimal Elmore delay,
in Proc. Int. Symp. on Circuits and Systems, 1990, pp. 865868.
[10] S. Lin and M. Marek-Sadowska, A fast and efficient algorithm for determining fanout tree in large
networks, in Proc. of EDAC, Feb 1991, pp. 539544.
86
[11] H. Zhou, D. F. Wong, I. M. Liu, and A. Aziz, Simultaneous routing and buffer insertion with
restrictions on buffer locations, IEEE Trans. on Computer Aided Design of Integrated Circuits and
Systems , vol. 19, no. 7, pp. 819824, July 2000.
[12] C. C. N. Chu and D. F. Wong. A quadratic programming approach to simultaneous buffer insertion/sizing and wire sizing, IEEE Trans. on Computer Aided Design of Integrated Circuits and
Systems, vol. 18, no. 6, pp. 787798, Sept. 1999.
[13] J. Lillis, C. K. Cheng and T.-T. Y. Lin, Optimal wire sizing and buffer insertion for low power
and a generalized delay model, IEEE Trans. Solid-State Circuits, vol. 31, no. 3, pp. 437447, March
1996.
[14] C. J. Alpert and A. Devgan. Wire segmenting for improved buffer insertion, in Proc. ACM/IEEE
Design Automation Conf., 1997, pp. 588593.
[15] W. Shi and Z. Li, An O(n log n) time algorithm for optimal buffer insertion, in Proc. ACM/IEEE
Design Aut
[16] W. Shi, Z. Li and C.J. Alpert, Complexity analysis and speedup techniques for optimal buffer
insertion with minimum cost, in Proc. Asia and South Pacific Design Automation Conf., 2004, pp.
609614.
[17] S. D. Naffziger et al., “The Implementation of a 2-Core, Multi-Threaded Itanium Family Processor,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 1, pp. 197-209, January 2006.
[18] International Technology Roadmap for Semiconductors. Semiconductor Industry Association,
2003.
[19] H. Veendrick, Deep Submicron CMOS ICs - From Basics to ASICs. Deventer, Netherlands:
Kluwer, 1998.
[20] H. B. Bakoglu and J. D. Meindl, “Optimal Interconnection Circuits for VLSI,” IEEE Transactions
on Electron Devices, Vol. ED-32, No. 5, pp. 903-909, May 1985.
[21] N. Magen et al., “Interconnect-Power Dissipation in a Microprocessor,” Proceedings of the ACM
International Workshop on System Level Interconnect Prediction, pp. 7-13, February 2004.
[22] F. Chen and D. Gardner, “Influence of Line Dimensions on the Resistance of Cu Interconnections,”
IEEE Electron Device Letters, Vol. 19, No. 12, pp. 508-510, December 1998.
[23] A. H. Ajami et al., “Analysis of IR-Drop Scaling with Implications for Deep Submicron P/G Network Designs,” Proceedings of the IEEE International Symposium on Quality Electronic Design,
pp. 35-40, March 2003.
87
[24] W. Wu and K. Maex, “Studies on Size Effect of Copper Interconnect lines,” Proceedings of International Conference on Solid-State and Integrated-Circuit Technology, pp. 416-418, October 2001.
[25] A. V. Mezhiba and E. G. Friedman, Power Distribution Networks in High Speed Integrated Circuits. MA: Kluwer Academic Publishers, 2004.
[26] K. Nabors and J. White, “FastCap: A Multipole Accelerated 3-D Capacitance Extraction Program,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 10,
No. 11, pp. 1447-1459, November 1991.
[27] J. H. Chern et al., “Multilevel Metal Capacitance Models for CAD Design Synthesis Systems,”
IEEE Electron Device Letters, Vol. 13, No. 1, pp. 32-34, January 1992.
[28] S. Wong, G. Lee, and D. Ma, “Modeling of Interconnect Capacitance, Delay, and Crosstalk in
VLSI,” IEEE Transactions on Semiconductor Manufacturing, Vol. 13, No. 1, pp. 108-111, February
2000.
[29] K. Gala et al., “Inductance 101: Analysis and Design Issues,” Proceedings of the IEEE/ACM
Design Automation Conference, pp. 329-334, June 2001.
[30] B. Krauter and S. Mehrotra, “Layout Based Frequency Dependent Inductance and Resistance Extraction for On-Chip Interconnect Timing Analysis,” Proceedings of the IEEE/ACM Design Automation Conference, pp. 303-308, June 1998.
[31] S. Sim et al., “A Unified RLC Model for High-Speed On-Chip Interconnects,” IEEE Transactions
on Electron Devices, Vol. 50, No. 6, pp. 1501-1510, June 2003.
[32] X. Huang et al., “Loop-Based Interconnect Modeling and Optimization Approach for Multigigahertz Clock Network Design,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 3, pp. 457-463,
March 2003.
[33] S. Yu et al., “Loop-Based Inductance Extraction and Modeling for Multiconductor On-Chip Interconnects,” IEEE Transactions on Electron Devices, Vol. 53, No. 1, pp. 135-145, January 2006.
[34] A. Mezhiba and E. G. Friedman, “Frequency Characteristics of High Speed Power Distribution Networks,” Analog Integrated Circuits and Signal Processing, Vol. 35, No. 2/3, pp. 207-214,
May/June 2003.
[35] T. Dhaene and D. D. Zutter, “Selection of Lumped Element Models for Coupled Lossy Transmission Lines,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol.
11, No. 7, pp. 805-815, July 1992.
[36] S. Sim, K. Lee, and C. Y. Yang, “High-Frequency On-Chip Inductance Model,” IEEE Electron
Device Letters, Vol. 23, No. 12, pp. 740-742, December 2002.
88
[37] S. Lin and E. Kuh, “Transient Simulation of Lossy Interconnects Based on the Recursive Convolution Formulation,” IEEE Transactions on Circuits and Systems, Vol. 39, No. 11, pp. 879-892,
November 1992.
[38] T. Lin, M. W. Beattie, and L. T. Pileggi, “On the Efficacy of Simplified 2D On-Chip Inductance
Models,” Proceedings of the IEEE/ACM Design Automation Conference, pp. 757-762, June 2002.
[39] G. Lei, G. Pan, and B. K. Gilbert, “Examination, Clarification, and Simplification of Modal Decoupling Method for Multiconductor Transmission Lines,” IEEE Transactions on Microwave Theory
and Techniques, Vol. 43, No. 9, pp. 2090-2100, September 1995.
[40] L. Yin and L. He, “An Efficient Analytical Model of Coupled On-Chip RLC Interconnects,” Proceedings of the IEEE Design Automation Conference Asia and South Pacific, pp. 385-390, January
2001.
[41] F. Chang, “Transient Analysis of Lossless Coupled Transmission Lines in a Nonhomogeneous
Dielectric Medium,” IEEE Transactions on Microwave Theory and Techniques, Vol. 18, No. 9, pp.
616-626, September 1970.
[42] W. C. Elmore, “The Transient Response of Damped Linear Networks,” Journal of Applied Physics,
Vol. 19, pp. 55-63, January 1948.
[43] L. T. Pillage and R. A. Rohrer, “Asymptotic Waveform Evaluation for Timing Analysis,” IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 9, No. 4, pp.
352-366, April 1990.
[44] M. A. El-Moursy and E. G. Friedman, “Optimum Wire Shaping of an RLC Interconnect,” Proceedings of the IEEE Midwest Symposium on Circuits and Systems, December 2003.
[45] M. Ghoneima and Y. Ismail, “Optimum Positioning of Interleaved Repeaters in Bidirectional
Buses,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol.
24, No. 3, pp. 461-469, March 2005.
[46] F. Anderson, J. S. Wells, and E. Z. Berta, “The Core Clock System on the Next Generation Itanium
Microprocessor,” Proceedings of the IEEE International Solid-State Circuits Conference, pp. 110111, February 2002.
[47] L. He and K. M. Lepak, “Simultaneous Shield Insertion and Net ordering for Capacitive and
Inductive Coupling Minimization,” Proceedings of the ACM International Symposium on Physical
Design, pp. 56-61, 2000.
[48] B. Soudan, “The Effects of Swizzling on Inductive and Capacitive Coupling for Wide Signal
Busses,” Proceedings of the International Conference on Microelectronics, pp. 300-303, December 2003
89
[49] J.J. Cong, K.-S. Leung, “Optimal wiresizing under Elmore delay model”, IEEE Trans. Comput.
Aided Design Integrated Circuits Systems 14 (3) (1995) 321336.
[50] P.Sotiriadis and A.Chandrakasan, “Reducing bus delay in sub-micron technology using coding, In
Proc. of IEEE Asia and South Pacific Design Automation Conf (ASPDAC01),pp 109-114, 2000.
[51] P.Sotiriadis, “Interconnect Modeling and Optimization in Deep Submicron Technologies Dissertation Thesis, MIT, May 2002.
[52] Lin Li, Narayanan Vijaykrishnan, Mahmut T. Kandemir, Mary Jane Irwin, “A Crosstalk Aware
Interconnect with Variable Cycle Transmission, In Design Automation and Test in Europe (DATE),
2004, pp. 102-107.
[53] F.J. Taylor, Digital Filter Design Handbook, Marcel Dekker, Inc., NYC, 1984.
[54] G.K. Ma, and F.J. Taylor, “Multiplier Policies For Digital Signal Processing”, IEEE ASSP Mag.,
pp.6- 20, January, 1990.
[55] A.G.Dempster, and M.D.Macleod, “Use of Minimum-Adder Multiplier Blocks in FIR Digital Filters”, IEEE Trans. Circuits Syst. 11, vol. 42, no. 9, pp. 569-577, Sept, 1995.
[56] Reza Hashemian “A New Method for conversion of a 2s complement to Canonic Sign Digit Number System and its Representation, in Proceedings of Asilomar Conference on Signals, Systems and
Computers, pp. 904-907., 1997.
[57] Dejhan, K., Tooprakai, P., Rerkmaneewan, T. Soonyeekan, C., “A high-speed direct bootstrapped
CMOS Schmitt trigger circuit” ,Semiconductor Electronics, 2004. ICSE 2004. IEEE International
Conference, 7-9 December 2004.
[58] O.H. Schmitt to H.R. Lang, Nov. 25, 1937, O.H. Schmitt Papers, University Archives, University
of Minnesota, Minneapolis, MN, Box SF114.
[59] O.H. Schmitt, A thermionic trigger, J. Sci. Instrum., vol. 15, no. 1, pp. 2426.
[60] B. Hart, Picturing Schmitts trigger, Electron. World, vol. 105, no. 1764, pp. 10401046, 1999.
[61] P. R. Gray and R. G. Meyer, “Analysis and Design of Analog Integrated circuits”, 2nd edition,
New York: Wiley, 1984.
[62] P. Sotiriadis and A. Chandrakasan “Low power Coding Techniques Considering Inter wire capacitances, In Proc. Of IEEE Conferences on Custom Integrated Circuits (CICC00), pp 507-510,
2000.
[63] M.R Stan and W.P Burleson, “Bus Invert Coding for Low Power I/O, IEEE Transactions VLSI
systems, pp 49-58, March 1995.
90
[64] P. Sotiriadis, “Interconnect Modeling and Optimization in Deep Submicron Technologies, Dissertation Thesis, Massachusetts Institute of Technology, May 2002.
[65] Yan Zhang et al, “Odd/Even bus invert with two phase transfer for buses with coupling, Proceedings of ISPLED 02, pp 80-83, August 12-14, Monterey, CA, USA.
[66] Jayapreetha Natesan and Damu Radhakrishnan, “Shift Invert coding (SINV) for low power VLSI,
Proceedings of EUROMICRO Systems on Digital System Design (DSD04), pp 190-194, 2004.
[67] J.V.R Ravindra, K.S. Sainarayanan, M.B. Srinivas, “A novel bus coding technique for low power
data transmission, IEEE symposium on VLSI design and test conference (VDAT-2005), pp 263-266,
August 2005.
[68] P. Ghosh, R. Mangaser, K. Rose, “Interconnect-dominated VLSI design”, Proceedings of the Conference on Advanced Research, March 1999, pp. 114122.
[69] Semiconductor Industry Association, The National Technology Roadmap for Semiconductors 1997 Edition.
[70] M. T. Bohr, Proc. IEEE International Electron Devices Meeting, p. 241, 1995.
[71] E. E. Davidson, IEEE Micro, 18/4, p. 33, 1998.
91