Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A Novel Approach to Reduce Delay and Power In VLSI Interconnects

2010
...Read more
A Novel Approach to Reduce Delay and Power in VLSI Interconnects Submitted in partial fulfillment of the requirements for the degree of Master of Science (by Research) in Electronics and Communication Engineering by Sandeep Saini <saini sandeep@research.iiit.ac.in> http://web.iiit.ac.in/saini sandeep Under Guidance of Dr M. B. Srinivas Centre for VLSI and Embedded System Technologies International Institute of Information Technology Hyderabad, INDIA May, 2010
© Copyright by Sandeep Saini, 2009
A Novel Approach to Reduce Delay and Power in VLSI Interconnects Submitted in partial fulfillment of the requirements for the degree of Master of Science (by Research) in Electronics and Communication Engineering by Sandeep Saini <saini sandeep@research.iiit.ac.in> http://web.iiit.ac.in/∼saini sandeep Under Guidance of Dr M. B. Srinivas Centre for VLSI and Embedded System Technologies International Institute of Information Technology Hyderabad, INDIA May, 2010 © Copyright by Sandeep Saini, 2009 INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY Hyderabad, India CERTIFICATE It is certified that the work contained in this thesis, titled “A Novel Approach to Reduce Delay and Power in VLSI Interconnects” by Sandeep Saini, has been carried out under my supervision and is not submitted elsewhere for a degree. Date Advisor: M B Srinivas To my Parents Acknowledgement I am greatly indebted to my advisor, Dr. M. B. Srinivas. Sir, I could not have realized my potential without your invaluable guidance, consistent encouragement and emphasis on quality of the research contribution.Professor Srinivas is a wonderful teacher and person to work with. Professor Srinivas has shared his profound knowledge and professional manner of conducting research. I am very thankful to him for all the time he devoted to scientific discussions with me, as well as for his constant encouragement. Special thanks to J.V.R.Ravindra and Srihari for the brain-storming sessions we had, and for guiding me in right direction right from the beginning of my research work.I appreciate very much their invaluable assistance. To my lab mates through Bachelors and Masters, I owe big thanks for the fun-centered atmosphere in CVEST and OBH. I have been fortunate enough to meet Gaurav, Maneesh, Khosla, Bajaj, Rishi, Manan, Handa, Ramavtar, Sumit, Bhatt, Anshul, Bharat, Avinash, Kashi, Mohit, Abheet, Gopal. All CVEST lab mates were equally supportive. Finally, and most importantly, this thesis is dedicated to my parents, whose unconditional love and support I have enjoyed throughout my life. v Abstract Interconnects play a major role in deep submicron (DSM) technologies such as 90nm and below. While gate delay dominated interconnect delay in earlier technologies, it is no longer the case and delays associated with interconnects are becoming increasingly important. This is because in DSM technologies, interconnect can no longer be seen as a simple resistor but the associated parasitics such as capacitance and inductance also need to be considered. Thus any signal propagating through such an interconnect can be expected to be delayed. Buffer insertion is one popular technique to reduce (eliminate) the delay. In this technique, buffers are placed at regular intervals along an interconnect that seeks to restore the signal each time it is affected by the parasitics. However, buffers themselves have certain switching time that contributes to delay. A large number of such buffers along an interconnect can thus contribute to overall delay to signal propagation. Also buffer switching contributes to power dissipation. Further in DSM technologies, leakage power is a major problem and buffers may consume power even when they are not switching. Thus there is an urgent need to evolve techniques that while reducing the overall delay, also consume lesser power, dynamic as well as static. In this thesis, Schmitt trigger as an alternate to buffer to reduce delay and power in interconnects is examined. The most favorable feature of Schmitt trigger is it’s adjustable threshold voltage, and since it can be controlled, the threshold voltage can be chosen to be above or below Vdd /2 a voltage at which buffer normally operate. Thus a Schmitt trigger can be designed to switch faster than a buffer leading to a reduction in delay. Further, the adjustable low-voltage threshold of the schmitt trigger handles more noise and voltage glitches as compared to buffer. Proposed approach is first implemented for linear interconnects of various lengths and then on buses which are groups of interconnects. It is shown that the proposed approach is better in terms of delay, power and crosstalk noise reduction compared to that of buffers. vi List of Publications 1 Sandeep Saini, A. Mahesh Kumar, Sreehari Veeramachaneni, M.B.Srinivas, ”Alternative approach to Buffer Insertion for Delay and Power Reduction in VLSI Interconnects”, accepted in, Journal of Low Power Electronics, to be published by American Scientific Publishers. 2 Sandeep Saini, A. Mahesh Kumar, Sreehari Veeramachaneni, M.B.Srinivas, ”Schmitt trigger as an alternate to buffer for delay reduction in on chip buses”, Tencon 2009. 23rd to 26th Nov 2009, Singapore, pages 1-5. 3 Sandeep Saini, A. Mahesh Kumar, Sreehari Veeramachaneni, M.B.Srinivas, ”Alternative approach to Buffer Insertion for Delay and Power Reduction in VLSI Interconnects”, VLSI design 2010, 3rd to 7th January 2010, Banglore, pages 411-416. vii Contents Chapter 1 Introduction . . . . . . . . . . . . . . 1.1 Objective . . . . . . . . . . . . . 1.2 Motivation . . . . . . . . . . . . . 1.3 Literature Survey . . . . . . . . . 1.3.1 Need for a better approach 1.4 Contribution of the Thesis . . . . 1.5 Organization of the Thesis . . . . Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Introduction to Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Design Flows for DSM ASICs . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Interconnect Design Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Physical Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Interconnect Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1.1 Diffusion barrier . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1.2 Surface and grain boundary scattering . . . . . . . . . . . . . 2.3.1.3 Temperature effect . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1.4 High frequency effects . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Inductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3.1 Partial inductance . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3.2 Loop-based inductance . . . . . . . . . . . . . . . . . . . . . 2.3.3.3 High frequency effects . . . . . . . . . . . . . . . . . . . . . 2.4 Interconnect Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Single Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1.1 Lumped models . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1.2 Distributed models . . . . . . . . . . . . . . . . . . . . . . . 2.4.1.3 Lumped representation of distributed interconnects . . . . . . 2.4.1.4 Modeling frequency dependent effects . . . . . . . . . . . . . 2.4.2 Parallel Coupled Interconnects . . . . . . . . . . . . . . . . . . . . . . 2.5 Design Methodologies for Interconnect . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Constructing an Interconnect Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 . 8 . 10 . 10 . 11 . 11 . 12 . 12 . 13 . 13 . 14 . 14 . 14 . 16 . 16 . 17 . 17 . 17 . 18 . 18 . 18 . 19 . 20 . 21 . 21 . 23 . 23 viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 3 5 6 6 CONTENTS 2.5.2 2.5.3 2.5.4 2.5.5 ix Wire Sizing, Shaping, and Spacing Repeater Insertion . . . . . . . . . Shielding Techniques . . . . . . . . Net-Ordering and Wire Swizzling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Buffer Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Repeater / buffer insertion process: An overview . . . . . . . . . . . . . . . . . 3.3 Propagation delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Power dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Short-circuit power dissipation . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Dynamic power dissipation . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Total power dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Area of the repeater system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Design criteria for interconnect within a repeater system . . . . . . . . . . . . . 3.6.1 Constrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Unconstrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2.1 Power-delay-product design criterion . . . . . . . . . . . . . . 3.6.2.2 Power-delay-area-product design criterion . . . . . . . . . . . 3.7 Application of interconnect design methodology . . . . . . . . . . . . . . . . . . 3.8 Need for a better approachesults and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 NTRS 1997 predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Signal Propagation on a Linear Interconnect . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Types of interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Effect of Buffer Insertion on Delay, Noise and Power Reduction . . . . . . . . . . . 5.3.1 Delay Reduction using Buffer Insertion . . . . . . . . . . . . . . . . . . . . 5.3.2 Noise and Power reduction using Buffer Insertion . . . . . . . . . . . . . . . 5.4 Effect of Schmitt trigger on delay, noise and power reduction in Linear Interconnects 5.4.1 Delay reductions with Schmitt trigger . . . . . . . . . . . . . . . . . . . . . 5.4.2 Noise and power reduction with Schmitt trigger approach . . . . . . . . . . . 5.5 Replacement of Buffers in Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Signal Propagation in Buses . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Definitions and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2.1 Low Power Coding . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2.2 Crosstalk Avoidance Coding . . . . . . . . . . . . . . . . . . . . 5.5.2.3 Error Control Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 55 . 55 . 56 . 56 . 60 . 62 . 64 . 68 . 69 . 71 . 74 . 74 . 77 . 77 . 77 . 78 4 Schmitt Trigger as an alternate to Buffer . . . . . . . . 4.1 Classical Schmitt Trigger . . . . . . . . . . . . . 4.2 Hysteresis in Schmitt Trigger . . . . . . . . . . . 4.3 CMOS Schmitt Trigger . . . . . . . . . . . . . . 4.4 Low Voltage Schmitt Trigger . . . . . . . . . . . 4.5 CMOS buffer . . . . . . . . . . . . . . . . . . . 4.6 Schmitt trigger as an alternate to buffer Insertion 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 25 26 27 CONTENTS x 5.6 5.5.2.4 CAC coding Schemes . . . . . . . . . . 5.5.2.5 Relationship between delay and crosstalk 5.5.2.6 Interconnect Power Model . . . . . . . 5.5.3 Comparison with existing bus coding technique . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 78 80 81 82 6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1 Scope of further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 List of Figures Figure 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Page The waveform for an 8 bit wide 1 mm long bus at 65nm technology . . . . . . . . . . Percentage of nets requiring buffers. M3 and M6 represent nets on third and sixth metal layer in a six metal layer technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . Buffers as a percentage of the total cell count for the chip. . . . . . . . . . . . . . . . . Hysteresis in Schmitt trigger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A conventional ASIC design flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A data path in a synchronous digital system . . . . . . . . . . . . . . . . . . . . . . . Components of dynamic power dissipation due to different capacitance sources: gate capacitance, diffusion capacitance, and interconnect capacitance. . . . . . . . . . . . . Interconnect coupling noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross section of an on-chip copper interconnect. . . . . . . . . . . . . . . . . . . . . . Current distribution in the cross section of an interconnect at high frequencies. Darker color indicates higher current density. . . . . . . . . . . . . . . . . . . . . . . . . . . Skin depth of Cu as a function of frequency. . . . . . . . . . . . . . . . . . . . . . . . Current distributions in the cross section of two parallel wires at high frequencies due to the proximity effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lumped interconnect models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circuit models of transmission lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling frequency dependent impedance with lumped elements. . . . . . . . . . . . Decoupling multiple parallel coupled interconnects. . . . . . . . . . . . . . . . . . . . An example of an A-tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shaping interconnect to minimize delay. . . . . . . . . . . . . . . . . . . . . . . . . . Staggering repeaters to reduce the worst case delay and crosstalk noise. . . . . . . . . Buffered interconnect tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of net-ordering and wire swizzling. . . . . . . . . . . . . . . . . . . . . . . Comparisions of Interconnect delay to gate delay . . . . . . . . . . . . . . . . . . . . Minimum signal propagation delay and transient power dissipation as a function of line width for a repeater system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uniform repeater system driving a distributed RC interconnect. . . . . . . . . . . . . . Wire sizing in a repeater insertion system . . . . . . . . . . . . . . . . . . . . . . . . Optimum numbers of repeaters for minimum propagation delay for different line widths. Optimum repeater size for minimum propagation delay for different line widths. . . . . Minimum signal propagation delay as a function of interconnect width (l=5mm). . . . xi 2 4 5 6 9 10 11 12 13 15 15 18 19 20 21 22 24 24 25 26 27 28 30 31 31 33 34 35 LIST OF FIGURES xii 3.8 3.9 3.10 3.11 3.12 3.13 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 Minimum signal delay as a function of interconnects width for different line lengths. . Dynamic power dissipation as a function of interconnect width for l=20 mm. . . . . . Total transient power dissipation as a function of interconnects width. . . . . . . . . . Interconnect area as a function of interconnects width for different line lengths. . . . . Total area of the repeaters as a function of the interconnect width for different line lengths. Product of interconnect and transistor area as a function of the interconnect width for different line lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Schmitt trigger implementation with comparator . . . . . . . . . . . . . . . . . . . . . Hysteresis in conventional Schmitt trigger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N-subcircuit driven by a voltage source: (a) circuit; (b) current-voltage characteristic; (c) superposition of N- and P-subcircuit characteristics. . . . . . . . . . . . . . . . . . 1 V CMOS Schmitt trigger circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.4 V CMOS Schmitt trigger circuit derived from 1 V Schmitt trigger . . . . . . . . . Measured hysterisis characteristics of 0.4 V CMOS Schmitt trigger circuit, and measured input-output waveform characteristics a Measured hysterisis characteristic of 0.4 V CMOS Schmitt trigger circuit b Measured input-output (Vin-Vout2) waveform characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMOS buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 bit bus with buffers to restore signals. . . . . . . . . . . . . . . . . . . . . . . . . . An RC interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interconnect structure used for simulations . . . . . . . . . . . . . . . . . . . . . . . . Output end signals on a 2mm, 5mm and 10mm RC interconnect at 180nm technology. Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 180nm technology. Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 90nm technology. Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 65nm technology. Buffers inserted in an RLC interconnect. . . . . . . . . . . . . . . . . . . . . . . . . . Delay reduction in 2mm interconnect with triangular input. . . . . . . . . . . . . . . . Delay reduction in 2mm interconnect with square wave input. . . . . . . . . . . . . . Delay reduction in 5mm interconnect with square wave input. . . . . . . . . . . . . . Delay reduction in 10mm interconnect with square wave input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delay reduction using Schmitt trigger approach in 2mm interconnect with square wave input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delay reduction using Schmitt trigger approach in 5mm interconnect with square wave input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delay reduction using Schmitt trigger approach in 10mm interconnect with square wave input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noise reduction using schmitt trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . Behavior of buffer and Schmitt trigger towards a noisy signal. . . . . . . . . . . . . . Data transfer on an 8 bit data bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data transfer on an 8 bit data bus in 65nm technology. . . . . . . . . . . . . . . . . . . A 3 Bit to 4 wire coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 37 38 39 40 40 45 47 47 48 51 51 52 52 53 57 57 60 61 61 62 63 63 64 65 65 67 67 69 70 70 72 73 75 76 79 LIST OF FIGURES 5.22 Transition Probability Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.23 Example of Transition Probability Graph . . . . . . . . . . . . . . . . . . . . . . . . . 5.24 Data signals rectified using Schmitt trigger approach in an 8 bit data bus. . . . . . . . . xiii 81 82 83 List of Tables Table 3.1 Page Uniform repeater system for different optimization criteria . . . . . . . . . . . . . . . 43 Projected advances in CMOS chip performance . . . . . . . . . . . . . . . . . . . . . Interconnect dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interconnect Resistance, Inductance and Capacitance values . . . . . . . . . . . . . . Propagation delay values for an interconnect of different length with and without buffer insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Power consumption values for an interconnect of different length with and without buffer insertion approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Propagation delay values for an interconnect of different length with buffer insertion and delay reduction using Schmitt trigger approach . . . . . . . . . . . . . . . . . . . 5.7 Power consumption values for an interconnect of different length with buffer insertion and reduction using Schmitt trigger approach. . . . . . . . . . . . . . . . . . . . . . . 5.8 Delay and Crosstalk Classes for various 3-bit combinations (transitions) . . . . . . . . 5.9 Propagation delay values for 8 bit buses of different length with buffer insertion and delay reduction using Schmitt trigger approach . . . . . . . . . . . . . . . . . . . . . 5.10 Power consumption values for 8 bit buses of different length with buffer insertion and reduction using Schmitt trigger approach. . . . . . . . . . . . . . . . . . . . . . . . . 56 58 59 5.1 5.2 5.3 5.4 xiv 66 68 71 73 80 84 84 Abbreviations VLSI : Very Large Scale Integration DSM : Deep Sub Micron ASIC : Application-specific integrated circuit RTL : Register transfer level VHDL : VHSIC hardware description language CMOS: Complementary metaloxidesemiconductor PTM : Predictive Technology Model PDP : Power-Delay-Product MOSFET : metaloxidesemiconductor field-effect transistor SA : Switching Activity TA : Transition Activity MCF : Miller’s Coupling Factor xv Chapter 1 Introduction 1.1 Objective In deep submicron (DSM) technologies, interconnects no longer behave as resistors but may have associated parasitics such as capacitance and inductance. With a linear increase in interconnect length, both the interconnect capacitance (C) and interconnect resistance (R) increase linearly, making the RC delay increase quadratically. Although the RC delay is not a precise measure of the time necessary for a signal to propagate through a wire, the total RC delay of a section of a line may be useful as a figure of merit. In order to increase the operating speed of an integrated circuit, it is necessary to reduce the RC delay. In addition to increased signal propagation delay, increased power dissipation is another effect of large interconnect impedance. The total RC delay of an interconnect line can be reduced drastically with the insertion of a signal amplifier known as a repeater. In CMOS technology, the simplest form of a repeater is produced from a two transistor inverter. But as is discussed in Chapter 3, buffer insertion is becoming a bulky technique for DSM technologies, requiring to find the solution with different approach. The objective of the thesis is to develop an alternative approach to buffer Insertion for the purpose of delay, power and noise reduction in VLSI interconnect in DSM technology. 1.2 Motivation With the continuous trend of Very Large Scale Integration (VLSI) technology scaling and frequency increasing, interconnect delay becomes a significant bottleneck in system performance [1, 2]. This trend is a result of increased resistance, capacitance and inductance of the interconnect when feature sizes enter the nanometer era. From International Technology Roadmap for Semiconductors (ITRS) projection, interconnect delay can contribute to more than 50% of the delay when the feature size is beyond 180 nm [3, 4]. As a result, delay optimization techniques for interconnect are increasingly important for achieving timing closure of high performance designs. 1 Figure 1.1 The waveform for an 8 bit wide 1 mm long bus at 65nm technology 2 Signals on an interconnect get highly distorted due to propagation delay and coupling effects of adjacent lines. The effect of this is shown in figure 1.1 for a group of 8 interconnects laid side by side at 65nm technology. This figure depicts the delayed signals on interconnects of length equal to 1 mm. There are not only visible propagation delays in each signal but also quite significant presence of noise glitches due to switching signals on adjacent lines. Hence along with power and delay, noise cancellation is also an important point to be noted while developing the algorithm/technique for better transmission. Reduction of delay and power consumption is the main motivation behind using repeater/buffer insertion technique. In this technique a large interconnect is broken into smaller pieces and joined with CMOS buffers. For example, assume a long interconnect has 5 units of resistance and 10 units of capacitance. The total RC delay would be 50 units. However, if five repeaters are inserted within this line to break the interconnect into five equal pieces, the RC delay would be 1 x 2 + 1 x 2+ 1 x 2+ 1 x 2 + 1 x 2 = 10 units. If the delay of the five repeaters is less than 40 units, then there is a speed benefit to inserting CMOS repeaters. Hence the solution for this problem has been approached in the same manner. 1.3 Literature Survey The objective of buffer insertion is to find where to insert buffers in the interconnect so that the timing requirements are met. Since the propagation (Elmore) delay has a square dependence on the length of an RC interconnect line, subdividing the line into shorter sections is an effective strategy to reduce the total propagation delay [6]. The interconnect can be subdivided into shorter sections by inserting repeaters, which break the quadratic dependence of the delay on the interconnect length but add additional parasitic impedances due to the inserted repeaters. Thus, an optimum number and size of repeaters exist that minimizes the total propagation delay of the line [6]. Buffer insertion for a single net or interconnect tree is a well-researched problem. Ginneken [9] proposed a time dynamic programming algorithm in 1991 to maximize the slack of the net that has a time complexity of O(n2 ). Since then, his algorithm has become a classic in this field and a substantial body of research has developed on the basis of van Ginneken’s algorithm. The work in [14] suggested a wire segmenting algorithm to be used as a precursor to van Ginneken’s algorithm resulting in faster run-time. Lillis et al. [13] extended the framework to minimize buffer cost while satisfying the timing requirements. Li et al. [15] improved the time bound on van Ginneken’s algorithm to O(nlogn). The authors of [16] proved that optimizing the total cost given arbitrary buffer costs is a NP-hard problem, and also suggested techniques to improve the efficiency of Lillis’ algorithm. Previous researchers [10, 11, 12] have taken other approaches to solve different variants of the buffer insertion problem like simultaneous routing, simultaneous gate sizing, and inclusion of slew and signal integrity constraints. 3 Figure 1.3 Buffers as a percentage of the total cell count for the chip. 1.3.1 Need for a better approach Buffer Insertion is a very effective approach for delay reduction. But as is clear from the above section, in every new generation deep submicron technology, buffer insertion is becoming a major problem, because of their number and also because they now a major source of power dissipation. Hence a trade-off is required between delay and power consumed. Thus there is a need for a new approach that while reducing the delay, also consumes less power. Schmitt trigger is a special logic element adjusted to work with analog input signals. The primary purpose of Schmitt trigger is to restore the shape of digital signals. Hence this element can replace buffer as far as restoring the signal is concerned. Because of transmission line effects digital shape transforms from square to trapezoid or triangle or more complex signal. Of course during transmission signals become noisy and distorted. Schmitt trigger is a comparator circuit but inside it is a positive feedback what results in hysteresis and memory effect. Compared to simple logical elements, Schmitt triggers have two threshold levels. Between these threshold values U1 and U2 the state of output doesn’t change leading to what is called a hysteresis. Such effect stabilizes output against rapid triggering by noise. The benefit of a Schmitt trigger over a circuit with only a single input threshold (such as buffer) is its greater stability (noise immunity). With only one input threshold, a noisy input signal near that threshold could cause the output to switch rapidly back and forth from noise alone. A noisy Schmitt Trigger input signal near one threshold can cause only one switch in output value, after which it would have to move beyond the other threshold in order to cause another switch. Schmitt trigger can be easily 5 Figure 1.4 Hysteresis in Schmitt trigger. implemented with 6 CMOS transistors. This implementation ensures more noise reduction and early rise and fall of signal, which causes less propagation delay too. Thus if Buffer is replaced with Schmitt trigger in interconnects, it is expected to achieve more noise, delay and power reduction. In this thesis the advantages of having Schmitt trigger in place of buffer in an interconnect are shown in detail. 1.4 Contribution of the Thesis Delay and noise are two equivalent factors in DSM technology. For the purpose of signal restoration and to handle the on-chip delay and noise, buffer insertion technique has been modified and Schmitt trigger is used to replace it in VLSI interconnects at all the possible nodes. In Schmitt trigger, the threshold voltage of the device can be adjusted, so if it is set to low then it can get an early rise in rising signals and hence less propagation delay. The results of this replacement approach for various lengths of linear interconnect for all technology nodes are compared in this work. It is shown in results that the proposed technique is better for all the technologies. Since Schmitt trigger has the property of dual threshold, hence this provides better noise immunity to the circuit. Better results are observed when noise reduction results are compared for Buffer insertion and Schmitt trigger approach. The same replacement approach has been proposed for data buses as an alternate to bus coding techniques for delay, crosstalk noise and power reduction. It has been compared with some of the existing bus coding techniques and found to be better than them. 1.5 Organization of the Thesis Rests of the chapters in this thesis are organized as follows: 6 • Chapter 2 provides a description of interconnects while explaining the interconnect design criteria, their basic properties and the models to represent them in circuits. Various existing models for interconnect design are discussed in this chapter. Then it deals with the existing problems in interconnects and their growing trends with next coming technologies, the possible solutions and effectiveness of these solutions. • Chapter 3 gives an introduction to conventional buffer insertion technique for the purpose of signal restoration and delay reduction. Benefits of buffer insertion in linear interconnect and their use in delay and noise reduction are explained along with various buffer insertion existing in literature. This chapter provides the understanding of basics about propagation delay, power dissipation and design criteria. It is also shown how buffer insertion is becoming a bulky technique and going to consume more and more resources in incoming technologies. Limitations of buffer insertion technology in terms of area and power consumption are discussed in the end. • Chapter 4 introduces Schmitt trigger. History, invention and basic circuit implementation of Schmitt trigger are discussed in early sections of the chapter. Implementation and working of Schmitt trigger is discussed in detail. CMOS Schmitt triggers are mentioned in the later sections of the chapter. Benefits of Schmitt trigger over buffer for the purpose of signal restoration and delay reduction is discussed in the end. • Chapter 5 contains the simulation results for all types of interconnects, namely local, intermediate and global, with existing as well as proposed approaches. First of all the problems in Interconnects are simulated and then the conventional solution of buffer insertion. Simulations are done in for 180nm to 65nm nodes using PTM parameters with H-Spice tool. Simulations are based on the following criteria: Propagation Delay Noise reduction Power reduction • Chapter 6 draws conclusions of the thesis. 7 Chapter 2 Introduction to Interconnects Due to the importance of interconnects in current and future ICs, significant research has been published over the past several decades, covering different areas such as parasitic extraction, interconnect models, and interconnect design methodologies. In this chapter, a brief review of the background of on-chip electrical interconnect is provided. In Section 2.1, a typical design flow for application-specific integrated circuits (ASIC) is described. Challenges in DSM technologies due to interconnect dominant behavior are discussed. In Section 2.2, different design criteria that need to be considered during the interconnect design procedure are described. The impedance characteristics of interconnect are presented in Section 2.3; specially, the resistance, capacitance, and inductance. Interconnect models and design methodologies are reviewed in Sections 2.4 and 2.5, respectively. Finally, some conclusions are offered in Section 2.6. 2.1 Design Flows for DSM ASICs A conventional design flow for ASICs is shown in Fig. 2.1 [19]. A typical design process can be divided into two steps: functional design (front-end) and physical design (back-end). The functional design phase includes functional specification, VHDL/Verilog coding in the register transfer level (RTL), and logic synthesis. A gate level netlist is generated as the result of logic synthesis. Functional design is implemented during the front-end design process. The back-end physical design process converts a gate level netlist into a layout, including floorplaning, module placement, and interconnects routing. From the physical layout, parasitic impedances are extracted. A post-layout timing analysis tool is used to detect any timing violations. Necessary corrections are made in the physical layout or gate level netlist to fox these violations. This design flow is successful for those technologies where gate delays dominate. The timing of the circuits is determined by the gate types and loads. The effect of the interconnect parasitic impedances typically produces only a few timing violations in a medium speed application, making the design flow efficient. With interconnect becoming increasingly important, the interconnect delay needs to be considered during the functional design process. Due to the lack of placement and routing information, the interconnect delay is approximated with statistical fan-out based wire load models. The circuit design based on these inaccurate delay models can produce a large 8 number of timing violations. Design iterations are usually required to achieve timing closure. A method to alleviate this problem is to introduce physical information earlier into the logic synthesis stage. An initial floor plan is created before the synthesis procedure to provide an estimate of the location of the cells as well as the interconnect lengths. A timing model based on this estimation is significantly more accurate, making the synthesis process more efficient and resulting in a placed gate level netlist. This synthesis procedure is called physical synthesis. In the DSM regime, the functional and physical design processes are no longer separated, requiring tight integration of the front-end and back-end design processes. Interconnect plays an important role in both the physical synthesis and timing verification stages Figure 2.1 A conventional ASIC design flow. in the design flow. Requirements placed on the interconnect analysis are different in these two stages. During the synthesis process, since the detailed routing information is not available, higher efficiency with reasonable accuracy is preferred, such as closed-form models. In the post-layout verification stage, realistic timing information describing the entire IC is determined, requiring both high efficiency and high accuracy. 9 2.2 Interconnect Design Criteria Since interconnect has become a dominant issue in high performance ICs, the focus of the circuit design process has shifted from logic optimization to interconnect optimization. Multiple criteria should be considered during the interconnect design process, such as delay, power dissipation, noise, bandwidth, and physical area. These criteria are individually discussed in the following subsections. 2.2.1 Delay Interconnect delay is a primary design criterion due to the close relationship to the speed of a circuit. Early interconnect design methodologies [20] focused primarily on delay optimization. A typical data path in a synchronous digital circuit is shown in Fig. 2.2. In the case of zero clock skew, the minimum allowable clock period is Tp min = TC Q + Tint + Tlogicmax + Tsetup (2.1) where TC Q is the time required for the data to leave the initial register after the clock signal arrives, Tint is the interconnect delay, Tlogicmax is the maximum logic gate delay, and Tsetup is the required setup time of the receiving register. From (interconnect logical), by reducing Tint , the clock period can be decreased, increasing the overall clock frequency of the circuit (assuming the data path is a critical path). In advanced microprocessors, multiple computational cores can be fabricated on the same die [17]. Figure 2.2 A data path in a synchronous digital system Communication among these cores and on-chip memories generally requires multiple clock cycles. Sometimes the computational core enters an idle state waiting for the required data or control signals from other regions of the IC. The computational resource of these cores, therefore, cannot be efficiently utilized due to the large amount of multi-cycle communication. By reducing the interconnect delay, the speed of the system, i.e., the computational efficiency of the cores, can be improved at the architecture level. 10 2.2.2 Power Dissipation Due to higher clock frequencies and on-chip integration levels, power dissipation has significantly increased. The on-chip power dissipation of current state-of-the-art microprocessors is on the order of hundreds of watts and the power density has exceeded the power density of a kitchen hot plate. In Fig. 2.3 power-sharing, the components of dynamic power due to different capacitance sources are shown for a state-of-the-art microprocessor [21]. The dynamic power due to the interconnect capacitance can be greater than 50% of the total dynamic power. Furthermore, the repeaters and pipeline registers inserted in the interconnect introduce additional dynamic, leakage, and short-circuit power . High power dissipation increases the packaging cost due to heating problems and shortens the battery life in portable applications. Power dissipation, therefore, is another important criterion in interconnect design. Figure 2.3 Components of dynamic power dissipation due to different capacitance sources: gate capacitance, diffusion capacitance, and interconnect capacitance. 2.2.3 Noise With interconnect scaling, coupling capacitance between (and among) interconnects dominates the ground capacitance. Furthermore, inductive coupling has to be considered due to increasing signal frequencies, making coupling noise more significant (and complicated). Interconnect coupling induced noise can be classified into two categories: voltage level noise and delay uncertainty, as shown in Fig. 2.4 interconnect-coupling. Noise may cause a malfunction in the circuit if the noise level is greater than a certain threshold, thereby reducing yield. In addition to coupling effects, delay uncertainty can 11 Figure 2.4 Interconnect coupling noise. also be caused by other factors, such as process variations (on both interconnects and the inserted repeaters or pipeline registers), temperature variations, and power/ground noise. Delay uncertainty is both spatially dependent (due to process variations) and temporally dependent (due to coupling, temperature variations, and power/ground noise). Timing margins are assigned to manage this delay uncertainty, thereby increasing the clock period and reducing the overall performance of the circuits. When delay uncertainty exceeds these margins, setup or hold violations may occur, reducing the yield. 2.2.4 Physical Area With technology scaling, billions of transistors can now be integrated onto a single monolithic die. The number of interconnects has therefore also significantly increased. The die size, however, is expected to remain approximately fixed for future technologies as predicted in [18]. The number of metal layers, therefore, needs to be increased to provide sufficient metal resources for interconnect routing. Increasing the number of metal layers, however, increases the fabrication cost. Furthermore, buffers and pipeline registers inserted along the interconnects make the constraint on silicon area more stringent. The area criterion, therefore, should be considered during the interconnect design processes, such as wire sizing and repeater insertion. 2.3 Interconnect Characteristics The impedance characteristics of on-chip interconnect includes the resistance, capacitance, and inductance. These parameters can be extracted from the geometry of the interconnect structures, as illustrated in the following subsections. 12 2.3.1 Resistance For a conductor with a rectangle cross-section, the resistance is described by the following expression, l (2.2) WH Where ρ is the material resistivity. l, W, and H are the length, width, and thickness of the interconnect, respectively. In present DSM CMOS technologies, copper has been adopted to replace aluminum as the primary interconnect material due to the lower resistivity of copper as compared to aluminum. Due to specialized processing and operating conditions of the on-chip copper interconnect, certain non-ideal effects need to be considered, making the effective resistivity deviate from the idea bulk resistivity. R=ρ∗ Figure 2.5 Cross section of an on-chip copper interconnect. 2.3.1.1 Diffusion barrier For on-chip Cu interconnect, a thin and highly resistive barrier layer is built on three sides of the interconnect to prevent Cu from diffusing into the surrounding dielectric, as shown in Fig. 2.6. This barrier layer consumes part of the cross sectional area allocated to the interconnect. The effective resistivity ρb due to this barrier induced reduction in the cross sectional area is rhob = ρ0 (1 − Ab WH ) (2.3) Where ρ0 is the bulk resistivity at a given temperature, and Ab is the cross sectional area occupied by the barrier layer. 13 2.3.1.2 Surface and grain boundary scattering When the dimensions of the interconnect are scaled deep into the DSM regime, the resistivity of the interconnect increases as the wire dimensions shrink. This behavior is due to surface and grain boundary scattering [22], as illustrated in Fig. 2.7. The electron mean-free path λ of copper is 42.1 nm at 0 degree Celsius. [22]. When any dimension of the wire shrinks to the order of λ, the electrons will experience more collisions at the surface, increasing the effective resistivity. A typical value of ρ for copper is 0.47 [22]. Note that in (2.6) and (2.7), only one dimension (thin film structure) surface scattering is considered. For thin wires with two-dimensional surface scattering effect, the effective resistivity is larger. A reduced k is used in [24] to consider this two-dimensional surface scattering effect. 2.3.1.3 Temperature effect The resistivity of copper increases approximately linearly with temperature and can be characterized as ρt = ρ0 (1 + βδt) (2.4) where β is the temperature coefficient of resistivity (TCR) and δ T is the difference in temperature from a reference temperature. Since the electron mean-free path λ will decrease with increasing temperature, the k will be resulting in a smaller ratio of ρs /ρ0 . The TCR for thin-film interconnect, therefore, is smaller than that of bulk Cu [23]. 2.3.1.4 High frequency effects At sufficiently high frequencies, the current density in an interconnect is no longer uniform, as shown in figure 2.8. The current tends to flow near the interconnect surface. This phenomenon is called the skin effect [25]. The effective cross sectional area of the interconnect is reduced, thereby increasing the interconnect resistance. The skin depth is the distance below the conductor surface where the current density drops to 1/e of that at the surface, and is determined as: r δ(f ) = ρ πµf (2.5) where µ is the permeability in the conductor. Expression (2.4) actually characterizes the DC resistance, and is no longer accurate when δ is smaller than the wire cross sectional dimension. The skin depth of bulk Cu as a function of frequency at 20 degree Celsius is shown in figure 2.7. As the frequency increases to tens of GHz, the skin depth enters the DSM region and decreases slowly. Whether to consider these non-ideal effects depends upon the accuracy requirements of the models and 14 Figure 2.6 Current distribution in the cross section of an interconnect at high frequencies. Darker color indicates higher current density. Figure 2.7 Skin depth of Cu as a function of frequency. 15 the operating regime of the circuits. Often more than one effect needs to be simultaneously considered. For example, the skin effect and surface scattering effect when simultaneously considered is known as the anomalous skin effect (ASE). 2.3.2 Capacitance Since interconnect delay dominates gate delay in the DSM regime, the requirement on the accuracy of parasitic extraction of the interconnect impedances increases. 2-D or 3-D extraction is generally required. A 3-D field solver, such as FastCap [26],can provide accurate capacitance results, however, with large timing and memory requirements. With increasing integration, the number and geometric complexity of the on-chip interconnects drastically increases. It is, therefore, not practical to apply a field solver to an entire IC. Modern 3-D on-chip capacitance extraction can be divided into three steps. Initially, test patterns are measured or simulated with a 2-D or 3-D field solver. The generated data are used to derive closed-form formulae or to build look-up tables. The geometric parameters of the interconnects are extracted next. Finally, the geometric parameters are matched to the test patterns, and the capacitance values are obtained through formulae or look-up tables. Due to the short-range nature of electrostatic interaction, only the nearest neighbors are considered during the process of capacitance extraction. The capacitance matrices, therefore, are fairly sparse. Interconnect capacitance is composed of two components, the capacitance between the interconnect and adjacent metal layers or substrate Cg , and the coupling capacitance between neighboring interconnects in the same layer Cc .Cc is expected to dominate Cg in the DSM regime due to the increasing aspect ratio and decreasing wire spacing. In early stage interconnect design and analysis, adjacent layers are generally treated as a ground plane for capacitance extraction. By numerical fitting, closed-form capacitance expressions have been derived for parallel lines above one ground plane or between two ground planes in [27, 28]. 2.3.3 Inductance As compared with resistance and capacitance, the interconnect inductance is significantly more difficult to extract. One reason for this difficulty is due to the loop-based inductance definition, Lij = ψij Ij (2.6) Where ψij is the magnetic flux in loop i induced by the current Ij in loop j. To form a loop, the current return paths need to be identified. The current distribution in a circuit, however, a priori depends on the interconnect characteristics. The effect of inductance in wide global interconnects in top metal layers is more significant than that of local interconnects in lower metal layers. Since the wires in adjacent layers are generally orthogonal, adjacent layers can no longer be treated as a ground plane as in capacitance extraction. Another reason for the difficulty in inductance extraction is due to long range inductive coupling effects. Artificially restricting the inductance extraction to nearby geometries not only introduces inaccuracy but may also result in unstable models. The pattern matching method 16 used for capacitance extraction, therefore, can not be used for inductance extraction due to the complex geometries surrounding the wire. 2.3.3.1 Partial inductance One way to avoid determining a priori the current return path is to use the concept of partial inductance [28]. In determining the partial inductance, the flux area extends from the conductor to infinity. The loop inductance of a closed loop can be uniquely determined by the partial self-inductance of each segment of the loop and the partial mutual inductance between any pair of those segments. The partial inductance is used in partial element equivalent circuit (PEEC) models, which can be used to accurately simulate a circuit. Partial inductance nonlinearly depends upon the interconnect length. This behavior is the result of inductive coupling among different segments of the same line [25]. For a loop formed by two closely placed parallel interconnects (where the length of the loop is more than ten times longer than the loop width), the loop inductance depends linearly on the length of the loop. Note that the inductance of a wire not forming a closed loop has no physical meaning [28]. When applying the concept of partial inductance in circuit models, all of the wires that form the current loops should be included, e.g., the reference ground lines. The current return paths are determined from circuit simulation. The PEEC model generally results in huge and dense inductance matrices, increasing the computational complexity of the simulation. Various methods have been presented to sparsify the inductance matrices [29], such as the shell technique, the halo technique, and the K matrix technique. 2.3.3.2 Loop-based inductance As an alternative to the PEEC model, a loop-based inductance model is preferred in welldesigned interconnect structures, such as shielded buses and clock distribution networks. In early design stages, a good assumption regarding the current return path is the nearby power/ground networks, since these tracks are generally wide with low resistive impedance. ’FastHenry’ is a commonly used numerical tool for extracting the partial or loop inductance of simple interconnects structures. By estimating the distribution of the return current, more accurate loop-based inductance models have been developed [30, 31]. 2.3.3.3 High frequency effects Inductance is also a function of frequency due to the variation of the current distribution with frequency. In addition to the skin effect mentioned in Subsection 2.3.1, the current distribution inside a conductor also changes with frequency due to the proximity effect [25]. The proximity effect in two parallel interconnects is illustrated in figure 2.8. If the current in these two wires flows in opposite directions, the currents concentrate towards each other, as shown in Fig. 2.10(a); otherwise, the two currents shift away from each other, as shown in Fig. 2.10(b). Both the skin effect and the proximity effect are essentially due to the same mechanism. The current tends to concentrate closer to the current 17 Figure 2.8 Current distributions in the cross section of two parallel wires at high frequencies due to the proximity effect. return path in order to minimize the inductance [35]. Note that at high frequencies, the resistance of a conductor also depends on the surrounding signal activities due to the proximity effect. Another effect of frequency on the inductance is due to multi-path current redistribution [34]. In an integrated circuit, there are many possible current return paths, e.g., the power/ground network, nearby signal lines, and the substrate. The distribution of the return current among these possible paths is determined by the impedance of the individual paths. At different frequencies, the relationship among the impedances of different paths will change, as well as the distribution of the return current, as shown in Fig. 2.11. The return current is distributed in those paths so as to minimize the total impedance at a specific frequency [25]. 2.4 Interconnect Models Interconnect modeling is critical in both the circuit design and verification processes. An efficient and accurate interconnect model can significantly enhance these processes. In Subsections 2.4.1 and 2.4.2, models of single interconnect and coupled interconnects are described, respectively. 2.4.1 Single Interconnect The single interconnect model is the basis for many interconnect network simulation tools. Various on-chip interconnect models have been presented over the past several decades, from lumped C/RC/RLC models to distributed transmission lines. A tradeoff between efficiency and accuracy is required in selecting the appropriate model. 2.4.1.1 Lumped models For local interconnects with a length of tens of micrometers and below, the circuit behavior is typically dominated by the capacitance and effective resistance of the gates. Modeling the interconnect as a lumped capacitance or lumped RC structure is generally sufficiently accurate. Commonly used lumped models include L, T, and π shaped structures, as depicted in figure 2.9. 18 Figure 2.9 Lumped interconnect models. 2.4.1.2 Distributed models For long intermediate and global interconnects, the signal propagation delay along the interconnect is larger than the gate delay. In this case, the distributed characteristics of the interconnect should be considered. Distributed interconnect can be characterized by the Telegrapher’s equations in transmission line theory, ∂V = −(R + sL) ∗ I (2.7) ∂x ∂I = −CV ∂x (2.8) Where R, L, and C are the interconnect impedance parameters per unit length, x is the distance along the interconnect, and s is the complex frequency. The conductance between the signal line and ground can typically be ignored in on-chip structures. If the interconnect is non-uniform, these parameters are a function of x. If frequency dependent effects need to be considered, these interconnect parameters are also a function of s. Besides the difficulties in inductance extraction, including inductance in the model also makes circuit analysis more complicated due to inductance induced signal reflection, ringing, and coupling effects. A figure of merit to characterize the condition when on-chip inductance should be considered is presented in [35], t 2 √r < l < p 2 LC R L/C (2.9) Where tr is the signal transition time and l is the interconnect length. Transmission line models are based on transverse electro-magnetic (TEM) mode or quasiTEM mode wave propagation. The TEM or quasi-TEM mode assumption is valid when the line crosssectional dimension is much smaller than the wavelength. This requirement can be generally satisfied in on-chip structures. For example, the wavelength of a 100 GHz frequency component is on the order of 1 mm, which several orders greater than the cross-sectional dimension are of interconnects in DSM technologies. When using a transmission line model, both the resistance and the inductance should be 19 extracted from the loop formed by the signal line and the ground return path. Since the resistance of the ground return path is generally much smaller than that of the signal line, the resistance of the ground can be ignored. In a typical circuit representation of a transmission line, the loop inductance is assigned to the signal line as shown in Fig.2.10. Figure 2.10 Circuit models of transmission lines. 2.4.1.3 Lumped representation of distributed interconnects A transient time domain simulation of a transmission line can be grouped into two categories: impulse response convolution and lumped equivalent circuits [35]. In the first method, the transmission line is initially analyzed in the frequency domain. Next, a time domain impulse response (called a Green’s function) is obtained based on the frequency domain solution. Finally, the time domain solution is determined by convolving the Green’s function with the voltages at the line ports. Accurate results can be provided with the penalty of long simulation times and excessive memory requirements due to the convolution procedure. Furthermore, this method is not compatible with general circuit simulators, such as SPICE. The second method is to partition the transmission line into a number of segments and model each segment as a lumped structure. Additional segments provide more accurate results, but consume more computational resources. The key issue in this method, therefore, is to determine the 20 appropriate number of segments. Using lumped models to represent a distributed transmission line introduces inaccuracy when evaluating circuits that operate at high frequencies. The highest frequency of interest, therefore, should be determined in order to evaluate the maximum error induced by using lumped models. The frequency domain representation of a normalized saturated ramp signal with rise time tr is Vr (s) = 2.4.1.4 (1 − str ) (tr ∗ s2 ) (2.10) Modeling frequency dependent effects After partitioning a distributed line into lumped segments, frequency dependent effects can be modeled in each segment by a ladder structure of frequency independent lumped RL elements, as shown in figure 2.11. Additional ladder stages provide higher accuracy when operating at high frequencies. Two stages are used in [30] and three stages are used in [31, 36]. The value of the circuit elements can be obtained by matching the impedance of the model to the extracted impedance at different frequencies. Figure 2.11 Modeling frequency dependent impedance with lumped elements. 2.4.2 Parallel Coupled Interconnects Modeling parallel coupled interconnects draws special attention in the circuit design process due to the commonly used bus structure [37]. A general solution for coupled multiconductor systems is composed of two steps, decoupling the systems into independent interconnects, followed by applying 21 Figure 2.12 Decoupling multiple parallel coupled interconnects. single line models to each of these interconnects. The decoupling procedure is illustrated in figure 2.12. The Telegrapher’s equation describing a coupled multiple interconnect system becomes ∂V = −(R + sL) ∗ I ∂x (2.11) ∂I = −CV ∂x (2.12) Where V and I are vectors of voltage and current along N coupled interconnects. R, L and C are the matrices characterizing the impedance parameters per unit length. The use of (2.11 and 2.12) assumes that the capacitive and inductive coupling among interconnects is restricted in the direction perpendicular to the direction of the signal propagation, i.e., forward coupling [38] is ignored. For well designed circuits, this simplification is often valid [38]. By applying a modal analysis [37], a coupled multiconductor system can be decoupled, i.e., the impedance matrices R + sL and sC in (2.11 and 2.12) can be converted into (much simpler) diagonal matrices. The modal decoupling method, however, generally is not analytically tractable, except for certain special cases, such as two identical interconnects [40], multiple lossless wires [41], wires in a homogeneous dielectric , and 22 wires only coupled to direct neighbors. In general, the computational complexity required to decouple a large number of coupled lossy interconnects with a modal analysis is high. 2.5 Design Methodologies for Interconnect Since interconnect plays an important role in ICs, interconnect design methodologies have been developed at different levels to satisfy specific performance requirements. In Subsection 2.5.1, interconnect topology optimization methods are discussed, where interconnect trees are constructed. Wire geometry optimization methods are reviewed in Subsection 2.5.2. Circuit level interconnect design methodologies are described in Subsections 2.5.3, 2.5.4, and 2.5.5, including buffer insertion, shielding techniques, and net-ordering/wire swizzling, respectively. 2.5.1 Constructing an Interconnect Tree An interconnect tree network is a commonly used structure in ICs. Signals are transmitted from the root of a tree to each leaf of the tree. When the circuit is dominated by the gates, the interconnects can be modeled as a lumped capacitance. A minimum Steiner tree (MST) is generally constructed in this case such that the total wire length required to connect the source and sinks is minimized. The capacitance of the tree, therefore, is also minimized, as well as the circuit delay and dynamic power. With the circuit now dominated by the interconnect, both the interconnect resistance and inductance need to be considered during the tree construction process. In this case, the delay at different sinks is different. The required arrival time at each sink is also different. The slack at a node is defined asTslack = Trat − T delay (2.13) Where Trat is the require arrival time at that node and Tdelay is the delay from the source to that node. In a properly designed tree, the slack at the source should be maximized for high performance while minimizing the area and power overhead. Some examples of tree constructions are A-tree, P-tree, and C-tree. In an A-tree, the Manhattan distance from the source to each sink is minimized. Subject to this constraint, the total wire length is also minimized. An example of an A-tree is illustrated in Fig.2.13. During constructing of a P-tree, the solution space is limited to a set of topologies induced by a permutation on the sinks. From this solution space, the optimal solution is chosen based on the delay or delay-area product. In the C-tree, the sinks are first clustered according to the spatial, temporal, and polarity properties. After the clustering procedure, tree structures are built within and among this clusters. 23 Figure 2.13 An example of an A-tree. 2.5.2 Wire Sizing, Shaping, and Spacing Given a metal layer in a specified technology, the thickness of the wires and inter-layer dielectric (ILD) is fixed. The wire width and space, however, can be varied to satisfy different design criteria. By explicitly characterizing the relationship between the interconnect impedance and wire geometries, tradeoffs among the delay, bandwidth, and power of the global interconnect can be made. In [52], the effects of inductance are included during the wire width optimization process to lower the power dissipation. Figure 2.14 Shaping interconnect to minimize delay. It is known that the optimal shape of an RC interconnects that minimize the Elmore delay is an exponential taper, as shown in Fig. 2.14. Wire tapering increases the wire width near the driver and decreases the wire width near the load. Since the near end resistance sees more downstream capacitance than the far end resistance, assigning less resistance to the near end than to the far end will reduce 24 the total RC delay. In [44], the optimal shape of an RC interconnect is also shown to be exponential. Exponential shaping, however, is more difficult to implement than uniformly sized wires. 2.5.3 Repeater Insertion The delay of an RC interconnect is 0:377 RCl2 , which is proportional to the square of the wire length l. By splitting the interconnect into k segments with repeaters, the interconnect delay term is reduced to 0:377 RCl2 /k. These repeaters, however, introduce additional gate delay. The optimal number and size of the repeaters can be determined to achieve the minimum delay [20]. As signals propagate along the interconnect, sharper transition edges are regenerated by the repeaters, increasing the bandwidth of the interconnect. By dividing the interconnect into segments, the coupling between interconnects is also reduced due to the shorter length of coupling between neighboring lines. Inserting repeaters in long interconnects, however, introduces an area and power penalty. A tradeoff among different design criteria is, therefore, required for an effecient repeater insertion methodology. Figure 2.15 Staggering repeaters to reduce the worst case delay and crosstalk noise. In [44], a repeater staggering technique is proposed to reduce the worst case delay and crosstalk noise in bus structures. As shown in Fig. 2.15, the repeaters in adjacent wires are interleaved. By placing a repeater in the middle of two repeaters in adjacent wires, a potential worst case capacitive coupling only persists for half the wire length. For the other half length, the capacitive coupling is the best case. The worst case delay as well as the delay uncertainty can therefore be reduced. One of the advantages of this technique is that no additional area overhead is required. By staggering the repeaters, the inductive coupling among the wires can also be averaged. As shown in Fig. 2.15, for two simultaneously switching adjacent wires, the direction of current is the same for half the wire length and opposite for the other half length. Inductive coupling due to the current flowing in different directions in the neighboring wire can be cancelled. In [45], the optimum position of staggered repeaters is determined for RC interconnect to achieve the minimum worst case delay. 25 Figure 2.16 Buffered interconnect tree. Another significant application of repeater insertion is the buffered tree. The repeaters inserted in an interconnect tree are also called buffers. Buffer insertion in tree structures is an important design tool for interconnect optimization. Van Ginneken presented a dynamic programming algorithm to insert buffers in a Steiner tree to minimize the Elmore delay. Van’s algorithm is composed of two phases. The first phase is a bottom-up process, where all of the possible buffer insertion candidates are determined for each node in the tree. In this process, those suboptimal candidates are eliminated such that the number of candidates does not increase exponentially. After the candidates at the root are determined, the candidate with the maximum slack is chosen. The second phase traces back the computations in the first phase from this candidate and places buffers at the appropriate locations. Various extensions to this algorithm have been presented in the last decade which considers low power, blockage constraints, and more accurate delay models. In a properly designed buffered tree, as shown in Fig. 2.16, the buffers should be inserted in the following situations: 1. Splitting long interconnect (buffers 1 and 2); 2. Isolating large capacitances from the critical path (buffer 3); 3. Cascading buffers to drive large capacitances (buffers 4, 5, and 6); 4. Reversing the signal polarity if necessary (inverter 7). Note that interconnect tree construction, buffer insertion, and wire sizing can be performed simultaneously in order to achieve an optimal solution. 2.5.4 Shielding Techniques Shielding techniques are widely used in ICs to reduce capacitive and inductive coupling. By inserting a shield line (generally connected to the power or ground grid) between signal lines, the effective capacitance of the interconnect is almost fixed and no longer depends upon the signal switching 26 Figure 2.17 Examples of net-ordering and wire swizzling. activity. With shielding, the normalized peak crosstalk noise can be reduced to less than 5% of Vdd for RC interconnect with lengths ranging up to 2 mm. Inductive coupling can also be reduced by inserting a shield line, though not as efficiently as reducing capacitive coupling due to the long range magnetic coupling property. The shield line provides a nearby current return path, reducing the self and mutual inductance of the signal lines. Due to the importance of the on-chip clock signal, the clock distribution network in a high speed circuit is generally shielded on both sides in the same layer . Additional parallel shielding in the N-2 layer has been reported in [46] to further prevent inductive coupling from the lower layers. The primary drawback of the shielding technique is the overhead of the metal resources. 2.5.5 Net-Ordering and Wire Swizzling Interconnect coupling is closely related to the signal switching activity. For example, simultaneously opposite switching on two adjacent RC lines produces the worst case delay. By ordering the nets such that the sensitive nets are not placed adjacent to each other, the total capacitive coupling among the nets can be minimized. Examples of net-ordering and wire swizzling are shown in Fig. 2.17. The net-ordering technique, however, is less efficient in reducing long range inductive coupling. In [47], the net-ordering and shield insertion techniques are simultaneously performed to minimize both capacitive and inductive coupling. In wire swizzling, the wires are split into several segments, and the wire sequences in each segment are changed, such that the capacitive coupling among the wires averages out for each wire, reducing both the worst case delay and the delay uncertainty. For a group of k wires, the number of permutations required to realize all possible adjacencies is k/2. For the example shown in Fig. 2.17, k = 4 and two permutations are required: 1234 and 2413. In [48], it is also shown that the mutual inductance in a bus structure can be reduced by wire swizzling. 27 Chapter 3 Buffer Insertion Over the past 10 years, the source of the critical signal delays has undergone a major transition. With the scaling of active device feature sizes into the deep submicron regime, the on chip interconnect has become the primary bottleneck in signal flow within high complexity, high speed integrated circuits (ICs).The smaller feature size in DSM technology nodes reduces the delay of the active devices, however, the effect on delay due to the passive interconnects has increased rapidly, as described by the 2005 International Technology Roadmap for Semiconductors (lTRS). The transition from an IC dominated by gate delays for feature sizes greater than 250um to where the interconnects are the primary source of delay is graphically illustrated in Fig 3.1. As noted in the figure, the disparity between the relative delay of the interconnect and active devices is exacerbated in each successive technology node. The local wire delay decreases with feature size due to a reduction in the distance among the active devices. Special attention must, however, be placed on the global lines, since the overall speed of current ICs is most often limited by the long distance global interconnects. Figure 3.1 Comparisions of Interconnect delay to gate delay In this chapter we discuss the concept of buffer/repeater insertion in interconnects. This topic is a highly researched one and various works have been done in optimizing interconnect delay with the help of buffer insertion. 28 3.1 Background As VLSI technology moves into the nanoscale regime, interconnect delay becomes a dominant constraint in circuit design. A great amount of effort has been made to reduce interconnect delay and buffer insertion appears tobe a very effective technique. It is witnessed in [13] that a large number of buffers are needed with current IC technology. In two recent IBM ASIC designs, 25% gates are buffers [14]. Interconnect design has become a dominant issue in high-speed integrated circuits (ICs). With the decreased feature size of CMOS circuits, on-chip interconnect now dominates both circuit delay and power dissipation. Many algorithms have been proposed to determine the optimum wire size that minimizes a cost function such as the delay [49]. According to [2], the number of long interconnects doubles every three years thus increasing the importance of on-chip interconnect further. The behavior of inductive interconnect can no longer be neglected, particularly in long, low-resistance interconnect lines [3]. As on-chip inductance becomes important, some wire optimization algorithms have been enhanced to consider RC impedances [4]. Uniform repeater insertion is an effective technique for driving long interconnects. Based on a distributed RC interconnect model, a repeater insertion technique to minimize signal propagation delay was introduced in [5]. A uniform repeater structure decreases the total delay as compared to a tapered buffer structure when driving long resistive interconnects while buffer tapering is more efficient for driving large capacitive loads [6, 7]. Different techniques have been developed to enhance the model of a repeater system that considers a variety of design factors [8,14]. The drain/source capacitance of each repeater and multistage repeaters are considered in [15]. Noise-aware techniques for repeater insertion and wire sizing have been described in [16-19]. In [20,22], signal integrity, interconnect reliability, and manufacturability issues have been discussed. The work described in [23] assumes that increasing the interconnect width while maintaining the thickness, spacing, and height from the substrate does not reduce the signal delay since the resistance decreases and the capacitance increases. This assumption however is not accurate. Different factors affect the total delay such as the coupling capacitance, the driver size, and the load capacitance. Furthermore, with increasing inductive impedances, trends in the propagation delay with changing line width depend upon the number of repeaters and the size of the inserted repeaters. For an RC line, repeater insertion outperforms wire sizing [24]. It is shown in this chapter that this behavior is not the case for an RC line. The minimum signal propagation delay always decreases with increasing line width for RC lines if an optimum repeater system is used. With increasing demand for low-power ICs, different strategies have been developed to minimize power in the repeater insertion 29 process. Power dissipation and area overhead have been considered in previous work [25-30]. The line inductance, however, has yet to be considered in the optimization process of sizing a wire driven by a repeater system. As shown in Fig. 3.2, the minimum delay for a signal to propagate along an RC line decreases while the power dissipation increases for wider interconnect [31]. Figure 3.2 Minimum signal propagation delay and transient power dissipation as a function of line width for a repeater system. 3.2 Repeater / buffer insertion process: An overview The primary objective of a uniform repeater insertion system is to minimize the time for a signal to propagate through a long interconnect. Uniform repeater insertion techniques divide the interconnect into equal sections and employ equal size repeaters to drive each section as shown in Fig. 3.3. In some practical situations, the optimum location of the repeaters cannot be achieved due to physical space constraints. Also changing the repeater size can compensate for a change in the ideal physical placement. Bakoglu and Meindl have developed closed-form expressions for the optimum number and size of repeaters to achieve the minimum signal propagation delay in an RC interconnect [5]. Adler and Friedman characterized a timing model for a CMOS inverter driving an RC load [32, 33]. They used this model to enhance the accuracy of the repeater insertion process in RC interconnects. Alpert considered the interconnect width as a design parameter [24]. He showed that, for RC lines, repeater insertion outperforms wire sizing. The delay can be greatly affected by the line inductance, particularly low-resistance materials with fast signal transitions. Ismail and Friedman extended previous research in repeater insertion by 30 Figure 3.3 Uniform repeater system driving a distributed RC interconnect. Figure 3.4 Wire sizing in a repeater insertion system considering the line inductance [34]. They showed that on-chip inductance can decrease the delay, area, and power of the repeater insertion process as compared to an RC line model [35]. Interconnect sizing within a repeater system affects two primary design parameters, the number of repeaters and the optimum size of each repeater as shown in Fig. 3.4. Different tradeoffs in sizing long inductive interconnect driven by an optimum repeater system are investigated in this paper. Design criteria are developed to determine the optimum width, while considering different design objectives, such as the delay, power, and area. 3.3 Propagation delay The interconnect resistance decreases with increasing line width, increasing Lint /Rint the ratio between the line inductance and resistance. An increase in Lint /Rint decreases the number of inserted repeaters to achieve the minimum propagation delay. For an RC line, the minimum signal 31 propagation delay decreases with wider wires until no repeaters should be used. Wire sizing outperforms repeater insertion in RC lines. Expressions for the optimum number of repeaters kopt−RC and the optimum repeater size hopt−RC [34] are s Rint (Wint )Cint (Wint ) 1 kopt−RC (Wint ) = ∗ (3.1) 2.3R0 C0 [1 + 0.16(TLint /Rint (Wint ))3 ]0.24 s hopt−RC (Wint ) = R0 Cint (Wint ) 1 ∗ Rint (Wint )C0 [1 + 0.16(TLint /Rint (Wint ))3 ]0.3 where (3.2) s TLint /Rint (Wint ) = Lint (Wint )/Rint (Wint ) R0 C0 (3.3) C0 and R0 are the input capacitance and output resistance of a minimum size repeater, respectively. Rint (Wint ) and Cint (Wint ) are the interconnect line resistance and capacitance as functions of the interconnect width. For a copper interconnect line, low k dielectric material, R0 = 2k ohms , and C0 =1fF,kopt−RC is determined from (3.1). For different line lengths l, the optimum number of repeaters kopt−RC is illustrated in Fig. 3.5. It is shown in the figure that for an RC line, the optimum number of repeaters which minimizes the signal propagation delay decreases with an increase in the line width for all line lengths. The number of repeaters reaches zero (or only one driver at the beginning of the line) for an interconnect width=3 mm and 4 mm for l=5mm and 10 mm, respectively. For widths greater than 4 mm, the wire should be treated as one segment. A repeater system should not be used above a certain width for each line length. The line capacitance per unit length increases with line width. As the number of inserted repeaters decreases with wider lines, a longer line section is driven by each repeater. An increase in the section length and width increases the capacitance driven by each repeater. To drive a high capacitive load, a larger repeater size is required to decrease the overall delay. As shown in Fig. 3.6, the optimum repeater size hopt−RC is an increasing function of line width. The minimum signal propagation delay of an optimum repeater system decreases with increasing line width as the total gate delay decreases. For an interconnect line, the total signal propagation delay is tpd−total (Wint ) = kopt−RC (Wint ) ∗ tpd−section (Wint ) (3.4) Where tpd−section (Wint ) is the signal delay of each RC section as a function of the interconnect width.  1.35  e2.9ζ + 0.74(Rtr (Wint )Csection tpd−section (Wint ) + Rsection (Wint )CL (Wint ) ωn tpd−section (Wint ) = +R (W )C (W ) + 0.5R (W )C (W )) tr int L int section int section int (3.5) 32 Figure 3.5 Optimum numbers of repeaters for minimum propagation delay for different line widths. where   ωn ∗ (0.5R section (Wint )Csection (Wint )+ 2 ζ= C (W )R (W ) + C (W )(R section int tr int int L section (Wint ) + Rtr (Wint ))) 1 ωn = p Lsection (Wint )(Csection (Wint ) + CL (Wint )) CL (Wint ) = Csection (Wint ) + hopt−RC (Wint )C0 , Rtr (Wint ) = R0 (Wint ) , hopt−RC (Wint ) , (3.6) (3.7) (3.8) (3.9) Rsection (Wint ) = Rline (Wint ) , kopt−RC (Wint ) (3.10) Lsection (Wint ) = Lline (Wint ) , kopt−RC (Wint ) (3.11) Csection (Wint ) = Cline (Wint ) , kopt−RC (Wint ) (3.12) 33 Figure 3.6 Optimum repeater size for minimum propagation delay for different line widths. The minimum delay [obtained from (3.4)] is shown in Fig. 3.7 as a function of interconnect width. An increase in the inductive behavior of the line and a reduction in the number of repeaters decrease the minimum signal propagation delay that can be achieved by a repeater system. The signal delay for different line lengths is shown in Fig. 3.8. The lower limit in the propagation delay decreases with increasing line width until the number of repeaters is zero. For a system of repeaters, there is no optimum width at which the total propagation delay is minimum. Rather, the delay is a continuously decreasing function of line width. The propagation delay with no repeaters in an RC line produces a smaller signal propagation delay than using any number of repeaters with any repeater size. For RC interconnect, wire sizing outperforms repeater insertion, producing a smaller signal propagation delay. This characteristic is an important trend when developing a wire sizing methodology for a repeater system. 3.4 Power dissipation The power characteristics of a repeater insertion system are discussed in this section. The work described in [25 - 30] considers power and area as design constraints. The line inductance, however, has not been considered. In Section 3.4.1, the factors that affect the short-circuit power while considering the line inductance of an interconnect driven by a repeater system is discussed. The dependence of the dynamic power on wire size is described in Section 3.4.2. The total transient power dissipation characteristics are summarized in Section 3.4.3. 34 Figure 3.7 Minimum signal propagation delay as a function of interconnect width (l=5mm). 3.4.1 Short-circuit power dissipation Short-circuit current flows when both transistors within an inverting repeater are simultaneously on. In interconnects thin lines cause less dynamic power and higher short-circuit power to be dissipated. Hence for thin resistive lines, the number of repeaters can be large. In this work the shortcircuit power dissipation in all repeaters along a line is considered. Short-circuit power depends on both the input signal transition time and the load characteristics. A simple and accurate expression for the short-circuit power dissipation of a repeater driving an RC load has been presented in [32] Psc−section = 1 ∗ Ipeak tbase Vdd f, 2 (3.13) Where Ipeak is the peak current that flows from Vdd to ground, tbase is the time period during which both transistors are on, Vdd is the supply voltage, and f is the switching frequency. Tang used this expression to characterize the short-circuit power of an RC load [40]. A closed form expression for the signal transition time at the far end of an RC line has been described in [41 - 43]. Increasing the line width has two competing effects on the short-circuit power. As described in [43], the short-circuit power decreases when a line is under-damped. For wide interconnect, the short-circuit power increases as the line capacitance becomes dominant. Furthermore, increasing the length of the section by reducing the number of repeaters increases the short-circuit power of each section due to the higher section impedance. 35 Figure 3.8 Minimum signal delay as a function of interconnects width for different line lengths. The total short-circuit power of a repeater system is Psc−total = kopt−RLS ∗ Psc−section 3.4.2 (3.14) Dynamic power dissipation The dynamic power is the power required to charge and discharge the various device and interconnect capacitances. The total dynamic power is the summation of the CV 2 f power from the line capacitance and the repeaters. Pdyn−total = Pdyn−line + Pdyn−repeaters ; (3.15) 2 Pdyn−repeaters = kopt−RC ∗ hopt−RC ∗ C0 ∗ Vdd ∗f (3.16) 2 Pdyn−line = Cint ∗ Vdd ∗f (3.17) where Pdyn−repeaters depend on both the number and size of each repeater. While the number of repeaters decreases, the repeater size increases. The dynamic power dissipated by a line increases with greater line capacitance (as the line width is increased). The dynamic power of the repeaters, however, decreases since fewer repeaters are used with wider lines. As shown in Fig. 3.9, the total dynamic power is a minimum for thin interconnect. The effect of sizing the interconnect on the total transient power dissipation is discussed in next subsection. 36 Figure 3.9 Dynamic power dissipation as a function of interconnect width for l=20 mm. 3.4.3 Total power dissipation In order to develop an appropriate criterion for determining the optimal interconnect width between repeaters, the total transient power dissipation of a system needs to be characterized. The total transient power can be described as n Ptotal (Wint ) = Vdd f [kopt−RC (Wint )(1/2Ipeak (Wint )tbase (Wint ) + hopt−RC (Wint )Vdd C0 ) + Vdd Cint (Wint )] (3.18) All of the terms in (3.18) are functions of the line width except Vdd , C0 , and f. As described in subsections 3.4.1 and 3.4.2, both transient power components decrease with increasing line width, thereby decreasing the total power until the line capacitance becomes dominant. For an RC interconnect, fewer repeaters are necessary to drive a line while achieving the minimum propagation delay [34]. For an inductive interconnect, the line capacitance is typically larger than the input capacitance of the repeaters. Increasing the width reduces the power dissipation of the repeaters and increases the power dissipation of the line. The reduction in power dissipated by the repeaters overcomes the increase in the interconnect power until the line capacitance dominates the line impedance. After exceeding a certain width, the total power increases with increasing line width. The total power dissipation as a function of line width for different interconnect lengths is shown in Fig. 3.10. As the line width increases from the minimum width (i.e., 0.1. mm in the example technology), the total power dissipation is reduced. A minimum transient power dissipation therefore 37 Figure 3.10 Total transient power dissipation as a function of interconnects width. occurs with thin interconnect (see Fig. 3.10). The minimum transient power dissipation is obtained from ∂Ptotal =0 ∂Wint (3.19) where ∂Ptotal /∂Wint is a nonlinear function of Wint. Numerical methods are used to obtain values of Wint for specific interconnect and repeater parameters. Over a range of practical interconnect width, the total transient power increases as shown in Fig. 3.10. As the line length increases, the total power dissipation rapidly increases with increasing line width as the interconnect capacitance becomes dominant. 3.5 Area of the repeater system For a specific interconnect width within a repeater system, the optimum number and size of the repeaters can be determined. Previous studies on repeaters have considered the silicon area, ignoring the metal layer resources [25 - 30]. Long global interconnects are typically wide and require shielding. In order to develop appropriate criteria for considering the area overhead, both the transistors and interconnect are need to be characterized. The area of the interconnect metal can be described as Aline (Wint ) = Wint l. 38 (3.20) The interconnect metal area is illustrated in Fig. 3.11 as a function of the interconnect width. For CMOS inverters used as repeaters, the total silicon area of the active repeaters is Arepeater (Wint ) = 3kopt−RC (Wint )hopt−RC (Wint )L2n (3.21) where Ln is the feature size. The PMOS transistor of each repeater is assumed to be twice the size of the NMOS transistor to achieve a symmetric transition. For an RC line, fewer repeaters are needed to minimize the propagation delay, reducing the silicon area as shown in Fig. 3.12. The active repeaters and the passive interconnects utilize different layers, making the area overhead of both elements independent, particularly for interconnects routed on the upper layers. A weighted product in (3.22) is used as a criterion to consider both area parameters in sizing the interconnect, Aproduct (Wint ) = Arepeater (Wint )wr Aline (Wint )wl (3.22) where wr and wl are the weights of the two cost functions. For wr = wl = 1; the area product of the system increases with different interconnect widths as shown in Fig. 3.13. Despite the reduction in repeater area with increasing interconnect width, the increased area occupied by the interconnect increases the overall area of the repeater system. Figure 3.11 Interconnect area as a function of interconnects width for different line lengths. 39 Figure 3.12 Total area of the repeaters as a function of the interconnect width for different line lengths. Figure 3.13 Product of interconnect and transistor area as a function of the interconnect width for different line lengths. 40 3.6 Design criteria for interconnect within a repeater system In this section, different designs criteria to size interconnect within a repeater system are developed. The optimization criteria can be applied to different repeater systems. In subsection 3.6.1, a constrained system is considered. Application to an unconstrained system is discussed in subsection 3.6.2. 3.6.1 Constrained systems For a constrained system, there is a delay target (minimum speed or maximum delay) and/or a limit on the power dissipation. The minimum signal propagation delay determines a lower limit on the line width while the maximum power dissipation determines the upper limit. If the minimum limit on the line width obtained from (3.4) is greater than the maximum width obtained from (3.18), both limits cannot be simultaneously satisfied and one of the design constrains needs to be relaxed. If the minimum limit is lower than the maximum limit, both constraints can be satisfied. For a constrained system, the transistor or metal area has an upper limit. The two factors change differently with the width; therefore, there is a tradeoff between the two area components. 3.6.2 Unconstrained systems For an RC line, there are four criteria to size interconnect in an unconstrained system. The first criterion is for minimum power while sacrificing speed. The optimum solution for this criterion is obtained from (3.19). The second criterion is for minimum delay. As no optimum interconnect width exists for minimum propagation delay, the practical limit is either the maximum repeater size or no repeaters, and whichever produces a tighter constraint. The constraint in this case is either the maximum repeater size or the maximum line width. The optimum number of repeaters for a target line width is determined from [34]. If not possible, no repeaters should be used and the design problem reduces to choosing the width of a single section of interconnect [31]. The third and fourth criteria are presented in the following subsections. In Section 3.6.2.1, the Power-Delay-Product (PDP) as a criterion to size an interconnect within a repeater system is described. The Power-Delay-Area-Product (PDAP) is introduced in Section 3.6.2.2 as an alternative design criterion. 3.6.2.1 Power-delay-product design criterion The PDP criterion satisfies both the power dissipation and speed with no constraints on the area. From the discussions in Sections 3.2 and 3.3, the minimum signal propagation delay of an RC interconnect driven by a repeater system decreases with increasing line width. Alternatively, the total 41 transient power has a global minimum at a narrow width. Over the entire range of line width, the total transient power increases with increasing line width. At a line width smaller than the line width for minimum power, the power and delay both increase. An upper limit on the line width is reached where the minimum propagation delay of a repeater system is attained. Beyond that limit, a single segment sizing criterion should be used to optimize the width according to a cost function (i.e., delay [1] or power [4143]). Between these two limits, a tradeoff exists between the power dissipation and signal propagation delay. A single expression for the Power-Delay-Product (PDP) as a function of the interconnect width is P DP (Wint ) = Ptotal (Wint )wp ∗ tpd−total (Wint )wd , (3.23) where wp and wd are the weights of the cost functions. A local minimum for the PDP exists for each line length. The minimum power delay product is obtained by numerically solving the nonlinear equation, ∂P DP =0 ∂Wint (3.24) The weights wp and wd describe which design objective is more highly valued. 3.6.2.2 Power-delay-area-product design criterion The criterion described in Section 3.6.2.1 does not include the area of the system as a design parameter. In order to include the area of the system, the PDAP criterion is introduced. This criterion satisfies both the power dissipation and speed while considering area. The Power-Delay-Area-Product (PDAP) can be used as a criterion to size the interconnect. A single expression for the PDAP as a function of the interconnect width is P DP (Wint ) = Ptotal (Wint )wp ∗ tpd−total (Wint )wd ∗ Arepeater (Wint )wr ∗ Aline (Wint )wl (3.25) A local minimum for the PDAP exists for each line length. The minimum PDAP is obtained by numerically solving the nonlinear equation, ∂P DP =0 (3.26) ∂Wint 3.7 Application of interconnect design methodology The four criteria are applied to a 65nm CMOS technology to determine the optimum solution for different line lengths. No limit on the maximum buffer size is assumed. In order to characterize the line inductance in terms of the geometric dimensions, an interconnect line shielded by two ground lines is assumed. An interconnect line with resistance per square R¤ = 250mW/¤, capacitance per unit length for minimum width CW min =66 fF/mm, and inductance per unit length for minimum width LW min =1nH/mm is used. For a repeater system with the following characteristics, C0 = 1fF and wp = wd 1; the optimum solution for each criterion is listed in Table 3.1. A clock signal with a 20 ps transition 42 l = 5mm: Wint (µm) Number Of Repeaters Repeater Size (of Minimum) Minimum delay (ns) Totol Increase (times) Power (mW) Total Increase (Percentage) l = 5mm: Wint (µm) Number Of Repeaters Repeater Size (of Minimum) Minimum delay (ns) Totol Increase (times) Power (mW) Total Increase (Percentage) Minimum Power No repeater Minimum PDP 0.8 1 43.3 0.157 2 1.73 0 2.1 0 61.2 0.051 1 1.98 14.5 2.1 0 61.2 0.051 1 1.98 14.5 0.8 5 43.2 3.87 19.36 5.2 0 20 0 225.6 0.19 1 21.31 310 3.9 1 80.7 0.43 1.26 7.58 45.7 Table 3.1 Uniform repeater system for different optimization criteria time ramp input signal and 250MHz frequency is used to determine the propagation delay and power dissipation. The optimum line width for each design criterion is listed in the first row for each line length. The optimum number and size of the repeaters for each line width is listed in the second and third row of each line length. The per cent increase in the minimum propagation delay based on the optimum power and PDP as compared to no repeaters is also listed. The per cent increase in the total transient power dissipation is provided. For an l = 5mm line, the optimum interconnect width for both minimum PDP and no repeaters is the same, producing a 14.5% increase in power as compared to the optimum width for minimum power and a reduction of 68% as compared to the optimum width for minimum signal propagation delay. For short interconnects, few repeaters are necessary to produce the minimum propagation delay. For longer interconnect, an increase in the line capacitance rapidly increases the power dissipation, while the minimum propagation delay decreases more slowly. For l=15mm, the optimum solution that minimizes PDP increases the delay by 1.26 rather than 20 times for the solution for minimum power. The power increases by 45% rather than 3.1 times for the no repeater solution. Optimizing the interconnect to produce the minimum power delay produces a smaller increase in both the power and delay as compared to separately optimizing either the power or delay. A reduction in the minimum propagation delay of 89% and in the power dissipation of 65% is achieved if the optimum width for the minimum PDP is used rather than the optimum width for either minimum power or no repeaters. 43 3.8 Need for a better approach In this chapter various aspects of buffer insertion technique for interconnect modeling including delay reduction; power consumption and area consumed by repeaters in system are discussed. Figure 3.7 depicts the increase in propagation delay with decreasing technology size. Figure 3.5 shows the exponential increase in the optimum number of buffer required for different interconnect length with decreasing technology size. Hence even an optimum number of buffers are not enough to reduce the enormous increase in the propagation delay. It is observed from graph in figure 3.9 that with decreasing interconnect width total dynamic power and interconnect power is decreasing but the power consumed by repeaters is increasing. This proves to be a major factor in power optimization. One of the major limitations of buffer insertion is the increasing number of repeaters in system. As it is shown in chapter 1, figure 1.2 shows the increasing percentage of total buffered nets in every technology node. Similarly figure 1.3 shows the increase in buffered cell with each next technology. Hence buffers are occupying a major portion of total area in the system. Similar results are shown in figure 3.12, which shows the exponential increase in area consumed by buffers for different interconnect length. All these factors are not in favor of buffer insertion for interconnect modeling. Thus a major breakthrough is needed to handle interconnects. Hence keeping in mind of all the problems being faced and to be coming with buffer insertion, in this thesis, an alternate to buffer is proposed and tried analyzing the results. In the new approach buffer is replaced by Schmitt trigger and analyzed all the above mentioned factors in next chapter. 44 Chapter 4 Schmitt Trigger as an alternate to Buffer 4.1 Classical Schmitt Trigger The classic Schmitt trigger is implemented using an op-amp with two resistors to conduct a regenerative feedback [61]. Schmitt triggers are typically built around comparators, connected to have positive feedback instead of the usual negative feedback. For this circuit the switching occurs near ground, with the amount of hysteresis controlled by the resistances of R1 and R2: circuit representation of Schmitt trigger is shown in figure 4.1. Figure 4.1 Schmitt trigger implementation with comparator The comparator gives out the highest voltage it can, +VS , when the non-inverting (+) input is at a higher voltage than the inverting (-) input, and then switches to the lowest output voltage it can, −VS , when the positive input drops below the negative input. For very negative inputs, the output will be low, and for very positive inputs, the output will be high, and so this is an implementation of a ”non-inverting” Schmitt trigger. 45 For instance, if the Schmitt trigger is currently in the high state, the output will be at the positive power supply rail (+VS ). V+ is then a voltage divider between Vin and +VS . The comparator will switch when V+ = 0 (ground). Current conservation shows that this requires: Vin Vs =− R1 R2 (4.1) 1 and so Vin must drop below - R R2 ∗ Vs to get the output to switch. Once the comparator output 1 has switched to -VS , the threshold becomes + R R2 ∗ Vs to switch back to high. So this circuit creates a 1 switching band centered around zero, with trigger levels ± R R2 ∗ Vs . The input voltage must rise above the top of the band, and then below the bottom of the band, for the output to switch on and then back off. If R1 is zero or R2 is infinity (i.e., an open circuit), the band collapses to zero width, and it behaves as a standard comparator. The output characteristic is shown in the picture on the right. The value of R1 the threshold T is given by R ∗ Vs and the maximum value of the output M is the power supply rail. 2 4.2 Hysteresis in Schmitt Trigger A phenomenon wherein two (or more) physical quantities bear a relationship which depends on prior history. More specifically, the response Y takes on different values for an increasing input X than for a decreasing X. If one cycles X over an appropriate range, the plot of Y versus X gives a closed curve which is referred to as the hysteresis loop. The response Y appears to be lagging the input X. Hysteresis occurs in many fields of science. Schmitt trigger also have this property. Reason for hysteresis in Schmitt trigger is dual threshold voltage. Once the input voltage crosses one of the threshold voltages, output remains above that voltage untill it crosses the other threshold voltage. This delay results in the hysteresis curve of Schmitt trigger. figure 4.2 shows the hysteresis in classical schmitt trigger. 4.3 CMOS Schmitt Trigger The CMOS Schmitt trigger, along with its transfer characteristics is shown in figure 4.3. In bipolar technology, p-n-p transistors are much slower than their n-p-n counterparts [61], and the bipolar prototype for the whole circuit of Fig. 4.3 (a) is not known. A bipolar Schmitt trigger includes an n-p-n differential pair loaded with a resistor. The circuit of Fig. 4.3(a) includes two similar subcircuits (M1, M2, M3 and M4, M5, M6). Each of them is a highly nonlinear load for the other. However, as shown subsequently, at each transition point one subcircuit can be considered as a linear resistive load for the other. In the circuit of Fig. 4.3 (a), the bottom circuit MI, M2, M3 (which is called here the N-subcircuit), is loaded by the top circuit, M4, M5, M6 (P-subcircuit). As a result of the circuit symmetry, the inverse statement is also valid. To obtain the voltage-current characteristics of these nonlinear loads, one can take, for example, the N-subcircuit, apply a voltage source VO, and 46 Figure 4.2 Hysteresis in conventional Schmitt trigger. Figure 4.3 47 calculate the source current IO, assuming a constant voltage VG at the gates of M1 and M2 [Fig. 4.4 (a)]. Figure 4.4 N-subcircuit driven by a voltage source: (a) circuit; (b) current-voltage characteristic; (c) superposition of N- and P-subcircuit characteristics. When the voltage V, is very small, transistor M3 will be off, and M1 and M2 are in the triode mode of operation. The current I, is equal to I = 2K1 (VG − VT N ) ∗ VN (4.2) If we consider transistor M1 , or I = 2K2 (VG − VN − VT N )(Vo − VN ) (4.3) if one considers M2 Here k1 = 0.5(µn Cox )(W/L), as usual, and VT N is the threshold voltage of n-channel transistors. For pchannel transistors, one has to use µp and VT P . It is assumed in (4.2) and (4.3) that VG > VT N . For the triode mode of operation, VN << VT N and the last equation can be simplified to I = 2K2 (VG − VT N )(Vo − VN ) (4.4) from 4.2 and 4.4 one can obtain that VN = VO 48 k2 k2 (4.5) and I= 2K1 K2 (VG − VT N ) Vo k2 (4.6) from (4.6) one can find that RLN = [ k1−1 + k2−1 ∂I −1 ] = ∂Vo 2(VG − VT N ) (4.7) It is seen from (4.5) and (4.7) that, in this part of the subcircuit operation, transistors M1 and M2 may be considered as a series connection of two resistors. When Vo , increases, M2 enters into saturation (pinch-off). Then I,is determined, depending on the considered transistor, or by I = 2k1 [VG − VT N − (VN /2)]VN (4.8) I = k2 (VG − VN − VT N )2 (4.9) or from 4.8 and 4.9 one can find that r VN = (VG − VT N )(1 − k1 ) k1 + k2 (4.10) and does not depend on Vo . This means [Fig. 4.4(b)] that when the voltage V, achieves the value of VoS = VG − VT N (4.11) then current Ibecomes constant, equal to IN = k1 k2 (VG − VT N )2 k2 (4.12) Yet, an additional increase of Vo will gradually introduce some changes. When Vo achieves the value of r k1 VoT = VG − (VG − VT N ) (4.13) k1 + k2 then transistor M3 will be turned on, V . starts to increase again, and the current I, is diminishing. When V, becomes equal to p VoC = VG + (VG − VT N ) k1 /k3 (4.14) transistor M2 will be completely turned off and I, becomes equal to zero. At this instant, voltage VN will be equal to VG - VT N and M1 is entering into saturation. Transistor M1 cames the current IN = k1 (VG − VT N )2 (4.15) which is completely intercepted by M3. Additional increase of Vp up to VDD does not bring any changes and completes the current-voltage characteristic of the N-subcircuit. 49 4.4 Low Voltage Schmitt Trigger With shrinking technology, power consumption is increasing in all CMOS devices and hence low voltage and low power designs of Schmitt trigger have been proposed. Fig. 4.5 shows the 1 V Schmitt trigger circuit. In this design, a dynamic body-bias is applied to a simple CMOS inverter circuit, whereby the threshold voltages of the two MOSFETs can be changed, thus changing the switching voltage. The operation of the circuit of Fig. 4.5 can be described as follows. First, the values of bias voltages Vbias,p and Vbias,n are, respectively, set externally to values (-—Vthp3 + 0.1) V and (—Vthn3 - 0.1)) V. This ensures that the drain voltage magnitudes of the MOSFETs Mp3 and Mn3 (body voltage magnitudes of the MOSFETs Mp1 and Mn1) will have a value of +0.1 V minimum, and 0.1 V maximum, respectively, when the transistor is conducting. This will limit forward body-bias in transistors Mn1 and Mp1 to 0.4 V. A forward bias greater than 0.4 V may trigger latch-up in a CMOS circuit. When a low value signal is applied to Vin , Vout2 goes low. Vout2 provides zero forward bodybias to the transistors of Mn1 through Mn3 operating in linear region and a forward bias of 0.4 V to Mp1 through Mp3 operating in saturation region. The substrate of transistor Mn1 is biased at -0.5 V and its threshold voltage now corresponds to the value at zero substrate bias, Vtho,n1 , while the substrate of transistor Mp1 is biased at +0.1 V with its threshold voltage corresponding to +0.4 V forward-bias value, Vth,p1 . Transistor Mp1 remains on and Mn1 remains off until Vin increases to a certain voltage Vhl , at which output, Vout1 switches from a high to a low value and Vout2 switches from a low to a high value. Since Mn1 substrate is zero body-bias, its threshold voltage Vtho,n1 is higher than the value for forward body-bias. Hence, a higher voltage is needed to turn Mn1 on. For a ramp input, this results in a time delay t1, as Vout1 goes to a low value and Vout2 goes to a high value of VDD. This provides a 0.4 V forward body bias to Mn1 through the transistor Mn3 operating in saturation at the end of the switching transient period. A zero bodybias is now provided to Mp1 through the transistor Mp3 operating in the linear region at the end of the switching transient. Transistor Mp1 is now off and Mn1 remains on until Vin decreases to a certain voltage Vlh , at which output, Vout1 switches from low to high and Vout2 switches from high to low. Since Mn1 has forward substrate body-bias, a lower voltage is now needed to turn it off. This results in a time delay t2 for a ramp input. The different switching voltage or switching time causes the hystersis. Vout1 is buffered by an Mp2-Mn2 inverter, which provides high fan-out capability. Thus, output is taken at the Vout2 terminal. The circuit of 0.4V Schmitt trigger is shown in figure 4.6 and the output voltage curve is shown in figure 4.7. 4.5 CMOS buffer A buffer is designed with 2 CMOS inverters placed back to back as shown in figure 4.8. Buffer is designed with minimal lambda parameters for 65nm technology by keeping Wp = 3Wn to ensure equal rise and fall time. Second inverter is four times the size of first one to meet the current carrying ability. 50 Figure 4.5 1 V CMOS Schmitt trigger circuit Figure 4.6 0.4 V CMOS Schmitt trigger circuit derived from 1 V Schmitt trigger 51 Figure 4.7 Measured hysterisis characteristics of 0.4 V CMOS Schmitt trigger circuit, and measured input-output waveform characteristics a Measured hysterisis characteristic of 0.4 V CMOS Schmitt trigger circuit b Measured input-output (Vin-Vout2) waveform characteristics Figure 4.8 CMOS buffer. 52 4.6 Schmitt trigger as an alternate to buffer Insertion It has been discussed in chapter 3 that buffers are used for the purpose of signal restoration and delay reduction. The most basic form of an interconnect is a linear interconnect with no neighbors. When we consider only one linear interconnect RC delay is the major factor deciding the signal propagation delay. Hence buffers have to handle only signal delay. In this thesis, Schmitt trigger as a replacement to buffer is studied. Initally the focus is on a linear interconnect and the effect of replacing buffer with Schmitt trigger is studied. Further since interconnects are also organized in groups to act as address or data buses effect of Schmitt trigger in buses is also studied. In particular any possible advantage of Schmitt trigger in mitigating signal cross talk is studied in detail. As an example of a bus a typical 4 bit bus is shown in figure 4.9. Buffers are placed at regular distances between transmitter and receiver. The interconnects segments in between them are considered to be RC models. The total delay between transmitter and receiver is the sum of RC delay of all the elements and switching time of buffers. Figure 4.9 4 bit bus with buffers to restore signals. In deep submicron technology, parasitics play a noticeable role in deciding the delay and waveform shape. RC delay or Elmore delay [42] becomes the main factor the total delay. Crosstalk effect of adjacent signals increases the switching activity an in turn delay. Delay reduction in buses is dealt with various bus coding techniques [62, 63, 64, and 66]. In all these techniques extra hardware is added before transmitter and data bits are encoded to have minimum switching activity [67] and thus delay. In this process some amount of delay but at the cost of extra hardware is saved. We have used a Schmitt trigger in place of the conventional buffer due to following reasons: 53 • Schmitt trigger can act as a signal restoring circuit; this is the main reason why we have looked into the approach of using Schmitt trigger as an alternate of buffer in interconnects as a data restoring element. • Switching time of both the buffer as well as Schmitt trigger is same for a certain DSM technology, but lower threshold of the Schmitt trigger allows the reduction in rise time and hence saves in terms of total delay. Although the savings in rise time delay are of few ps only, but when we consider slowly rising signals this saving is very significant. • With the introduction of Schmitt trigger, all sorts of bus coding techniques can be neglected and thus resulting in reduction of extra hardware and power consumption by those transistors. • A low threshold buffer cannot be used just to lower the triggering voltage level, as it will result in non uniform duty cycle, which is never desired in data transmission. While a low threshold Schmitt trigger doesn’t hamper the duty cycle of the waveform due to presence of dual threshold. • Higher noise margin of Schmitt trigger allows the circuit to handle the larger noise glitches. Thus making the proposed approach more efficient. With this noise margin all the 6 types of crosstalk noises are removed quite effectively. • Reduced noise glitches result in lesser power consumption and hence help in reducing the total power consumed. • Schmitt trigger has 15% more cell area, when fabricated. But major reduction in power and delay and better noise handling justifies the extra area consumed. 4.7 Conclusions In this chapter a new circuit element called Schmitt trigger which has the property of hysteresis and dual threshold to switch between two logic levels have been studied. The element is studied as an alternative to Buffers in conventional approaches. It has been observed that Schmitt trigger outplays buffer on the following points: • Programmable dual threshold property of Schmitt trigger allows the designer to have lower thresholds for fast signal switching. • Lower thresholds are beneficial at the time of switching too, as it would not allow all the transistors to be in active or saturation mode. • Noise immunity of Schmitt trigger is more than buffer due to larger bandgap. In next chapter the simulation results with this replacement approach and then the results achieved would be discussed. 54 Chapter 5 Results and Discussion 5.1 NTRS 1997 predictions. Interconnect design has become a dominant issue in high-speed integrated circuits (ICs). With the decreased feature size of CMOS circuits, on-chip interconnect now dominates both circuit delay and power dissipation. The number of long interconnects doubles every three years [68], further increasing the importance of on-chip interconnect. The 1997 National Technology Roadmap for Semiconductors (NTRS ’97) [69] proposes aggressive goals for chip performance as CMOS devices approach 40 nm minimum feature sizes. Table 5.1 indicates some pertinent factors which have been adapted from [69] to reflect technology shrinkage of 0.7 per generation and a corresponding doubling of clock frequency every two generations. Although the clock frequency for the 250 nm technology generation in Table 5.1 is smaller than the NTRS ’97 value of 750 MHz, it rises nearly to the 3000 MHz value predicted for across-chip clock frequency for high-performance processor chips in the 40 nm generation. CV/I data for nMOSFETs from [70] shows that device scaling will be able to provide comparable decreases in gate delay of 0.7 per generation. MOSFETs can meet these higher clock frequencies requirements but chip performance will be limited by long, lossy lines to use Davidson’s terminology [71]. Since the RC time constants of interconnects remain the same when comparably scaled, interconnects increasingly dominate delay and cycle time as devices are scaled. To raise clock frequencies the effects of long, lossy wires must be reduced. Only two approaches are possible - reduce length or reduce loss. Interconnect length can be reduced by confining high-speed clocking to a limited area or by using repeaters to chop long wires into a series of short wires. Interconnect loss can be reduced by changing materials to improve resistivity or interconnect cross sections. If the dimensions of interconnect cross sections are doubled, the interconnect’s resistance per unit length will drop by a factor of four while the interconnect’s capacitance per unit length (Cint ) will remain the same. 55 Year Technology (nm) Fc (MHz) CV/I [82] (ps) Die Area (mm2) Chip edge length (mm) Logic transistor density (M/100mm2) 1997 250 500 7 300 17 1.8 1999 180 700 5 340 19 3.6 2003 130 1000 3.5 432 21 7.2 2006 90 1400 2.5 520 23 14.4 2009 60 2000 1.8 620 25 28.8 2012 40 2800 1.2 750 27 57.6 Table 5.1 Projected advances in CMOS chip performance It should be noted that the effects of long, lossy lines on delay are exacerbated by the NTRS ’97 projections of increased die area shown in Table 1. Corresponding chip edge lengths are also shown, assuming a square die. Increased areas allow many more transistors on a chip. A consistent set of logic transistor densities which double in every generation is also shown in Table 5.1. (These values are considerably smaller than the values NTRS ’97 assumes, decreasing from about half the value at 250 nm to about a third at 40 nm.) An advantage of CMOS technology has been the ability to increase processor performance by using more transistors instead of faster circuits. 5.2 Signal Propagation on a Linear Interconnect Various Interconnect models and their representations have been discussed in chapter 2, section 2.4. L, T, and Π shaped structures, as depicted in figure 2.9. When large interconnects are modeled into RC models they are divided into different smaller sections cascaded one after other. A typical RC interconnect model is shown in figure 5.1. Each RLC element has its own delay and glitch introduced in the output waveform. All simulations in this work use latest technology parameter model files from Predictive Technology Model (PTM) website( http://www.eas.asu.edu/ ptm/latest.html) have been used. Structure used for each simulation is shown in figure 5.2.Corresponding values of width, space between adjacent lines, thickness and height above the ground have been considered for simulations. Dimensions of the interconnect taken for simulations are given in Table 5.2. Corresponding values of Resistance(R), Inductance (L), Coupling capacitance (Ccouple ) and Ground capacitance(Cground ) are provided in Table 5.3. 5.2.1 Types of interconnects For simulation purposes different types of linear interconnect are taken into consideration. Three major types of such interconnect are: 56 Figure 5.1 An RC interconnect Figure 5.2 Interconnect structure used for simulations 57 Tech 180 nm 130 nm 90 nm 65 nm L (mm) 2 5 10 2 5 10 2 5 10 2 5 10 t (um) 0.45 0.65 1.25 0.45 0.45 1.20 0.30 0.45 1.20 0.20 0.35 1.2 h (um) 0.65 0.65 0.65 0.45 0.45 0.45 0.30 0.30 0.30 0.20 0.2 0.2 K 3.5 3.5 3.5 3.2 3.2 3.2 2.8 2.8 2.8 2.2 2.2 2.2 W (um) 0.28 0.35 0.80 0.20 0.28 0.60 0.15 0.20 0.50 0.10 0.14 0.45 S (um) 0.28 0.35 0.80 0.20 0.28 0.60 0.15 0.20 0.50 0.10 0.14 0.45 Table 5.2 Interconnect dimensions • Local interconnect • Intermediate Interconnect • Global Interconnect Local interconnects are the smallest length interconnects typically used to connect the consecutive logic blocks. While designing these interconnects the width is kept to be half of the height and thickness of the interconnect. Minimum spacing between two local interconnects is kept to be at least equal to the width of line. Interconnects up to 2mm length are considered in this category. Intermediate interconnects are the larger interconnects typically used to connect long distance placed logic blocks. These can have a length of around 5mm. While designing these interconnects the typical ratio in width, thickness and height is kept to be 4:9:6. Global interconnects are the longest interconnects possible on the chip. These are mainly used to provide power supplies to different parts of the chip. Typical length of these interconnects can be as long as 10mm also. The ratio of width, thickness and height is kept to be 4:8:3. In the first test case different RC elements of length 2mm,5mm and 10mm for 180nm technology are taken into consideration. A fast rising signal with operating frequency of 500MHz is fed at input end. Output is observed at output end for each length and shown in figure 5.3. In figure 5.3 waveform 1 is the input signal with operating voltage of 1.8V and frequency 500MHz. interconnects are considered to be only RC elements in this simulation. Waveform 2 is the output wave appearing at the end of 2mm long RC interconnect. It can be observed that the output is delayed and parabolic shaped due to Elmore delays. It is observed that though the output is maintaining 58 Tech 180 nm 130 nm 90 nm 65 nm L (mm) 2 5 10 2 5 10 2 5 10 2 5 10 R (ohms) 349 483 880 488 1242 2444 977 2444 4888 2200 5500 11000 L (nH) 3.6 9.72 16.5 3.6 10.5 21.2 3.8 10.6 22.4 3.99 10.9 23.2 Ccouple (fF) 94.8 246 435 122 312.5 612.5 107.2 268 536.3 107.2 268 536.2 Cground (fF) 33.2 104 214 34 82.5 173.4 38.2 97 194.2 38.2 97 194.2 Table 5.3 Interconnect Resistance, Inductance and Capacitance values the same frequency as of input signal its gets delayed in reaching the output. When waveform 3 is observed, which corresponds to a 5mm long, interconnect at same technology node with same input signal, it is found that due to larger R and C factors the signal is more delayed and deformed. To reach 50 percent of Vdd it has taken almost 40% of the clock cycle and same implies in reaching to 10% of Vdd while switching back to zero level. Thus there is an alarming issue due to such a high RC delay factors. The situation is worse in 10mm length. When waveform 4 is observed, which corresponds to 10 mm long interconnect, it can be seen that the output is not delayed by such a large time amount that it’s not even reaching the 50% of Vdd during the complete clock cycle. Same pattern is observed while switching from high to low. This situation makes the use of large interconnects almost impossible for data transmission, frequency mismatch can result in a large amount of data loss. When RLC models are to be considered, the situation becomes more complicated. Added mutual inductance adds some more distortions to the output signal. Effect of inductance on the same signal is shown in figure 5.4. By observing the output waveforms in figure 5.4 it can be observed that interconnect inductance adds noise glitches to the output signal. Hence the signal is delayed and due to added noise glitches it consumes more power. Same trends have been observed for smaller feature sizes also. Output results for same interconnect length for 90nm and 65nm technologies are simulated. Figure 5.5 and 5.6 show the results 59 Figure 5.3 Output end signals on a 2mm, 5mm and 10mm RC interconnect at 180nm technology. corresponding to 90nm and 65nm technology respectively. Thus it is concluded that in DSM with each next generation technology linear interconnects are facing the following problems • RC delay in the output signal • Noise induced due to interconnect inductance • Extra power consumption. In the incoming section, the conventional approach to handle these problem that is with buffer Insertion is discussed. 5.3 Effect of Buffer Insertion on Delay, Noise and Power Reduction With the continuous trend of Very Large Scale Integration (VLSI) circuits technology scaling down and frequency increasing, interconnect delay becomes a significant bottleneck in system performances. This trend is a result of increased resistance of the interconnect when feature sizes enter the nanometer era. From International Technology Roadmap for Semiconductors (ITRS) projection, interconnect delay can contribute to more than 50% of the delay when the feature size is beyond 180 nm. As 60 Figure 5.4 Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 180nm technology. Figure 5.5 Output end signals on a 2mm, 5mm and 10mm RLC interconnect at 90nm technology. 61 arrangements is shown in figure 5.7. Figure 5.7 Buffers inserted in an RLC interconnect. Operating voltage for 65nm technology is 1.2 volts and operating frequency is 1 giga hertz. One triangular and one fast rising signal as input signal are taken into consideration. Output of RC interconnect and buffered interconnects are observed in figures 5.8 and 5.9. Figure 5.8 Delay reduction in 2mm interconnect with triangular input. Figure 5.8 shows a triangular input being fed to a 2mm RC interconnect. Waveform 1 is the input triangular wave. Waveform 2 is the delayed output due to RC effects. Here it is observed that the signal is highly deformed and hence buffer insertion is required. Waveform 3 is the output of buffered interconnect, which is not deformed and also reaches the output end earlier that delayed wave. Hence it can be considered that buffer insertion is capable of reducing delay in VLSI interconnects. 63 Figure 5.9 Delay reduction in 2mm interconnect with square wave input. Further experiments are carried on longer interconnects of lengths 5 mm and 10mm. Figure 5.10 and 5.11 respectively show the reduced propagation time for these two lengths. It is observed from figures 5.10 and 5.11 that buffer insertion is quite a handful technique for delay reduction in VLSI interconnects. Simulations have been carried out on for different technologies for all the interconnect lengths mentioned above. Detailed statistical results are provided in table 5.4. It can be observed that from this table for all technology nodes buffer insertion is a useful technique which gives significant delay reduction. 5.3.2 Noise and Power reduction using Buffer Insertion Advances in integrated circuit technology have led to an increase in switching speeds of digital circuits. This increase is the primary reason why inductance induced noise (e.g., oscillation, delay, and crosstalk) is beginning to cause chips to fail. Thus, a great interest in inductance of on-chip signal lines is shown. Inductance is associated with a current loop. In a VLSI chip, when a single signal line switches, numerous current loops are formed through the interconnect substrate, power and ground lines. A conventional transmission line assumes only one current return path. We can use conventional transmission line analysis if we assume that there is no transient potential drop on the return paths and thus lump them together as a single terminal. The interconnect circuit and model are shown in Fig. 64 Figure 5.10 Delay reduction in 5mm interconnect with square wave input. Figure 5.11 Delay reduction in 10mm interconnect with square wave input. 65 Table 5.4 Propagation delay values for an interconnect of different length with sertion Technology 180nm 130nm 90nm Length = 2mm Delay without any element insertion (ps) 22.7 41.2 72.5 Delay in buffered Interconnect (ps) 16.5 31.76 53.14 % reduction with Buffer Insertion 27.1 23.5 26.54 Length = 5mm Delay without any element insertion (ps) 83.1 271.5 458.2 Delay in buffered Interconnect (ps) 64.5 198.5 338.36 % reduction with Buffer Insertion 22.3 27.4 26.2 Length = 10mm Delay without any element insertion (ps) 267.8 1047.2 1833.56 Delay in buffered Interconnect (ps) 192.1 717.9 1285 % reduction with Buffer Insertion 28.5 31.4 30.2 and without buffer in65nm 163.9 128.76 21.6 916.5 704.2 23.2 4127.5 2948 29.6 5.12. The driver resistance is modeled as a constant linear resistance, denoted by Rsource . The receiver can be one of the following: (i) a static gate, (ii) a transmission gate, (iii) a pass transistor, or (iv) a domino gate. The load can be modeled as a capacitance (in the case of a static inverter, domino gate, and non-conducting pass transistor or transmission gate) or a resistance conducting pass transistor or transmission gate) and is assumed constant and is denoted by Cload or Rload . Buffer is designed by cascading two inverters back to back by keeping the size of second buffer to be four times than first. Output of Inverter will remove all the glitches which are of magnitude less than Vdd/2. Hence first inverter will give a clean output with an opposite logic level and second inverter will bring it back to the original logic level with all the noise glitches removed. In this process the earlier property of delay reduction is still followed by the circuit. Hence we observe that buffer insertion is capable of removing unwanted inductive noise glitches occurring in linear interconnect. If the noise glitch is more than Vdd/2 then the signal will switch back to opposite logic level and come back resulting in extra switching. In next simulation the same input is fed to buffered interconnect. Here each buffer will remove the glitches occurring it their corresponding previous RLC elements and hence providing a clean output at the final output end. The simulations are shown in figure 5.13. 66 Figure 5.12 Figure 5.13 67 Reduced noise glitches result in less current in the circuit. This implies that for the same circuit operation we would be operating with lower power consumption. Detailed analysis of reduced power in the interconnect is shown in table 5.5. Table 5.5 Power consumption values for an interconnect of different length with and without buffer insertion approach. Technology 180nm 130nm 90nm 65nm Length = 2mm Power consumption in the interconnect (µW) 111.2 151.5 198.5 245.2 Power consumption in buffered Interconnect (µW) 89.6 121.6 153.6 177.6 % reduction with Buffer Insertion 19.5 19.9 22.8 27.7 Length = 5mm Power consumption in the interconnect (µW) 301.2 410.2 504.6 614.5 Power consumption in buffered Interconnect (µW) 231.1 340 417.5 497.28 % reduction with Buffer Insertion 23.3 18.1 17.3 19.1 Length = 10mm Power consumption in the interconnect (µW) 620.1 921.4 1114.6 1340 Power consumption in buffered Interconnect (µW) 531.6 715 907.6 1094.1 % reduction with buffer Insertion 14.4 22.4 19.1 18.4 While buffer insertion is a useful technique, as discussed in chapter 4, the limitations of buffer insertion technique have forced us to think of new algorithms and approaches. In next section the effect of using Schmitt trigger in place of buffer in linear interconnects would be studied. 5.4 Effect of Schmitt trigger on delay, noise and power reduction in Linear Interconnects In chapter 4, it was suggested that replacing buffers with Schmitt trigger may have an advantage. In this section effect of Schmitt trigger in interconnect delay, noise and power reduction would be analyzed and discussed. 68 5.4.1 Delay reductions with Schmitt trigger If an RC delayed waveform rising slowly to high level is considered, then a buffer will get triggered at Vdd/2 and then within the switching time of the buffer, output will be generated. However in case of Schmitt trigger, there are two thresholds that can be controlled by changing the W/L ratio of the transistors. Hence they can be kept to be much lower than Vdd/2 and higher than Vt of the transistors. With this approach, whenever the rising signal will reach the set threshold the output will switch to the high level within the switching time of the Schmitt trigger. For comparison with buffer insertion same input signal and interconnect model have been considered for Schmitt trigger. Simulations are shown in figure 5.14. It can be seen that waveform 1 is the input waveform and waveform 2 is the delayed waveform due to RLC effects. Waveform 3 is the output using buffer insertion and waveform 4 is the output waveform using Schmitt trigger. It can be observed that Schmitt trigger results in less propagation delay. Figure 5.14 Delay reduction using Schmitt trigger approach in 2mm interconnect with square wave input. Same trend is being followed for 5mm and 10mm long interconnects. Simulations results of these two interconnect are shown in figure 5.15 and 5.16 respectively. It is observed from figures 5.14 and 5.15 that Schmitt trigger replacement approach is more efficient than buffer insertion. Hence experiments for all these interconnect lengths for different tech69 Figure 5.15 Delay reduction using Schmitt trigger approach in 5mm interconnect with square wave input. Figure 5.16 Delay reduction using Schmitt trigger approach in 10mm interconnect with square wave input. 70 nology nodes have been conducted. Detailed results are provided in Table 5.6. Table 5.6 Propagation delay values for an interconnect of different length with buffer insertion and delay reduction using Schmitt trigger approach Technology 180nm 130nm 90nm 65nm Length = 2mm Delay in buffered Interconnect (ps) 16.5 31.76 53.14 128.76 Delay with Schmitt trigger approach (ps) 12.3 21.65 38.1 95.6 % reduction with Schmitt trigger 25.3 31.4 28.3 25.6 Length = 5mm Delay in buffered Interconnect (ps) 64.5 198.5 338.36 704.2 Delay with Schmitt trigger approach (ps) 46.1 148.8 236.8 514 % reduction with Schmitt trigger 28.5 25 30 27 Length = 10mm Delay in buffered Interconnect (ps) 192.1 717.9 1285 2948 Delay with Schmitt trigger approach (ps) 134.4 490.3 842.5 2098.2 % reduction with Schmitt trigger 29.8 31.7 34.5 28.2 5.4.2 Noise and power reduction with Schmitt trigger approach As it has been discussed in section 5.3, interconnects do suffer from inductive noise. Buffer insertion, along with reducing the signal propagation delay, is quite capable of handling inductive noises and thus reducing the power consumption too (table 5.5). Buffers are quite useful until we have noise glitches with magnitude less than Vdd/2. In these cases there won’t be any unwanted switch to the opposite logic level due to noisy signal. But when the glitches are of higher magnitude, unwanted switching to opposite level can occur due to triggering of buffer at Vdd/2. These switches always add to extra power consumption too. Noise reduction using Schmitt trigger approach is shown in figure 5.17. However, Schmitt trigger possesses dual threshold voltage. Hence to switch from on logic level to other the noisy signal has to cross both the thresholds. Thus more noise margin is obtained with Schmitt trigger. Extensive noise analysis has been carried out by introducing artificial noise at Vdd/2 of the input signal. The noise is generated by coupling the signal with a high frequency waveform. Thus glitches are produced at Vdd/2. When this waveform is fed to buffered interconnect, buffer will make unwanted switching from high to low due to instability at Vdd/2 and thus adding extra delay to 71 Figure 5.17 Noise reduction using schmitt trigger the output signal. Figure 5.18 shows that waveform 3, which is the output waveform of the buffered interconnect makes one incomplete transition from low to high and hence adding to delay as well as power consumption increments. When the same input is applied to Schmitt trigger only, on reaching Vdd/3 the signal switches to the opposite logic level. Since the glitches are not large enough to cover cross both the thresholds of Schmitt trigger, output signal doesn’t show any unwanted transitions. This is shown by waveform 2 in figure 5.18. Power reduction in Schmitt trigger approach is due to following reasons. • From the point of view of device operation Schmitt trigger is made of 6 MOS transistors. We have set our lower thresholds to be just above the threshold voltage of the transistor. If we consider the switching from lower to higher logic level, whenever the signal crosses the lower threshold, Schmitt trigger will make the signal to switch to higher level within its switching time. At the point of switching, 3 NMOS transistors are in on state while the 3 PMOS transistors are in cut off region. While in case of switching point of buffer, when switching occur, all the 4 MOS transistors are in saturation region. Thus we have a further reduction in static power consumption. Detailed data regarding the power consumption for various interconnect lengths for all technologies are provided in Table 5.7. 72 Figure 5.18 Behavior of buffer and Schmitt trigger towards a noisy signal. Table 5.7 Power consumption values for an interconnect of reduction using Schmitt trigger approach. Technology 180nm Length = 2mm Power consumption in buffered Interconnect (µW) 89.6 Power consumption with Schmitt trigger approach (µW) 70.8 % reduction with Schmitt trigger 21 Length = 5mm Power consumption in buffered Interconnect (µW) 231.1 Power consumption with Schmitt trigger approach (µW) 181.24 % reduction with Schmitt trigger 21.8 Length = 10mm Power consumption in buffered Interconnect (µW) 531.6 Power consumption with Schmitt trigger approach (µW) 416.8 % reduction with Schmitt trigger 23.5 73 different length with buffer insertion and 130nm 90nm 65nm 121.6 153.6 177.6 96.8 21.1 114.6 25.7 140 21.2 340 407.5 497.28 266.2 22.5 299.5 26.5 392.1 21.4 715 907.6 1094.1 572.3 20 697.1 22.1 834.1 23.3 5.5 Replacement of Buffers in Buses In deep-submicron technology, minimizing the propagation delay and power consumption on buses is the most important design objective in system-on-chip design. In particular, the coupling effects between wires on the bus that can cause serious problems such as crosstalk delay, noise and power consumption. One of the fastest growing areas in computing industry is the provision of high throughput low power digital signal processing (DSP) and communication systems. The recent trends show that the systems-on-chip (SOC) used for such systems are becoming increasingly more complex as they add more functionality, while having size, performance, and power consumption constraints. The basic problems affecting the issue are: • Minimizing the crosstalk delay. • Minimizing the power consumption on bus, Bus coding techniques are often used to reduce delay and power in interconnect buses. It is known that lowering transition-switching activity on the bit lines of bus leads to a significant reduction in the (dynamic) bus power consumption. 5.5.1 Signal Propagation in Buses In a data bus interconnects are laid side by side very close to each other. Parallel data bits are transmitted on them simultaneously. The values on adjacent interconnects always keep changing with new data values. Every rise or fall in the data value in one line always affects the adjacent lines due to the existence of coupling capacitance in between them. At 180nm technology this affect was very low because the interconnects were laid very far from each other. Thus the coupling capacitance value was very low or close to negligible. This implied that the crosstalk noise glitches were not so prominent in the transmitted signal. Also lower values of interconnect resistance and capacitances result in lower RC delay in 180nm technology. Thus we get the transmitted signal on buses to be only slightly delayed with negligible noise glitches. Figure 5.19 shows the data bits transmitted on all the bits of a 2mm long 8bit bus. However the case is not the same in case of current technologies. When we come to DSM technologies, feature size reduces a lot, which results in increased coupling capacitance and higher resistance values. These factors results in a tremendous increase in R and C values and hence a larger Elmore Delay and due to very less distance between adjacent interconnects the coupling capacitance value is very high. Hence crosstalk noise has a much larger magnitude. Figure 5.20 shows the data being transmitted on an 8 bit bus. 74 Figure 5.20 Data transfer on an 8 bit data bus in 65nm technology. 76 Thus we observe that data signal on data buses in DSM technologies get distorted due to delay as well as crosstalk noises. Various bus coding techniques have earlier been proposed to recover the signal from these effects. These are discussed in next subsection. 5.5.2 Definitions and Related Work 5.5.2.1 Low Power Coding The power dissipation in the bus depends on data transition activity. We refer to codes that reduce the average transition activity as low-power codes (LPCs). A simple but effective LPC is the bus-invert code in which the data is inverted and an invert bit is sent to the decoder if the current data word differs from the previous data word in more than half the number of bits. The effectiveness of businvert coding decreases with increase in the bus width .Therefore, for wide buses, the bus is partitioned into several sub-buses each with its own invert bit. It should be noted that bus-invert coding is nonlinear. It has been shown that linear codes do not reduce transition activity. 5.5.2.2 Crosstalk Avoidance Coding The delay of a wire in the bus depends on the transitions on the wire and wires adjacent to it. The worst-case delay of a wire is . The purpose of the crosstalk avoidance coding is to limit the worst-case delay to . Crosstalk avoidance codes (CACs) are proposed to reduce the worst-case delay by ensuring that a transition from one codeword to another codeword does not cause adjacent wires to transition in opposite directions. We refer to this condition as Forbidden transition (FT) condition. Shielding the wires of a bus by inserting grounded wires between adjacent wires is the simplest way to satisfy this condition. A forbidden transition code (FTC) that requires fewer wires that shielding has been proposed. There is no linear code that satisfies the FT condition while requiring fewer wires than shielding. The number of valid n-bit code words, MFT (n) satisfying the forbidden transition condition is, MFT (n) = Fn+2 ; where Fn is the Fibonacci sequence satisfying Fn = Fn−1 + Fn−2 with initial conditions F1 = F2 = 1. The worst-case delay can also be reduced to by avoiding bit patterns ”010” and ”101” from every codeword. We refer to this condition as Forbidden pattern (FP) condition. The simplest method to satisfy the FP condition is to duplicate every data wire whereby each data bit is transmitted using two adjacent wires. There is no linear forbidden pattern code (FPC) that satisfies the FP condition while requiring fewer wires than duplication. The number of codewords is given by, MFP (n) = 2 Fn+1 ; where Fn is the Fibonacci sequence. However, this increase in the number of codewords translates into at most one additional data bit that can be encoded for the same n. 77 5.5.2.3 Error Control Coding Error control is possible if the Hamming distance between any two codewords in the codebook is greater than one. If the minimum Hamming distance between any two codewords is two, then all single errors appearing on the bus can be detected. If the minimum Hamming distance is three, then all single errors can be corrected. Error detection is simpler to implement than error correction but requires retransmission of the data when an error occurs. In systematic codes, a few redundant bits are appended to the input bits to generate the codeword. Hamming code is an example of a linear systematic error correcting code. 5.5.2.4 CAC coding Schemes Coding involves mapping k data/information bits to n code bits resulting in an (n, k) code having a code rate of k/n. This mapping can be done either by involving memory or without memory (memory less). However, codes with memory, in general, suffer from error propagation at the decoder. Complex techniques, such as those employed in communication systems, are needed to ensure that error propagation is not catastrophic. Such techniques are prohibitively complex to be used for on-chip buses in the foreseeable future .Further, even when error propagation is not a concern; codes with memory tend to have significantly more complex encoders and decoders as compared to memoryless codes. So memory less codes are preferred over codes with memory. The design of memoryless codes boils down to determining a subset C of size/cardinality 2k consisting of n-bit vectors derived from the set S of all possible 2n n-bit vectors. The code words in C, referred to as the codebook, provide delay, power, or reliability benefits by satisfying specific constraints. For example, a (n, k, p) CAC achieves delay reduction by reducing the worst-case delay of a bus from (1+4 ) 0 to (1+p ) 0, where 0 is the delay of a crosstalk-free bus line, 0 is the ratio of the coupling capacitance to the bulk capacitance, and p = 1,2 or 3 is the maximum coupling. For large buses, it is impractical to encode all k bits at once due to the large complexity in the design and the implementation of the codec. Therefore, partial coding is employed wherein the bus is broken into sub-buses of smaller width which are encoded into sub-channels. This is because the hardware complexity grows exponentially with k. For example: a 32 bit bus broken into sub buses of size 3 and encoded into buses of size 4(3-4 encoding). The mapping between data words and codewords is shown as well. This coding scheme removes the FT condition present in the data. Using the partial coding technique described above, an array of ten of these simple coders could be used to implement a crosstalk immune 32-bit bus with 53 wires. When compared to a 63-wire shielded channel, this amounts to cutting ten wires from the channel for the cost of a handful of gates. 5.5.2.5 Relationship between delay and crosstalk The equations for computing the delay between two data tuples dt (data already available on the bus lines), dt+1 (data that have to be transmitted on the bus lines) at time instances ’t’ and ’t+1’ is 78 Figure 5.21 A 3 Bit to 4 wire coder given by equations (5.1) and (5.2). Equation (5.2) describes the expression for delay computation on the kth data line (wire) whereas equation (5.1) shows that for an n-bit line, delay between two data tuples is defined as the maximum value of delay when all the bit positions of two data tuples (dt, dt+1) are considered. D(dt , dt+1 ) = M ax(Dk (D(dt , dt+1 ))f or1 ≤ k ≤ n    ((1 + λ) M1 −λ M2 ) M1   Dk (D(dt , dt+1 ) = ((1 + 2λ) Mk −λ(Mk−1 + Mk+1 ))∗ Mk  CG ∗ RS   ((1 + λ) Mn −λ Mn−1 ) Mn (5.1) if k = 1, if 1 < k < n. (5.2) if k = n. Where Mk is defined as Mk = dkt+1 - dkt , D denotes delay function, Max denotes maximum value, RS and CG represent the total resistance of a particular wire and total capacitance between a line and the ground respectively. In equation (5.2) technology parameter λ is given by the ratio of coupling capacitance (CC ) to capacitance to ground (CG ) (i.e λ = CC /CG ) and n denotes the number of data lines. Table 5.7 shows different crosstalk classes defined in literature depending upon the transition activity between adjacent interconnects. For example consider, dt = 010 and dt+1 = 100. Then the delay D (dt , dt+1 ) on the bus computed using equations (5.1) and (5.2) is CG RS (1+3λ). For different 3-bit transitions (from dt to dt+1 ), the normalized delay (with respect to CG RS ) on the middle line and crosstalk class are given in Table 5.7. The classification of the delay into classes has been dealt in [62, 63]. Throughout the chapter the symbols ↑, ↓, - are used to indicate 0 → 1, 1 → 0 and 1→ 1 (or)0 → 0 bit transitions respectively. 79 Transitions(4k−1 , 4k , 4k+1 ) ↑ − ↑, ↓ − ↓, ↑ − ↓, ↓ − ↑, ↑ −− ,↓ −−, − − −, −− ↑, −− ↓ ↑↑↑, ↓↓↓ ↑↑ −, ↓↓ −, − ↑↑, − ↓↓ − ↑ −, − ↓ −, ↑↓↓, ↑↑↓, ↓↓↑, ↓↑↑ − ↑↓, − ↓↑, ↓↑ −, ↑↓ − ↑↓↑, ↓↑↓ Delay of Line 0 k 0 Crosstalk class Cc 0 1 1+λ 1 +2λ 1 +3λ 1 +4λ 1 2 3 4 5 6 Table 5.8 Delay and Crosstalk Classes for various 3-bit combinations (transitions) 5.5.2.6 Interconnect Power Model In general, four sources of power dissipation in any CMOS VLSI circuit is given by, Pavg = Pstatic + Pdynamic + Pleakage + Pshortckt (5.3) The major share in the overall power dissipation is that of dynamic power dissipation. Furthermore, the dynamic power dissipation in a CMOS VLSI circuit is given by 2 PDynamic = (X ∗ (CS + C) + Y + CC )) ∗ VDD ∗ fc (5.4) Where CS is self-capacitance, CL is loading capacitance,CC is coupling capacitance,VDD is supply voltage and fc is clock frequency. X and Y are formulated in the following ways. X= W X Xi , andY = i=o W −1 X Yi , Yi+1 (5.5) i=o where W is the number of bit lines of the bus. X denotes the self transition activity for the selfcapacitance and loading capacitances Self transition activity i X denotes the self transition activity for the self-capacitance CS and loading capacitances CL . Let Pr,s be the transition probability that the signal line i of bus changes from state r(0,1) to s(0, 1). Then, we can compute the quantity of Xi for signal line by, Xi = pi0,1 (5.6) Coupling transition activity Let denote the coupling transition probability that the signal line i of bus changes from p(0,1) to r(0,1) and at the same time, the adjacent signal line j of bus changes from q(0,1) to s(0,1). Then we can compute Yi,j between signal lines and by, pi,j pq,rs i,j i,j i,j i,j i,j Yi,j = α ∗ (pi,j 00,01 + p11,10 + p00,10 + p11,10 ) + β ∗ (p01,10 + p10,01 ) 80 (5.7) The capacitance ratio γ is defined as, γ = Cs /(Cs + Cl ). The value of gamma will increase as the aspect ratio of the interconnect increases. It is easily shown that the dynamic power consumption is proportional to the value of, Z =X +γ∗Y (5.8) The total power consumption for an encoding graph Gen with mapping function f by the quantity of Z Z= X p(ci , cj ) ∗ w(f (ci ), f (cj )) (5.9) where p(ci , cj ) is the transition probability from ci to cj in the transition probability graph. Note that p(ci , cj ) is the percentage of the transition occurrences over the entire transitions obtained from the transition profile. Transition from one state to other is a random process. So a probablity distribution state diagram is drawn from the average number of transitions for every specific transition. The graph is shown in figure 5.20: Figure 5.22 Transition Probability Graph One such example of this graph is shown in figure 5.21, for which Z = 3.45 Our main aim is to reduce the value of Z. 5.5.3 Comparison with existing bus coding technique As it has been shown in section 5.5, the main aim of bus coding techniques is to reduce or remove crosstalk noise by various methods. At the input stage encoder is placed to reduce crosstalk and thus coded signal is transmitted on the data bus which will not suffer from crosstalk noise. Thus total delay and power consumption are reduced by using bus coding techniques. In this thesis, it is proposed to replace the signal restoring buffers in data buses with Schmitt triggers. Since Schmitt trigger is able to handle very large noise glitches too, hence it will not suffer from unwanted glitches due to crosstalk noises. It is shown that if both encoder and decoder are removed from input and output ends and Schmitt triggers are used as signal restoring elements in the place of 81 Figure 5.23 Example of Transition Probability Graph buffers, then it can result in over all delay reduction. Due to large bandgap, Schmitt trigger is able to remove all the crosstalk noise classes from input signal and thus provide a clean output. Moreover the extra hardware in form of encoder and decoder is not required in the proposed approach. Thus along with area savings more power savings are also achieved by proposed approach. Output signal on an 8 bit data bus is shown in figure 5.24. Statistical gains in terms of delay and power savings are shown in table 5.9 and 5.10 respectively. 5.6 Conclusion In this chapter, the utility of Schmitt trigger in a voiding cross talk noise has been discussed. It has been shown that the existence of dual threshold of Schmitt trigger helps in avoiding cross talk in buses providing the noise falls in threshold interval. Where the noise is below the threshold level, Schmitt trigger can be adjusted to prevent the noise from coupling to adjacent lines. 82 Figure 5.24 Data signals rectified using Schmitt trigger approach in an 8 bit data bus. 83 Table 5.9 Propagation delay values for 8 bit buses of different length with buffer insertion and delay reduction using Schmitt trigger approach Technology 180nm 130nm 90nm 65nm Length = 2mm Delay in buffered Interconnect (ps) 18.5 35.76 59.14 138.76 Delay with Schmitt trigger approach (ps) 16.3 31.65 53.1 125.6 % reduction with Schmitt trigger 12.1 12.5 10.1 9.5 Length = 5mm Delay in buffered Interconnect (ps) 74.7 215.5 354.36 765.2 Delay with Schmitt trigger approach (ps) 66.1 187.8 304.8 700 % reduction with Schmitt trigger 12.5 13.55 14.3 9.8 Length = 10mm Delay in buffered Interconnect (ps) 192.1 717.9 1285 2948 Delay with Schmitt trigger approach (ps) 181.4 638.3 1130.5 2600.2 % reduction with Schmitt trigger 9.8 11.7 12.5 13.2 Table 5.10 Power consumption values for 8 bit buses of different length with duction using Schmitt trigger approach. Technology 180nm 130nm 90nm Length = 2mm Power consumption in buffered Bus (µW) 99.6 141.6 183.6 Power consumption with Schmitt trigger approach (µW) 89.8 126.8 160.6 % reduction with Schmitt trigger 11 12.1 13.7 Length = 5mm Power consumption in buffered bus (µW) 211.1 310 387.5 Power consumption with Schmitt trigger approach (µW) 181.24 266.2 299.5 % reduction with Schmitt trigger 12.8 09.5 11.5 Length = 10mm Power consumption in buffered bus (µW) 581.6 785 977.6 Power consumption with Schmitt trigger approach (µW) 520.8 708.3 850.1 % reduction with Schmitt trigger 12.5 11.6 14.1 84 buffer insertion and re65nm 197.6 170 15.2 467.28 392.1 13.4 1184.1 934.1 15.3 Chapter 6 Conclusions and Future Work In this thesis, Schmitt trigger has proposed as an alternate to existing buffer insertion technique for linear VLSI interconnects for delay, power and noise reduction. It has been shown that the replacement of buffer with Schmitt trigger helps in reducing delay and power consumption. Schmitt trigger possesses the property of dual threshold as compared to single threshold buffer. This property enables the desirable settings of both the voltage thresholds. Thus the signal can be made to rise/fall faster by keeping the voltage threshold to be lower. Lesser number of transistors in active mode at the time of switching also results in lesser power consumption thus resulting in more savings in power consumptions due to the Schmitt trigger as compared to buffer. It has also been shown that Schmitt trigger helps in reducing cross talk noise in the circuit. Crosstalk noise is a problem in data buses where close proximity of interconnects results in induced noise in neighboring lines, which contributes to glitches. It is that there may not be a need for bus coding techniques for the purpose of crosstalk noise reduction if Schmitt trigger is used for. The extra hardware which is required in the form of encoder and decoder for bus coding techniques is also not required. Hence area savings may also be achieved. 6.1 Scope of further work This thesis focused mainly on linear VLSI interconnects and analyze local, inter- mediate and global interconnects only. However, since practical interconnects also have tree and mesh structures, utility of Schmitt trigger in such interconnect structures may be also be explored. 85 Bibliography [1] Chandrakasan, A. P., S. Sheng, and R. W. Brodersen, “Low-power Digital CMOS Design,” IEEE Journal of Solid State Circuits, pp. 473-484, April 1992. [2] Proakis and Manolakis, “Digital Signal Processing, Principles, Algorithms, and Applications, 3/e” Prentice Hall of India 2003 [3] P. Saxena and N. Menezes and P. Cocchini and D.A. Kirkpatrick, Repeater scaling and its impact on CAD, IEEE Transactions on Computer-Aided Design, vol. 23, no. 4, pp. 451463, 2004. [4] P.J. Osler, Placement driven synthesis case studies on two sets of two chips: hierarchical and flat, in Proceedings of International Symposium on Physical Design, San Diego, California, 2004, pp. 190197. [5] Maged M. Ghoneima, Muhammad M. Khellah,“Skewed Repeater Bus: A Low-Power Scheme for On-Chip Buses”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008. [6] Y.I. Ismail and E.G. Friedman, Effects of inductance on the propagation delay and repeater insertion in VLSI circuits, in Proceedings of the Conference on Design Automation, New Orleans, Louisiana, 1999, pp. 721724. [7] Y.I. Ismail, E.G. Friedman, and J.L. Neves, Repeater insertion in tree structured inductive interconnect, in Proceedings of the International Conference on Computer-Aided Design, San Jose, California, 2001, pp. 420424. [8] Z. Jiang, S. Hu, J. Hu, Z. Li, and W. Shi, A new RLC buffer insertion algorithm, in Proceedings of the International Conference on Computer-Aided Design, San Jose, California, 2006, pp. 553557. [9] L. P. P. P. van Ginneken, Buffer placement in distributed RC-tree network for minimal Elmore delay, in Proc. Int. Symp. on Circuits and Systems, 1990, pp. 865868. [10] S. Lin and M. Marek-Sadowska, A fast and efficient algorithm for determining fanout tree in large networks, in Proc. of EDAC, Feb 1991, pp. 539544. 86 [11] H. Zhou, D. F. Wong, I. M. Liu, and A. Aziz, Simultaneous routing and buffer insertion with restrictions on buffer locations, IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems , vol. 19, no. 7, pp. 819824, July 2000. [12] C. C. N. Chu and D. F. Wong. A quadratic programming approach to simultaneous buffer insertion/sizing and wire sizing, IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, vol. 18, no. 6, pp. 787798, Sept. 1999. [13] J. Lillis, C. K. Cheng and T.-T. Y. Lin, Optimal wire sizing and buffer insertion for low power and a generalized delay model, IEEE Trans. Solid-State Circuits, vol. 31, no. 3, pp. 437447, March 1996. [14] C. J. Alpert and A. Devgan. Wire segmenting for improved buffer insertion, in Proc. ACM/IEEE Design Automation Conf., 1997, pp. 588593. [15] W. Shi and Z. Li, An O(n log n) time algorithm for optimal buffer insertion, in Proc. ACM/IEEE Design Aut [16] W. Shi, Z. Li and C.J. Alpert, Complexity analysis and speedup techniques for optimal buffer insertion with minimum cost, in Proc. Asia and South Pacific Design Automation Conf., 2004, pp. 609614. [17] S. D. Naffziger et al., “The Implementation of a 2-Core, Multi-Threaded Itanium Family Processor,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 1, pp. 197-209, January 2006. [18] International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 2003. [19] H. Veendrick, Deep Submicron CMOS ICs - From Basics to ASICs. Deventer, Netherlands: Kluwer, 1998. [20] H. B. Bakoglu and J. D. Meindl, “Optimal Interconnection Circuits for VLSI,” IEEE Transactions on Electron Devices, Vol. ED-32, No. 5, pp. 903-909, May 1985. [21] N. Magen et al., “Interconnect-Power Dissipation in a Microprocessor,” Proceedings of the ACM International Workshop on System Level Interconnect Prediction, pp. 7-13, February 2004. [22] F. Chen and D. Gardner, “Influence of Line Dimensions on the Resistance of Cu Interconnections,” IEEE Electron Device Letters, Vol. 19, No. 12, pp. 508-510, December 1998. [23] A. H. Ajami et al., “Analysis of IR-Drop Scaling with Implications for Deep Submicron P/G Network Designs,” Proceedings of the IEEE International Symposium on Quality Electronic Design, pp. 35-40, March 2003. 87 [24] W. Wu and K. Maex, “Studies on Size Effect of Copper Interconnect lines,” Proceedings of International Conference on Solid-State and Integrated-Circuit Technology, pp. 416-418, October 2001. [25] A. V. Mezhiba and E. G. Friedman, Power Distribution Networks in High Speed Integrated Circuits. MA: Kluwer Academic Publishers, 2004. [26] K. Nabors and J. White, “FastCap: A Multipole Accelerated 3-D Capacitance Extraction Program,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 10, No. 11, pp. 1447-1459, November 1991. [27] J. H. Chern et al., “Multilevel Metal Capacitance Models for CAD Design Synthesis Systems,” IEEE Electron Device Letters, Vol. 13, No. 1, pp. 32-34, January 1992. [28] S. Wong, G. Lee, and D. Ma, “Modeling of Interconnect Capacitance, Delay, and Crosstalk in VLSI,” IEEE Transactions on Semiconductor Manufacturing, Vol. 13, No. 1, pp. 108-111, February 2000. [29] K. Gala et al., “Inductance 101: Analysis and Design Issues,” Proceedings of the IEEE/ACM Design Automation Conference, pp. 329-334, June 2001. [30] B. Krauter and S. Mehrotra, “Layout Based Frequency Dependent Inductance and Resistance Extraction for On-Chip Interconnect Timing Analysis,” Proceedings of the IEEE/ACM Design Automation Conference, pp. 303-308, June 1998. [31] S. Sim et al., “A Unified RLC Model for High-Speed On-Chip Interconnects,” IEEE Transactions on Electron Devices, Vol. 50, No. 6, pp. 1501-1510, June 2003. [32] X. Huang et al., “Loop-Based Interconnect Modeling and Optimization Approach for Multigigahertz Clock Network Design,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 3, pp. 457-463, March 2003. [33] S. Yu et al., “Loop-Based Inductance Extraction and Modeling for Multiconductor On-Chip Interconnects,” IEEE Transactions on Electron Devices, Vol. 53, No. 1, pp. 135-145, January 2006. [34] A. Mezhiba and E. G. Friedman, “Frequency Characteristics of High Speed Power Distribution Networks,” Analog Integrated Circuits and Signal Processing, Vol. 35, No. 2/3, pp. 207-214, May/June 2003. [35] T. Dhaene and D. D. Zutter, “Selection of Lumped Element Models for Coupled Lossy Transmission Lines,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 11, No. 7, pp. 805-815, July 1992. [36] S. Sim, K. Lee, and C. Y. Yang, “High-Frequency On-Chip Inductance Model,” IEEE Electron Device Letters, Vol. 23, No. 12, pp. 740-742, December 2002. 88 [37] S. Lin and E. Kuh, “Transient Simulation of Lossy Interconnects Based on the Recursive Convolution Formulation,” IEEE Transactions on Circuits and Systems, Vol. 39, No. 11, pp. 879-892, November 1992. [38] T. Lin, M. W. Beattie, and L. T. Pileggi, “On the Efficacy of Simplified 2D On-Chip Inductance Models,” Proceedings of the IEEE/ACM Design Automation Conference, pp. 757-762, June 2002. [39] G. Lei, G. Pan, and B. K. Gilbert, “Examination, Clarification, and Simplification of Modal Decoupling Method for Multiconductor Transmission Lines,” IEEE Transactions on Microwave Theory and Techniques, Vol. 43, No. 9, pp. 2090-2100, September 1995. [40] L. Yin and L. He, “An Efficient Analytical Model of Coupled On-Chip RLC Interconnects,” Proceedings of the IEEE Design Automation Conference Asia and South Pacific, pp. 385-390, January 2001. [41] F. Chang, “Transient Analysis of Lossless Coupled Transmission Lines in a Nonhomogeneous Dielectric Medium,” IEEE Transactions on Microwave Theory and Techniques, Vol. 18, No. 9, pp. 616-626, September 1970. [42] W. C. Elmore, “The Transient Response of Damped Linear Networks,” Journal of Applied Physics, Vol. 19, pp. 55-63, January 1948. [43] L. T. Pillage and R. A. Rohrer, “Asymptotic Waveform Evaluation for Timing Analysis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 9, No. 4, pp. 352-366, April 1990. [44] M. A. El-Moursy and E. G. Friedman, “Optimum Wire Shaping of an RLC Interconnect,” Proceedings of the IEEE Midwest Symposium on Circuits and Systems, December 2003. [45] M. Ghoneima and Y. Ismail, “Optimum Positioning of Interleaved Repeaters in Bidirectional Buses,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 24, No. 3, pp. 461-469, March 2005. [46] F. Anderson, J. S. Wells, and E. Z. Berta, “The Core Clock System on the Next Generation Itanium Microprocessor,” Proceedings of the IEEE International Solid-State Circuits Conference, pp. 110111, February 2002. [47] L. He and K. M. Lepak, “Simultaneous Shield Insertion and Net ordering for Capacitive and Inductive Coupling Minimization,” Proceedings of the ACM International Symposium on Physical Design, pp. 56-61, 2000. [48] B. Soudan, “The Effects of Swizzling on Inductive and Capacitive Coupling for Wide Signal Busses,” Proceedings of the International Conference on Microelectronics, pp. 300-303, December 2003 89 [49] J.J. Cong, K.-S. Leung, “Optimal wiresizing under Elmore delay model”, IEEE Trans. Comput. Aided Design Integrated Circuits Systems 14 (3) (1995) 321336. [50] P.Sotiriadis and A.Chandrakasan, “Reducing bus delay in sub-micron technology using coding, In Proc. of IEEE Asia and South Pacific Design Automation Conf (ASPDAC01),pp 109-114, 2000. [51] P.Sotiriadis, “Interconnect Modeling and Optimization in Deep Submicron Technologies Dissertation Thesis, MIT, May 2002. [52] Lin Li, Narayanan Vijaykrishnan, Mahmut T. Kandemir, Mary Jane Irwin, “A Crosstalk Aware Interconnect with Variable Cycle Transmission, In Design Automation and Test in Europe (DATE), 2004, pp. 102-107. [53] F.J. Taylor, Digital Filter Design Handbook, Marcel Dekker, Inc., NYC, 1984. [54] G.K. Ma, and F.J. Taylor, “Multiplier Policies For Digital Signal Processing”, IEEE ASSP Mag., pp.6- 20, January, 1990. [55] A.G.Dempster, and M.D.Macleod, “Use of Minimum-Adder Multiplier Blocks in FIR Digital Filters”, IEEE Trans. Circuits Syst. 11, vol. 42, no. 9, pp. 569-577, Sept, 1995. [56] Reza Hashemian “A New Method for conversion of a 2s complement to Canonic Sign Digit Number System and its Representation, in Proceedings of Asilomar Conference on Signals, Systems and Computers, pp. 904-907., 1997. [57] Dejhan, K., Tooprakai, P., Rerkmaneewan, T. Soonyeekan, C., “A high-speed direct bootstrapped CMOS Schmitt trigger circuit” ,Semiconductor Electronics, 2004. ICSE 2004. IEEE International Conference, 7-9 December 2004. [58] O.H. Schmitt to H.R. Lang, Nov. 25, 1937, O.H. Schmitt Papers, University Archives, University of Minnesota, Minneapolis, MN, Box SF114. [59] O.H. Schmitt, A thermionic trigger, J. Sci. Instrum., vol. 15, no. 1, pp. 2426. [60] B. Hart, Picturing Schmitts trigger, Electron. World, vol. 105, no. 1764, pp. 10401046, 1999. [61] P. R. Gray and R. G. Meyer, “Analysis and Design of Analog Integrated circuits”, 2nd edition, New York: Wiley, 1984. [62] P. Sotiriadis and A. Chandrakasan “Low power Coding Techniques Considering Inter wire capacitances, In Proc. Of IEEE Conferences on Custom Integrated Circuits (CICC00), pp 507-510, 2000. [63] M.R Stan and W.P Burleson, “Bus Invert Coding for Low Power I/O, IEEE Transactions VLSI systems, pp 49-58, March 1995. 90 [64] P. Sotiriadis, “Interconnect Modeling and Optimization in Deep Submicron Technologies, Dissertation Thesis, Massachusetts Institute of Technology, May 2002. [65] Yan Zhang et al, “Odd/Even bus invert with two phase transfer for buses with coupling, Proceedings of ISPLED 02, pp 80-83, August 12-14, Monterey, CA, USA. [66] Jayapreetha Natesan and Damu Radhakrishnan, “Shift Invert coding (SINV) for low power VLSI, Proceedings of EUROMICRO Systems on Digital System Design (DSD04), pp 190-194, 2004. [67] J.V.R Ravindra, K.S. Sainarayanan, M.B. Srinivas, “A novel bus coding technique for low power data transmission, IEEE symposium on VLSI design and test conference (VDAT-2005), pp 263-266, August 2005. [68] P. Ghosh, R. Mangaser, K. Rose, “Interconnect-dominated VLSI design”, Proceedings of the Conference on Advanced Research, March 1999, pp. 114122. [69] Semiconductor Industry Association, The National Technology Roadmap for Semiconductors 1997 Edition. [70] M. T. Bohr, Proc. IEEE International Electron Devices Meeting, p. 241, 1995. [71] E. E. Davidson, IEEE Micro, 18/4, p. 33, 1998. 91