On-Chip Communication Architectures

On-Chip Communication


Physical Design Trends

for Interconnects

ICS 295
Sudeep Pasricha and Nikil Dutt
Slides based on book chapter 11
DSM Interconnect Design

Low Power High Speed Circuit Design Techniques

Power Distribution Techniques

Clock Distribution Techniques

3D Interconnects

 Smaller feature size in DSM technology nodes reduces delay of
active devices
 However, effect on delay due to the passive interconnects has
increased rapidly

 Local wire delay decreases with feature size due to a reduction in

distance among active devices
 However overall speed of current ICs is most often limited by the
long distance global interconnects

 With shrinking feature size and larger chip die dimensions,
sheer number of interconnects has increased exponentially
 Interconnect capacitance often dominates total gate load
◦ therefore, a large portion of the total transient power is dissipated
by these on-chip lines
◦ particularly true for those long interconnects that distribute the
clock signals
 that can dissipate 40–50% of total IC power
 Gains achieved in performance with technology scaling
are often accompanied by an increase in power dissipation
◦ e.g., additional interconnect layers enhance circuit speed at the
expense of higher power consumption due to the larger interconnect

 Traditionally used RC interconnect models are not
sufficiently accurate to properly capture signal
propagation in lines with fast transition times
 Including inductance (L) in the RC model has become a
necessary modification
◦ RLC models are therefore becoming increasingly common at the
expense of greater computational cost
 Whether to include inductance in the interconnect model
at midrange frequencies (~ 1 - 3 GHz) depends on several
◦ e.g. the length of the line, cross-sectional area of the line, etc
 Under certain conditions, ignoring line inductance may
lead to high area and power inefficient circuits
 In addition to interconnects among the various on-chip
devices, the clock and power distribution networks
require significant metal resources
◦ both must span the entire chip
 Accurately modeling the clock, power, and signal nets is
a difficult task
◦ highly complex structures
 Optimally allocating metal to properly design these
networks presents an even greater challenge
◦ because of complexity in developing interconnect models

DSM Interconnect Design
 Scalingof the lateral dimensions in planar devices, such
as MOS transistors, has produced improvements in
◦ area
◦ power
◦ speed
 Power consumption and signal propagation delays of
long (global) resistive lines have increased
 Accurate on-chip interconnect models are required
◦ to determine the signal characteristics and design requirements
of high speed DSM interconnect

DSM Interconnect Design
 A local line can be modeled as a single lumped capacitor
◦ signal propagation delay is negligible compared to gate delay

 Longer lines must include resistive effect

Long interconnect models often divide the line into

sections - distributed impedance model
 T or П
 accuracy depends on no. of sections

DSM Interconnect Design
 Long,wide interconnects operating at frequencies
exceeding a GHz are not accurately characterized by an
RC model
◦ An RLC model is necessary for accuracy

◦ More complex to evaluate

◦ Distributed RLC model (T or П)

DSM Interconnect Design
 Length criteria for including inductance in interconnect

DSM Interconnect Design
 Frequency criteria for including inductance in interconnect
◦ increasing signal frequencies require faster signal transition times
◦ resulting effect of a decreased signal transition time is a lower limit on
the line length, making shorter on-chip interconnects behave inductively
 medium length lines can also behave inductively at high signal frequencies
Dielectric material criteria for including inductance in
interconnect models
◦ line capacitance can be reduced by half of the capacitance of SiO2 with
the use of low k dielectrics
◦ copper interconnect has reduced the line resistance by a factor of two to
three as compared to aluminum
◦ these and other new materials further the need to include line inductance
in interconnect models

Low Power High Speed Circuit Design
 High Speed Circuit Design Goal: Improve performance
 Power and noise are important when considering design
techniques to optimize circuit performance
 Noise affects delay, degrades waveform shape, and most
importantly, creates the possibility of an erroneous
interpretation of the digital signals

CMOS Power Dissipation Overview
 CMOS Power Dissipation

◦ Dynamic power

CMOS Power Dissipation Overview
◦ Short circuit power
 Due to current flow when a
conductive path exists from
the power rail to ground rail

◦ Leakage power
 Due to current that flows
between the power terminals
in the absence of any
 Increasingly dominating
overall power

Wire Sizing
 Width of an interconnect affects the power characteristics
and propagation delay
 Consider a CMOS inverter driving an RC interconnect line
◦ Simple first order model of delay

Wire Sizing
 Increasing the driver transistor width reduces R tr
◦ decreasing the circuit delay
◦ trading off circuit power and area for higher speed
 Increasing the width of the interconnect to reduce R int does not
significantly reduce the delay caused by the RC interconnect
◦ since decrease in wire resistance is offset by increase in wire capacitance
 Many algorithms have been proposed to determine the
optimum wire size that minimizes a target cost function
◦ minimizing delay
◦ addressing reliability issues by reducing clock skew
 Few approaches for simultaneous driver and wire sizing

Wire Sizing
 Tradeoffsexist between
dynamic and short-circuit

◦ As line inductance-to-resistance ratio increases with wider lines, short-

circuit power decreases due to reduction in signal transition time
◦ For an RC line, short-circuit power remains approximately constant
with increasing width (unlike RLC line shown in figure)
 decrease in interconnect resistance offset by an increase in capacitance
◦ If width of interconnect exceeds a specific limit, short-circuit power
 change in the matching characteristics between driver and interconnect
◦ Dynamic power increases with width since line capacitance is greater

Driver Sizing
 Transistorsizing techniques aim to lower delay
 Wider transistors have a few advantages but also several
◦ + produce more current
◦ + reduce charge time of load capacitance
◦ – have greater physical area
◦ – have larger gate capacitance
◦ – increased circuit area and power
 A careful balance of the current drive and output load is
necessary to enhance circuit performance
 Power optimal transistor size is smaller than the power-
delay optimal transistor size
© 2008 Sudeep Pasricha & Nikil Dutt 20
Tapered Buffers
 An important example of transistor sizing is tapered buffers
◦ Used to drive large capacitive loads
 Consists of chain of inverters of gradually increasing size
 Ratio of size of an inverter to size of the preceding inverter
is the tapering factor β
 Delay of tapered buffer system is less than delay of a single
large inverter

Tapered Buffers
 Dependence of the tapered
buffer propagation delay on
tapering factor

 For minimum delay, optimal number of stages

 Nopt= ln(M)
 where M = CL/Co is the ratio of the load capacitance to the input
capacitance of the initial inverter in the chain
 Area,power, and delay strongly dependent on β
 Trade-offs possible by varying tapering factor

Repeater Insertion
 An effective strategy for reducing delay of a long
interconnect is to strategically insert buffers along a line
 Repeaters circumvent quadratic increase in interconnect
delay by partitioning the line into smaller and approximately
equal sections

 Sum of section delays is smaller than delay of original path

since delay of each section is reduced
 Decreased interconnect delay is partially offset by the
additional delay of the inserted repeaters
© 2008 Sudeep Pasricha & Nikil Dutt 23
Repeater Insertion
 Optimal number of repeaters

 Optimal size of repeaters

Rt and Ct are the total interconnect resistance and capacitance

R0 and C0 are the input and output repeater resistance and capacitance
Cg0 is the input capacitance of the repeater
Fitting parameters a1 and a2 account for rise and fall time of the propagating

Repeater Insertion
 For an RC line, repeater insertion techniques outperform
wire sizing
 In RLC lines, wire sizing outperforms repeater insertion as
the minimum signal propagation delay with no repeaters is

Power Distribution Techniques
 Power grid consists of a supply, load, and interconnect lines
connecting the supply to the load

 Interconnect lines connecting the power supply to the load are

non-ideal with a finite resistance and inductance,
◦ Rp , Lp and Rg , Lg , for the power and ground lines, respectively
 Change in supply voltages at the load terminal is referred to as
power supply noise
© 2008 Sudeep Pasricha & Nikil Dutt 27
Power Distribution Techniques
 Power supply noise can adversely affect circuit operation
 Power supply variations reduce rail-to-rail power voltage,
causing gate-to-source voltage across both NMOS and
PMOS transistors to also decrease
◦ lowering the output drive current of these devices
◦ signal delay increases, compared to delay under nominal power
supply voltage
 Power noise affects propagating clock and data signals by
causing an increase in both delay and delay uncertainty
within the data paths
◦ severely limit the maximum operating frequency of an IC

Power Distribution Techniques
 Powerdistribution network should exhibit a minimal
impedance at terminals of the load
◦ to ensure a small variation in the power supply voltage
 Decoupling capacitors ensure correct and reliable operation
◦ distributed across a system, placed at board, package, and on-chip
◦ provide charge when transient current demands on power grid are high
◦ each decoupling capacitor provides transient current to the load,
effectively reducing the local transient noise

Power Distribution Techniques
 On-chip current densities can reach several hundred thousand
amperes per sq. cm, leading to electromigration
 Electromigration is the transport of metal atoms under the
force of an electron flux
◦ depletion and accumulation of metal material resulting from atomic
flow can lead to formation of extrusions and voids in metal structures
◦ extrusions and voids can lead to short circuits and open circuit faults

Clock Distribution Techniques
 In a synchronous digital system, the clock signal provides a
time reference for movement of data within that system
 Clock signals are typically
◦ loaded with the greatest fanout
◦ travel over the longest distances
◦ operate at the highest speeds of any signal, either control or data
 Differences in the delay of the clock signals can
◦ severely limit the maximum performance of the entire system
◦ create catastrophic race conditions in which an incorrect data signal
may latch within a register

Clock Distribution Techniques
 Synchronous systems are composed of the following
three delay components:
◦ memory storage elements
◦ logic elements
◦ clocking circuitry and distribution networks
 Minimum allowable clock period TCP(min) between any
two registers in a sequential data path is

Clock Distribution Techniques

Clock Distribution Techniques
 Ideally,clocking events occur simultaneously at all registers
 Difference in the clock signal arrival time between two
sequentially adjacent registers is the clock skew
◦ system-wide or chip-wide clock skew between two non-sequentially
adjacent registers, from an analysis viewpoint is meaningless
 has no effect on the performance and reliability of a synchronous system
◦ can be positive or negative

Clock Distribution Techniques
 Under positive clock skew the maximum attainable
operating frequency is decreased
◦ Performance degradation
 Under negative clock skew maximum performance of a
synchronous system can be improved by decreasing
delay of a critical path
 But there is also potential for a race condition
◦ clock skew must be less than the time required for
 data signal to leave initial register, propagate through interconnect and
combinatorial logic, and successfully set up in the final register
◦ if this condition is not met
 data stored in register Rf is overwritten by data that had been stored in
register Ri and has propagated through the combinatorial logic

Clock Distribution Techniques
 The most common strategy for distributing on-chip clock
signals is to insert buffers at the clock source and along the
clock path, forming a tree structure
 Buffers, acting as repeaters, are usually distributed
throughout clock network
 Distributed buffers serve the double function of
◦ amplifying clk signals degraded by distributed interconnect
◦ isolating the local clock nets from the upstream load impedances

Clock Distribution Techniques
 Anotherapproach for distributing clock signals utilizes a
hierarchy of planar symmetric H-tree or X-tree structures to
ensure zero clock skew
◦ by maintaining identical distributed interconnect
and buffer paths from clock signal source to the
clocked registers
◦ conductor widths in H-tree structures are
designed to progressively decrease as signal
propagates to lower levels of the hierarchy
 ensuring that reflections are minimized at branch points
 Drawbacks compared to standard clock trees:
◦ interconnect capacitance and power dissipation is much greater since total
wire length is much longer
◦ difficult to implement in high complexity integrated systems which are
typically irregular in nature

3D Interconnects
 3-D interconnects have been proposed as a way to
address increasing line delay and capacitive crosstalk
 Introduction of a third dimension significantly alters
distribution of the interconnect length in ICs
◦ As the number of planes is increased, the length and number of
the global (local) interconnects decrease (increase)

3D Interconnects
 Itis possible to partition a 2-D IC into multiple subsections
and stack these sections in the vertical dimension
◦ reduces corner-to-corner interconnect length significantly
◦ several global interconnects in the upper metallization levels can be
transferred to local, smaller aspect ratio metal layers
 reduces the total number of metal levels within a 3-D circuit
◦ an increase in the clock frequency is possible as the worst case data
path delay can be reduced
 assuming a constant number of registers along a sequential data path and
number of metal layers
◦ consume less power as compared to 2-D ICs as a consequence of the
reduced capacitive load of the global interconnect lines

3D Interconnects
 3-D interconnect system requires inter-plane
interconnects that connect signals between vertically
stacked devices
 Inter-plane interconnects implemented as through
silicon vias (TSV) or 3-D vias can produce the shortest
path within a 3-D system
◦ as compared to wire bonding, peripheral vertical interconnects,
and solder ball arrays
 Much work is needed to properly characterize and
model the interplane TSV
◦ primary technological innovation required to exploit benefits of
3-D integration

 Complexity of properly designing interconnects in the DSM
regime increases with each successive technology generation
◦ Choice between RC or RLC dependant on several factors
 Lowpower, high speed circuit techniques are essential to
expand battery lifetime and maintain ambient thermal levels
◦ Wire and driver sizing, as well as repeater insertion critical
 Clockand power distribution are important applications of the
general interconnect design problem
◦ noise and skew must be considered during design
 Noveltechniques such as 3D interconnects in interconnect
design can help alleviate challenges in emerging ICs
◦ e.g. longer line lengths, greater line impedances, increased signal delays

