Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

1.timing Optimization Techniques: 1. Mapping

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 152
At a glance
Powered by AI
The key takeaways are timing optimization techniques like mapping, unmapping, pin swapping, buffering, cell sizing, cloning, and logic restructuring. Static timing analysis is also discussed which is important for verifying timing performance of a design.

The different timing optimization techniques discussed are mapping, unmapping, pin swapping, buffering, cell sizing, cloning, and logic restructuring.

Static timing analysis is a method of validating the timing performance of a design by checking all possible paths for timing violations without having to simulate. It is important because it allows checking how fast the design will run and interact with other chips to meet timing constraints.

1.

Timing Optimization Techniques


Timing Optimization Techniques are as follows:
1. Mapping :
Mapping converts primitive logic cells found in a net list to technology specific logic
gates found in the library on the timing critical paths.
2. Un mapping:
Un mapping converts the technology specific logic gates in the net list to primitive
logic gates on the timing critical paths.
3. Pin Swapping :
Pin swapping optimization examines the slacks on the inputs of the gates on worst
timing paths and optimizes the timing by swapping nets attached to the input pins, so
the net with the least amount of slack is put on the fastest path through the gate
without changing the function of the logic.
4. Buffering:
Buffers are inserted in the design to drive a load that is too large for a logic cell to
efficiently drive. If the net is too long then the net is broken and buffers are inserted to
improve the transition which will ultimately improve the timing on data path and
reduce the setup violation. To reduce the hold violations buffers are inserted to add
delay on data paths.
5. Cell Sizing
Cell sizing is the process of assigning a drive strength for a specific cell in the library
to a cell instance in the design.If there is a low drive strength cell in the timing critical
path then this cell is replaced by higher drive strength cell to reduce the timing
violation.
6. Cloning :
Cell cloning is a method of optimization that decreases the loa
d of a very heavily loaded cell by replicating the cell. Replication is done by
connecting an identical cell to the same inputs as the original cell. Cloning clones the
cell to divide the fanout load to improve the timing.
7. Logic Restructuring
Logic restructuring means to rearrange logic to meet timing constraints on critical
paths of design.

1
2. STATIC TIMING ANALYSIS
Static timing analysis is a method of validating the timing performance of a
design by checking all possible paths for timing violations without having to simulate.
No vector generation is required, no functionality check is done.
Why Timing Is Important?
Timing is important because just designing the chip is not enough; we need to
Know how fast the chip is going to run, how fast the chip is going to interact with the
other chips, how fast the input reaches the output etc…Timing Analysis is a method
of verifying the timing performance of a design by checking for all possible timing
violations in all possible paths.
What is STA?

In Static Timing Analysis (STA) static delays such as gate delay and net
delays are considered in each path and these delays are compared against their
required maximum and minimum values. Circuit to be analyzed is broken into
different timing paths constituting of gates, flip flops and their interconnections. Each
timing path has to process the data within a clock period which is determined by the
maximum frequency of operation. Cell delays are available in the corresponding
technology libraries. Cell delay values are tabulated based on input transition and
fanout load which are characterized by SPICE simulation. Net delays are calculated
based on the Wire Load Models(WLM) or extracted resistance R and capacitance
C. Wire Load Models(WLM) are available in the Technology File. These values are
Table Look Up(TLU) values calculated based on the net fanout length.

The static timing analyzer will report the following delays (or it can do following
analysis):

 Register to Register delays


 Setup times of all external synchronous inputs
 Clock to Output delays
 Pin to Pin combinational delays
 Different Analysis Modes-Best, Worst, Typical, On Chip Variation (OCV)
 Data to Data Checks
 Case Analysis
 Multiple Clocks per Register
 Minimum Pulse Width Checks
2
 Derived Clocks
 Clock Gating Checks
 Netlist Editing
 Report_clock_timing
 Clock Reconvergence Pessimism
 Worst-Arrival Slew Propagation
 Path-Based Analysis
 Debugging Delay Calculation and many more......!!

The wide spread use of STA can be attributed to several factors

 The basic STA algorithm is linear in runtime with circuit size, allowing
analysis of designs in excess of 10 million instances.

 The basic STA analysis is conservative in the sense that it will over-estimate
the delay of long paths in the circuit and under-estimate the delay of short
paths in the circuit. This makes the analysis ”safe”, guaranteeing that the
design will function at least as fast as predicted and will not suffer from hold-
time violations.

 The STA algorithms have become fairly mature, addressing critical timing
issues such as interconnect analysis, accurate delay modeling, false or multi-
cycle paths, etc.

 Delay characterization for cell libraries is clearly defined, forms an effective


interface between the foundry and the design team, and is readily available. In
addition to this, the Static Timing Analysis (STA) does not require input
vectors and has a runtime that is linear with the size of the circuit .

Why do we normally do Static Timing Analysis and not Dynamic


Timing Analysis? What is the difference between them?
Timing Analysis can be done in both ways;static as well as dynamic. Dynamic
Timing analysis requires a comprehensive set of input vectors to check the timing
characteristics of the paths in the design. Basically it determines the full behavior of
the circuit for a given set of input vectors. Dynamic simulation can verify the
functionality of the design as well as timing requirements. For example if we have
100 inputs then we need to do 2 to the power of 100 simulations to complete the
analysis. The amount of analysis is astronomical compared to static analysis. Static
3
Timing analysis checks every path in the design for timing violations Without
checking the functionality of the design. This way, one can do timing and functional
analysis same time but separately. This is faster than dynamic timing simulation
because there is no need to generate any kind of test vectors. That’s why STA is the
most popular way of doing timing analysis.

Static Timing Analysis


Static Timing Analysis (STA) works with timing models where as the
Dynamic Timing Analysis (DTA) works with spice models. STA has more pessimism
and thus gives maximum delay of the design. DTA overcomes this difficulty because
it performs full timing simulation. The problem associated with DTA is the
computational complexity involved in finding the input pattern(s) that produces
maximum delay at the output and hence it is slow. The static timing analyzer will
report the following delays: Register to Register delays, Setup times of all external
synchronous inputs, Clock to Output delays, Pin to Pin combinational delays. The
clock to output delay is usually just reported as simply another pin to pin
combinational delay. Timing analysis reports are often pessimistic since they use
worst case conditions.
Dynamic Timing Analysis.
Note: There is one more type of Timing analysis: "Manual Analysis". But now a days
nothing is 100% Manual. Evey thing is more automated and less manual. So that we
are not discussing right now. In this Blog (and few next as a part of this) we will
discuss about the Static Timing Analysis. We will discuss Dynamic Timing Analysis
later on.
Static Timing analysis is divided into several parts as per the above mentioned list.

Dynamic vs Static Timing Analysis


Timing analysis is integral part of ASIC/VLSI design flow. Anything else can
be compromised but not timing! Timing analysis can be static or dynamic. Dynamic
timing analysis verifies functionality of the design by applying input vectors and
checking for correct output vectors whereas Static Timing Analysis checks static
delay requirements of the circuit without any input or output vectors.

Dynamic timing analysis has to be accomplished and functionality of the


design must be cleared before the design is subjected to Static Timing Analysis
(STA). Dynamic Timing Analysis (DTA) and Static Timing Analysis (STA) are not
4
alternatives to each other. Quality of the Dynamic Timing Analysis (DTA) increases
with the increase of input test vectors. Increased test vectors increase simulation time.
Dynamic timing analysis can be used for synchronous as well as asynchronous
designs. Static Timing Analysis (STA) can’t run on asynchronous deigns and hence
Dynamic Timing Analysis (DTA) is the best way to analyze asynchronous designs.
Dynamic Timing Analysis (DTA) is also best suitable for designs having clocks
crossing multiple domains.

Example of Dynamic Timing Analysis(DTA) tool is Modelsim (from mentor


Graphics), VCS (from Synopsys). DTA is also carried out on post layout netlist to
verify that functionality of the design has not changed. Test vectors remain same for
both.

SPICE Simulation

` Device level timing analysis is carried out using SPICE simulation. SPICE
simulation is very essential for full custom designs to verify the electrical properties
of the designs. These are calculated based on the mathematical equations that
represent electrical properties of devices. Material and some of the electrical
properties of the devices, which are represented by either variables or constants, are
stored in model files. Examples are threshold voltage of MOSFET, electron density
etc. SPICE characterized data is tabulated in technology libraries which becomes
basic delay information for the Static Timing Analysis.
For example let us consider a AND gate. Several electrical properties such as input
and output transition, propagation delay, output capacitance etc are evaluated by this
SPICE simulation. SPICE simulated data gives maximum accuracy compared to any
other form of simulation. SPICE code is manually written and simulated. Hence for a
larger design SPICE simulation is cumbersome job. There are specific tools available
for transistor level Static Timing Analysis (STA), (Eg. Pathmill from Synopsys)
SPICE simulation being the backbone of all these tools.
Static Timing Analysis:
Static timing analysis is a method of validating the timing performance of a
design by checking all possible paths for timing aviolations under worst-case
conditions. It considers the worst possible delay through each logic element, but not
the logical operation of the circuit. In comparison to circuit simulation, static timing

5
analysis is Faster - It is faster because it does not need to simulate multiple test
vectors.
More Thorough - It is more thorough because it checks the worst-case timing
for all possible logic conditions, not just those sensitized by a particular set of test
vectors.
Once again Note this thing : Static timing analysis checks the design only for proper
timing, not for correct logical functionality.
Static timing analysis seek s to answer the question, “Will the correct data be
present at the data input of each synchronous device when the clock edge arrives,
under all possible conditions?
In static timing analysis, the word static alludes to the fact that this timing
analysis is carried out in an input-independent manner. It locates the worst-case delay
of the circuit over all possible input combinations. There are huge numbers of logic
paths inside a chip of complex design. The advantage of STA is that it performs
timing analysis on all possible paths (whether they are real or potential false paths).
However, it is worth noting that STA is not suitable for all design styles. It has proven
efficient only for fully synchronous designs.
Since the majority of chip design is synchronous, it has become a mainstay of
chip design over the last few decades.
The Way STA is performed on a given Circuit:
 To check a design for violations or say to perform STA there are 3
main steps:
 Design is broken down into sets of timing paths,
 Calculates the signal propagation delay along each path
 And checks for violations of timing constraints inside the design and at
the input/output interface.
The STA tool analyzes ALL paths from each and every start point to each and
every endpoint and compares it against the constraint that (should) exist for that path.
All paths should be constrained; most paths are constrained by the definition of the
period of the clock, and the timing characteristics of the primary inputs and outputs of
the circuit.
Before we start all this we should know few key concepts in STA method:
timing path, arrive time, required time, slack and critical path.

6
The major design challenges of ASIC design consist of microscopic issues
and macroscopic issues .The microscopic issues are ultrahigh speeds, power
dissipation, supply rail drop, growing importance of interconnect, noise, crosstalk,
reliability, manufacturability and the clock distribution. The macroscopic issues are
time to market, design complexity, high levels of abstractions, reuse, IP portability,
systems on a chip and tool interoperability.
To meet the design challenge of clock distribution, the timing analysis is
performed. Timing analysis is to estimate when the output of a given circuit gets
stable. Timing Analysis (TA) is a design automation program which provides an
alternative to the hardware debugging of timing problems. The program establishes
whether all paths within the design meet stated timing criteria, that is, that data signals
arrive at storage elements early enough valid gating but not so early as to cause
premature gating. The output of Timing Analysis includes ‘Slack” at each block to
provide a measure of the severity of any timing problem.

Static Timing Analysis Facts


1. Static Timing Analysis is a technique of analysing timing paths in a digital logic by
adding up delays along a timing path (gates and interconnect) and comparing it with
constraints (like clock period) to check whether the path meets the constraint.
2. Static Timing Analysis is popular because it is simple to use and only needs
commonly available inputs like technology library, netlist, constraints, and parasitic(R
and C).
3. Static Timing Analysis tend to be comprehensive and provides a very high level of
timing coverage. Static Timing also honours timing exception to exclude the paths
that are either not true path are not exercised in an actual design. A good static timing
tool corelates well with timing in silicon.
4. A digital logic can be broken down into a number of timing paths. A timing path
can be any of the following:
i. a path between the clock pin of register/latch to the d-pin of another register/latch.
ii. a path between primary input to the d-pin of a register or latch.
iii. a path between clock-pin of a register to a primary output.
iv. a path between input pin and output pin of a block.
5. Static Timing Analysis is used to check the following:
i. Setup Timing
ii. Hold timing

7
iii. max/min timing between two points on a segment of timing path.
iv. Latch Time Borrowing
v. Removal and Recovery Timing on resets
vi. clock gating checks
vii. clock pulse width requirments
viii. min max transition times
ix. min/max fanout
x. max capacitance
6. Capture edge time of a setup path = launch clock edge time + 1 period
7. Max Timing Equation
Launch time clock edge + clock network delay + clock-q delay + path delay (cell +
internconnect) =< (capture edge time of setup path) + clock network delay - clock
uncertainty - setup time - output external delay (only for paths to output ports)
7. Min Timing Equation
Launch clock edge + clock network delay + input external delay + clock-q delay +
path delay (cell + internconnect delays) >= (Capture edge time of the corresponding
setup path - 1 clock period) + clock network delay + clock uncertainty + library hold
time - output external delays (only for paths to output port)
8. Capture edge time of a setup path = launch clock edge time + 1 period
9. Multi cycle paths are paths that are allowed to take more than one clock period to
complete. Multi cycle paths make setup easy but hold difficult unless corrected.
10. Recovery time is like setup time on a reset pin.
11. Removal time is like hold time on a reset pin.
12. clock gates also have setup hold requirements
13. Clock Domain Crossing issues are not detected by Static Timing Analysis.
Basic Definitions
Clock: It is a signal in the design in respect to which all other signals are
synchronized. There can be multiple clocks in design.
Setup Time: Setup time is the minimum amount of time the data signal should be held
steady before the clock event so that the data are reliably sampled by the clock. This
applies to synchronous circuits such as the flip flop.

8
STATIC VS DYNAMIC

In short I can say that the amount of time the Synchronous input (D) must be
stable before the active edge of the Clock. The Time when input data is available and
stable before the clock pulse is Applied is called Setup time.
Hold time:
Hold time is the minimum amount of time the data signal should be held
steady after the clock event so that the data are reliably sampled. This applies to
synchronous circuits such as the flip flop. Or in short I can say that the amount of time
the synchronous input (D) must be stable after the active edge of clock. The Time
after clock pulse where data input is held stable is called hold time.

9
3) Slack:
It is difference between the desired arrival times and the actual arrival time for a
signal. Slack time determines [for a timing path], if the design is working at the
desired frequency.
Positive Slack indicates that the design is meeting the timing and still it can be
improved.
Zero slack means that the design is critically working at the desired frequency.
Negative slack means ,design has not achieved the specified timings at the
specified frequency.
Slack has to be positive always and negative slack indicates a violation in
Timing.
4) Required time:
The time within which data is required to arrive at some internal node of the design.
Designers specify this value by setting constraints.
5) Arrival Time:
The time in which data arrives at the internal node. It incorporates all the net and logic
delays in between the reference input point and the destination node.
Setup Slack = Required time –Arrival time
Hold slack = Arrival time- Required time.
6) Setup Slack:
Amount of margin by which setup requirements are met.
TCL = Total combinational delay in a pipelined stage
TRC = RC delay of interconnects
TCQ= Clock to output delay
Tarrival = Arrival time (at node)
Tcycle,min = Minimum Achievable clock cycle
To meet the setup requirements the following equation must be satisfied.
Tslack,setup = Tcycle–Tarrival Tsetup(For all Paths )
Here T arrival= TCL + TRC + TCQ.
7) Hold Slack:
Amount of margin by which hold time requirements are met.
Tarrival >= Thold
Tarrival – Thold = Thold,slack
Thold,slack = TCL + TRC + TCQ-Thld
10
8) Clock jitter:
Clock jitter is the amount of cycle to cycle variation that can occur in a clock’s period.
Because clocks are generated by real physical devices such as phase locked loops,
there is some uncertainty, and a perfect waveform with an exact period of x nano
seconds cannot be achieved.

9) Source latency:
The delay from the clock origin point to the clock definition point in the
design .It is the insertion delay external to the circuit which we are timing. It applies
to only primary clocks.
10) Network Latency:
The delay from the clock definition point to the clock pin of the register. It is
the internal delay for the circuit which we are timing (the delay of the clock tree from
the source of the clock to all of the clock sinks).
11) I/O latency
If the flop of the block is talking with another flop outside the block, clock
latency (network) of that flop will be the i/o latency of the block.
12) Clock Skew:
It is the difference in arrival times of the capture edge at two adjacent Flip flop
pairs.

13) Positive skew


If capture clock comes late than launch clock then it is called positive skew.
14) Negative skew
If capture clock comes early than launch clock it is called –ve skew

11
15) Local skew
It is the difference in arrival of clock at two consecutive pins of a sequential
element.
16) Global skew
It is defined as the difference between max insertion delay and the min
insertion delay of any flops.It is also defined as the difference between shortest clock
path delay and longest clock path delay reaching two sequential elements

17) Boundary skew


It is defined as the difference between max insertion delay and the min
insertion delay of boundary flops.
18) Useful skew
If clock is skewed intentionally to resolve violations, it is called useful skew
19) Recovery and Removal Time
These are timing checks for asynchronous signals similar to the setup and hold
checks.
Recovery time is the minimum amount of time required between the release of
an asynchronous signal from the active state to the next active clock edge.
Example:The time between the reset and clock transitions for a flip flop.
If the active edge occurs too soon after the release of the reset, the state of the
flip flop can be unknown.

12
Removal time specifies the minimum amount of time between an active clock
edge and the release of an asynchronous control signal. The following diagram
illustrates recovery and removal times for an active low reset signal (RESET_N) and
positive edge triggered CLOCK.

Timing Paths
The different kinds of paths when checking the timing of a design are as
follows.
1. Input ports/pin >Sequential element (Register).
2. Sequential element (Register)>Sequential element (Register)
3. Sequential element (Register)>Output Pin/Port
4. Input ports/pin >Output Pin/Port.

13
The static timing analysis tool performs the timing analysis in the following way.
1. STA Tool breaks the design down into a set of timing paths.
2. Calculates the propagation delay along each path.
3. Checks for timing violations (depending on the constraints e.g. clock) on
the different paths and also at the input/output interface.
Timing Analysis is performed by splitting the design into different paths
based on:
 Start Points
 End points
Start points comprise of: A clock, a primary input port, a sequential cell, a clock
input pin of a sequential cell, a data pin of a level sensitive latch, or a pin that has an
input delay specified.
End points comprise of: A clock, a primary output port, a sequential cell, a data
input pin of a sequential cell, or a pin that has an output delay specified.
Calculation of the propagation delay along each path:
STA calculates the delay along each timing path by determining the Gate delay and
Net delay.
1. Gate Delay : Amount of delay from the input to the output of a logic gate. It is
calculated based on 2 parameters.
 Input Transition Time
 Output Load Capacitance
2. Net Delay : Amount of delay from the output of a gate to the input of the next
gate in a timing path. It depends on the following parameters.
 Parasitic Capacitance.
 Resistance of net

14
During STA, the tool calculates timing of the path by calculating:
1. Delay from input to output of the gate (Gate Delay).
2. Output Transition Time >(which in turn depends on Input Transition Time
and Output Load Capacitance)
Timing Exceptions
Timing exceptions are nothing but constraints which don’t follow the default when
doing timing analysis. The different kinds of timing exceptions are
1. False path: If any path does not affect the output and does not contribute to the
delay of the circuit then that path is called false path.
2. Multi cycle Path: Multi cycle paths in a design are the paths that require more than
one clock cycle. Therefore they require special Multi cycle setup and hold time
calculations.
3. Min/Max Path: This path must match a delay constraint that matches a specific
value. It is not an integer like the multi cycle path. For example: Delay from one point
to another Max: 1.67ns ; Min: 1.87ns.
4. Disabled Timing Arcs: The input to the output arc in a gate is disabled. For e.g. 3
input and gate (a, b, c) and output (out). If you want you can disable the path from
input ‘a’ to output ‘out’ using disable timing arc constraint.
Clock Path:
Please check the following figure:

In the above figure its very clear that for clock path the starts from the input
port/pin of the design which is specific for the Clock input and the end point is the
clock pin of a sequential element. In between the Start point and the end point there
may be lots of Buffers/Inverters/clock divider.
Clock Gating Path:

15
Clock path may be passed through a “gated element” to achieve additional
advantages. In this case, characteristics and definitions of the clock change
accordingly. We call this type of clock path as “gated clock path”. As in the following
fig you can see that

LD pin is not a part of any clock but it is using for gating the original CLK signal.
Such type of paths are neither a part of Clock path nor of Data Path because as per the
Start Point and End Point definition of these paths, its different. So such type of paths
are part of Clock gating path.
Asynchronous path:
A path from an input port to an asynchronous set or clear pin of a sequential
element. See the following fig for understanding clearly.

As you know that the functionality of set/reset pin is independent from the clock edge.
Its level triggered pins and can start functioning at any time of data. So in other way
we can say that this path is not in synchronous with the rest of the circuit and that's
the reason we are saying such type of path an Asynchronous path.
16
As we have discussed in our last blog (about Basic of Timing analysis) that there are 2
types of timing analysis.
Different types of timing paths.
Timing Paths: Timing paths can be divided as per the type of signals (e.g clock signal,
data signal etc).
Types of Paths for Timing analysis:
 Data Path
 Clock Path
 Clock Gating Path
 Asynchronous Path
Each Timing path has a "Start Point" and an "End Point". Definition of Start
Point and End Point vary as per the type of the timing path. E.g for the Data path- The
start point is a place in the design where data is launched by a clock edge. The data is
propagated through combinational logic in the path and then captured at the endpoint
by another clock edge.
Start Point and End Point are different for each type of paths. It's very
important to understand this clearly to understand and analyzing the Timing analysis
report and fixing the timing violation.
Data path
 Start Point
Input port of the design (because the input data can be launched from
some external source).
Clock pin of the flip-flop/latch/memory (sequential cell)
 End Point
Data input pin of the flip-flop/latch/memory (sequential cell)
Output port of the design (because the output data can be captured by some external
sink)
Clock Path
 Start Point
Clock input port
 End Point
Clock pin of the flip-flop/latch/memory (sequential cell)
Clock Gating Path
 Start Point
Input port of the design
17
 End Point
Input port of clock-gating element

Asynchronous path
 Start Point
Input Port of the design
 End Point
Data Paths:
If we use all the combination of 2 types of Starting Point and 2 types of End Point, we
can say that there are 4 types of Timing Paths on the basis of Start and End point.
 Input pin/port to REGISTER (flip-flop).
 Input pin/port to Output pin/port.
 REGISTER (flip-flop) to REGISTER (flip-flop)
 Register (flip-flop) to Output pin/port
Please see the following fig:

PATH1- starts at an input port and ends at the data input of a sequential element.
(Input port to REGISTER )
PATH2- starts at the clock pin of a sequential element and ends at the data input of a
sequential element. (REGISTER toREGISTER )
PATH3- starts at the clock pin of a sequential element and ends at an output port.
(Register to Output port).
PATH4- starts at an input port and ends at an output port. (Input port to Output port)
Other types of Paths:

18
There are few more types of path which we usually use during timing analysis reports.
Those are subset of above mention paths with some specific characteristics. Since we
are discussing about the timing paths, so it will be good if we will discuss those here
also.
Few names are
 Critical path
 False Path
 Multi-cycle path
 Single Cycle path
 Launch Path
 Capture Path
 Longest Path ( also know as Worst Path , Late Path , Max Path , Maximum
Delay Path )
 Shortest Path ( Also Know as Best Path , Early Path , Min Path, Minimum
Delay Path)
Critical Path:
In short, I can say that the path which creates Longest delay is the critical path.
Critical paths are timing-sensitive functional paths. because of the timing of these
paths is critical, no additional gates are allowed to be added to the path, to prevent
increasing the delay of the critical path. Timing critical path are those path that do not
meet your timing. What normally happens is that after synthesis the tool will give you
a number of path which have a negative slack. The first thing you would do is to make
sure those path are not false or multi cycle since it that case you can just ignore them.
Taking a typical example (in a very simpler way), the STA tool will add the delay
contributed from all the logic connecting the Q output of one flop to the D input of the
next (including the CLK->Q of the first flop), and then compare it against the defined
clock period of the CLK pins (assuming both flops are on the same clock, and taking
into account the setup time of the second flop and the clock skew). This should be
strictly less than the clock period defined for that clock. If the delay is less than the
clock period, then the "path meets timing" . If it is greater, then the "path fails
timing". The "critical path" is the path out of all the possible paths that either exceeds
its constraint by the largest amount, or, if all paths pass, then the one that comes
closest to failing.

19
False Path:
Physically exist in the design but those are logically/functionally incorrect path.
Means no data is transferred from StartPoint to End Point. There may be several
reasons of such path present in the design. Some time we have to explicitly
define/create few false path with in the design.
E.g for setting a relationship between 2 Asynchronous Clocks.
The goal in static timing analysis is to do timing analysis on all “true” timing
paths, the false paths are excluded from timing analysis. Since false path are not
exercised during normal circuit operation, they typically don't meet timing
specification, considering false path during timing closure can result into timing
violations and the procedure to fix would introduce unnecessary complexities in the
design. There may be few paths in your design which are not critical for timing or
masking other paths which are important for timing optimization, or never occur with
in normal situation. In such case , to increase the run time and improving the timing
result , sometime we have to declare such path as a False path , so that Timing
analysis tool ignore these paths and so the proper analysis with respect to other paths.
Or During optimization don't concentrate over such paths.
One example of this. e.g A path between two multiplexed blocks that are never
enabled at the same time. You can see the following picture for this.

Here you can see that False path 1 and False Path 2 can not occur at the same
time but during optimization it can effect the timing of another path. So in such
scenario, we have to define one of the path as false path. Same thing I can explain in
another way (Note- Took snapshot from one of the forum). As we know that, not all
paths that exist in a circuit are "real" timing paths. For example, let us assume that one

20
of the primary inputs to the chip is a configuration input; on the board it must be tied
either to VCC or to GND.
Since this pin can never change, there are never any timing events on that
signal. As a result, all STA paths that start at this particular startpoint are false. The
STA tool (and the synthesis tool) cannot know that this pin is going to be tied off, so
it needs to be told that these STA paths are false, which the designer can do by telling
the tool using a "false_ path" directive. When told that the paths are false, the STA
tool will not analyze it (and hence will not compare it to a constraint, so this path can
not fail), nor will a synthesis tool do any optimizations on that particular path to make
it faster; synthesis tools try and improve paths until they "meet timing" - since the
path is false, the synthesis tool has no work to do on this path.
Thus, a path should be declared false if the designer KNOWS that the path in
question is not a real timing path, even though it looks like one to the STA tool. One
must be very careful with declaring a path false. If you declare a path false, and there
is ANY situation where it is actually a real path, then you have created the potential
for a circuit to fail, and for the most part, you will not catch the error until the chip is
on a board, and (not) working. Typically, false paths exists from configuration inputs
like the one described above from "test" inputs; inputs that are only used in the testing
of the chip, and are tied off in normal mode (however, there may still be some static
timing constraints for the test mode of the chip) from asynchronous inputs to the chip
(and you must have some form of synchronizing circuit on this input) (this is not an
exhaustive list, but covers the majority of legitimate false paths). So we can say that
false paths should NOT be derived from running the STA tool (or synthesis tool); they
should be known by the designer as part of the definition of the circuit, and
constrained accordingly at the time of initial synthesis.
Multi Cycle Path:
A multi cycle path is a timing path that is designed to take more than one
clock cycle for the data to propagate from the start point to the endpoint.
A multi-cycle path is a path that is allowed multiple clock cycles for propagation.
Again, it is a path that starts at a timing start point and ends at a timing
endpoint. However, for a multi-cycle path, the normal constraint on this path is
overridden to allow for the propagation to take multiple clocks.
In the simplest example, the start point and endpoint are flops clocked by the
same clock. The normal constraint is therefore applied by the definition of the clock;
the sum of all delays from the CLK arrival at the first flop to the arrival at the D of the
21
second clock should take no more than 1 clock period minus the setup time of the
second flop and adjusted for clock skew. By defining the path as a multi cycle path
you can tell the synthesis or STA tool that the path has N clock cycles to propagate;
so the timing check becomes "the propagation must be less than N x clock_period,
minus the setup time and clock skew". N can be any number greater than 1.
Few examples are
When you are doing clock crossing from two closely related clocks; ie. from a
30MHz clock to a 60MHz clock, Assuming the two clocks are from the same clock
source (i.e. one is the divided clock of the other), and the two clocks are in phase.
The normal constraint in this case is from the rising edge of the 30MHz clock to the
nearest edge of the 60MHz clock, which is 16ns later.
However, if you have a signal in the 60MHz domain that indicates the phase
of the 30MHz clock, you can design a circuit that allows for the full 33ns for the clock
crossing, then the path from flop30 -> to flop60 is a MCP (again with N=2).

The generation of the signal 30MHZ_is_low is not trivial, since it must come from a
flop which is clocked by the 60MHz clock, but show the phase of the 30MHz clock.
Another place would be when you have different parts of the design that run at
different, but related frequencies.
Again, consider a circuit that has some stuff running at 60MHz and some
running on a divided clock at 30MHz.
Instead of actually defining 2 clocks, you can use only the faster clock, and
have a clock enable that prevents the clocks in the slower domain from updating every
other clock, Then all the paths from the "30MHz" flops to the "30MHz" flops can be
MCP. This is often done since it is usually a good idea to keep the number of different
clock domains to a minimum.
Single Cycle Path: A Single-cycle path is a timing path that is designed to take only
one clock cycle for the data to propagate from the start point to the End point.
Launch Path and Capture Path:
Both are inter-related so I am describing both in one place. When a flip flop to
filp-flop path such as UFF1 to UFF3 is considered, one of the flip-flop launches the
data and other captures the data. So here UFF1 is referred to "launch Flip-flop" and
UFF3 referred to "capture flip-flop".

22
These Launch and Capture terminology are always referred to a flip-flop to
flip-flop path. Means for this particular path (UFF1->UFF3), UFF1 is launch flip-flop
and UFF3 is capture flip-flop. Now if there is any other path starting from UFF3 and
ends to some other flip-flop (lets assume UFF4), then for that path UFF3 become
launch flip-flop and UFF4 be as capture flip-flop.
The Name "Launch path" referred to a part of clock path. Launch path is
launch clock path which is responsible for launching the data at launch flip flop.
And Similarly Capture path is also a part of clock path. Capture path is capture
clock path which is responsible for capturing the data at capture flip flop.
This is can be clearly understood by following fig.

Here UFF0 is referred to launch flip-flop and UFF1 as capture flip-flop for
"Data path" between UFF0 to UFF1.So Start point for this data path is UFF0/CK and
end point is UFF1/D.One thing I want to add here (which I will describe later in my

23
next blog- but its easy to understand here)- Launch path and data path together
constitute arrival time of data at the input of capture flip-flop.
Capture clock period and its path delay together constitute required time of
data at the input of capture register.
Note: Its very clear that capture and launch paths are correspond to Data path.
Means same clock path can be a launch path for one data path and be a capture path
for another data path. Its will be clear by the following fig (source of Fig is From
Synopsys)

Here you can see that for Data path1 the clock path through BUF cell is a capture path
but for Data path2 its a Launch Path
Longest and Shortest Path:
Between any 2 points, there can be many paths.
Longest path is the one that takes longest time, this is also called worst path or
late path or a max path.
The shortest path is the one that takes the shortest time; this is also called the
best path or early path or a min path.

24
In the above fig, The longest path between the 2 flip-flop is through the cells
UBUF1,UNOR2 and UNAND3.
The shortest path between the 2 flip-flops is through the cell UNAND3.

2.Time Borrowing
In an ASIC there are majorly two types of component. Flip-flop and other is
Latches. Basically Here we will discuss about Latched based timing analysis.Before
this we should understand the basic differences between the latch based design and
flip-flop based design. Edge-triggered flip-flops change states at the clock edges,
whereas latches change states as long as the clock pin is enabled. The delay of a
combinational logic path of a design using edge-triggered flip-flops cannot be longer
than the clock period except for those specified as false paths and multiple-cycle
paths. So the performance of a circuit is limited by the longest path of a design. In
latch based design longer combinational path can be compensated by shorter path
delays in the sub sequent logic stages. So for higher performance circuits designer are
turning to latched based design. Its true that in the latched based design its difficult to
control the timing because of multi-phase clockes used and the lack of "hard" clock
edges at which events must occur.
The technique of borrowing time from the shorter paths of the subsequent
logic stages to the longer path is called time borrowing or cycle stealing.

There are 4 latches (positive level sensitive). L1 and L3 are controlled by PH1
and L2 and L4 are cont*rolled by PH2. G1, G2, G3 and G4 are combinational logic

25
paths. For now assume a library setup time is zero for the latches and zero delay in
latch data-path in the transparent mode.
Now if assume that if designs using edge-triggered flip-flops, the clock period
has to be at least 8 ns because the longest path in G1 is 8 ns. Now as the clock pulse is
5ns , there is a violation at L2. On the other hand, if the design uses latches , L2 latch
is transparent for another 5ns and since the eighth (8th) ns is within the enabled period
of L2, the signal along path1 can pass through L2 and continue on path2. Since the
delay along path2 is 2 ns, which is short enough to compensate for the overdue delay
of path1, this design will work properly. In other word we can say that path1 can
borrow sometime (3ns) from the path2. Since the sum of path1 and path2 is 10ns,
which is the required time of L3, there will be no violation in either of the Latches.
For the same reason, path3 can borrow some time (1ns) from path4 without any
timing violation.
Note: A latch-based design completes the execution of the four logic stages in 20 ns,
whereas an edge-triggered based design needs 32 ns.

Just wanted to convey here that this Timing borrowing can be multi stage.
Means we can easily say that for a latched based design, each executing path must
start at a time when its driving latch is enabled, and end at a time when its driven latch
is enabled.
Few Important things: Time borrowing occur with in the same cycle. Means
launching and capturing latches be using the same phase of the same clock. when the
clocks of the launching and capturing latches are out of phase, time borrowing is not

26
to happen. Usually it was disabled by EDA tools. Time borrowing typically only
affects setup slack calculation since time borrowing slows data arrival times. Since
hold time slack calculation uses fastest data, time-borrowing typically does not affect
hold slack calculation.
Few Important terminology:
Maximum Borrow time:
Maximum Borrow time is the clock pulse width minus the library setup time
of the latch. Usually to calculate the maximum allowable borrow time, start with
clock pulse width and then substract clock latency , clock re convergence pessimism
removal ,library setup time of the endpoint latch.
Negative Borrow time:
If the arrival time minus the clock edge is a negative number, the amount of
time borrowing is negative ( in other way you can say that no borrowing
This amount is know as Negative Borrow time.

27
3. Setup and Hold Time Basic Part
Its been long time, people are asking about Setup and Hold time blog. Finally time
come for that. :)
The way we will discuss this concept in the following manner
1. What is Set Up and Hold time?
2. Definition of Setup and Hold.
3. Setup and Hold Violation.
4. How to calculate the Setup and Hold violation in a design?
I saw that lots of people are confused with respect to this concept. And the reason of
these are
1. They know the definition but don't know the origin or say concept behind Setup
and Hold timing.
2. They know the formula for calculating setup and hold violation but don't know how
this formula comes in picture.
3. They become confuse by few of the terminology like capture path delay, launch
path delay, previous clock cycle, current clock cycle, data path delay, slew, setup
slew, hold slew, min and max concept, slowest path and fastest path, min and max
corner, best and worst case etc during the explanation of Setup and Hold
Timings/Violation.
What is Setup and Hold time?
To understand the origin of the Setup and Hold time concepts first understand
it with respect to a System as shown in the fig. An Input DIN and external clock CLK
are buffered and passes through combinational logic before they reach a synchronous
input and a clock input of a D flipflop (positive edge triggered). Now to capture the
data correctly at D flip flop, data should be present at the time of positive edge of
clock signal at the C pin ( to know the detail just read basis of D flipflop).
Note: here we are assuming D flip flop is ideal so Zero hold and setup time for this.

28
Setup Time and Hold Time-Story of Poor Flip-Flop

!
It is always interesting to talk about setup and hold!! Don’t think that if anybody asks
questions related to setup time and hold time, he or she doesn’t know about setup and
hold. He or she may know everything about setup time and hold time, time being it
confuses. The term “setup” and “hold” is such a word in this VLSI – ASIC design
world which only creates continuous questions, hard to explain in words,
at least i myself is concerned! I remember, during my MTech days my professor used
to say always "whole VLSI world is depending on two pillars, setup time and hold
time". It would be more realistic if i say that he used to scold us !! The doubt why is
set up and hold in flipflop always lingers in my mind. Being a digital design engineer,
i should be able to go beneath transistor and convince myself the existence of setup
delay and hold delay. I know metastability state of the flip flop or charging
or discharging of capacitor on a CMOS, upon which all the gates, flip flops are built.
When I say "i know meta stability" i may know about its standard definition as per

29
data book. If i advent into getting answer to "why metastability", i believe i must be
able to understand setup time and hold time.
Let me try to dig myself. What i know? Flip flop is combination of 2 latches, and
latch is level triggered. One is positive level triggered and another in negative level
triggered. If so whatever data sent to two latches will be launched or captured on
different edges. Then why metastability? Why set up time? Why hold time?
So how two level triggered latches form an edge triggered flop? Let me get in to the
latch. After all how it works? Say one input is given...then when can i expect the
output data? Is it immediately ? or does it take some time ?
If i remember working of simple SR latch from several theory classes and text books i
know that any latch output doesn’t stabilize immediately. Output changes to
intermediate values of 0 (or 1) then 1 (or 0) then finally it gets settles at 0 (or 1). It
used to take 2 or 3 looping of data between NOR (or NAND ) gates.
So in this way it takes 23 data cycles....right....This must happen for both latches of
flop. Hence this must take some time, may be in nano second or pico second, but it
consumes some time !
Now, from the working principle of Master slave flip flop, i know that both latches
won’t work together. Because i have arranged flop circuit such away that slave
follows master. It means to say that when master latches the data slave sleeps, then
slave follows master. Or in other words, slave releases the data which is latched
earlier by the master. As i understood earlier, to latch the data, master takes 23
cycle. Same should be the story for slave.
Now let me extend my imagination to the next horizon.
To a flop which is exclusively designed as edge triggered with basic gates itself, may
be NOR or NAND based, or may be based on CMOS full custom circuit, same of 23
cycle delay applies here as well. All that happens is those 23
cycles to stabilize data which is coming in and going out !
I should analyze practical conditions of latching the data. Considering one internal
data cycle is completed in logic gate,data is not yet stabilized within this latch. If i
allow one more input to enter at the same time what will happen to that data which
was under process? Naturally latch may start processing new input data or may go to
unknown loop state that i think i call as metastable state ! Poor latch, it must have
completely confused, whether to drop the catching of present data or should i try to
catch new one? I am the boss and hence i, as a designer of latch, has instructed latch

30
to to both, to process present data (so that it can catch it and memorize it), then look
for new one. As a duty bound soldier latch will try to do both.
Same applies for data that was already latched but about to leave out of the latch.
These two timing delay requirements ultimately constitute setup and hold; hold time
is for time required for data to come out while setup for data to get latched. Hence, i
believe, hold is always related with launch clock whereas setup is related with capture
clock.
So, what I can i understand is i don’t need a reference for hold since it’s
already in flop. That’s why for hold analysis, clock period is always considered as
0ns, which virtually turns out to be no clock. ( or..."hold is not dependent on clock").
This is not always true. There are exceptional cases where data is not launched at 0ns
with respect to capture clock. These kind of situations should be dealt separately.
Always i must remember that flop has latch structure, this means to say, when one
latch works another doesn't do any work. So if i consider register to register path,
when one is launching data next one is ready to receive data. That’s all ! It continues
like that way throughout the digital circuit. When first one is receiving next flop is
ready to launch...and so on.
To summarize, it takes one clock cycle to complete the launch or capture.
That’s why we always use terms such as present data, previous data when dealing
with data flow through flip flop so that i can understand the delay introduced by the
flop (due to its latch architecture) which I technically termed it as setup time and hold
time.

As per the definition, data should be stable at input before clock pulse ticks at
the clock pin of the flip flop. I understand from the definition that data at the input
should have completed the process of 23 cycle interchanging values at the receiving
gates section of the latch to settle down to a known value. By any means, if clock is
faster (or data is slower in its arrival at input), then it can tick at at the time when data
might have completed its 1 or 2 cycle interchanging state. Then i am sure any one of
these intermediate value can get latched, which may not the actual intended original
input data.

For hold, definition is time for which data should be stable after clock edge.
Once the clock edge ticks data present within latch tries to go out. I know this takes
another 23 cycle intermediate values within latch and settle to known value at the
output pin of the flip flop. Imagining that output pin is connected to input of another
31
flip flop and there is no combinational circuit in between them, lets assume that delay
is zero or very less. In this case intermediate value can immediately reflect at the input
of receiving flip flop, which is functionally fatal error. Introduce a delay element
which is more than 23 cycle delay time (i.e. hold time), then delay element provides
sufficient time for the data to settle to known value. Looking into these aspects
minimum period for the clock can't be less than the addition of setup time and hold
time. if clock period becomes lesser than this, i am sure flip flop will fail.
But i should be cautious in understanding that every capture flop becomes
launch flop for new data to be launched. So we need to make sure that combinational
delay is enough so that new data launched doesn’t kill the data which is already
available within flop. And hence hold check is carried out for clock edge which is one
lesser than (or previous to) setup check. Or in other words, setup check for present
data which is traveling, hold for new (future) data. Present data should reach the
capture flop input before capture clock reaches there.(Setup check). New data
shouldn't reach too fast to capture flop so that present data doesn't corrupt.

Well...after all these literature exercise i must agree that i don't want all
jargons to implement a practical design. What i need is basic understanding of setup
time, hold time and how this affects or controls the timing of a timing path. It would
be nice if i can fix setup and hold violations by adjusting rest of the parameters such
as skew, latency and jitter

32
There may be only 2 condition.
Tpd DIN > Tpd Clk
For capture the data at the same time when Clock signal (positive clock edge) reaches
at pin C, you have to apply the input Data at pin DIN "Ts(in)=(Tpd DIN) - (Tpd Clk)"
time before the positive clock edge at pin CLK.
In other word, at DIN pin, Data should be stable "Ts(in)" time before the positive
clock edge at CLK pin.
This Time "Ts(in)" is know as Setup time of the System.
Tpd DIN < Tpd Clk
For capture the data at the same time when clock signal (positive clock edge) reaches
at pin C, input Data at pin DIN should not change before "Th(in)= (Tpd Clk) - (Tpd
DIN)" time. If it will change, positive clock edge at pin C will capture the next data.
In other word, at DIN pin, Data should be stable "Th(in)" time after the positive clock
edge at CLK pin.
This time "Th(in)" is know as Hold Time of the System.
From the above condition it looks like that both the condition can't exist at the same
time and you are right. But we have to consider few more things in this.
Worst case and best case (Max delay and min delay)
Because of environment condition or because of PVT, we can do this analysis
for the worst case ( max delay) and best case ( min delay) also.
33
Shortest Path or Longest path ( Min Delay and Max delay)
If combinational logic has multiple paths, the we have to do this analysis for the
shortest path ( min delay) and longest path ( max delay) also.
So we can say that above condition can be like this.
Tpd DIN (max) > Tpd Clk (min)
SetUp time == Tpd DIN (max) - Tpd Clk (min)
Tpd DIN (min) < Tpd Clk (max)
Hold time == Tpd Clk (max) - Tpd DIN (min)
For example for combinational logic delays are
Data path (max, min) = (5ns, 4 ns)
Clock path (max, min) = (4.5ns, 4.1ns)
Then Setup time= 5-4.1=0.9ns
Hold time is = 4.5-4=0.5ns
Now similar type of explanation we can give for a D flip flop. There is a
combinational logic between C and Q , between D and Q of the Flipflop. There are
different delays in those combinational logic and based on there max and min value ,
a flipflop has Setup and Hold time. One circuitry of the positive edge triggered D flip
is shown below.

There are different ways for making the D flip flop. Like by JK flip flop, master slave
flipflop,Using 2 D type latches etc. Since the internal circuitry is different for each
type of Flip flop, the Setup and Hold time is different for every Flip flop.
Setup Time:
Setup time is the minimum amount of time the data signal should be held
steady before the clock event so that the data are reliably sampled by the clock. This
applies to synchronous circuits such as the flip-flop. Or In short I can say that the
amount of time the Synchronous input (D) must be stable before the active edge of the
Clock.
34
The Time when input data is available and stable before the clock pulse is applied is
called Setup time.
Hold time:
Hold time is the minimum amount of time the data signal should be held
steady after the clock event so that the data are reliably sampled. This applies to
synchronous circuits such as the flip-flop. Or in short I can say that the amount of
time the synchronous input (D) must be stable after the active edge of clock.
The Time after clock pulse where data input is held stable is called hold time.

Setup and Hold Violation:


In simple language- If Setup time is Ts for a flip-flop and if data is not stable before
Ts time from active edge of the clock, there is a Setup violation at that flipflop. So if
data is changing in the non-shaded area ( in the above figure) before active clock
edge, then it's a Setup violation.
And If hold time is Th for a flip flop and if data is not stable after Th time from active
edge of the clock , there is a hold violation at that flipflop. So if data is changing in
the non-shaded area ( in the above figure) after active clock edge, then it's a Hold
violation.
Calculation Of Set Up And Hold Time Violations
Till now we have discussed setup and hold violation with respect to the single
flipflop, now lets extend this to 2 flip flop. In the following fig there are 2 flipflops
(FF1 and FF2).

35
Few important things to note down here-
Data is launching from FF1/D to FF1/Q at the positive clock edge at FF1/C. At FF2/D
,input data is coming from FF1/Q through a combinational logic. Data is capturing at
FF2/D, at the positive clock edge at FF2/C.
So I can say that Launching Flip-Flop is FF1 and Capturing Flip-Flop is FF2.
So Data path is FF1/C --> FF1/Q --> FF2/D
For a single cycle circuit- Signal has to be propagate through Data path in one clock
cycle. Means if data is launched at time=0ns from FF1 then it should be captured at
time=10ns by FF2.
So for Setup analysis at FF2, Data should be stable "Ts" time before the positive edge
at FF2/C. Where "Ts" is the Setup time of FF2.
If Ts=0ns, then , data launched from FF1 at time=0ns should arrive at D of FF2 before
or at time=10ns. If data takes too long ( greater then 10ns) to arrive (means it is not
stable before clock edge at FF2) , it is reported as Setup Violation.
If Ts=1ns, then, data launched from FF1 at time=0ns should arrive at D of FF2 before
or at time=(10ns-1ns)=9ns. If data takes too long (greater then 9ns) to arrive (means it
is not stable before 1ns of clock edge at FF2), it is reported as Setup Violation.
For Hold Analysis at FF2, Data should be stable "Th" time after the positive edge at
FF2/C. Where "Th" is the Hold time of FF2. Means there should not be any change in
the Input data at FF2/D between positive edge of clock at FF2 at Time=10ns and
Time=10ns+Th. To satisfy the Hold Condition at FF2 for the Data launched by FF1 at
0ns, the data launched by FF1 at 10ns should not reach at FF2/D before 10ns+Th
time.

36
If Th=0.5ns, then we can say that the data launched from FF1 at time 10ns does not
get propagated so soon that it reaches at FF2 before time (10+0.5)=10.5ns ( Or say it
should reach from FF1 to FF2 with in 0.5ns). If data arrive so soon (means with in
0.5ns from FF1 to FF2, data can't be stable at FF2 for time=0.5ns after the clock edge
at FF2), itsreported Hold violation.
With the above explanation I can say 2 important points:
1. Setup is checked at next clock edge.
2. Hold is checked at same clock edge.
Setup Check timing can be more clear for the above Flip-flop combination
with the help of following explanation.

.
In the above fig you can see that the data launched by FF1/D ( at launch edge)
reaches at FF2/D after a specific delay ( CLK-to-Q delay + Combinational Logic
Delay) well before the setup time requirement of Flip-Flop FF2, so there is no setup
violation.
From the Fig its clear that if Slack= Required Time - Arrival time < 0 (-ive) , then
there is a Setup violation at FF2.
Hold Check timing can be more clear with the help of following circuit and
explanation.

37
In the above fig you can see that there is a delay in the CLK and CLKB
because of the delay introduced by the series of buffer in the clock path. Now Flip-
flop FF2 has a hold requirement and as per that data should be constant after the
capture edge of CLKB at Flip-flop FF2.
You can see that desired data which supposed to capture by CLKB at FF2.D
should be at Zero (0) logic state and be constant long enough after the CLKB capture
edge to meet hold requirement but because of very short logic delay between FF1/Q
and FF1/D, the change in the FF1/Q propagates very soon. As a result of that there
occurs a Hold violation. This type of violation (Hold Violation) can be fixed by
shortening the delay in the clock line or by increasing the delay in the data path. Setup
and Hold violation calculation for the single clock cycle path is very easy to
understand. But the complexity increases in case of multi-cycle path ,Gated clock,
Flip-flop using different clocks, Latches in place of Flip-Flop.

38
Examples Of Setup and Hold time
Till now we have discussed a lot of theory about setup and hold time (with and
without Example).
Now it’s time to discuss the practical implementation of that. Means in a circuit How
will you calculate the setup and hold values?
How will you analyze setup and hold violation in a circuit?
If you have to improve timing of a circuit then what can you do?
First we will solve few examples which will give you an basic idea about these
formulas, then in the last I will summarize all those in one place.
I saw a lot of confusion with respect to setup and hold timing calculation.
Actually there are two things.
Timing Specification of a Block/Circuit/Library:
You have a block with input A and output Y. Some combinational logic is
there between A and Y. Now you have to calculate following parameters for that
block
 Setup Time Value at input A
 Hold Time value at input A.
 Maximum operating Clock Frequency or Time Period for that block.
 Clock To Y delay value
 Input A to Output Y delay value.
 Timing Violation of a circuit:
You have to operate a circuit at a particular clock frequency and now you have
to find out whether this circuit has any setup or Hold Violation.
So in second case all the parameters are given and you have to find out whether this
circuit has any violation or not
In first case you have to find out all the parameters keeping in mind that there
should not be any violation.
Lets Discuss in the reverse order.

39
Problem1: In the following Circuit, Find out whether there is any Setup Or Hold
Violation?

Solution:
Hold Analysis:
When a hold check is performed, we have to consider two things-
 Minimum Delay along the data path.
 Maximum Delay along the clock path.
If the difference between the data path and the clock path is negative, then a timing
violation has occurred. ( Note: there are few Exceptions for this- We will discuss that
some other time)
Data path is: CLK->FF1/CLK ->FF1/Q ->Inverter ->FF2/D
Delay in Data path = min(wire delay to the clock input of FF1) + min(Clk-to-Q delay
of FF1) +min(cell delay of inverter) + min(2 wire delay-"Qof FF1-to inverter" and
"inverter-to-D of FF2")
Td = 1+9+6+(1+1)=18ns.
Clock path is: CLK-> buffer -> FF2/CLK
Clock path Delay = max(wire delay from CLK to Buffer input) + max(cell delay of
Buffer) + max(wire delay from Buffer output to FF2/CLK pin) + (hold time of FF2)
Tclk = 3+9+3+2 = 17 ns.
Hold Slack = Td - Tclk = 18ns -17ns = 1ns
Since Hold Slack is positive-> No hold Violation.
Note: If the hold time had been 4 ns instead of 2 ns, then there would have been a
hold violation.
Td=18ns and Tclk = 3+9+3+4=19ns
So Hold Slack=Td - Tclk = 18ns - 19ns = -1ns (Violation).

40
Setup Analysis:
When a setup check is performed, we have to consider two things-
 Maximum Delay along the data path.
 Minimum Delay along the clock path.
If the difference between the clock path and the data path is negative, then a timing
violation has occurred. ( Note: there are few Exceptions for this- We will discuss that
some other time)
Data path is: CLK->FF1/CLK ->FF1/Q ->Inverter ->FF2/D
Delay in Data path= max(wire delay to the clock input of FF1) + max(Clk-to-
Q delay of FF1)+max(cell delay of inverter) + max(2 wire delay-"Qof FF1- to-
inverter" and "inverter-to-of FF2")
Td = 2+11+9+(2+2) = 26ns
Note: The first part of the clock path delay (during setup calculation) is the clock
period, which has been set to 15 ns. Hope You remember in last blog, I have
mentioned very clearly that Setup is checked at the next clock cycle. That's the reason
for clock path delay we have to include clock period also.
Clock path is: CLK-> buffer -> FF2/CLK
Clock path Delay = (Clock period) + min(wire delay from CLK to Buffer input) +
min(cell delay of Buffer) + min(wire delay from Buffer
output to FF2/CLK pin) - (Setup time of FF2)
=Tclk = 15+2+5+2-4=20ns
Setup Slack = Tclk - Td = 20ns - 26ns = -6ns.
Since Setup Slack is negative -> Setup violation.
Note: A bigger clock period or a less maximum delay of the inverter solve this setup
violations in the circuit.
E.g If Clock period is 22ns then
Tclk = 22+2+5+2-4=31-4=27ns AND Td = 26ns
Setup Slack = Tclk - Td = 27-26=1ns (No Violation).

41
Problem2: In order to work correctly, what should be the Setup and Hold time at
Input A in the following Circuit. Also find out the maximum operating frequency for
this circuit. (Note: Ignore Wire delay). Where Tsu- Setup time; Thd-Hold Time;
Tc2q- Clock-to-Q delay.

Solution:
Step1: Find out the maximum Register to register Delay.
Max Register to Register Delay
=(clk-to-Q delay of U2) + (cell delay of U3)+(all wire delay) + (setup time of U1)
= 5 + 8 + 3 = 16 ns.
There are 2 register to register paths
U2 -> U3 ->U1 (Delay=5+8+3=16ns)
U1 -> U4 -> U2 ( Delay=5+7+3=15ns)
We have to pick maximum one.
Step2: Find Out Setup Time:
A setup time = Setup time of Flipflop + Max (Data path Delay) - min(Clock path
Delay)
= (Setup time of Flipflop + A2D max delay) - (Clk path min delay)
= Tsu + (Tpd U7 + Tpd U3 + wire delay) - Tpd U8
= 3 + (1+8 ) - 2 = 10 ns.
Note:
Here we are not using the Clock period. Because we are not suppose to calculate the
Setup violation. We are calculating Setup time. Please refer the part3a for the
referance.
All the wire dealy is neglected. If Wire delay present, we have to consider those one.

42
There are 2 Data path
A -> U7 -> U4 -> D of U2 (Data path Delay = 1+7 =8ns )
A -> U7 -> U3 -> D of U1 ( Data path Delay = 1+8 =9ns )
Since for Setup calculation we need maximum Data path delay, we have choosen 2nd
for our calculation
Step3: Find Out Hold Time:
A hold time = Hold time of Flipflop + max(Clock path Delay) - min( Data path
delay)
=( Hold time of Flipflop + Clk path max delay) - (A2D max delay)
= Thd + Tpd U8 - (Tpd U7 + Tpd U4+wire delay)
= 4 + 2 - (1+7 ) = -2 ns
Note: Same explanation as for Setup time. For hold time we need minimum data
path , so we have picked first Data path.
Step4: Find out Clock to Out Time:
Clock to Out = Cell delay of U8 + Clk-to-Q delay of FlipFlop+ Cell delay of U5+
Cell delay of U6+ (all wire delay)
= Tpd U8+ U2 Tc2q + U5 Tpd + U6 Tpd
= 2 + 5 + 9 + 6 = 22 ns.
Note:
There are 2 Clock to Out path- one from Flip flop U1 and other from U2.
Since in this case the Clk -to-Q path for both Flipflop is same, we can consider any
path. But in some other Circuit where the delay is different for both the paths, we
should consider Max delay path.
Step5: Find Pin to Pine Combinational Delay (A to Y delay)
Pin to Pin Combinational Delay (A to Y)
= U7 Tpd + U5 Tpd + U6 Tpd
= 1 + 9 + 6 = 16 ns
Step5: Find Out Max Clock Frequency:
Max Clock Freq = 1/ Max (Reg2reg, Clk2Out, Pin2Pin)
= 1/ Max (16, 22, 16)
= 45.5 Mhz

43
Problem3: In the above Circuit, Try to improve the timing by adding any "buffer" or
"Register".

Now follow all those 5 Steps onn by one


Step1: Max Register to Register Delay
U2 Tc2q + U5 Tpd + U9 Tsu = 5 + 9 + 3 = 17 ns
Note: A lot of Register to Register path
U8 -> U5 -> U9 (Delay = 5+9+3=17ns)
U8 -> U4 -> U2 (Delay = 5+7+3=15ns)
U8 -> U3 -> U1 (Delay = 5+8+3=16ns)
U1 -> U4 -> U2 (Delay= 5+7+3=15ns)
U1 -> U5 -> U9 (Delay= 5+9+3=17ns)
U2 -> U5 -> U9 (Delay = 5+9+3=17ns)
U2 -> U3 -> U1 (Delay = 5+8+3=16ns)
Maximum delay is 17ns, Just picked anyone.

44
Step2:
A setup time = Tsu + A2D Tpd max - Clk Tpd min
= Tsu + (Tpd U7) - Tpd U8
= 3 + (1) - 2 = 2 ns
Note: Only One path between A and D of FF(i.e U8)
Step3:
A hold time = Thd + Clk Tpd max - A2D Tpd min
= Thd + Tpd U8 - (Tpd U7)
= 4 + 2 - ( 1) = 5 ns
Note: Only One path between A and D of FF(i.e U8)
Step4: Clock to out:
=Tpd U8+ U9 Tc2q + U6 Tpd
=2+5+6 = 13 ns
Step5: No direct link between A and Y. So Not Applicable.
Step6: Max Clock Freq= 1/ Max (Reg2reg, Clk2Out, Pin2Pin)
= 1/ Max (17, 13)
=58.8 Mhz

*********************************************************************
*********************************************************************
I hope this much will help you. Now its the time to summarize all the important things
and formulas.
Points to remember:
1. Setup is check ed at next clock edge.
2. Hold is check ed at same clock edge.
3. For Hold Check ( Checking of hold Violation)
Minimum Delay along the data path.
Maximum Delay along the clock path.
4. For SetUp Check ( Check ing of Setup Violation)
Maximum Delay along the data path.
Minimum Delay along the clock path

45
Calculation of Setup Violation Check:Consider above circuit of 2 FF connected to
each other.
 Setup Slack= Required time-Arrival time(since we want data toarrive before
it is required)
Where:
 Arrival time (max) = clock delay FF1 (max) +clock-to-Q delay FF1 (max) +
comb. Delay( max)
 Required time = clock adjust + clock delay FF2 (min) - set up time FF2
Clock adjust = clock period (since setup is analyzed at next edge)
Calculation of Hold Violation Check: Consider above circuit of 2 FF connected to
each other.
 Hold Slack = Arrival Time - Required time (since we want data to arrive
after it is required)
Where:
 Arrival time (min) = clock delay FF1 (min) +clock-to-Q delay FF1 (min) +
 comb. Delay( min)
 Required time = clock adjust + clock delay FF2 (max) + hold time FF2
 Clock adjust = 0 (since hold is analyzed at same edge)
Calculation of Maximum Clock Frequency:
Max Clock Freq = 1/ Max (Reg2reg delay, Clk2Out delay, Pin2Pin delay)
Where:
Reg2Reg Delay=Clk-to-Q delay of first FF(max)+comb delay(max)+setup time of
2nd FF.
Clk2Out Delay = Clock delay w.r.t FF (max) + clock-to-Q delay of FF1 (max) +
comb. delay (max)
Pin2Pin delay = Comb delay between input pin to output pin (max)
6.Timing path Delay I have a doubt regarding how delay is calculated along a path.

46
i think there are two ways
1) To calculate max delay and min delay, we keep adding max delays and min delays
of all cells(buffer/inverter/mux) from start point to end point respectively.
2) In other way, we calculate path delay for rising edge and falling edge separately.
we apply a rise edge at start point and keep adding cell delay.
cell delay depends upon input transition and output fanout. so now we have two path
delay values for rise edge and falling edge.greater one is considered as Max delay and
smaller one is min delay.
which one is correct ?
Short Ans is .. both are correct and you have to use both. May be you all become
confuse, so let me give you few details.
As I have mention that for Setup and Hold calculation , you have to calculate the
Delay of the Timing path (capture path or launch path).
Now in a circuit there are 2 major type of Delay.
1. CELL DELAY
Timing Delay between an input pin and an output pin of a cell.
Cell delay information is contained in the library of the cell. e.g- .lef file
2. NET DELAY
Interconnect delay between a driver pin and a load pin.
To calculate the NET delay generally you require 3 most important information.
Characteristics of the Driver cell (which is driving the particular net)
Load characteristic of the receiver cell. (which is driven by the net)
RC (resistance capacitance) value of the net. (It depends on several factor- which we
will discuss later) Both the delay can be calculated by multiple ways. It depends at
what stage you require this information with in the design. e.g During pre layout or
Post layout or during Signoff timing. As per the stage you are using this, you can use
different ways to calculate these Delay. Sometime you require accurate numbers and
sometime approximate numbers are also sufficient.
Now lets discuss this with previous background and then we will discuss few new
concepts.

47
Now in the above fig- If I will ask you to calculate the delay of the circuit, then the
delay will be.
Delay=0.5+0.04+0.62+0.21+0.83+0.15+1.01+0.12+0.57=4.05ns (if all the delay in
ns) Now lets add few more value in this. As we know that every gate and net has max
and min value, so in that case we can find out the max delay and min delay. (on what
basis these max delay and min delay we are calculating .. we will discuss after that).

Delay(max)= 0.5+0.04+0.62+0.21+0.83+0.15+1.01+0.12+0.57=4.05ns
Delay(min)= 0.4+0.03+0.6+0.18+0.8+0.1+0.8+0.1+0.5=3.51ns
Till now every one know the concept. Now lets see what's the meaning of min and
max delay. The delay of a cell or net depends on various parameters.
Few of them are listed below.
 Library setup time
 Library delay model
 External delay
 Cell load characteristic
 Cell drive characteristic
 Operating condition (PVT)
 Wire load model
 Effective Cell output load
 Input skew
 Back annotated Delay
If any of these parameter vary , the delay vary accordingly. Few of them are
mutually exclusive. and In that case we have toconsider the effect of only one
parameter at a time. If that's the case , then for STA, we calculated the delay in both
the condition and then categorize them in worst (max delay) condition or the best
condition (min delay).
E.g- if a cell has different delay for rise edge and fall edge. Then we are sure that in
delay calculation we have to use only one value. So as per their value , we can

48
categorize fall and rise delay of all the cell in the max and min bucket. And finally we
come up with max Delay and min delay.

The way delay is calculated also depends which tool are you using for STA or delay
calculation. Cadence may have different algorithm from Synopsys and same is the
case of other vendor tools like mentor,magma and all. But in general the basic or say
concepts always remain same. I will explain about all these parameter in detail in next
of few blogs, but right now just one example which can help you to understand the
situation when you have a lot of information about the circuit and you want to
calculate the delay.

In the above diagram, you have 2 paths between UFF1 and UFF3. So whenever you
are doing setup and hold analysis, these path will be the part of launch path (arrival
time). So lets assume you want to calculate the max and min value of delay between
UFF1and UFF2.

49
50
Delay - Wire Load Model
WIRE LOAD MODEL
What is Wire Load Models (WLM).
Wire loading models Used to estimate the interconnect wire delay during pre-
layout in a design cycle.Wire load information is based on statistics from physical
layout parasitic Information from the statistics is used in both conservative and
aggressive tables.The conservative tables are based on “mean value” plus 3-sigma; the
aggressive tables on “mean value” plus 1-sigma.
Different for different technology.
Wire load models are approximated from one technology to another based on scaling
factors.
Due to these approximations, the accuracy of these models diminish over multiple
technology nodes Describes effect of wire length and fan out on
 Resistance
 Capacitance
 Area of the nets.
 All attributes (R, C and Area) are given per unit length wire.
 Slope value is used to characterize linear fanout.
 Basically a set of tables
 Net fanout vs load
 Net fanout vs resistance
 Net fanout vs area
One example of such type of table is:

51
In above circuit - The RC value is estimated and represented as per WLM.
The following are few snapshot of the different format of wire load model.
wire_load("WLM1") {
resistance : 0.0006 ;------>R per unit length
capacitance : 0.0001 ;------> C per unit length
area : 0.1 ;------> Area per unit length
slope : 1.5 ;------> Used for linear extrapolation
fanout_length(1, 0.002) ; ------> at fanout “1” length of the wire is 0.002
fanout_length(2, 0.006);
fanout_length(3, 0.009);
fanout_length(4, 0.015);
fanout_length(5, 0.020);
fanout_length(7, 0.028); ------> at fanout “7” length of the wire is 0.028
fanout_length(8, 0.030);
fanout_length(9, 0.035);
fanout_length(10, 0.040);
}
wire_load("WLM2") {
fanout_length( 1, 1 );
fanout_length( 2, 2 );
fanout_capacitance( 1, 0.002 );
fanout_capacitance( 2, 0.004 );
fanout_capacitance( 3, 0.006 );
fanout_capacitance( 4, 0.008 );
fanout_capacitance( 5, 0.010 );
fanout_capacitance( 6, 0.013 );
fanout_capacitance( 7, 0.015 );
fanout_capacitance( 8, 0.019 );
fanout_capacitance( 9, 0.023 );
fanout_capacitance( 10, 0.027);
fanout_resistance( 1, 0.01 );
fanout_resistance( 2, 0.015 );
fanout_resistance( 3, 0.022 );
fanout_resistance( 4, 0.026 );
fanout_resistance( 5, 0.030 );
52
fanout_resistance( 6, 0.035 );
fanout_resistance( 7, 0.039 );
fanout_resistance( 8, 0.048 );
fanout_resistance( 9, 0.057 );
fanout_resistance( 10, 0.06 );
fanout_area( 1, 0.11 );
fanout_area( 20, 2.20 );
}
Here --
Area, Resistance and Capacitance are in per unit length of the interconnect.
The slope is the extrapolation slop to be used for data points that are not specified in
the fan-out length table.
In general, not all fanouts are mentioned in a given WLM lookup table. For example,
in above WLM1 and WLM2 lookup table, capacitance and resistance values for
fanouts 1, 2, 3, 4, 5, 7, 8, 9, 10 is given. If we want to estimate the values at fanouts in
the gaps (e.g. from 6) or outside the fanout range
specified in the table (e.g Fanout 20), we have to calculated those value using (linear)
interpolation and extrapolation.
For WLM1
For Fanout=20
Since its more than the max value of Fanout available in table (i.e 10) , so we have to
perform extrapolation.

Net length = <length of net at fanout 10> + (20-10) x Slope


Resistance = <new calculated Net length at fanout 6> x Resistance or Capacitance
value per unit length
Capacitance = <new calculated Net length at fanout 6> x Capacitance value per unit
length

Net length = 0.040 + 10 x 1.5 (slope) = 15.04 ----------> length of net with fanout of
20
Resistance = 15.04 x 0.0006 = 0.009024 units
Capacitance = 15.04 x 0.0001 = 0.001504 units

53
For Fanout=6
Since it’s between 5 and 7 and corresponding fanout Vs length is available, we can do
the interpolation.

Net length = ( (net length at fanout 5) + (net length at fanout 7) ) / 2


Resistance = <new calculated Net length at fanout 20> x Resistance value per unit
length
Capacitance = <new calculated Net length at fanout 20> x Capacitance value per
unit length

Net length = (0.0020 + 0.0028)/2=0.0048/2=0.0024 ----------> length of net with


fanout of 6
Resistance = 0.0024 x 0.0006 = 0.00000144 units
Capacitance = 0.0024 x 0.0001 = 0.00000024 units

In the similar way we can calculate the WLM for any no of fanout value.
WLMs are often used in pre-placement optimization to drive speedups of critical
paths. Since timing driven placement plausibly makes nets on critical paths shorter
than average, some optimism may be incorporated into the WLM. Thus, a WLM may
actually consist of more than one lookup table, with each table corresponding to a
different optimism level. There are several ways to incorporate the optimism level. If
we use the WLMs that come from the (ASIC vendor’s) design library, usually there
are several tables from which we can select. We can also increase the optimism level
of a WLM by multiplying all values in the WLM by some factor less than 1.2 For
example, we can use 0.25, 0.5, or 0.75.
WLM Types
For flows that run timing-based logic optimization before placement,
there are three basic types of WLMs that can be used:
1. Statistical WLMs
Are based on averages over many similar designs using the same or similar
physical libraries.
2. Structural WLMs
Use information about neighboring nets, rather than just fanout and module
size information.

54
3. Custom WLMs
Are based on the current design after placement and routing, but before the current
iteration of preplacement synthesis.
Now the Question is: Where do the wire load models come from?
Normally the semiconductor vendors will develop the models.
ASIC vendors typically develop wire load models based on statistical information
taken from a variety of example designs. For all the nets with a particular fanout, the
number of nets with a given capacitance is plotted as a histogram. A single
capacitance value is picked to represent this fan out value in the wire load model. If a
very conservative wireload model is desired, the 90% decile might be picked (i.e.
90% of the nets in the sample have a capacitance smaller than that value).

In this example 90% of nets have a capacitance smaller then 0.198pf. So in the WLM
table, you will notice that fanout_capacitance( 3, 0.198 ).
Similar statistics are gathered for resistance and net area.
Usually the vendor supplies a family of wireload models, each to be used for a
different size design. This is called area-based wireload selection
Few Advance concepts:
Till now we have discussed that for a particular Net you can estimate the RC value as
per the WLM. Let me ask you one question. What if your design is hierarchical? Do
you think even in that case you can use the same WLM for a particular net which is
crossing the hierarchical boundaries?
Short ANS is: you can use it but you will lose the accuracy.
Just to solve this problem, Vendors usually supplies multiple WLMs.
There are different Modes for WLM analysis- few important are:

55
WLM analysis has three modes:
1. Top:
Consider the design as it has no hierocracy and use the WLM for the top module to
calculate delays for all modules.
Any low level WLM is ignored.
2. Enclosed:
Use the WLM of the module which completely encloses the net to compute delay for
that net.
3. Segmented:
If a net goes across several WLM, use the WLM that corresponds to that portion of
the net which it encloses only.

56
Delay - Interconnect Delay Models
In the previous post we have discussed about the way tool calculate the max and min
delay in a circuit. Now we will discuss other basics of the Delay and delay
calculation. During your day to day work (in Semiconductor Field) or say in different
Books, you come across different terminology related to the delays. There is a long
list of that.
 Input Delay
 Output Delay
 Cell Delay
 Net Delay
 Wire Delay
 Slope Delay
 Intrinsic Delay
 Transition Delay
 Connect Delay
 Interconnect Delay
 Propagation Delay
 Min/Max Delay
 Rising/Falling Delay
 Gate Delay
 Stage delay
Fortunately or say luckily out of the above mention long list few are just synonym of
other and few are interrelated to each other . Like Net delay also know as Wire
Delay ,Interconnect delay. Broadly we can divide this Long List into 2 type of delay.
Net Delay (Wire delay) and Cell Delay. ( Note : Stage Delay = Net delay + Cell
Delay. )
So let’s discuss these one by one. In digital design, a wire connecting pins of standard
cells and blocks is referred to as a NET. A net Has only one driver Has a number of
fan out cells or blocks.Can travel on multiple metal layers of the chip.
“Net Delay” refers to the total time needed to charge or discharge all of the parasitic
(Capacitance / Resistance / Inductance) of a given Net.
So we can say that Net delay is a function of
 Net Resistance
 Net Capacitance
 Net Topology
57
Now to calculate the Net delay, the wires are modeled in different ways and
there is different way to do the calculation. Practically, when you are applying a
particular delay model in a design, then you have to apply that to all cells in a
particular library. You cannot mix delay models within a single library. There are few
recommendations provided by experts or say experienced designer regarding the
application of a particular Delay model in a design and that depends on Technology of
design.
At what stage you are ? Or say at what stage you want to apply a delay model.
How accurately you want to calculate the delay.
Note : Ideally Till the physical wire is not present in you design, you cannot calculate
the Net delay. Reason is ... If wire is not present,you have no idea about the
Length/Width of the wires.
SO YOU CANN'T CALCULATE THE ACCURATE VALUES OF
PARASITIC OR SAY DELAY VALUE OF THE WIRE. But here main point is
accurate value, means there is possibility of In accurate or say approximate value of
delay value before physical laying of wire in a design.
There are several delay models. Those which can provide more accurate
result, takes more runtime to do the calculation and those which are fast provides less
accurate value of delay. Lets discuss few of them.
Most popular delay models are -
 Lumped Capacitor Model
 Lumped RC model
 Distributed RC model
 Pi RC network
 T RC network
 RLC model
 Wire Load model
 Elmore Delay model
 Transmission Line Model
 Lumped Capacitor Model.
Lumped Capacitor Model:
Model assume that wire resistance is negligible. Source driver sees a single
loading capacitance which is the sum of total capacitance of the interconnect and the
total loading capacitance at the sink.

58
In past (higher technology-350nm and so), capacitor was dominating and
that’s the reason in the model we have only capacitance.
 Older technology had wide wires,
 More cross section area implies less resistance and more capacitance.
 So Wire model only with capacitance.
In the Fig R=0

Lumped RC (Resistance Capacitance) model:


As the feature size decreases to the submicron dimensions, width of the wire
reduced. Resistance of wire is no longer negligible.
Have to incorporate the resistance in our model. And that’s the reason Lumped
RC model (or say RC tree) comes into picture. In lumped RC model the total
resistance of each wire segment is lumped into one single R, combines the global
capacitive into single capacitor C.

Distributed RC model:
Distributed means RC is distributed along the length of the wire. The total resistance
(Rt) and capacitance (Ct) of a wire can be
expressed as
Rt = Rp * L
Ct = Cp * L
Where
Cp and Rp are Capacitance and Resistance per unit length.
L is the length of the wire.
Ideally, distributing the resistance and capacitance of a wire in very small portion of
the wire (say delta) give you the better performance. Now to find out the total
59
capacitance and resistance we use the differential equation. Distributed RC model
provides better accuracy over lumped RC model. But this type of model is not
practically possible.

T model:
 Ct is modeled as a half way of the resistive tree.
 Rt is broken into 2 sections (each being Rt/2 )
Pi Model:
 Ct is broken into 2 sections (each being Ct/2) are connected on either side of
the resistance.
 Rt is in between the capacitances.
For practical purpose, wire-models with 5-10 elements/nodes are used to model the
wire. It will provide the more accurate result. For N element section
For T network:
 Intermediate section of resistance are equal to Rt/N.
 Intermediate section of Capacitance are modeled by Ct/N
 End section of Resistance are equal to Rt/(2N).
 This T Network is represented as TN model.
For Pi network:
 Intermediate section of resistance are equal to Rt/N.
 Intermediate section of Capacitance are modeled by Ct/N
 End section of Capacitance are equal to Ct/(2N).
 This Pi Network is represented as PiN model

60
Note: Lumped Vs Distributed RC wire:
Following is the comparison between the Lumped and distributed RC
network. It will help you to understand in terms of uses of theboth type of network in
terms of accuracy and runtime. Following is the Step Response of Lumped Vs
Distributed RC line.

61
RLC model
In the past since the design frequency was low so the impedance (wL) was dominated
by Resistance (wL << R). So we are not caring “L”. However if you are operating at
higher frequency and use the wider wire that reduce the resistivity then we have to
take account the inductance into our modeling.

62
Net Delay or Interconnect Delay or Wire Delay or Extrinsic Delay or
Flight Time
Net delay is the difference between the time a signal is first applied to the net and the
time it reaches other devices connected to that net.
It is due to the finite resistance and capacitance of the net. It is also known as wire
delay.
Wire delay = function of (Rnet, Cnet+Cpin)
This is output pin of the cell to the input pin of the next cell.

Net delay is calculated using Rs and Cs.


 There are several factors which affect net parasitic:
 Net Length
 Net crosssectional
 area
 Resistively of material used for metal layers (Aluminum vs. copper)
 Number of vias traversed by the net
 Proximity to other nets (crosstalk)
Postlayout design is annotated with RCs extracted from layout for better accuracy.
Annotated RCs override information from WLM.

63
Interconnect introduces capacitive, resistive and inductive parasites. All three have
multiple effects on the circuit behavior.
1. Interconnect parasites cause an increase in propagation delay (i.e. it slows down
working speed)
2. Interconnect parasites increase energy dissipation and affect the power distribution.
3. Interconnect parasites introduce extra noise sources, which affect reliability of the
circuit. (Signal Integrity effects)
Dominant parameters determine the circuit behavior at a given circuit node. Non
dominant parameters can be neglected for interconnect analysis.
 Inductive effect can be ignored if the resistance of the wire is substantial
enough this is the case for long aluminum wires with a small cross section or
if the rise and fall times of the applied signals are slow.
 When the wires are short, the cross section of the wire is large or the
interconnect material used has a low resistivity, a capacitive only model can
be used.
 When the separation between neighboring wires is large or when the wires
only run together for short distance, inter wire capacitance can be ignored, and
all the parasitic capacitance can be modeled as capacitance to ground.
Capacitance
Capacitance can be modeled by the parallel plate capacitor model.
C = (ε / t).WL
Where
ε > permittivity of dielectric material (SiO2)
t > thickness of dielectric material (SiO2)
W > width of wire
L > length of wire
ε > εr εo where εr > relative permittivity of SiO2
εo > 8.854 x 1012 F/m; permittivity of free space
As technology node shrinks (scaling), to minimize resistance of the wires, it is
desirable to keep the cross section of the wire (WxH) as large as possible. But this
increases area. Small values of W lead to denser wiring and less area overhead. In
advanced process W/H ratio has reduced below unity. Under such circumstances
parallel plate capacitance model becomes inaccurate. The capacitance between the
sidewall of the wires and substrate called fringing capacitance can no longer be
ignored and contributes to the overall capacitance.
64
Inter wire capacitance become dominant factor in multilayer interconnect structures.
These floating capacitors (not connected to substrate or ground) form a source of
noise (cross talk).
This effect is more pronounced for wires in the higher interconnect layer, as these are
farther away from the substrate.

Generally higher metal layers (i.e. interconnects) have higher thickness (i.e. height)
and higher dielectric layers have higher permittivity. Hence these wires display the
highest inter wire capacitance. Hence use it for global signals that are not sensitive to
interference. (eg. Supply rails). Or it is advisable to separate wires by an amount that
is larger than minimum spacing
Resistance
Resistance R= (ρ.L)/ (H.W) = (ρ. L)/ Area
L > length
W > width
ρ > resistivity (ohmm)
Since H (height, thickness) is constant for a given technology we can write: R = Rs.
(L/W) where Rs=ρ/H ohm/sqare is called “sheet resistance”.
At very high frequencies “skin effect” comes into play such that the resistance
becomes frequency dependent. High frequency currents tend to flow primarily on the
surface of a conductor, with the current density falling off exponentially with depth
into the conductor.
65
Skin effect is only an issue for wider wires. Since clocks tends to carry the highest
frequency signals on a chip and also fairly wide to limit resistance, the skin effect
likely to have its first impact on these lines.
Inductance
With the adoption of low resistance interconnect materials and the increase of
switching frequencies to GHz range, inductance starts to an important role.
Consequences of on chip inductance include ringing and overshoot effect, reflection
of signals due to impedance mismatch, inductive coupling between lines, and
switching noise due to (Ldi/dt) voltage drops
Lumped Capacitor Model
As long as the resistive component of the wire is small, and switching frequencies are
in the low to medium range, it is meaningful to consider only the capacitive
component of the wire, and to lump the distributed capacitance into a single
capacitance.

The only impact on performance is introduced by the loading effect of the capacitor
on the driving gate.
Lumped RC Model
If wire length is more than a few millimeters, the lumped capacitance
model is inadequate and a resistive capacitive model has to be adopted.
In lumped RC model the total resistance of each wire segment is lumped into one
single R, combines the global capacitive into single capacitor C.

66
Analysis of network with larger number of R and C becomes complex as network
contains many time constants (zeroes and poles). Elmore delay model overcome such
problem.
Elmore Delay Model

Properties of the network:


 Has single input node
 All the capacitors are between a node and ground.
 Network does not contain any resistive loops.
“Path resistance” is the resistance from source node to any other node.
“Shared path resistance” is the resistance shared among the paths from the source
node to any other two nodes.
Hence,
Delay at node 1: Tow d1 = R1C1
Delay at node 2: Tow d2= (R1+R2)C2
Delay at node 3: Tow d3 = (R1+R2+R3)C3
In general:
τdi=R1C1+(R1+R2)C2+……..+(R1+R2+R3+…..+Ri)Ci
IfR1=R2=R3=….=R
C1=C2=C3=…..C then
τdi=RC+2RC+……..+nRC
Thus Elmore delay is equivalent to the first order time constant of the network.
Assuming an interconnect wire of length L is partitioned into N identical segments.
Each segment has length L/N.
Then,
τd=L/N.R.L/N.C+ 2 (L/n.r+L/N.C)+……
=(L/N)2(RC+2RC+…….+NRC)
=(L/N)2. N(N+1)
or τd=RC.L2/2
67
=> The delay of a wire is a quadratic function of its length
=> doubling the length of the wire quadruples its delay
Advantages
 It is simple
 It is always situated between minimum and maximum bounds
Disadvantages
It is pessimistic and inaccurate for long interconnect wires.
Distributed RC model
Lumped RC model is always pessimistic and distributed RC model provides better
accuracy over lumped RC model.

But distributed RC model is complex and no closed form solution exists. Hence
distributed RC line model is not suitable for Computer Aided Design Tools.

The behavior of the distributed RC line can be approximated by a lumped RC ladder


network such as Elmore Delay model hence these are extensively used in EDA tools.
Transmission Line Model:

When frequency of operation increases to a larger extent, rise (or fall)


time of the signal becomes comparable to time of flight of the net, then inductive
effects starts dominating over RC values.
This inductive effect is modeled by Transmission Line models. The model assumes
signal is a "wave" and it propagates over the medium "net".
There are two types of transmission models:
Lossless transmission line model: This is good for Printed Circuit Board level
design.
Lossy transmission line model: This model is used for IC interconnect model.
Transmission line effects should be considered when the rise or fall time of the input
signal is smaller than the time of flight of the transmission line or resistance of the
wire is less than characteristics impedance.
Wire Load Models

Extraction data from already routed designs are used to build a lookup table known as
the wire load model (WLM). WLM is based on the statistical estimates of R and C
based on “Net Fanout”.

68
For fanouts greater than those specified in a wire load table, a “slope factor” is
specified for linear extrapolation.
wire_load (“5KGATES”) {

resistance : 0.000271 >

R per unit length

capacitance : 0.00017 >

C per unit length

slope : 29.4005 >

Used for linear extrapolation

fanout_length (1, 18.38) >

(fanout = 1, length = 18.38)

fanout_length (2, 47.78)

fanout_length (3, 77.18)

fanout_length (4, 106.58)

fanout_length (5, 135.98)

Eg:

Fanout = 7

Net length = 135.98 + 2 x 29.4005 (slope) = 194.78 >

length of net with fanout of 7

Resistance = 194.78 x 0.000271 = 0.05279 units

Capacitance = 194.78 x 0.00017 = 0.03311 units

Wire load models for synthesis


Wire load modeling allows us to estimate the effect of wire length and fanout
on the resistance, capacitance, and area of nets. Synthesizer uses these physical values
to calculate wire delays and circuit speeds. Semiconductor vendors develop wire load
models, based on statistical information specific to the vendors’ process. The models
include

69
Co efficients for area, capacitance, and resistance per unit length, and a fanout to
length table for estimating net lengths (the number of fanouts determines a nominal
length).
Selection of wire load models in the initial stage (before physical design) depends on
the following factors:
1. User specification
2. Automatic selection based on design area
3. Default specification in the technology library
Once the final routing step is over in the physical design stage, wire load models are
generated based on the actual routing in the design and synthesis is redone using those
wire load models.
In hierarchical designs, we have to determine which wire load model to use for nets
that cross hierarchical boundaries. There are three modes for determining which wire
load model to use for nets that cross hierarchical boundaries:
Top:
Applying same wire load models to all nets as if the design has no hierarchy and uses
the wire load model specified for the top level of the design hierarchy for all nets in a
design and its sub designs

Enclosed:
The wire load model of the smallest design that fully encloses the net is
applied. If the design enclosing the net has no wire load model, then traverses the
design hierarchy upward until we finds a wire load model. Enclosed mode is more
70
accurate than top mode when cells in the same design are placed in a contiguous
region during layout.
Use enclosed mode if the design has similar logical and physical hierarchies.
Segmented:
Wire load model for each segment of a net is determined by the design
encompassing the segment. Nets crossing hierarchical boundaries are divided into
segments. For each net segment, the wire load model of the design containing the
segment is used. If the design contains a segment that has no wire load model, then
traverse the design hierarchy upward until it finds a wire load model.
Interconnect Delay vs. Deep Sub Micron Issues :
Performances of deep sub micron ICs are limited by increasing interconnect
loading affect. Long global clock networks account for the larger part of the power
consumption in chips. Traditional CAD design methodologies are largely affected by
the interconnect scaling.

Capacitance and resistance of interconnects have increased due to the smaller wire
cross sections, smaller wire pitch and longer length. This has resulted in increased RC
delay. As n technology is advancing scaling of interconnect is also increasing. In such
scenario increased RC delay is becoming major bottleneck in improving performance
of advanced ICs.
Here the gate delay and the interconnect delay are shown as functions of
various technology nodes ranging from 180nm to 60nm. The interconnect delays
shown assumes a line where repeaters are connected optimally and includes the delay
due to the repeaters. From the graph it can be observed that with the shrinking of
technology gate delay reduces but interconnect delay increases.
Limits of Cu/low k interconnects
At submicron level of 250 nm copper with low k dielectric was introduced to
decrease affects of increasing interconnect delay. But below 130 nm technology node
interconnect delays are increasing further despite of introducing low k dielectric. As
the scaling increases new physical and technological effects like resistivity and
barrier thickness start dominating and interconnect delay increases. Introduction of
repeaters to shorten the interconnect length increases total area. The vias connecting
repeaters to global layers can cause blockage in lower metal layers. Thus as the
technology improves material limitations will dominate factor in the interconnect
delay. Increasing metal layer width will cause increase in metallization layer.

71
This can’t be a solution for the problem as it increases complexity, reliability and cost.
Cu lowk dielectric films are deposited by a special process known as Damascene
process.
Adhesion property of Cu with dielectric materials is very poor. Under electric
bias they easily drift and cause short between metal layers. To avoid this problem a
barrier layer is deposited between dielectric and Cu trench. Even though it decreases
effective cross section of interconnects compared to drawn dimensions, it improves
reliability. The barrier thickness becomes significant in deep submicron level and
effective resistance of the interconnect rises further. In addition to this increasing
electron scattering and self-heating caused by the electron flow in interconnects due to
comparable increase in internal chip temperature also contribute to increase
interconnect resistance.

72
Delays in ASIC Design
We encounter several types of delays in ASIC design. They are as follows:
 Gate delay or Intrinsic delay
 Net delay or Interconnect delay or Wire delay or Extrinsic delay or Flight time
 Transition or Slew
 Propagation delay
 Contamination delay
Wire delays or extrinsic delays are calculated using output drive strength, input
capacitance and wire load models. Other delays are intrinsic properties of each and
every gate.
Delays are interdependent on different electrical properties:
 Input capacitance of the logic gate is a function of output state, output loads
and input slew rate.
 Internal timing arcs and output slew rate is a function of switching input(s).
 Capacitance of the wire is dependent on frequency.
 Internal timing arcs are a function of input slew rates.
 Output slew rate is a function of input slew rate on each input.
 Wires exhibit RLC characteristics instead of lumped RC.
Gate Delay
Transistors within a gate take a finite time to switch. This means that a change
on the input of a gate takes a finite time to cause a change on the output. [Magma]
Gate delay =function of (input transition (slew) time, Cnet+Cpin).
Or
Gate delay =function of (input transition (slew) time, Cload).
where Cload=Cnet+Cpin
Cnet-->Net capacitance
Cpin-->pin capacitance of the driven cell
Cell delay is also same as Gate delay.

73
How gate delay is calculated?
Cell or gate delay is calculated using Non-Linear Delay Models (NLDM).
NLDM is highly accurate as it is derived from SPICE characterizations. The delay is a
function of the input transition time (i.e. slew) of the cell, the wire capacitance and the
pin capacitance of the driven cells. A slow input transition time will slow the rate at
which the cell’s transistors can change state logic 1 to logic 0 (or logic 0 to logic 1),
as well as a large output load Cload (Cnet + Cpin), thereby increasing the delay of the
logic gate. There is another NLDM table in the library to calculate output transition.
Output transition of a cell becomes the input transition of the next cell down the
chain.
Table models are usually two-dimensional to allow lookups based on the input
slew and the output load (Cload). A sample table is given below.
timing() {

related_pin : "CKN";

timing_type : falling_edge;

timing_sense : non_unate;

cell_rise(delay_template_7x7) {

index_1 ("0.012, 0.032, 0.074, 0.154, 0.318, 0.644, 1.3");

index_2 ("0.001278, 0.0046008, 0.0112464, 0.0245376, 0.05112,

0.10454, 0.212148");

values ( \

"0.225894, 0.249015, 0.285537, 0.352680, 0.484244, 0.748180, 1.279570", \

"0.231295, 0.254415, 0.290938, 0.358081, 0.489646, 0.753585, 1.284980", \

"0.243754, 0.266878, 0.303398, 0.370542, 0.502105, 0.766044, 1.297440", \

"0.267240, 0.290389, 0.326908, 0.394052, 0.525615, 0.789561, 1.320950", \

"0.307080, 0.330200, 0.366721, 0.433861, 0.565425, 0.829373, 1.360760", \

"0.380552, 0.403875, 0.440426, 0.507569, 0.639136, 0.903084, 1.434500", \

"0.497588, 0.521769, 0.558548, 0.625744, 0.757301, 1.021260, 1.552680");

}
74
rise_transition(delay_template_7x7) {

index_1 ("0.012, 0.032, 0.074, 0.154, 0.318, 0.644, 1.3");

index_2 ("0.001278, 0.0046008, 0.0112464, 0.0245376, 0.05112, 0.10454,

0.212148");

values ( \

"0.040574, 0.068619, 0.125391, 0.246672, 0.497688, 1.005982, 2.030120", \

"0.040570, 0.068618, 0.125390, 0.246672, 0.497688, 1.005940, 2.030240", \

"0.040565, 0.068616, 0.125389, 0.246650, 0.497770, 1.006180, 2.030120", \

"0.040532, 0.068612, 0.125387, 0.246670, 0.497710, 1.006164, 2.030100", \

"0.040578, 0.068621, 0.125392, 0.246636, 0.497688, 1.006182, 2.030040", \

"0.041763, 0.069211, 0.125662, 0.246758, 0.497726, 1.005930, 2.030000", \

"0.045813, 0.071321, 0.126671, 0.247154, 0.497846, 1.005962, 2.030180");

index_1 --> input transition values

index_2--> output load capacitance values

values--> delay values

Situation 1:
Input transition and output load values match with table index values

If both input transition and output load values match with table index values then
corresponding delay value is directly picked up from the delay “values” table as
highlighted by yellow shaded data.
Situation 2:
Output load values doesn't match with table index values

When the actual load capacitance values does not fall directly on or at one of the load-
axis index points, the delay is determined by interpolation from the closest points.
Note that to carry out interpolation input transition point should match with the any
one of the table index values.
Determine the equation for the line segment connecting the two nearest points in the
table.
75
To do this first we need to find the slope value.
Slope m = (y2-y1)/(x2-x1) where (y2-y1) is delay segment (generally in ns) on y axis
and (x2-x1) is load segment (generally in pf) on x-axis.
Solve for the delay at the load point of interest.
The linear equation is:
y = mx+c
where
y-->delay (ns)
m-->slope
x-->load capacitance (pf)
i.e. delay=slope*load point of interest (constant value is zero)
Load point of interest means load capacitance value for which delay has to be
calculated.
Situation 3:
Both input transition and output load values doesn't match with table index
values

 If both input transition and load capacitance values do not match exactly with
the look up table index values then bilinear interpolation is used.
 Multiple linear interpolations (~3) are performed on multiple closest table data
points (~4) as shown in highlighted violet color in the look up table.
Situation 4:
Output load values doesn't match with table index values and is outside the table
boundary

 When the load point is outside of the boundary of the index, the delay is
extrapolated to the closest known points.
 Lookup value too far out of range of the given table value could lead to
inaccuracy.

Intrinsic delay

 Intrinsic delay is the delay internal to the gate. This is from input pin of the
cell to output pin of the cell.

76
 It is defined as the delay between an input and output pair of a cell, when a
near zero slew is applied to the input pin and the output does not see any load
condition. It is caused by the internal capacitance associated with its transistor.
 This delay is largely dependent on the size of the transistors forming the gate
because increasing size of transistors increase internal capacitors.

Transition Delay and Propagation Delay


Transition Delay 
Transition delay or slew is defined as the time taken by signal to rise from 10 %
( 20%) to the 90 %( 80%) of its maximum value. This is known as “rise time”.

Similarly “fall time” can be defined as the time taken by a signal to fall from 90 %
( 80%) to the 10 %( 20%) of its maximum value.

 Transition is the time it takes for the pin to change state.


Setting Transition Time Constraints

The above theoretical definitions are to be applied on practical designs. Now, the
transition time of a net becomes the time required for its driving pin to change logic
values (from 10 %( 20%) to the 90 %( 80%) of its maximum value). This transition
time used foe delay calculations are based on the timing library (.lib files).
Transition related constraints can be provided in Design Compiler (logic synthesis
tool from Synopsys) by using below commands:

77
1. max_transition : This attribute is applied to each output of a cell. During
optimization, Design Compiler tries to make the transition time of each net less than
the value of themax_transition attribute.
2. set_max_transition: This command is used to change the maximum transition
time restriction specified in a technology library.
“This command sets a maximum transition time for the nets attached to the identified
ports or to all the nets in a design by setting the max_transition attribute on the named
objects.
For example, to set a maximum transition time of 3.2 on all nets in the design adder,
enter the following command:
set_max_transition 3.2 [get_designs adder]
To undo a set_max_transition command, use the remove_attribute command. For
example, enter the following command:
remove_attribute [get_designs adder] max_transition”
(Directly quoted from Design Complier user manual)
Setting Capacitance Constraints
The transition time constraints specified above do not provide a direct way to control
the actual capacitance of nets. To control capacitance directly, below command has to
be used:
set_max_capacitance: This command sets the maximum capacitance constraint on
input ports or designs.
In addition to set_max_transition, set_max_capacitance can also be used as this
command works independent.
This command applies maximum capacitance limit to output pin or port of the design.
This command can also be used to apply capacitance limit on any net.
Eg:
set_max_capacitance 4 [get_designs decoder]
To remove the set_max_capacitance command, use the remove_attribute command.
remove_attribute [get_designs decoder] max_capacitance

Propagation Delay
Propagation delay is the time required for a signal to propagate through a gate or net.

 Hence if it is cell, you can call it as “Gate or Cell Delay” or if it is net you can
call it as “Net Delay”

78
 Propagation delay of a gate or cell is the time it takes for a signal at the input
pin to affect the output signal at output pin.
 For any gate propagation delay is measured between 50% of input transition to
the corresponding 50% of output transition.

There are 4 possibilities:


 Propagation delay between 50 % of Input rising to 50 % of output rising.
 Propagation delay between 50 % of Input rising to 50 % of output falling.

 Propagation delay between 50 % of Input falling to 50 % of output rising.

 Propagation delay between 50 % of Input falling to 50 % of output falling.

Each of these delays has different values. Maximum and minimum values of these set
are very important. Maximum and minimum propagation delay values are considered
for timing analysis.
For net propagation delay is the delay between the time a signal is first applied to the
net and the time it reaches other devices connected to that net.
Propagation delay is taken as the average of rise time and fall time i.e. Tpd=
(Tphl+Tplh)/2.

79
PVT vs. Delay
Sources of variation can be:
 Process variation (P)
 Supply voltage (V)
 Operating Temperature (T)
This variation accounts for deviations in the semiconductor fabrication
process. Usually process variation is treated as a percentage variation in the
performance calculation. Variations in the process parameters can be impurity
concentration densities, oxide thicknesses and diffusion depths. These are caused bye
non uniform conditions during depositions and/or during diffusions of the
impurities.This introduces variations in the sheet resistance and transistor parameters
such as threshold voltage. Variations are in the dimensions of the devices, mainly
resulting from the limited resolution of the photolithographic process. This causes
(W/L) variations in MOS transistors.
Process variations are due to variations in the manufacture conditions such as
temperature, pressure and dopant concentrations. The ICs are produced in lots of 50 to
200 wafers with approximately 100 dice per wafer. The electrical properties in
different lots can be very different. There are also slighter differences in each lot, even
in a single manufactured chip. There are variations in the process parameter
throughout a whole chip. As a consequence, the transistors have different transistor
lengths throughout the chip. This makes the propagation delay to be different
everywhere in a chip, because a smaller transistor is faster and therefore the
propagation delay is smaller.
Supply Voltage Variation
The design’s supply voltage can vary from the established ideal value during
day to day operation. Often a complex calculation (using a shift in threshold voltages)
is employed, but a simple linear scaling factor is also used for logic level performance
calculations. The saturation current of a cell depends on the power supply. The delay
of a cell is dependent on the saturation current. In this way, the power supply inflects
the propagation delay of a cell. Throughout a chip, the power supply is not constant
and hence the propagation delay varies in a chip. The voltage drop is due to nonzero
resistance in the supply wires. A higher voltage makes a cell faster and hence the
propagation delay is reduced. The decrease is exponential for a wide voltage range.
The self inductance of a supply line contributes also to a voltage drop. For example,

80
when a transistor is switching to high, it takes a current to charge up the output load.
This time varying current (for a short period of time) causes an opposite self induced
Electro motive force. The amplitude of the voltage drop is given by .V=L*dI/dt,
where “L” is the self inductance and I is the current through the line.
Operating Temperature Variation
Temperature variation is unavoidable in the everyday operation of a design.
Effects on performance caused by temperature fluctuations are most often handled as
linear scaling effects, but some submicron silicon processes require nonlinear
calculations. When a chip is operating, the temperature can vary throughout the chip.
This is due to the power dissipation in the MOS transistors. The power consumption
is mainly due To switching short circuit and leakage power consumption. The average
switching power dissipation (approximately given by P average = Cload*Vpower
supply 2*fclock) is due to the required energy to charge up the parasitic and load
capacitances. The shortcircuit power dissipation is due to the finite rise and fall times.
The nMOS and pMOS transistors may conduct for a short time during switching,
forming a direct current from the power supply to the ground. The leakage power
consumption is due to the nonzero reverse leakage and subthreshold currents. The
biggest contribution to the power consumption is the switching. The dissipated power
will increase the surrounding temperature. The electron and hole mobility depend on
the temperature. The mobility (in Si) decreases with increased temperature for
temperatures above –50 °C. The temperature, when the mobility starts to decrease,
depends on the doping concentration. A starting temperature at –50 °C is true for
doping concentrations below 1019 atoms/cm3. For higher doping concentrations, the
starting temperature is higher. When the electrons and holes move slower, then the
propagation delay increases. Hence, the propagation delay increases with increased
temperature. There is also a temperature effect, which has not been considered. The
threshold voltage of a transistor depends on the temperature. A higher temperature
will decrease the threshold voltage. A lower threshold voltage means a higher current
and therefore a better delay performance. This effect depends extremely on power
supply, threshold voltage, load and input slope of a cell. There is a competition
between the two effects and generally the mobility effect wins.

81
The following figure shows the PVT operating conditions.

The best and worst design corners are defined as follows:


Best case: fast process, highest voltage and lowest temperature
Worst case: slow process, lowest voltage and highest temperature
On Chip Variation
Onchip variation is minor differences on different parts of the chip within one
operating condition. OnChip variation (OCV) delays vary across a single die due to:
 Variations in the manufacturing process (P)
 Variations in the voltage (due to IR drop)
 Variations in the temperature (due to local hot spots etc)
This need is to be modeled by scaling the coefficients. Delays have uncertainty
due to the variation of Process (P), Voltage (V), and Temperature (T) across large
dies. On Chip variation allows you to account for the delay variations due to PVT
changes across the die, providing more accurate delay estimates.

82
Timing Analysis With On Chip Variation
For cell delays, the on chip variation is between 5 percent above and 10 percent below
the SDF back annotated values.

For net delays, the on chip variation is between 2 percent above and 4 percent below
the SDF back annotated values.

For cell timing checks, the onchip variation is 10 percent above the SDF values for
setup checks and 20 percent below the SDF values for hold checks.

In Prime Time, OCV derations are implemented using the following commands:
pt_shell> read_sdf analysis_type on_chip_variation my_design.sdf

pt_shell> set_timing_derate cell_ delay min 0.90 max 1.05

pt_shell> set_timing_derate net min 0.96 max 1.02

pt_shell> set_timing_derate cell_ check min 0.80 max 1.10

In the traditional deterministic STA (DSTA), process variation is modeled by


running the analysis multiple times, each at a different process condition. For each
process condition, a so called corner file is created that specifies the delay of the gates
at that process condition. By analyzing a sufficient number of process conditions, the
delay of the circuit under process variation can be bounded. The uncertainty in the
timing estimate of a design can be classified into three main categories.

Modeling and analysis errors: Inaccuracy in device models, in the extraction and
reduction of interconnect parasitics and in the timing analysis algorithms.

Manufacturing variations: Uncertainty in the parameters of a fabricated devices


and interconnects from die to die and within a particular die.

Operating context variations: Uncertainty in the operating environment of a


particular device during its lifetime, such as temperature, supply voltage, mode of
Operation and lifetime wear out.
For instance, the STA tool might utilize a conservative delay noise algorithm resulting
in certain paths operating faster than expected. Environmental uncertainty and
uncertainty due to modeling and analysis errors are typically modeled using worst
case margins, whereas uncertainty in process is generally treated statistically.

83
Taxonomy of Process Variations
As process geometries continue to shrink, the ability to control critical device
parameters is becoming increasingly difficult and significant variations in device
length, doping concentrations and oxide thicknesses have resulted [9]. These process
variations pose a significant problem for timing yield prediction and require that static
timing analysis models the circuit delay not as a deterministic value, but as a random
variable.
Process variations can either systematic or random.
Systematic variation: Systematic variations are deterministic in nature and are
caused by the structure of a particular gate and its topological environment. The
systematic variations are the component of variation that can be attributed to a layout
or manufacturing equipment related effects. They generally show spatial correlation
behavior.

Random variation: Random or nonsystematic variations are unpredictable in nature


and include random variations in the device length, discrete doping fluctuations and
oxide thickness variations. Random variations cannot be attributed to a specific
repeatable governing principle.
The radius of this variation is comparable to the sizes of individual devices, so each
device can vary independently.

Process variations can classified as follow:


Inter die variation or die to die: Inter chip variations are variations that occur from one
die to next, meaning that the same device on a chip has different features among
different die of one wafer, from wafer to wafer and from wafer lot to wafer lot. Die to
die variations have a variation radius larger than the die size including within wafer,
wafer to wafer, lot to lot and fab to fab variations .

Intra die or with in die variation: Intra die variations are the variations in device
features that are present within a single chip, meaning that a device feature varies
between different locations on the same die. Intra chip variations exhibit spatial
correlations and structural correlations.

Frontend variation: Frontend variations mainly refer to the variations present at the
transistor level. The primary components of the front end variations entail transistor
gate length and gate width, gate oxide thickness, and doping related variations. These

84
physical variations cause changes in the electrical characteristics of the transistors
which eventually lead to the variability in the circuit performance.

Backend variation: Backend variations refer to the variations on various levels of


Inter connecting metal and dielectric layers used to connect numerous devices to form
the required logic gates. In practice, device features vary among the devices on a chip
and the likelihood that all devices have a worst case feature is extremely small. With
increasing awareness of process variation, a number of techniques have been
developed which model random delay variations and perform STA.
These can be classified into full chip analysis and path based analysis approaches:

Full Chip Analysis: Fullchip analysis models the delay of a circuit as a random
variable and endeavors to compute its probability distribution. The proposed methods
are heuristic in nature and have a very high worst case computational complexity.
They are also based on very simple delay models, where the dependence of gate delay
due to slope variation at the input of the gate and load variation at the output of the
gate is not modeled. When run time and accuracy are considered, full chip STA is not
yet practical for industrial designs.

Path Based STA:


Path based STA provides statistical information on a path by path basis. It accounts
for intra die process variations and hence eliminates the pessimism in deterministic
timing analysis, based on case files.
It is a more accurate measure of which paths are critical under process
variability, allowing more correct optimization of the circuit. This approach does not
include the load dependence of the gate delay due to variability of fan out gates and
does not address spatial correlations of intra die variability. To compute the intra die
path delay component of process variability, first the sensitivity of gate delay, output
slope and input load with respect to slope, output load and device length are
computed. Finally, when considering sequential circuits, the delay variation in the
buffered clock tree must be considered.
In general, the fully correlated assumptions will underestimate the variation in the
arrival times at the leaf nodes of the clock tree which will tend to overestimate circuit
performance.

85
Clock Definitions

Rising and falling edge of the clock


For a +ve edge triggered design +ve (or rising) edge is called ‘leading edge’ whereas
–ve (or falling) edge is called ‘trailing edge’.
For a -ve edge triggered design –ve (or falling) edge is called ‘leading edge’ whereas
+ve (or rising) edge is called ‘trailing edge’.

Minimum pulse width of the clock can be checked in PrimeTime by using commands
given below:
set_min_pulse_width high 2.5 [all_clocks]
set_min_pulse_width low 2.0 [all_clocks]

These checks are generally carried out for post layout timing analysis. Once these
commands are set, PrimeTime checks for high and low pulse widths and reports any
violations.
Capture Clock Edge
The edge of the clock for which data is detected is known as capture edge.
Launch Clock Edge
This is the edge of the clock wherein data is launched in previous flip flop and will be
captured at this flip flop.

86
Skew
Skew is the difference in arrival of clock at two consecutive pins of a sequential
element is called skew. Clock skew is the variation at arrival time of clock at
destination points in the clock network. The difference in the arrival of clock signal at
the clock pin of different flops.
Two types of skews are defined: Local skew and Global skew.
Local skew
Local skew is the difference in the arrival of clock signal at the clock pin of related
flops.
Global skew
Global skew is the difference in the arrival of clock signal at the clock pin of non
related flops. This also defined as the difference between shortest clock path delay
and longest clock path delay reaching two sequential elements.

87
Skew can be positive or negative. When data and clock are routed in same direction
then it is
Positive skew. When data and clock are routed in opposite direction then it is negative
skew.
Positive Skew
If capture clock comes late than launch clock then it is called +ve skew.
Clock and data both travel in same direction.
When data and clock are routed in same direction then it is Positive skew.
+ve skew can lead to hold violation.
+ve skew improves setup time.

88
Negative Skew
If capture clock comes early than launch clock it is called –ve skew. Clock and data
travel in opposite direction. When data and clock are routed in opposite then it is
negative skew. -Ve skew can lead to setup violation. –ve skew improves hold time.
(Effects of skew on setup and hold will be discussed in detail in forthcoming articles)
Uncertainty
Clock uncertainty is the time difference between the arrivals of clock signals at
registers in one clock domain or between domains.
Prelayout and Post layout Uncertainty
Pre CTS uncertainty is clock skew, clock Jitter and margin. After CTS skew is
calculated from the actual propagated value of the clock. We can have some margin
of skew + Jitter.

89
Clock latency
Latency is the delay of the clock source and clock network delay.
Clock source delay is the time taken to propagate from ideal waveform origin point to
clock definition point. Clock network latency is the delay from clock definition point
to register clock pin.
Pre CTS Latency and Post CTS Latency
Latency is the summation of the Source latency and the Network latency. Pre CTS
estimated latency will be considered during the synthesis and after CTS propagated
latency is considered.
Source Delay or Source Latency
It is known as source latency also. It is defined as "the delay from the clock origin
point to the clock definition point in the design".
Delay from clock source to beginning of clock tree (i.e. clock definition point).
The time a clock signal takes to propagate from its ideal waveform origin point to the
clock definition point in the design.
90
Network Delay (latency) or Insertion Delay
It is also known as Insertion delay or Network latency. It is defined as "the delay from
the clock definition point to the clock pin of the register".
The time clock signal (rise or fall) takes to propagate from the clock definition point
to a register clock pin.
Figure below shows example of latency for a design without PLL.

The latency definitions for designs with PLL are slightly different.
Figure below shows latency specifications of such kind of designs. Latency from the
PLL output to the clock input of generated clock circuitry becomes source latency.
From this point onwards till generated clock divides to flops is now known as network
latency. Here we can observe that part of the network latency is clock to q delay of the
flip flop (of divide by 2 circuit in the given example) is known value.

91
Jitter
Jitter is the short term variations of a signal with respect to its ideal position in time.
Jitter is the variation of the clock period from edge to edge. It can vary +/jitter
value. From cycle to cycle the period and duty cycle can change slightly due to the
clock generation circuitry. Jitter can also be generated from PLL known as PLL jitter.
Possible jitter values should be considered for proper PLL design. Jitter can be
modeled by adding uncertainty regions around the rising and falling edges of the
clock waveform
Sources of Jitter Common sources of jitter include:
 Internal circuitry of the phaselocked
 loop (PLL)
 Random thermal noise from a crystal
 Other resonating devices
 Random mechanical noise from crystal vibration
 Signal transmitters
 Traces and cables
 Connectors

92
 Receivers
 Click here to read more about jitter from Altera.
 Click here to read what wiki says about jitter

Multiple Clocks
If more than one clock is used in a design, then they can be defined to have different
waveforms and frequencies. These clocks are known as multiple clocks. The logics
triggered by each individual clock are then known as “clock domain”.
If clocks have different frequencies there must be a base period over which all
waveforms repeat. Base period is the least common multiple (LCM) of all clock
periods

Asynchronous Clocks
In multiple clock domains, if these clocks do not have a common base period then
they are called as asynchronous clocks. Clocks generated from two different crystals,
PLLs are asynchronous clocks. Different clocks having different frequencies
generated from single crystal or PLL are not asynchronous clocks but they are
synchronous clocks.

Gated clocks
Clock signals that are passed through some gate other than buffer and inverters are
called gated clocks. These clock signals will be under the control of gated logic.
Clock gating is used to turn off clock to some sections of design to save power. Click
here to read more about clock gating.
Generated clocks
Generated clocks are the clocks that are generated from other clocks by a circuit
within the design such as divider/multiplier circuit. Static timing analysis tools such
as PrimeTime will automatically calculate the latency (delay) from the source clock to
the generated clock if the source clock is propagated and you have not set source
latency on the generated clock.

93
‘Clock’ is the master clock and new clock is generated from F1/Q output. Master
clock is defined with the constraint ‘create_clok’. Unless and until new generated
clock is defined as ‘generated clock’ timing analysis tools won’t consider it as
generated clock. Hence to accomplish this requirement use “create_generated_clock”
command. ‘CLK’ pin of F1 is now treated as clock definition point for the new
generated clock. Hence clock path delay till F1/CLK contributes source latency
whereas delay from F1/CLK contributes network latency.
Virtual Clocks
Virtual clock is the clock which is logically not connected to any port of the design
and physically doesn’t exist. A virtual clock is used when a block does not contain a
port for the clock that an I/O signal is coming from or going to. Virtual clocks are
used during optimization; they do not really exist in the circuit. Virtual clocks are
clocks that exist in memory but are not part of a design. Virtual clocks are used as a
reference for specifying input and output delays relative to a clock. This means there
is no actual clock source in the design. Assume the block to be synthesized is
“Block_A”. The clock signal, “VCLK”, would be a virtual clock.
The input delay and output delay would be specified relative to the virtual clock.

94
Fixing Transition Violation
Transition violations can be fixed by different methods based on the design situations.
They include:
 Up sizing the driver cell
 Decreasing the net length by moving cells nearer or reducing long routed net
 By adding buffers
 By using existing spare cells as buffers
 By splitting loads through buffers to reduce the fan out number (number of
driven cells)
First we need to analyze the root causes of violations. Obstructions such as macros,
routing blockages, or fixed status of cells might have caused long routed net lengths
or detours causing increased load on the connected driver. Recently I had a chance to
work on metal only change ECO wherein no base layer change is carried out. The
new ECO cells added are having the same base layers. Special ECO cell library has
been used for this purpose.

Here new cells have to be added wherever space available by not moving any of the
existing placed cells. Naturally placement of the new cells were not based on
connectivity of the cells rather it was availability of the placement locations. This
caused long routed nets causing huge transition violation

In this case how did I fix transition?


Newly added cells are moved nearer as much as possible to the nearest available
locations based on the connectivity with the existing cells. This reduced overall
transition violations.
Wherever it was possible driver cells were up sized. There were several situations for
which further up sizing was not possible. Here I adopted different technique to solve
transition violations. We had good number SPARE cells available in the design. This
included Spare AND and OR gates. Inputs of spare AND gate is tied to logic 1 and
spare OR gate is
connected to logic 0. By using any one of the inputs we can use these spare cells as
buffers and that’s what I did. I tried to find a spare cell near the violated net path and
inserted it as buffer. Since used spare cells were of high drive strengths my method
yielded good results by reducing transition to a large extent. Wherever I could find
enough setup margin I can easily insert high drive strength buffers to tackle transition
95
Maximum Clock Frequency
This is a general question in most of the interview, what’s the maximum clock
frequency for a particular circuit? Or Interviewer will provide some data and they will
repeat the same question. Many of us know the direct formula and after applying that
we can come across the final “Ans” but if someone twist the question. Some -time we
become confuse.
Here I will discuss the same but from basic point of view. It has 3 major sections.
1. In 1st section, we will discuss different definitions with respect to Sequential and
combinational Circuits.
2. 2nd Section contains the basics of “Maximum Clock Frequency”. I will explain
why and how you can calculate the max Clock frequency.
3. I will take few examples and try to solve them. I will make sure that I can capture
at least 2-4 examples from easy one to difficult one.
As we know that now a days all the chips has combinational + sequential circuit. So
before we move forward, we should know the definition of “Propagation delay” in
both types of circuits. Please read it once because it will help you to understand the
“Maximum Clock Frequency” concepts.
Propagation Delay in the Combinational circuits:
Let’s consider a “NOT” gate and Input/output waveform as shown in the figure.

From the above figure, you can define


Rise Time (tr): The time required for a signal to transition from 10% of its maximum
value to 90% of its maximum value.
Fall Time (tf): The time required for a signal to transition from 90% of its maximum
value to 10% of its maximum value.

96
Propagation Delay (tpLH, tpHL): The delay measured from the time the input is at
50% of its full swing value to the time the output reaches its 50% value.
I want to rephrase above mention definition as
 This value indicates the amount of time needed to reflect a permanent change
at an output, if there is any change in logic of input.
 Combinational logic is guaranteed not to show any further output changes in
response to an input change after tpLH or tpHL time units have passed.
So, when an input X change, the output Y is not going to change instantaneous.
Inverter output is going to maintain its initial value for some time and then it’s going
to change from its initial value. After the propagation delay (tpLH or tpHL - depends
on what type of change- low to high or high to low), the inverter output is stable and
is guaranteed not to change again until another input change ( here we are not
considering any SI/noise effect).
Propagation Delay in the Sequential circuits:
In the sequential circuits, timing characteristics are with respect to the clock input.
You can correlate it in this way that in the
combinational circuit every timing characteristic/parameter are with respect to the
data input change but in the sequential circuits
the change In the “data input” is important but change in the clock value has higher
precedence. E.g in a positive-edged-triggered
Flip-flop, the output value will change only after a presence of positive-edge of clock
whether the input data has changed long time ago.
So flip-flops only change value in response to a change in the clock value, timing
parameters can be specified in relation to the rising (for positive edge-triggered) or
falling (for negative-edge triggered) clock edge.
Let’s consider the positive-edge flip-flop as shown in figure.

97
Propagation delay, tpHL and tpLH , has the same meaning as in combinational
circuit– beware propagation delays usually will not be equal for all input to output
pairs.
Note: In case of flip-flop there is only one propagation delay i.e tclk-Q (clock→Q
delay) but in case of Latches there can be two
propagation delays: tClk-Q (clock→Q delay) and tD-Q (data→Q delay). Lation delay
we will discuss later.
So again let me rephrase the above mention definition
 This value indicates the amount of time needed for a permanent change at the
flip-flop output (Q) with respect to a change in the flip flop-clock input (e.g.
rising edge).
 When the clock edge arrives, the D input value is transferred to output Q.
After tClk−Q (here which is equivalent to tpLH), the output is guaranteed not
to change value again until another clock edge trigger (e.g. rising edge) arrives
and corresponding Input also.
Setup time (tsu) - This value indicates the amount of time before the clock edge that
data input D must be stable.
Hold time (th) - This value indicates the amount of time after the clock edge that data
input D must be held stable.
The circuit must be designed so that the D flip flop input signal arrives at least “tsu”
time units before the clock edge and does not change until at least “th” time units
after the clock edge. If either of these restrictions are violated for any of the flip-flops
in the circuit, the circuit will not operate correctly. These restrictions limit the
maximum clock frequency at which the circuit can operate.
The Maximum Clock Frequency for a circuit:
I hope you may be asking that why there is a need of explaining the combinational
circuit propagation delay here. Combinational circuit is always independent of clock,
so why combination circuit here.
Now the point is combinational circuit plays a very important role in deciding the
clock frequency of the circuit. Let’s first discuss an example and try to calculate the
circuit frequency, and then we will discuss rest of the things in details.

98
Now let’s understand the flow of data across these Flip-flops.
Let’s assume data is already present at input D of flip-flop A and it’s in the stable
form.
 Now Clock pin of FF (Flip-Flop) A i.e Clk has been triggered with a positive
clock edge (Low to high) at time “0ns”.
 As per the propagation delay of the sequential circuit (tclk-Q), it will take at
least 10ns for a valid output data at the pin X.
Remember- If you will capture the output before 10ns, then no one can give you the
guarantee for the accurate/valid value at the pin x.
This data is going to transfer through the inverter F. Since the propagation delay of
“F” is 5ns, it means, you can notice the valid output at the pin Y only after
10ns+5ns=15ns (with reference to the positive clock edge- 10ns of FF A and 5 ns
of inverter)
Practically this is the place where a more complex combinational circuit are
present between 2 FFs. So in a more complex design, if a single path is present
between X and Y, then the total time taken by the data to travel from X to Y is equal
to the sum of the propagation delay of all the combinational circuits/devices. (I will
explain this in more detail in the next section with more example) Now once valid
data reaches at the pin Y, then this data supposed to capture by FF B at the next clock
positive edge (in a single cycle circuit). We generally try to design all the circuit in
such a way that it operates in a single clock cycle. Multiple clock cycle circuit are
special case and we are not going to discuss that right now . For properly capturing
the data at FF B, data should be present and stable 2ns (setup time) before the next
clock edge as part of setup definition).
So it means between 2 consecutive positive clock edge, there should be minimum
time difference of 10ns +5ns +2ns = 17ns. And we can say that for this circuit the
minimum clock period should be 17ns (if we want to operate the circuit in single
clock cycle and accurately).
Now we can generalize this
99
Minimum Clock Period = tclk-Q (A) + tpd (F) + ts (B)
And “Maximum Clock Frequency = 1/(Min Clock Period)”
Now at least we have some idea how to calculate the Max clock frequency or Min
Clock Period. So even if we will forget the formula then we can calculate our self and
we can also prove the logic behind that. Let me use the same concept in few of the
more complex design circuit or you can say the practical circuit.

100
Examples to calculate the “Maximum Clock Frequency”
for different circuits.
Example 1: Multiple FF’s Sequential Circuit
In a typical sequential circuit design there are often millions of flip-flop to flip-flop
paths that need to be considered in calculating the maximum clock frequency. This
frequency must be determined by locating the longest path among all the flip-flop
paths in the circuit.
Consider the following circuit.

There are three flip-flop to flip-flop paths (flop A to flop B, flop A to flop C, flop B to
flop C). Using an approach similar to whatever I
have explained in the last section, the delay along all three paths are:
TAB = tClk−Q(A) + ts(B) = 9 ns + 2 ns = 11 ns
TAC = tClk−Q(A) + tpd(Z) + ts(C) = 9 ns + 4 ns + 2 ns = 15 ns
TBC = tClk−Q(B) + tpd(Z) + ts(C) = 10 ns + 4 ns + 2 ns = 16 ns
Since the TBC is the largest of the path delays, the minimum clock period for
the circuit is Tmin = 16ns and the maximum clock
frequency is 1/Tmin = 62.5 MHz.
Example 2: Circuit with min and max delay Specification.
Let’s consider following circuit. Now this circuit is similar to the normal FF circuitry,
only differences are
Every specification has 2 values (Min and Max).
There is a combinational circuit in the clock path also.
Note: if you are wondering why there are min and max value (or like from where
these values are coming, then you have to refer another blog).
101
Now let’s understand the flow/circuit once again.
 Every interconnect wire also has some delay, so you can see clock CLK will
take some time to reach the clock pin ofthe FF1.
 That’s means with reference to original clock edge (let’s assume at 0ns), clock
edge will take minimum 1ns and Maximum 2ns to reach the clock pin of the
FF1.
 So in the similar fashion, if we will calculate the total minimum delay and
maximum delay.
 In data path : max delay = (2+11+2+9+2)ns=26ns
 In data path : min delay = (1+9+1+6+1)ns=18ns
 In clock path: max delay= (3+9+3)ns=15ns
 In clock path : min delay = (2+5+2)ns=9ns. In the last 2 example, there were
no delays in the clock path, so it was easy to figure out the minimum clock
period. But in this example we have to consider the delay in the clock path
also.
 So for minimum clock period, we just want to make sure that at FF2, data
should be present at least “tsetup” time before positive clock edge (if it’s a
positive edged triggered flipflop) at the FF2.
 So Clock edge can reach at the FF2 after 9ns/15ns (min/max) with the
reference of original clock edge.
 And data will take time 18ns/26ns (min/max) with the reference of
original clock edge.
 So clock period in all the 4 combinations are:
Clock period (T1)

102
= (Max data path delay)-(max clock path delay)+tsetup
=26-15+4=15ns
Clock period (T2)
= (Min data path delay)-(max clock path delay)+t
=18-15+4=7ns
Clock period (T3)
= (Max data path delay)-(min clock path delay)+tsetup
=26-9+4=21ns
Clock period (T4)
= (Min data path delay)-(min clock path delay)+tsetup
=18-9+4=11ns.
Since we want that this circuit should work in the entire scenario (all combination of
data and clock path delay), so wehave to calculate the period on the basis of that.
Now if you will see all the above clock period, you can easily figure out that if the
clock period is less than21ns, then either one or all of the
scenarios/cases/combinations fail.
So we can easily conclude that for working of the entire circuit properly
Minimum Clock Period = Clock period (T3) = (Max data path delay)-(min clock
path delay)+ t setup=26-9+4=21ns
So in general:
Minimum Clock Period = (Max data path delay)-(min clock path delay) + t setup
And "Maximum Clock Frequency = 1/(Min Clock Period)”
Example 3: Circuit with multiple Combinational paths between 2 FFs:

Now same scenario is with this example. I am not going to explain much in detail.
Just it’s like that if you have multiple paths in between the 2-flipflops, then as we
have done in previous examples, please calculate the delays.

103
Then calculate the time period and see which one is satisfying all the condition. Or
directly I can say that we can calculate the Clock period on the bases of the delay of
that path which has big number.
Min Clock Time Period = Tclk-q (of UFF1) + max(delay of Path1,delay of Path2)
+Tsetup (of UFF3).
Example 4: Circuit with Different kind of Timing paths:

Since I have mentioned that it has different kind of timing path, so you should know
about the timing paths. For that you can refer the (Post link) post. After reading the
Timing path, you can easily figure out that in the above circuit there are 4 types of
data paths and 2 clock paths.
Data path:
1. REGISTER to REGISTER Path
 U2 -> U3 ->U1 (Delay=5+8=13ns)
 U1 -> U4 -> U2 ( Delay=5+7=12ns)
2. Input pin/port to Register(flip-flop)
 U7 -> U4 -> U2 ( Delay=1+7=8ns)
 U7 -> U3 -> U1 ( Delay=1+8=9ns)
3. Input pin/port to Output pin/port
 U7 -> U5 -> U6 (Delay=1+9+6=16ns)
4. Register (flip-flop) to Output pin/port
 U1 -> U5 -> U6 (Delay=5+9+6=20ns)
 U2 -> U5 -> U6 (Delay=5+9+6=20ns)
Clock path:
 U8 -> U1 (Delay = 2ns)
 U8 -> U2 (Delay =2ns)
104
Now few important points- This is not a full chip circuit. In general,
recommendation is that you use REGISTERS at every input and output port.
But for the time being, we will discuss this circuit, considering this as full chip
circuit. And you will how much analysis you have to do in this case. Next example, I
will add the FFs (registers) at input and output port and then you come to know the
difference.
Now let’s Study this circuit in more details.
 In this circuit, we have to do the analysis in such a way that if we will apply an
input at Port A, then how much time it will take to reach at output Port Y. It
will help us to find out the time period of clock.
 Output pin Y is connected with a 3input NAND gate. So if we want a stable
out at Y, we have to make sure that all 3 Inputs of NAND gate should have
stable data.
 One input of NAND gate is connected with Input pin A with the help of U7.
Time take by data to reach NAND gate is 1ns (gate delay of U7).
Second input pin of NAND gate is connected with output pin Q of Flip flop U2.
 Time take by data which is present at input D of FF –U2 to reach NAND
gate:
 2ns(delay of U8)+5ns(Tc2q of FF U2)=7ns.
 Third input pin of NAND gate is connected with the output pin Q of Flip
Flop U1.
 Time take by data which is present at input D of FF –U2 to reach NAND
gate:
 2ns(delay of U8)+5ns(Tc2q of FF U1)=7ns
I know you may have doubt that why delay of U8 comes in picture. With
reference to the clock edge at CLK pin, we can receive the data at NAND pin after
7ns only (Don’t ask me- why we can’t take reference in negative?)
May be you can ask why we haven’t consider the setup time of FF in this
calculation. If in place of NAND gate, any FF would there then we will consider the
setup. We never consider the setup and Tc2q (Clk-2-Q) values of same FF inthe delay
calculation at the same time. Because when we are considering Clk-2-Q delay, we
assume that Data is already present at input Pin D of the FF.
So Time required for the data to transfer from input (A) to output (Y) Pin is the
maximum of:
 Pin2Pin Delay = U7+U5+U6 = 1+9+6=16ns
105
 Clk2Out (through U1) delay = U8 +U1+U5+U6=2+5+9+6=22ns
 Clk2Out (through U2) delay = U8 +U2+U5+U6=2+5+9+6=22ns.
So out of this Clk2Out Delay is Maximum.
From the above Study, you can conclude that data can be stable after 7ns at
the NAND gate and maximum delay is 22ns. And you can also assume that this
much data is sufficient for calculating the Max Clock Frequency or Minimum Time
Period. But that’s not the case. Still our analysis is half done in calculating the Max-
clock-frequency.
As we have done in our previous example, we have to consider the path between 2
flip-flops also.
So the paths are:
From U1 to U2 (Reg1Reg2)
Path delay= 2ns (Delay of U8) + 5ns (Tclk2Q of U1)+7ns (Delay of
U4)+3ns (Setup of U2) – 2ns (Delay of U8)=17ns-2ns=15ns

From U2 to U1 (Reg2Reg1)
Path delay = 2ns (Delay of U8) + Tclk2Q of U2 (5ns) + Delay of U3 (8ns)
+setup of U1 (3ns) – Delay of U8
(2ns) =18ns -2ns = 16ns.
Note:
 I am sure you will ask why did I subtract “Delay of U8” from the above
calculation :) because Delay of U8 is common to both the launch and capture
path (In case you want to know what’s Launch and capture path please follow
this post).
So we are not supposed to add this delay in our calculation. But just to make it clear,
I have added as per the previous logic and then subtracted it to make it clear.
So now if you want to calculate the maximum clock frequency then you have to
consider all the delay which we have discussed above.
So Max Clock Freq = 1/ Max (Reg1Reg2, Reg2Reg1, Clk2Out_1, Clk2Out_2,
Pin2Pin)
= 1/ Max (15, 16, 22, 22, 16)
=1/22 =45.5MHZ.

106
Example 5: Circuit with Different kind of Timing paths with Register at Input
and output ports:

In this example, we have just added 2 FFs U8 at Input pin and U9 at output pin. Now
for this circuit, if we want to calculate the max clock frequency then it’s similar to
example 1.
There are 7 Flip flop to flip flop paths
1. U8 -> U4 -> U2
Delay = 5ns+7ns+3ns=15ns
2. U8 -> U3 -> U1
Delay = 5ns+8ns+3ns=16ns
3. U8 -> U5 -> U9
Delay = 5ns+9ns+3ns=17ns
4. U1 -> U4 -> U2
Delay = 5ns +7ns +3ns = 15ns
5. U1 -> U5 -> U9
Delay= 5ns+9ns+3ns=17ns
6. U2 -> U5 -> U9
Delay=5ns+9ns+3ns=17ns
7. U2 -> U3 -> U1
Delay=5ns+8ns+3ns=16ns
Since the maximum path delay is 17ns,
The Minimum clock period for the circuit should be T min = 17 ns
And the Maximum clock frequency is 1/T min = 58.8 MHZ.
107
SETUP AND HOLD VIOLATIONS
EDA tools usually take care but still you have to provide the input (or say proper
switch) to fix these violations. That means I can say that “Timing/Routing Tools are
enough intelligent to solve most of the timing violation, but still Tools never be more
Intelligent than the human brain”.
There are different ways to fix these issues and every way has the reason for that. So
designers should know what exactly the reason of Issue and what are the different
methods (priority wise) or at least different EDA’s switch for fixing those violation.
In this series we will discuss the following things one by one.
Basic of Fixing the SETUP and HOLD violations.
 More Examples here. Very less theory.
 Few shortcuts/formula/tricks to find out whether these violations are fixable
or not. And If fixable, then a rough idea where and how.
Different ways to fix.
 Their basics or say physics/Engineering of using that method for fixing.
 Which method is good and in what scenario you can use them.
Example 1:

Let’s discuss the flow of the data from FF1 to FF2


 Data is going to launch from FF1 at +ve Clock Edge at 0ns and it will reach
to FF2 after 0.5ns (combinational logic delay only).
 This data is going to capture at FF2 at +ve Clock Edge at 10ns.
 As per the Setup definition, data should be stable 2ns (Setup time of FF2) at
FF2 before the +ve Clock Edge (which is at 10ns)
In the above case – data become stable 9.5ns before the Clock edge at 10ns
(10ns – 0.5ns). That means it satisfy the Setup condition. NO SETUP VIOLATION.
 At the FF1 – second set of data is going to launch at t=10ns and it will
reach the FF2 in another 0.5ns, means at t=10.5ns.
 This second set of data is going to update/override the first set of data.
 As per the Hold Definition, data should be stable till 1ns (Hold time of FF2)
at FF2 after the clock edge (which is at t=10ns)

108
In the above case – first set of data is going to override by second set of data at 10.5ns
(means just after 0.5ns of the +ive Clock edge at FF2). This means it is not satisfying
the hold condition. HOLD VIOLATION.
To fix this Hold violation – we have to increase the delay of the Data path so that the
second set of data should not reach before t=11ns (10ns+1ns). That means the
minimum delay of the Combinational Logic Path should be 1ns for NO HOLD
VIOLATION.
That means if you want to fix the HOLD violation, you can increase the Delay of the
Data path by any method.
But it doesn’t mean that you can increase the Delay by any Value. Let’s assume that
you have increased the delay of combinational path by adding extra buffer (with delay
of 8.5ns).
Now new specifications are

As per the Setup definition, data should be stable 2ns (Setup time of FF2)
before the Clock Edge (at FF2 which is at 10ns) and with the updated specification –
data will be stable at t=9ns, just 1ns before the Clock edge at t=10ns at FF2. That
means it is not satisfying the Setup condition. SETUP VIOLATION.
 Since Data path delay is more than 1ns, there is NO HOLD VIOLATION
(just we have discussed few paragraph above)
 So it means that if we want to fix the setup violation, the Delay of the
combinational path should not be more then 8ns (10ns – 2ns). Means 8ns is
the maximum value of the Delay of the Combinational Logic path for NO
SETUP VIOLATION.
 So we can generalize this – For NO HOLD and SETUP VIOLATION, the
delay of the path should be in between 1ns and 8ns. OR
For Violation free Circuit:
Min delay of Combinational path > Hold time of Capture FF.
Max delay of Combinational path < Clock Period - Setup time of Capture FF.

109
Flow of the data from FF1 to FF2:
 Data is going to launch from FF1 at Clock Edge at 0ns and it will reach to FF2
after 0.5ns (combinational logic delay only).
 This data is going to capture at FF2 at Clock Edge at 10ns.
 As per the Setup definition, data should be stable 6ns (Setup time of FF2)
before the Clock Edge (which is at 10ns)
 In the above case – data become stable 9.5ns before the Clock edge at 10ns
(10ns – 0.5ns). That means it satisfy the Setup condition. NO SETUP
VIOLATION.
At the FF1 – second set of data is going to launch at t=10ns and it will reach the FF1
in another 0.5ns, means at t=10.5ns.
This second set of data is going to update/override the first set of data.
As per the Hold Definition, data should be stable till 5ns (Hold time of FF2) after the
clock edge (which is at t=10ns) at FF2
In the above case – first set of data is going to override by second set of
data at 10.5ns (means just after 0.5ns of the Clock edge at FF2). This means it is not
satisfying the hold condition. HOLD VIOLATION.
To fix this Hold violation – (As per the previous example) we may increase
the delay of the Data path, so that the second set of data should not reach before
t=15ns (10ns+5ns). That means the minimum delay of the Combinational Logic Path
should be 5ns for NO HOLD VIOLATION.
But Now if you will verify the Setup condition once again (with
combination delay of 5ns- which we have assumed for fixing the hold violation) then
you come to know that data is going to stable only after 5ns (means 10ns-5ns = 5ns
before the clock edge at t-10ns). But as per the setup condition data should be stable
before 6ns. So it means now it’s not satisfying Setup Condition. Means SETUP
VIOLATION.
But Now if you will verify the Setup condition once again (with
combination delay of 5ns- which we have assumed for fixing the hold violation) then
you come to know that data is going to stable only after 5ns (means 10ns-5ns = 5ns
before the clock edge at t-10ns). But as per the setup condition data should be stable

110
before 6ns. So it means now it’s not satisfying Setup Condition. Means SETUP
VIOLATION.
So in this scenario, we can’t fix the setup and hold violation at the same time by
adjusting the delay in the combinational logic.
You can also see it directly with the help of minimum and maximum value of
combinational delay.
Min delay > Hold time of Capture FF (means 5ns)
Max Delay < Clock Period – Setup time of capture FF (Means 10ns – 6ns = 4ns)
So Min delay > 5ns and Max Delay < 4ns which is not possible.
Now the point is how to fix these violations? Actually this is a non-fixable issue until
you just change the clock frequency or replace the FF with lesser setup/hold value.
Let me explain this.
Min delay has dependence only on Hold time, which is fixed for a particular FF.Max
delay has dependence on 2 parameters – Clock Period and Setup time - where Setup
time is fixed for a particular FF. So if you can change the FF with lower setup/hold
violation, then you can fix this issue. But in case if that’s not possible then We have to
change the Clock period.
In case we are changing the Clock period:
Keep -- Min delay >= 5ns ( No HOLD Violation)
Setup violation is by 6ns-5ns =1ns (6ns= Setup time and 5ns = combinational delay).
What if we will increase the Clock period by 1ns.
Means New clock period should be > 11ns.
So for Clock Period 11ns:
Max delay <= Clock period (11ns) – Setup time (6ns) =5ns.
Now - Max Delay=Min Delay = 5ns. (Neither Hold nor Setup Violation.)
We can generalize-
For Violation Free Circuit
Clock Period >= Setup time + Hold time.
Summary:
Min delay of Combinational path > Hold time of Capture FF.
Max delay of Combinational path < Clock Period - Setup time of Capture FF.
Clock Period >= Setup time + Hold time.

111
Clock Period Condition: (Satisfied)
Setup time +Hold time = 5ns
Clock period = 10ns
Clock Period > Setup time +Hold time (10> 5)
Min delay / Hold Condition: (Satisfied)
Combinational Delay (5ns) > Hold time.
Means - NO HOLD VIOLATION
Max Delay / Setup Condition: (Satisfied)
Combinational delay (5ns) < Clock period (10ns) – Setup (3ns)
Means - NO SETUP VIOLATION.

Clock Period Condition: (Satisfied)


Setup time +Hold time = 4ns+3ns = 7ns
Clock period = 10ns
Clock Period > Setup time + Hold time (10 > 7)

Min delay / Hold Condition: (Satisfied)


Combinational Delay (8ns) > Hold time (3ns)
Means - NO HOLD VIOLATION

Max Delay / Setup Condition: (Not Satisfied)


Combinational delay (8ns) Is Not Less Than “Clock period (10ns) – Setup (4ns)”
Means - SETUP VIOLATION.
Since we can’t change this Combinational delay and also Setup time for the FF, so we
have to think something else since we can’t touch the data path, we can try with clock
path.

112
Flow of the data from FF1 to FF2:
Let’s assume that you have added one buffer of T_capture delay in the clock path
between the FF1 and FF2.
Data is going to launch from FF1 at Clock Edge at 0ns and it will reach to FF2 after
8ns (combinational logic delay only).
This data is going to capture at FF2 at Clock Edge at 10ns+T_capture. (because of
Delay added by Buffer).
As per the Setup definition, data should be stable 4ns (Setup time of FF2) before the
Clock Edge at FF2 and in the above case clock edge is at t=T_capture+10ns.
So, for No Setup violation:
=> 8ns (Combinational Delay) < T_capture+10ns (clock period) – 4ns (Setup Time of
FF2)
=> 12ns – 10ns < T_capture
=> T_capture > 2ns.
Let’s assume if my T_capture = 3ns. Then NO SETUP VIOLATION.
Now, recheck the Hold violation.
At the FF1 – second set of data is going to launch at t=10ns and it will
reach the FF2 in another 8ns, means at t=18ns.This second set of data is going to
update/override the first set of data present at FF2.
As per the Hold Definition, data should be stable till 3ns (Hold time of FF2)
after the clock edge at FF2 (Which is at t=10ns+3ns=13ns – where 3ns is the
T_capture).
That means Data should be remain stable till t=13ns+3ns=16ns.
In the above case the second set of data is going to override only after
t=18ns. That means first set of data remain Stable till 16ns. Means NO HOLD
VIOLATION.
Let me Generalize this concept:
I am sure, few people may have question that what will happen if we will add the
buffer in the Launch path. Let’s discuss that. Please consider the following Diagram
for this. In this Launch Clock path has a buffer with a delay of “T_launch” and
Capture clock path has another buffer of delay “T_capture”.

113
Let’s understand the data flow from FF1 to FF2
 Data is going to launch from FF1 at Clock Edge at T_launch and it will reach
to FF2 after Td (combinational logic delay only) that means t= “Td +
T_launch”.
 This data is going to capture at FF2 at Clock Edge at “Clk_period +
T_capture”
 As per the Setup definition, data should be stable “T_setup” (Setup time of
FF2) time before the Clock Edge at FF2
Means data should reach at FF2 before t= “Clk_period + T_capture – T_setup”.
So For NO SETUP VIOLATION:
=> T_launch + Td < Clk_period + T_capture – T_setup
=> Td < Clk_Period + (T_capture - T_launch) – T_setup

114
At the FF1 – second set of data is going to launch at t= “Clk_Period + T_launch” and
it will reach the FF2 in another Td, means at t=” Clk_Period + Td + T_launch”.
This second set of data is going to update/override the first set of data present at FF2.
As per the Hold Definition, data should be stable till “T_hold” (Hold time of FF2)
time after the Clock edge (which is at t= “Clk_Period + T_capture”). Means Next set
of data should not reach FF2 before t= “Clk_Period + T_capture + T_hold”
So For NO HOLD VIOLATION:
=> Clk_Period + Td + T_launch > Clk_Period + T_capture + T_hold
=> Td > (T_capture - T_launch ) + T_hold.

On the basic of last post …let’s start with checking few conditions directly.
Clock Period Condition: (Satisfied)
Setup time +Hold time = 5ns
Clock period = 10ns
Clock Period > Setup time +Hold time (10> 5)

Min delay / Hold Condition: (Satisfied)


Combinational Delay (11ns) > Hold time.
Means - NO HOLD VIOLATION

Max Delay / Setup Condition:


Combinational delay (11ns) Is Not Less Than “Clock period (10ns) – Setup (3ns)”
Means - SETUP VIOLATION.
Since adding delay in the data path is not going to fix this violation and we can’t
reduce the combinational delay. So as we have discussed in our last post, we will try
with Clock path. From the last post, if T_capture is the delay of buffer which is
inserted between the CLK and Capture’s FF and T_launch is the delay of buffer
which is inserted between the CLK and Launch’s FF, then

115
Max Delay /Setup condition is :
Td < Clock Period + (T_capture - T_launch) – T_setup
=> 11ns < 10ns – 3ns + (T_capture - T_launch)
=> 11ns < 7ns + (T_capture - T_launch)
=> 4ns < (T_capture - T_launch)
Now we can choose any combination of T_capture and T_launch such that
their difference should be less than 4ns.
Note: Remember in the design if you are fixing the violation by increasing or
decreasing the delay in the Clock path then always prefer not to play too much with
this path. I never prefer to use T_launch in this case (For setup fixing, I ignore to use
T_launch).
So let’s assume T_launch =0ns and T_capture = 5ns
Then
11ns < 7ns + 5ns means no Setup Violation.
Check once again the Hold condition.
Min delay / Hold Condition:
Td > (T_capture - T_launch ) + T_hold
=> 11ns > (T_capture - T_launch ) + T_hold
=> 11ns > 5ns + 2ns
=> 11ns > 7ns – Means No Hold Violation.

Let’s check the conditions directly.


Clock Period Condition (Satisfied):
Setup time +Hold time = 8ns
Clock period = 10ns
Clock Period > Setup time +Hold time (10ns > 8ns )
Means we can fix violations, if there is any.

116
Max Delay/ Setup Condition (Satisfied):
Td < Clk_Period + (T_capture - T_launch) – T_setup
Combinational Delay = 2ns
There is no delay in the clock path till now, so T_capture=T_launch=0ns
=> Td (2ns) < Clk_period (10ns) + 0ns – T_setup (3ns)
=> 2ns < 7ns – Means NO SETUP Violations

Min Delay / Hold Condition (Not Satisfied):


Td > (T_capture - T_launch ) + T_hold
Combinational Delay = 2ns
There is no delay in the clock path till now, so T_capture=T_launch=0ns
=> Td (2ns) is not greater than 0ns + T_hold (5ns)
Means HOLD VIOLATION.

Since we can’t make change in the delay path, so we have to touch the
clock path.
For Hold fixing -
=> Td > (T_capture - T_launch ) + T_hold
=> 2ns > (T_capture - T_launch ) + 5ns
=> -3ns > (T_capture - T_launch )

For Satisfying the above equation T_launch should have more value in comparison to
T_capture.
We can choose any combination of T_capture and T_launch.
Note: Remember in the design if you are fixing the violation by increasing or
decreasing the delay in the Clock path then always prefer not to play too much with
this path.
I will never prefer to use T_capture in this case (For Hold fixing, I ignore to use
T_capture).
So let’s assume T_capture =0ns and T_launch = 4ns
Then

T_launch + Td > 5ns (hold time)


=> 4ns +2ns > 5ns NO HOLD Violation.
Check once again the Setup Condition:

117
Td < Clock Period + (T_capture - T_launch) – T_setup
=> 2ns < 10ns + 0ns -4ns – 3ns
=> 2ns < 3ns Means No Setup Violation.

Note: (T_capture - T_launch) also known as CLOCK SKEW. I will explain this later
in this blog. Right now, it’s Just for your
info.

Note: this is the same example which we have discussed in the part-6a. Let’s check all
the conditions one by one.

Clock Period Condition (Not Satisfied):


Setup time +Hold time = 11ns
Clock period = 10ns
Clock Period is not greater than Setup time +Hold time
Means we can’t fix violations, if there is any. But still we will try once again with all
other conditions, just to prove that above mention condition should be true for fixing
the violations.

Max Delay/ Setup Condition (Satisfied):


Td < Clk_Period + (T_capture - T_launch) – T_setup
Combinational Delay = 0.5ns
There is no delay in the clock path till now, so T_capture=T_launch=0ns
=> Td (0.5ns) < Clk_period (10ns) + 0ns – T_setup (6ns)
=> 0.5ns < 4ns – Means NO SETUP Violations

Min Delay / Hold Condition (Not Satisfied):


Td > (T_capture - T_launch ) + T_hold
Combinational Delay = 0.5ns
There is no delay in the clock path till now, so T_capture=T_launch=0ns
=> Td (0.5ns) is not greater than 0ns + T_hold (5ns)
Means HOLD VIOLATION

118
If you want to fix the Hold violation, then we have already seen that by
increasing/decreasing the delay in the data path it can’t be fixed. Even if this will
fixed, then Setup violation will occur.
Let’s Try with T_capture or T_launch. Means by adding delay in the clock circuit.
As per the above equations/conditions and corresponding values:

Max Delay/ Setup Condition :


Td < Clock Period + (T_capture - T_launch) – T_setup
=> Td < 10ns -6ns + (T_capture - T_launch)
=> Td < 4ns + (T_capture - T_launch)

Min Delay / Hold Condition:


Td > (T_capture - T_launch ) + T_hold
=> Td > (T_capture - T_launch ) + 5ns
Remember all 3 variable Td,T_capture,T_launch are positive number.
Possible values of (T_capture - T_launch) = +/-A (where A is a positive number)

Case 1: (T_capture - T_launch) = +A


=> Td < 4ns+A - Condition (a)
=> Td> 5ns+A – Condition (b)
Satisfying both the conditions (“a” and “b” ) not possible for any +ive value of A.

Case 1: (T_capture - T_launch) = -A


=> Td< 4ns-A => Td+A < 4ns - Condition (a)
=> Td> 5ns-A => Td +A > 5ns - Condition (a)
Satisfying both the conditions (“a” and “b”) not possible for any +ive value of A.
That means, I am successfully able to prove that if following condition is not satisfied
then you can’t fix any type of violation by increasing/decreasing delay in either
data_path or clock_path.

Clock Period Condition:


Clock period > Setup time + Hold Time
For fixing any type of violation (without changing Clock period) - This condition
should be satisfied.
Max Delay/ Setup Condition:
119
Td < Clk_Period + (T_capture - T_launch) – T_setup
For Fixing the Setup Violation – Always prefer T_capture over T_launch

Min Delay / Hold Condition:


Td > (T_capture - T_launch ) + T_hold
For Fixing the hold Violation – Always prefer T_launch over T_capture.

Methods for Increase/Decrease the Delay of Circuit (Effect of


Wire Length On the Slew)
Till now we have discussed the Ideal scenario for few of the cases. Like No Clock-to-
Q delay, No Net Delay.
But now we will discuss about those parameter also.
First understand/revise what are the different types or forms of Delay into a circuit.
In FFs:
 Clock to Q delay
Propagation delay of sequential flip flop
 Time taken to charge and discharge the output load (capacitance) at Pin Q.
Rise time and Fall time delay
Combinational Circuit:
Cell delay
 Delay contributed by Gate itself.
 Typically defined as 50% input pin voltage to 50% output voltage.
 Usually a function of Both Output Loading and Input Transition time.
 Can be divide into propagation delay and transition delay.
 Propagation delay is the time from input transition to completion of a
specific % (e.g 10%) of the output transition.
 Propagation delay is function of output loading and input transition time.
 Transition Delay is the time for an output pin to change the stage.
 Transition delay is function of capacitance at the output pin and can also be
a function of input transition time.
 Time taken to charge and discharge the output load (capacitance) of the
Cell output.

120
Net Delay:
RC delay.
Long wire has more delay in comparison to short wire.
 More coupling means more delay.
 Now we will discuss different techniques to increase or decrease the delay in
the design. We will also discuss the basics of different techniques, which will
help us to understand why we are using any particular technique.
 Now we have to see what best we can do to remove these violations or as
explained earlier – How can we increase or decrease the delay of the clock
or data path in the design. If I will ask you, then might be you can tell me
10 ways to do so.
 But I don’t want to explain in that way. Let’s start one by one with basics and
then in the last I will brief all those points.
Let’s talk about the Transition delay first. There are 2 types of transition delays. Rise
Delay and Fall delay. In terms of definition.
Rise Time Delay (tr): The time required for a signal to transition from 10% of its
maximum value to 90% of its maximum value.
Fall Time Delay (tf): The time required for a signal to transition from 90% of its
maximum value to 10% of its maximum value.
Basically these times (rise time and fall time) are related to the Capacitance
Charging and Discharging time.
So when capacitance is charging just because of any change in the input
voltage then time taken by capacitance to reach from10% to 90% of maximum value
is known as rise time. Since this time (rise time) is going to introduce the delay in the
circuit in comparison to the Ideal scenario (Capacitance charging time is Zero – It can
charge instantly), it’s known as Rise Time Delay also.
Similarly, during the discharging of the capacitance from 90% to 10% of its
maximum value, it’s going to add one more delay – known as Fall Time Delay.
Following figure is just an example of rise time and fall time.
Note: Transition time is also known as Slew.

121
So we can say that Capacitance (and the associated Resistance) is the culprit. J And if
we can play with capacitance/resistance, we can increase and decrease Transition
Delay.
Now, whenever we are talking about any signal which is changing its state from “0”
to “1” or from “1” to “0”, we are sure that it can’t be ideal (Ideal means its changing
its state in Zero “0” time).
Every “state changing signal” has a Slew Number (common name of Rise time and
Fall time) associated with itself at any given point of time.
In the below figure you can observe, how the step waveform (consider this as ideal
one) degrades from the start to the end of the wire (color coding can help you to
understand) and this is resulting a considerable amount of delay for long wires. That
means if wire length is less, then degradation of waveform be less, means less
effective delay and Vice-versa. We can conclude from this-
“If we want to increase the delay- we can increase the wire length and vice versa”.

122
I am sure you can cross question me that why this degradation is happing.
Simple Ans is - you can model a wire into a series of Resistance and Capacitance
network. For more detail please refer following post Interconnect Delay Models.
Note: This delay is also known as Net delay/Wire Delay/Interconnect Delay.

123
Methods for Increase/Decrease the Delay of Circuit (Effect of
Size of the Transistor On the Slew)
Size of transistor:
There are 2 parameters – Width and Length, by which you can decide the size of the
transistor. For a particular technology – Channel - length is almost constant. So it
means Width is going to decide the size of the transistor. Below figure will refresh
your memory - about which, parameter I am talking.

If you want to increase the width of the transistor, then you have 2 options.
One – Just increase the Width directly, Second -connect multiple transistors
in parallel in such a way that their effective impact remains same. For example – if
you want to manufacture a transistor with a width of 20um and a length of 0.2um then
it’s similar (not exactly the same) to having four transistors connected in parallel, each
with a width of 5um and a length of 0.2um. Here I am not going to discuss the
difference in both the way of representation of Layout.
Below figure will refresh your memory

Now since we are talking about the transition time /transition delay /slew, we know
that it depend on the capacitance and resistance. So before we start to discuss how
width (means size of the transistor) impact on the transition delay, we should know
what all are the capacitance associated with the transistor. Below diagram help you in
that.

124
How the capacitance are calculated

Note: the reference of above formulas is from the book written by “J.P.uyemura” -
Cmos Logic Circuit Design Edition -2002.
Now from the above, you can see that Gate Capacitance (this gate capacitance has 3
component – Gate to Base, Gate to Source and Gate to Drain) has dependence on the

125
Width of the Channel (W). So it means, if you increase the width, Gate Capacitance
will increase and Vice-Versa.
Source and Drain Capacitance has a multiplying factor As and Ad (which is
equavilant to WxLs or WxLd). It means source and drain capacitance also increases
with Width of the Channel and Vice-Versa.
Now let’s talk about the Resistance. Below Resistance formula is with respect to
NMOS. You can derive similar formula for
PMOS also (Just replace subscript “n” with “p” ).

Now here the Resistance is inversely proportional to Width of the Transistor.


Effect of Device Size on the Slew (transition time) and Propogation Delay.:
I can’t write in a single line the effect of size of transistor on the slew because it’s not
straight forward (I know you might have doubt on my statement ). There are some
other factors which we have to consider. I hope, below paragraph helps you to
understand the same.

126
 Capacitance Cgd12 is the Gate Capacitance of Driving Gate A due to overlap
in M1 and M2.
 Cdb1 and Cdb2 are the diffusion capacitances due to the reverse-biased pn-
junction
 Cw is the wiring capacitance (pp, fringe, and interwire) that depends on the
length and width of the connecting wire. It is a function of the fanout of the
gate and the distance to those gates.
 Cg3 and Cg4 are the gate capacitance of the fanout gate (Driven gate).
If we increase the size of the transistor (Width of the Transistor) it’s current
carrying capability increase. Means “larger is the size of a transistor, the larger is the
driving capability (the ability to source or sink current) of a transistor”. Thus a larger
transistor would normally make its output transition faster (when output load is
constant). The output load of a driving gate consists of the source/drain capacitance of
the driving gate, the routing capacitance of wire, and the gate capacitance of the
driven gate.
The larger is the output load, the longer is the time to charge or discharge it.
This would increase the transition (rise or fall) timeand propagation delay.
Let me summarize few important points.
On increasing the Size of Gate A –
 On Resistance Decreases (R - inversely proportional to W)
 Means large Driving capability (Ability to source or sink current)

127
 Decrease the time to charge the output load (capacitance) (Consists of
source/drain capacitance of the driving gate, the routing capacitance of
wire, and the gate capacitance of the driven gate) **
Means - Output Transition time of Gate A and Input Transition time for Gate B
decrease.
I am sure you have noticed that I have marked point 3 with ** because
there are terms and conditions. :).
On increasing the Size of the Gate A – Source/Drain Capacitance also
increases which are the part of output load of Gate A.
Means it’s going to increase the output load. That means as I have mentioned in
my point no 3 – that can be possible only when S/D Capacitance of Driving gate
are not dominating the rest of the Capacitance. Which is only possible when
either “Net capacitance is large” (length of wire is large) or “Size of the driven
gate (Gate B) is large” (which increase the Gate capacitance
of GateB) or “Both should be true”.
So for Minimizing Propagation Delay, A fast Gate/Cell is required, which is only
possible by
1. Keeping the output capacitance CL small (it decreases the charging and
discharging time).
And for this
 Minimize the area of drain pn junctions. (Decrease W)
 Minimize Interconnect capacitance. (Decrease wire/net Length)
 Avoid large fan-out. Means Minimize gate capacitance of Driven Cell.
(Decrease W of Driven cell)
2. Decreasing the equivalent Resistance of the transistors
 Decrease L (For a particular technology Node It’s almost constant)
 Increase W
But this increases pn junction area and hence CL.
So if we want to use the size of the transistor as one of the parameter to
increase/decrease of the propagation/transition delay, then we should have
understanding of the design and also it depends on the property of Driven Cell
and Net length also.

128
Few last points:
1. "Delay reduces with increase in input transition and constant load
capacitance".
2. "Delay increases with increase in output capacitance and constant input
transition"
Because on increasing the output capacitance – charging and discharging time
will increase.
ELECTRO MIGRATION
Electromigration
Electromigration (EM) is generally considered to be the result of momentum
transfer from the electrons due to high current density. Atoms get displaced from
their original position causing voids(opens) & hillocks(shorts) in the metal layer
Joule heating also accelerates EM because higher temperatures cause a higher
number of metal ions to diffuse. Under extreme joule heating, melting can occur.
EM causing opens EM causing shorts
Cell EM

129
DIFFERENT TYPES OF FILE FORMATS AND
THEIR MEANINGS IN VLSI..

There are different type of the files generated during a design cycle or data
received by the library vendor/foundry. Few of them having specific extension.
Just to know the extension, you can easily identity the type of content in that file.
*.ddc - Synopsys internal data base format. This format is recommended by
Synopsys to hand gate-level netlists. File Extensions:

*.v - Verilog source file. Normally it’s a source file your write. Design
Compiler, and IC Compiler can use this format for the gate-level netlist.

*.vg, .g.v - Verilog gate-level netlist file. Sometimes people use these file
extension to differentiate source files and gate-level netlists.

*.svf - Automated setup file. This file helps Formality process design changes
caused by other tools used in the design flow. Formality uses this file to assist the
compare point matching and verification process. This information facilitates
alignment of compare points in the designs that you are verifying. For each
automated setup file that you load, Formality processes the content and stores the
information for use during the name-based compare point matching period.

*.vcd - Value Change Dump format. This format is used to save signal
transition trace information. This format is in text format, therefore, the trace file
in this format can get very large quickly. There are tools like vcd2vpd, vpd2vcd,
and vcd2saif switch back and forth between different formats.

*.vpd - VCD Plus. This is a proprietary compressed binary trace format from
Synopsys. This file format is used to save signal transition trace information as
well.

*.saif - Switching Activity Interchange Format. It’s another format to save


signal transition trace information. SAIF files support signals and ports for
monitoring as well as constructs such as generates, enumerated types, records,
array of arrays, and integers.

*.tcl - Tool Command Language (Tcl) scripts. Tcl is used to drive Synopsys
tools.
130
*.sdc - Synopsys Design Constraints. SDC is a Tcl-based format. All commands
in an SDC file conform to the Tcl syntax rules. You use an SDC file to
communicate the design intent, including timing and area requirements between
EDA tools.
An SDC file contains the following information: SDC version, SDC units, design
constraints, and comments.
*.lib - Technology Library source file. Technology libraries contain information
about the characteristics and functions of each cell provided in a semiconductor
vendor’s library.
Semiconductor vendors maintain and distribute the technology libraries. In
our case the vendor is Synopsys. Cell characteristics include information such as
cell names, pin names, area, delay arcs, and pin loading. The technology library
also defines the conditions that must be met for a functional design (for example,
the maximum transition time for nets).These conditions are called design rule
constraints. In addition to cell information and design rule constraints,
technology libraries specify the operating conditions and wire load models
specific to that technology.
*.db - Technology Library. This is a compiled version of *.lib in Synopsys
database format.

*.plib - Physical Library source file. Physical libraries contain process


information, and physical layout information of the cells. This information is
required for floor planning, RC estimation and extraction, placement, and
routing.
*.pdb - Physical Library. This is a compiled version of *.plib in Synopsys
database format.

*.slib - Symbol Library source file. Symbol libraries contain definitions of the
graphic symbols that represent library cells in the design schematics.
Semiconductor vendors maintain and distribute the symbol libraries. Design
Compiler uses symbol libraries to generate the design schematic.
You must use Design Vision to view the design schematic. When you generate
the design schematic, Design Compiler performs a one-to-one mapping of cells
in the netlist to cells in the symbol library.

131
*.sdb - Symbol Library. This is a compiled version of *.slib in Synopsys
database format.

*.sldb - DesignWare Library. This file contains information about DesignWare


libraries.

*.def - Design Exchange Format. This format is often used in Cadence tools to
represent physical layout. Synopsys tools normally use Milkyway format to save
designs.

*.lef - Library Exchange Format. Standard cells are often saved in this format.
Cadence tools also often use this format. Synopsys tools normally use Milkyway
format for standard cells.

*.rpt - Reports. This is not a proprietary format, it’s just a text format which
saves generated reports by the tools when you use the automated makefiles and
scripts.

*.tf - Vendor Technology File. This file contains technology-specific


information such as the
names, characteristics (physical and electrical) for each metal layer, and design
rules. These information are required to route a design.

*.itf - Interconnect Technology File. This file contains a description of the


process crosssection and connectivity section. It also describes the thicknesses
and physical attributes of the conductor and dielectric layers.

*.map - Mapping file. This file aligns names in the vendor technology file with
the names in the process *.itf file.

*.tluplus - TLU+ file. These files are generated from the *.itf files. TLUPlus
models are a set of models containing advanced process effects that can be used
by the parasitic extractors in Synopsys place-and-route tools for modeling.
*.spef - Standard Parasitic Exchange Format. File format to save parasitic
information extracted by the place and route tool.

132
*.sbpf - Synopsys Binary Parasitic Format. A Synopsys proprietary
compressed binary format of the *.spef. Size of the file shrinks quite a bit using
this format.

*.mw-( Milkyway database) The Milkyway database consists of libraries that


contain information about your design. Libraries contain information about
design cells, standard cells, macro cells, and so on. They contain physical
descriptions, such as metal, diffusion, and polygon geometries.
Libraries also contain logical information (functionality and timing
characteristics) for every cell in the library. Finally, libraries contain technology
information required for design and fabrication.
Milkyway provides two types of libraries that you can use: reference
libraries and design libraries. Reference libraries contain standard cells and hard
or soft macro cells, which are typically created by vendors. Reference libraries
contain physical information necessary for design implementation.
Physical information includes the routing directions and the placement
unit tile dimensions, which is the width and height of the smallest instance that
can be placed. A design library contains a design cell. The design cell might
contain references to multiple reference libraries (standard cells
and macro cells). Also, a design library can be a reference library for another
design library.
The Milky way library is stored as a UNIX directory with
subdirectories, and every library is managed by the Milkyway Environment. The
top-level directory name corresponds to the name of the Milkyway library.
Library subdirectories are classified into different views containing the
appropriate information relevant to the library cells or the designs. In a Milkyway
library there are different views for each cell, for example, NOR1.CEL and
NOR1.FRAM. This is unlike a .db formatted library where all the cells are in a
single binary file. With a .db library, the entire library has to be read into
memory. In the Milky way Environment, the Synopsys tool loads the library data
relevant to the design as needed, reducing memory usage.
The most commonly used Milkyway views are CEL and FRAM. CEL is the full
layout view, and FRAM is the abstract view for place and route
operations.

133
simv - Compiled simulator. This is the output of vcs. In order to simulate, run
the simulator by ./simv at the command line.

alib-52 - characterized target technology library. A pseudo library which has


mappings from Boolean functional circuits to actual gates from the target library.
This library provides Design Compiler with greater flexibility and a larger
solution space to explore tradeoffs between area and delay during optimization.

DIFFERENT TYPES OF CELLS IN VLSI PHYSICAL DESIGN

Tap cell, Decap cell and end cap cells


134
Well Tap Cells
These library cells connect the power and ground connections to the substrate
and n wells, respectively. By placing well taps at regular intervals throughout the
design, the n-well potential is held constant for proper electrical functioning.
The placer places the cells in accordance to the specified distances and
automatically snaps them to legal positions (which are the core sites).
End Cap Cells
These library cells do not have signal connectivity. They connect only to
the power and ground rails once power rails are created in the design. They also
ensure that gaps do not occur between the well and implant layers. This prevents
DRC violations by satisfying well tie-off requirements for the core rows. Each
end of the core row, left and right, can have only one end cap cell specified.
However, you can specify a list of different end caps for inserting
horizontal end cap lines, which terminate the top and bottom boundaries of
objects such as macros. A core row can be fragmented (contains gaps), since
rows do not intersect objects such as power domains. For this, the tool places end
cap cells on both ends of the un fragmented segment.
Decap cells:
cells are temporary capacitors added in the design between power and ground
rails to counter functional failures due to dynamic IR drop.
Dynamic I.R. drop happens at the active edge of the clock at which a high
percentage of Sequential and Digital elements switch. Due to this simultaneous
switching a high current is drawn from the power grid for a small duration. If the
power source is far away from a flop the chances are that this flop can go into a
metastable state due to IR Drop. To overcome this de-caps are added. At an
active edge of clock when the current requirement is high ,these decaps discharge
and provide boost to the power grid. One caveat in usage of decaps is that these
add to leakage current. De caps are placed as fillers. The closer they are to the
flop’s sequential elements, the better it is. Decap cells are typically poly gate
transistors where source and drain are connected to the ground rail, and the gate
is connected to the power rail.
When there is an instantaneous switching activity the charge required
moves from intrinsic and extrinsic local charge reservoirs as oppose to voltage
sources. Extrinsic capacitances are decap cells placed in the design. Intrinsic

135
capacitances are those present naturally in the circuit, such as the grid
capacitance, the variable capacitance inside nearby logic, and the neighborhood
loading capacitance exposed when the P or N channel are open.
One drawback of decap cells is that they are very leaky, so the more decap
cells the more leakage. Another drawback, which many designers ignore, is the
interaction of the decap cells with the package RLC network. Since the die is
essentially a capacitor with very small R and L, and the package is a hug RL
network, the more decap cells placed the more chance of tuning the circuit into
its resonance frequency. That would be trouble, since both VDD and GND will
be oscillating. I have seen designs fail because of this Designers typically place
decap cells near high activity clock buffers, but I recommend a decap
optimization flow where tools study charge requirements at every moment in
time and figure out how much decap to place at any node. This should be done
while taking package models into account to ensure resonance frequency is not
hit.

Designing a robust clock tree structure

136
Clock tree synthesis (CTS) is at the heart of ASIC design and clock
tree network robustness is one of the most important quality metrics of SoC design.
With technology advancement happened over the past one and half decade, clock
tree robustness has become an even more critical factor affecting SoC performance.
Conventionally, engineers focus on designing a symmetrical clock tree with
minimum latency and skew. However, with the current complex
design needs, this is not enough.
Today, SoCs are designed to support multiple features. They have
multiple clock sources and user modes which makes the clock tree architecture
complex. Merging test clocking with functional clocking and lower technology
nodes adds to this complexity. Due to the increase in derate numbers and additional
timing signoff corners, timing margins are shrinking.
To meet the current requirements, designs that are timing friendly are
needed and provide minimum power dissipation. This article describes the factors
which a designer should consider while defining clock tree architecture. It presents
some real design examples that illustrate how current EDA tools or conventional
methodologies to design clock trees are not sufficient in all cases. A designer has to
understanding the nitty -gritty of clock tree architecture to be able to guide an EDA
tool to build a more efficient clock tree. First, the basics of CTS and requirements for
good clock tree are presented.
Clock tree quality parameters
The primary requirements for ideal synchronous clocks are:
1. Minimum Latency – The latency of a clock is defined as the total time that a
clock signal takes to propagate from the clock source to a specific register clock pin
inside the design. The advantages of building a clock with minimum latency are
obvious – fewer clock tree buffers, reduced clock power dissipation, less routing
resources and relaxed timing closure.
2. Minimum skew – The difference in arrival time of a clock at flip-flops is
defined as skew. Minimum skew helps with timing closure, especially hold timing
closure.
However there is a word of caution - targeting too aggressive minimum
skews can be counterproductive because it may not help meeting hold timing but it
can end up having other problems like increasing overall clock latency and
increasing uncommon paths between registers in order to achieve minimum skew.

137
3. Duty Cycle – Maintaining a good duty cycle for the clock network is another
important requirement. Many sequential devices, like flash, require minimum pulse
width on the input clock to ensure error-free operation. Moreover many IO interfaces
like DDR and QSPI can work on both edges of clock. A clock tree must be designed
with these considerations and symmetrical cells having similar rise-fall delays should
be used to build the clock tree.
4. Minimum Uncommon path - The logically connected registers must have
minimum uncommon clock path. Timing derates are applied to the clock path to
model process variations on the die. Using a standard timing derates methodology,
derates are applied only on uncommon path of launch and capture clock path because
it is unlikely that common clock paths can have different process variations in launch
and capture cycle. This concept is also called CRPR adjustment. The important
concept is that a clock path should have minimum uncommon path between two
connected registers.

5. Signal Integrity – Clock signals are more prone to signal integrity problems
because of high switching activity. To avoid the effect of noise and to avoid EM
violations, clock trees should be constructed using a DWDS(Double width double
spacing ) rule. Increased spacing will help in minimizing noise effect. Similarly,
increased width will help to avoid EM violations.
6. Minimum Power Dissipation – This is one of the most important quality
parameter of a clock tree. At the architecture level, clock gating is done at multiple
levels to save power and certain things are expected to done while building clock
trees such as maintaining good clock transition, minimum latency etc.
EDA tool role in clock tree synthesis

138
Today, a lot of R&D has been done on EDA tools to design an ideal clock tree. The
CTS engines of these tools support most of the SOC requirements to design a robust
clock tree. These tools even generate clock spec definitions from SDC(timing
constraint files). A typical
Clock spec file includes:
 All clock sources information
 Synchronous/Asynchronous relationships between various clocks
 Through pins
 Exclude pins
 Clock pulling pushing information
 Leaf Pin
Going one level down in SoC to design an ideal clock tree
For most SoCs, the existing EDA tools are sufficient for CTS engine to
generate an ideal clock tree. However, this is not always the case. This approach
presented in this paper is suitable for SoCs or IPs, which have few clock sources and
a simple clock architecture with minimum muxing of multiple clocks.
Today’s microcontrollers generally don’t have such a simple clock
architecture. Microcontrollers designed for the automotive world have multiple IPs
integrated into a single SoC. For example, a single SOC may have multiple cores, IO
peripherals like SPI, DSPI, LIN, DDR interfaces for multiple automotive control
applications. Considering human safety in automotive SoCs, testing requirements are
also very stringent in terms of test coverage such as atspeed and stuckat. This leads
to a very complex clocking architecture because it requires multiple clock sources
(both on SoC clock sources such as PLLs, IRC oscillators and off SoC clock sources
like EXTAL) and clock dividers in order to supply the required clock frequency to
multiple IPs within a SoC.
In such cases, CTS engines cannot be relied upon to build a clock tree. Due
to complicated muxing of various clock sources in multiple functional and test
modes , EDA tools sometimes are not able to build the clock tree properly, often
resulting in problems of increased latency, skew mismatch and huge uncommon
clock path problems.
In the next section some real design case studies are used to illustrate how current
EDA tools might fail to build the clock tree as expected by the designer and how a
backend engineer can help design a robust clock tree either by providing proactive

139
feed back to architecture designers or to improve the clock structure at the RT level
itself or by using better implementing techniques .
Case study 1 - Clock logic cloning
Suppose a clock tree is required for the following logic.
In functional mode there is one master clock source func and one generated clock
source gen_clk1. In test modes there is one test clocks tck1. In functional mode
register set 2 is clocked by gen_clk1 but in test mode, test clock tck1 is used instead.

The conventional way to define the clock tree spec for this design fragment would be
to define the master clock sources (func and tck1) and generated clock (gen_clk1)
and to define through pin for generated clock source so as to balance the latency of
the master clock and the total latency of the generated clock(source latency to
register clock pin plus latency from flop output to register group3). Defining a
through pin for the generated clock source ensures that a CTS engine does not
consider the generated clock flop as a sink pin and instead traces the clock path
through CK-> Q arc of flop.

Assume that the latency of func clock while in functional mode is constrained by
register set2 (highlighted in red in figure 2). This will force a CTS engine to build the
generated clock source flop 2 with minimum latency. This will only be possible if
the minimum buffering is done from func clock source to mux1 input D0 as well as
from mux1 output to generated clock source gen_clk1. In order to balance the
latency of register set 1 and register set 2, the tool will be forced to insert buffering
between mux1 output and register set 1 clock pins. This implementation is correct in
functional mode but will cause problems in test mode.

140
Test mode CTS: The architecture of the design is such that test clock tck1 to all
register sets 1-3 can be built with very low latency. However due to functional
mode clocking constraints, as explained above, clock latency for the test clock
will be high. The latency for test mode will be constrained by register set 1 as
shown in above diagram by green line. This cannot be avoided because of the
need to balance register set1 and register set 2 latency in functional mode and
buffering can only be done after mux1 output because it is not possible to
increase the latency of the generated clock source. The consequence of this is that
there is no option other than to increase the latency of register set 2 and register
set 3 in test modes. This is a serious problem because the latency of the test clock
is increased because of functional mode clocking constraints. As discussed above
in robust clock tree guidelines, an increase in clock latency can lead to multiple
problems. This problem cannot be solved using advanced features like MMMC
(Multi Mode Multi Corner)CTS of current EDA tools.
Solution: The solution to problem lies in cloning the clock logic as shown in
figure 3. EDA tools generally do not implement cloning of non-buffer logic in
the clock path network. The problem can be solved if there is a separate
dedicated clock mux for the generated clock source flop. The limitation of
placing clock buffers after the mux output has been removed and for register set

141
1, the clock buffering to balance latency in functional mode can be done between
func clock source and cloned mux input D0. Since there is now the bare
minimum buffering done between cloned mux output D0 and register set 1, the
latency numbers in test mode are not limited by register set 1 and it is possible to
achieve minimum latency in test mode

Case study 2 - Clock muxing of two synchronous clocks

142
In this example clock 1 and clock2 are synchronous to each other. The
assumption is that the minimum latency of both clock1 and clock2 is not limited
by register group 1-3, but by some other clock group (not shown in diagram).

A typical behavior of most of CTS engines would be to insert clock buffers


after the mux output to register group2 in order to save overall clock buffers.
However this will introduce a problem of larger uncommon path between register
group 1 and register group2 as well as between register group 2 and register
group3. A CTS engine is not intelligent enough to understand that the
architecture ensures that mux select will not toggle on the fly and there cannot be
a case of launch on clock 1 and capture on clock 2 for register group2. An
alternate approach for CTS for these type of cases is shown in Figure 4(b). In
figure 4(b), the clock buffers have been moved for both clock 1 and clock2
before the clock mux in order to have a greater common clock path between
register set 1-2 and register set 2-3. Note that this is under the same assumption
that latency of clocks Clk1 and Clk2 is limited by some other register group other
than 1-3 and extra clock buffers were placed by the CTS engine after clock mux
to balance skew requirements.
Case study 3 - Centralized vs decentralized clocking scheme
There is debate among designers about how to manage clock muxing and
clock divider logic for the SoC. Proponents of a centralized clocking scheme
argue that doing all clock muxing at a single place helps managing things in a
better way, while opponents question this approach citing timing issues that crop
up due to centralized muxing. Both possibilities will be considered.
Assume there are three IPs and one clock of 200 MHz frequency. The
design requirement says that both IP1 and IP2 require two synchronous 200 MHz
and 100 MHz clocks. Moreover, IP1 and IP2 can handshake data synchronously
both at 200 MHz and 100 MHz. Now, there are two options to implement a
clocking scheme that meets this requirement. First is to divide the 200 MHz
clock to generate the 100 MHz clock inside a centralized clocking module and
then provide both 200 MHz and 100 MHz clocks to both IPs. The second option
is to divide the 200 MHz clock separately in both IPs. In this scenario, option one
is the better option because IP1 and IP2 both need divided clocks and they are
exchanging data synchronously as well. If division is done independently in both
IPs, there is duplication of the divider logic and there are chances of phase

143
mismatch in the divided clocks and additional logic may be required to solve this
problem. In this case, a centralized clocking scheme is better than decentralized
one even though there may be some problem of uncommon path between the
200MHz and 100MHz clocks.

For IP3, a decentralized clocking scheme is the best approach. IP3 requires
200MHz, 100MHz and 50 MHz clocks and IP3 is exchanging data only with the
external world and not with any other IP within the SoC. In this case there is no
point placing the dividers in one centralized clocking block because it will
introduce uncommon path between all divided
clocks. The better option would be to divide the 200 MHz clock inside IP3 to
generate 100MHz and 50MHz clocks.

Figure 5: centralized and decentralized clocking


In summary, it might look tempting and convenient to keep all clock muxing and
dividing logic in one place, but in some cases it might introduce timing closure
problems.
The better approach is to analyze the impact of centralized/decentralized clocking
on a case to case basis and to take the appropriate decision after that analysis.

144
Case study 4 - Power vs. timing
A clock tree designer often has to choose between power and timing. One such
example is shown in figure 6. Different CTS engines can behave differently. The
first CTS tools prefers power saving over uncommon path because when clock
gating is done, the maximum number of clock buffers will stop toggling . The
second solution favors timing over power as both register groups now have the
minimum uncommon path. A CTS designer must choose their preference
between power and timing on a case by case basis. Whatever the tool algorithm,
the clock spec can be modified to force the CTS tool to build the required
structure.

Case study 5 – Back-to-back clock gating cells


Many times due to third party IPs and logic synthesis clock gating insertion,
back-to-back clock gating cells may have been created. Because of this, clock
latency to that register group can increase since a clock gating cell typically has a
higher delay than a clock buffer. This can be rectified either by merging these
clock gating cells at the RT level or if that is not possible because of integration
and third-party IP issues, it can also be done during logic synthesis.
Most EDA tools doing logic synthesis have a feature to merge such back-to-back
clock gating cells, but the default is not to merge these clock gating cells in order
to preserve the RTL implementation . This feature can be used on a case to case
basis.

145
Recommendations and guidelines/experiments for designing clock trees
For a new design when the clock tree is being constructed for first time, it is
important to know optimum latency and skew numbers. Some suggested
experiments for this include:
1. Build a clock tree with no skew balancing requirements. This will force the
CTS engine to build a clock tree to all registers at its lowest latency possible
without caring about skew balancing. The clock path of the register group having
the highest latency should be analyzed in detail because when that clock tree is
going to be built with skew requirements specified, this register group will
determine the latency of that whole clock group. Explore architectural
improvement that can be made to reduce latency for this register group. This
exercise should be repeated for subsequent highest latency register groups until
no further latency improvements can be made.
2. After minimum latency has been established, skew numbers should be
targeted. Two or three experiments, with different skew numbers, should be
performed to see if overall latency is increasing in order to meet clock skew
requirements. Inappropriate clock buffer selection could be an issue.Skew
numbers should be double checked. Very low skew numbers might look tempting
but too aggressive skew numbers can increase overall clock latency and can
increase peak power dissipation due to all flops toggling at the same time.

146
3. Another suggested way to target uncommon path problem is to compare
timing reports between pre CTS stage and post CTS stages. Ideally the timing
status of a design should remain the same between pre CTS and post CTS stages
because projected deterioration in the timing profile is already taken care of at
pre CTS by extra clock skew and derate uncertainty. If timing violation are seen
after post CTS stage and a clock tree with respectable skew numbers has been
built, the culprit is probably a huge uncommon path between launch and capture
registers. Root cause analysis of the uncommon path should be done to determine
if architectural improvements can be made to reduce the uncommon path.
Conclusion
The case studies, guidelines and experiments are neither compulsory
nor exhaustive enough to cover all aspects of an ideal clock tree. Moreover, there
are a lot of other issues such as signal integrity and clock gate ratio which have
not been considered. These could be important, particularly in smaller technology
nodes. This article should serve as an eye opener to change the perception about
how CTS activity is taken generally in our design cycle. With timing margins
being reduced, it has become very important to scrutinize the clock tree
architecture thoroughly and look for every single possibility of improving clock
structure.

147
Library Exchange Format (LEF)
Library Exchange Format (LEF) is a specification for representing the physical
Layout of an Integrate circuit in an ASCII format. It includes design rules and
abstract information about the cells. LEF is used in conjunction with Design
Exchange Format (DEF) to represent the complete physical layout of an
integrated circuit while it is being designed.
An ASCII data format, used to describe a standard cell library Includes the
design rules for routing and the Abstract of the cells, no information about the
internal netlist of the cells.
A LEF file contains the following sections:
Technology: layer, design rules, via definitions, metal capacitance
Site: Site extension
Macros: cell descriptions, cell dimensions, layout of pins and blockages,
capacitances.
The technology is described by the Layer and Via statements. To each layer the
following attributes may be associated:
Type: Layer type can be routing, cut (contact), masterslice (poly, active),
Overlap.
Width/pitch/spacing rules
direction
Resistance and Capacitance per unit square
Antenna Factor.
Layers are defined in process order from bottom to top
Poly masterslice
CC cut
Metal1 routing
Via cut
Metal2 routing
Via2 cut
Metal3 routing

148
Cut Layer definition
LAYER
layername
TYPE CUT ;
SPACING
Specifies the minimum spacing allowed between via cuts on the same net or
different nets. This value can be overridden by the SAMENET SPACING
statement (we are going to use this statement later)
END layerName
LAYER layerName
TYPE IMPLANT ;
SPACING minSpacing
END layerName
Defines implant layers in the design. Each layer is defined by assigning it a name
and simple spacing and width rules. These spacing and width rules only affect
the legal cell placements.
These rules interact with the library methodology, detailed placement, and filler
cell support.
Masterslice or Overlap Layer definition
LAYER layerName
TYPE {MASTERSLICE | OVERLAP} ;
Defines master slice (nonrouting) or overlap layers in the design. Masterslice
layers are typically polysilicon layers and are only needed if the cell MACROs
have pins on the polysilicon layer.
Routing Layer definition
LAYER layerName
TYPE ROUTING ;
DIRECTION {HORIZONTAL | VERTICAL} ;
PITCH distance;
WIDTH defWidth;
OFFSET distance ;
SPACING minSpacing;
RESISTANCE RPERSQ value ;

149
Specifies the resistance for a square of wire, in ohms per square. The resistance
of a wire can be defined as RPERSQU x wire length/wire width.
CAPACITANCE CPERSQDIST value ;
Specifies the capacitance for each square unit, in pico farads per square micron.
This is used to model wire-to-ground capacitance.
Manufacturing Grid
MANUFACTURINGGRID value ;
Defines the manufacturing grid for the design. The manufacturing grid is used for
geometry alignment. When specified, shapes and cells are placed in locations that
snap to the manufacturing grid.
Via
VIA viaName
DEFAULT
TOPOFSTACKONLY
FOREIGN foreign CellName [pt [orient]] ;
RESISTANCE value ;
{LAYER layer Name ;
{RECT pt pt ;} ...} ...
END viaName
Defines vias for usage by signal routers. Default vias have exactly three layers
used:
A cut layer, and two layers that touch the cut layer (routing or master slice). The
cut layer rectangle must be between the two routing or master slice layer
rectangles.
Via Rule Generate
VIARULE viaRuleName GENERATE
LAYER routingLayerName ;
{ DIRECTION {HORIZONTAL | VERTICAL} ;
OVERHANG overhang ;
METALOVERHANG metalOverhang ;
| ENCLOSURE overhang1 overhang2 ;}
LAYER routingLayerName ;
{ DIRECTION {HORIZONTAL | VERTICAL} ;
OVERHANG overhang ;

150
METALOVERHANG metalOverhang ;
| ENCLOSURE overhang1 overhang2 ;}
LAYER cutLayerName ;
RECT pt pt ;
SPACING xSpacing BY ySpacing ;
RESISTANCE resistancePerCut ;
END viaRuleName
Defines formulas for generating via arrays. Use the VIARULE GENERATE
statement to cover special wiring that is not explicitly defined in the VIARULE
statement.
Same-Net Spacing
SPACING
SAMENET layerName layerName minSpace [STACK] ; ...
END SPACING
Defines the same-net spacing rules. Same-net spacing rules determine minimum
spacing between geometries in the same net and are only required if same-net
spacing is smaller than different-net spacing, or if vias on different layers have
special stacking rules.
These specifications are used for design rule checking by the routing and
verification tools.
Spacing is the edge-to-edge separation, both orthogonal and diagonal.
Site
SITE siteName
CLASS {PAD | CORE} ;
[SYMMETRY {X | Y | R90} ... ;] (will discuss this later in macro definition)
SIZE width BY height ;
END siteName
Macro
MACRO macroName
[CLASS
{ COVER [BUMP]
| RING
| BLOCK [BLACKBOX]
| PAD [INPUT | OUTPUT |INOUT | POWER | SPACER | AREAIO]
| CORE [FEEDTHRU | TIEHIGH | TIELOW | SPACER | ANTENNACELL]
151
| ENDCAP {PRE | POST | TOPLEFT | TOPRIGHT | BOTTOMLEFT |
BOTTOMRIGHT}
}
;]
[SOURCE {USER | BLOCK} ;]
[FOREIGN foreignCellName [pt [orient]] ;] ...
[ORIGIN pt ;]
[SIZE width BY height ;]
[SYMMETRY {X | Y | R90} ... ;]
[SITE siteName ;]
[PIN statement] ...
[OBS statement] ...
Macro Pin Statement
PIN pinName
FOREIGN foreignPinName [STRUCTURE [pt [orient] ] ] ;
[DIRECTION {INPUT | OUTPUT [TRISTATE] | INOUT | FEEDTHRU} ;]
[USE { SIGNAL | ANALOG | POWER | GROUND | CLOCK } ;]
[SHAPE {ABUTMENT | RING | FEEDTHRU} ;]
[MUSTJOIN pinName ;]
{PORT
[CLASS {NONE | CORE} ;]
{layerGeometries} ...
END} ...
END pinName]
Macro Obstruction Statement
OBS
{ LAYER layerName [SPACING minSpacing | DESIGNRULEWIDTH value] ;
RECT pt pt ;
POLYGON pt pt pt pt ... ;
END

152

You might also like