Sta 9 1
Sta 9 1
Sta 9 1
STA Lecturer: Gil Rahav Semester B , EE Dept. BGU. Freescale Semiconductors Israel
Functional Simulation
Synthesis
Testbench
Equivalence Checking
Sign Off
Gate-level Domain
Equivalence Checking
Static verification:
Verifies timing and functionality
STA and equivalence checking
Is exhaustive Uses formal, mathematical techniques instead of vectors Does not use dynamic logic simulation
Read required files Every Corner and Mode Validate inputs Ready to perform STA on a gate-level synchronous design using SDF
Fix data
yes
Errors/ Warnings? no
SDF SDF
Errors/ Warnings? no
continue...
Quit
# Comment scripts # Comment scripts # Include all libraries - technology and IP model libraries # Include all libraries - technology and IP model libraries set link_path * my_tech_lib.db memory_lib.db set link_path * my_tech_lib.db memory_lib.db # Read all gate-level design files # Read all gate-level design files read_verilog my_full_chip.v read_verilog my_full_chip.v # Read libraries and link the design # Read libraries and link the design link_design MY_FULL_CHIP link_design MY_FULL_CHIP
Read Constrain
# Set up bc_wc analysis with 2 SDF. Wait for checks later # Set up bc_wc analysis with 2 SDF. Wait for checks later read_sdf analysis_type bc_wc max_type sdf_max min_type sdf_min read_sdf analysis_type bc_wc max_type sdf_max min_type sdf_min # Apply chip-level constraints for pre or post layout analysis # Apply chip-level constraints for pre or post layout analysis source MY_FULL_CHIP_CONST.tcl source MY_FULL_CHIP_CONST.tcl
Quit
report_design
Analysis Type
report_clock
Clocks
report_annotated_delay report_annotated_check
Complete SDF
check_timing
Complete Constraints
single
bc_wc
on_chip_variation
Quit
Type of Check Total Met Violated Untested Type of Check Total Met Violated Untested ------------------------------------------------------------------------------------------------------------------------------------------------setup 6724 2366 (( 35%) 00 (( 0%) 4358 (( 65%) setup 6724 2366 35%) 0%) 4358 65%) hold 6732 2366 (( 35%) 00 (( 0%) 4366 (( 65%) hold 6732 2366 35%) 0%) 4366 65%) recovery 362 302 (( 83%) 00 (( 0%) 60 (( 17%) recovery 362 302 83%) 0%) 60 17%) removal 354 302 (( 85%) 00 (( 0%) 52 (( 15%) removal 354 302 85%) 0%) 52 15%) min_pulse_width 4672 4310 (( 92%) 00 (( 0%) 362 (( 8%) min_pulse_width 4672 4310 92%) 0%) 362 8%) clock_gating_setup 65 65 (100%) 00 (( 0%) 00 (( 0%) clock_gating_setup 65 65 (100%) 0%) 0%) clock_gating_hold 65 65 (100%) 00 (( 0%) 00 (( 0%) clock_gating_hold 65 65 (100%) 0%) 0%) out_setup 138 138 (100%) 00 (( 0%) 00 (( 0%) out_setup 138 138 (100%) 0%) 0%) out_hold 138 74 (( 54%) 64 (( 46%) 00 (( 0%) out_hold 138 74 54%) 64 46%) 0%) ------------------------------------------------------------------------------------------------------------------------------------------------------------All Checks 19250 9988 (( 52%) 64 (( 0%) 9198 (( 48%) All Checks 19250 9988 52%) 64 0%) 9198 48%)
FF1
D
Max Dela ys
22 ns
FF4
U4
0.
FF2
U2 D U5
U3
7 .7
ns s
0.21ns
CLK
U6
0.
n 82
Bottleneck Analysis
Identify cells involved in multiple violations. Use the results to determine cells to buffer or upsize. report_bottleneck report_bottleneck
This cell is involved in 100 violations!
U2/U104
For post layout clock tree: set_propagated_clock <clock_object_list> or set timing_all_clocks_propagated true
15
30
15
30
5.5
20.5
35.5
15
30
5.5
20.5
35.5
Back-Annotation - Parasitics
Reduced and Distributed Parasitic Files
Reduced format annotates an RC pi model, and computes the effective capacitance.
r ive Dr
C1
s oad L
R C2
Pi model
Effective Capacitance
Distributed format enables PrimeTime to annotate each physical segment of the routed netlist (most accurate form of RC backannotation)
U2 U1
R1 C1 C2 R2 C3 R3 C4
U3
...
Quick Timing Model (QTM) Extracted Timing Model (ETM) Interface Logic Model (ILM) Stamp Model
Appropriate Model
Quick Timing Models ETMs ETMs ILMs / ETMs ILMs Stamp Models
OPERATION[1:0]
9
NR3
Q CP
ND3
OUTPUT_VALUE[1:12]
IVA
CLOCK
Delay IVA
2
Q CP
6
NR3
OVERFLOW
VALUE[1:12]
QTM is a set of interactive PrimeTime commands - not a language Like all PrimeTime commands, QTM can be saved in a script QTM model can be saved in db or Stamp format
contain timing arcs between external pins Internal pins only for generated/internal clocks models written out in Stamp, .lib ,or db formats context independent Exceptions and latches supported Provide huge performance improvements
ETM Design
A X A B B CLK Y CLK Y X
Design
A X A
ILM
X
B CLK
B CLK
Stamp Modeling
Generally created for transistor-level designs, where there is no gate-level netlist. Stamp timing models are usually created by core or technology vendors, as a compiled db. Capabilities include the ability to model: pin-to-pin timing arcs setup and hold data pin capacitance and drive mode information tri-state outputs internally generated clocks Stamp models co-exist with the Library Compiler .lib models
Top-Level
Block1 (ILM) Block3 (ETM)
Using ILMs and ETMs to address capacity and timing issues in multimillion gate design
All registers must reliably capture data at the desired clock edges.
FF1
FF2
F1
clk
F1
clk
Clk
0 2 4
FF2 U2 U3
F1
0ns 4ns
CLK
F1
CLK
Clk
FF1/clk FF2/D
1.1ns
5.1ns
Where does this 1.1ns shift come from? Why is the shift different here?
Setup
FF2/clk
1ns 5ns
PrimeTime Terminology
FF1
Data Arrival
FF2 U2 U3
F1
CLK
F1
CLK
Clk
Data Required
Data Arrival Time FF1/clk FF2/D
Setup
1.1ns 5.1ns
Slack is the difference between data arrival and data required. Data Required Time
FF2/clk
1ns 5ns
Data arrival
Data required
Slack
The Header
Header
Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk) Path Group: Clk Path Type: max
FF1
FF2 U2 U3
F1
CLK
F1
CLK
Clk
Calculated latency
SDF
Data arrival
F1
0 2 4
CLK
FF1
U2 r
U3 r
F1
CLK
FF2
Clk
Data required
FF1
FF2 U2 U3 r
F1
0 2 4
0.21ns
1.0ns
CLK
F1
CLK
Clk
Summary - Slack
report_timing
Startpoint: FF1 (rising edge-triggered flip-flop clocked by Clk) Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk) Path Group: Clk Path Type: max Point Incr Path ----------------------------------------------------------clock Clk (rise edge) 0.00 0.00 clock network delay (propagated) 1.10 * 1.10 FF1/CLK (fdef1a15) 0.00 1.10 r FF1/Q (fdef1a15) 0.50 * 1.60 r U2/Y (buf1a27) 0.11 * 1.71 r U3/Y (buf1a27) 0.11 * 1.82 r FF2/D (fdef1a15) 0.05 * 1.87 r data arrival time 1.87 clock Clk (rise edge) 4.00 4.00 clock network delay (propagated) 1.00 * 5.00 FF2/CLK (fdef1a15) 5.00 r library setup time -0.21 * 4.79 data required time 4.79 -----------------------------------------------------------data required time 4.79 data arrival time -1.87 -----------------------------------------------------------slack (MET) 2.92
Slack
F1
0ns 4ns
CLK
F1
CLK
Clk
FF1/clk FF2/D
1.1ns
5.1ns
STABLE Hold
FF2/clk
1ns 5ns
F1
0ns 4ns
CLK
F1
CLK
Clk
FF1/clk FF2/D
1.1ns
5.1ns
Hold
Setup
FF2/clk
1ns 5ns
PrimeTime Terminology
FF1
Data Arrival
FF2 U2 U3
F1
0ns 4ns
CLK
F1
CLK
Clk
Data Required
Data Arrival
1.1ns
5.1ns
1ns
5ns
FF2
F1
0 2 4
clk
F1
clk
Clk
FF1/clk
2.9ns
FF2/D
Setup
FF2/clk
1ns 5ns
Q
0 2 4
FF2
F1
clk
F1
clk
Clk
FF1/clk FF2/D
2.9ns
6.9ns
STABLE Hold
FF2/clk
1ns 5ns
F1
clk
F1
clk
Clk
FF1/clk FF2/D
2.9ns
Hold
Setup
FF2/clk
1ns 5ns
Library
Timing Models
Timing models are cells with many timing arcs:
Flip-flop with setup and hold timing checks Delay cell included along the data arrival time
FF1
C
D
FF2
F1
clk
F1
clk
Clk
clk
RAM
Point Incr Path ---------------------------------------------------------------------------clock SYS_CLK (rise edge) 0.000 0.000 clock network delay (propagated) 2.713 * 2.713 I_ORCA_TOP/I_PCI_WRITE_FIFO/count_int_reg[0]1/CP (sdcrq1) 0.000 2.713 r I_ORCA_TOP/I_PCI_WRITE_FIFO/count_int_reg[0]1/Q (sdcrq1) 0.678 * 3.390 r I_ORCA_TOP/I_PCI_WRITE_FIFO/PCI_WFIFO_RAM/A1[0] (ram32x32) 0.008 * 3.398 r data arrival time 3.398 clock SYS_CLK (rise edge) 0.000 0.000 clock network delay (propagated) 2.711 * 2.711 I_ORCA_TOP/I_PCI_WRITE_FIFO/PCI_WFIFO_RAM/CE1 (ram32x32) 2.711 r library hold time 0.282 * 2.992 data required time 2.992 ----------------------------------------------------------------------------data required time 2.992 data arrival time -3.398 ---------------------------------------------------------------------------slack (MET) 0.406
F1
clk FF2
Clk
Data Required
Min Data Arrival Clk FF2/ClrN Min Data Required
Recovery
0ns
4ns
FF2/clk
1ns
5ns
Cpin
0.00312 pF
0.01811
Estimated RCs are represented as wire load model Estimated RCs are represented as wire load model
.05 .15 . 23 .4
.10 .2 .3 .55
Rnet
Cnet
Net Delay = f (Rnet, Cnet + Cpin) Post-layout Rs and Cs are extracted as a parasitics file. Post-layout Rs and Cs are extracted as a parasitics file.
.005
0.005 pF
0.045 pF
F1
clk
F1
clk
Clk
Clock Network
k c lo c te
Ideal Clocks
Point Incr Path ---------------------------------------------------------clock Clk (rise edge) 0.00 0.00 clock network delay (propagated) 1.10 * 1.10 FF1/CLK (fdef1a15) 0.00 1.10 r FF1/Q (fdef1a15) 0.40 * 1.50 f U2/Y (buf1a27) 0.05 * 1.55 f U3/Y (buf1a27) 0.05 * 1.60 f FF2/D (fdef1a15) 0.01 * 1.61 f data arrival time 1.61 clock Clk (rise edge) 0.00 0.00 clock network delay (propagated) 1.00 * 1.00 FF2/CLK (fdef1a15) 1.00 r library hold time -0.10 * 1.10 data required time 1.10 ---------------------------------------------------------data required time 1.10 data arrival time -1.61 ---------------------------------------------------------slack (MET) 0.51
FF2
F1
clk
F1
clk
Clk
Clk FF2.D
0ns
2ns
4ns
FF2.clk
Hold
1ns
Setup
5ns
F1
clk
Q D
F1
clk
Clk1 Clk2
Hold
Clk2 0ns 3ns 6ns
Setup
9ns
12ns
FF1
Data Arrival
FF2
F1
clk
U2
D
U3
F1
clk
Clk
0 2 4
Data Required
You specify the path required time at the output ports of the design.
Data Arrival
FF1
F1
Clk
0 2 4
U2
D
U3
clk
F1
clk
Point Incr Path ---------------------------------------------------------clock Clk (rise edge) 0.00 0.00 clock network delay (propagated) 1.10 * 1.10 FF1/CLK (fdef1a15) 0.00 1.10 r FF1/Q (fdef1a15) 0.50 * 1.60 r U2/Y (buf1a27) 0.11 * 1.71 r U3/Y (buf1a27) 0.11 * 1.82 r M (out) 0.05 * 1.87 r data arrival time 1.87 clock Clk (rise edge) 4.00 4.00 clock network delay (propagated) 0.00 * 4.00 output external delay -0.21 * 3.79 data required time 3.79 ---------------------------------------------------------data required time 3.79 data arrival time -1.87 ---------------------------------------------------------slack (MET) 1.92
Clk3
Clk4
nochange
Clk1 ClkEn Clk2 Timing Model U1
clk_gating_setup clock_gating_hold
Timing checks: specified by the user Timing checks: specified by the vendor
STA part 2
What
Spice accurate
When
Process DFM Arch
IR Drop
RTL
Xtrn
Place
Route
Clocks
Components
Timing Specs
Input transition
Interconnect
P Process (Wide, Narrow, Tall, Short, K) T - Temperature
Propagation Delay
Delay Calculation
Path Delay Calculations
Worst arrival time of signal at input pin of capture flop = ? Best arrival time of signal at input pin of capture flop = ?
Timing Paths
Sequential Delay
Combinational Delay
Boundary Settings
Input transition time Output loading Logic settings
Clocks
Ex-I
Synchronous Designs
Ex-II
Ex-III
Ex-IV
Default single cycle of operation Launch Edge and Capture Edge Properties Period Waveform Rise/Fall Transition Time Skew or Uncertainty Generated Clocks Derived from a master Synchronous by definition Definite edge relationship
d1
d2
d1 != d2
Virtual Clocks
Virtual Clocks do not have any physical existence Virtual Clocks are used as a reference to module for input and output delays Virtual Clocks are local to module design
10 nS
Properties
Period Waveform
Global Constraints
Specifying min-max Cap Range This specification ensures that circuits used in design work within library characterization limits Specifying max Transition This specification ensures that transition thus propagated doesnt give rise to a bad propagation delays Specifying driver-load on ports This specification ensures that standard load value is modeled at ports Specifying Input and Output Delays at Ports
Check Types
Setup Hold Recovery Removal Clock Gating Min Pulse Width Data-to-Data
Setup Time and Hold Time are Properties of the Sequential Element Circuit These need to be honoured to guarantee expected operation of the design
Data Launched by Launch Edge of FF1 Captured by Intended Capture Edge of FF2 Data launched by launch edge of FF1 should arrive at the data input of FF2 latest by Capture Edge Time Setup Time of FF2
Data launched by Launch Edge of FF1 should not be captured by an edge preceding the intended Capture Edge of FF2, OR Data launched by edge following Launch Edge of FF1 should not be captured by the intended Capture Edge of FF2 Data should reach the data input of FF2 no earlier than the hold time of FF2
D1 D2
Timing Exceptions
False Paths
Timing Paths that are invalid
Paths between asynchronous clocks Paths that are static for a particular timing mode
Multicycle Paths
Non-default cycle operation
Logic Setting
Pins or nets that are tied to 1/0 for a particular timing mode
Disable Timing
Timing Arcs that are disabled
Advanced Topics
Timing Models
Extracted Timing Models Interface Logic Models Quick Timing Models
Problem
Given corner data below, which combinations are expected to lead to worst and best gate delays?
Process Slow Typical Fast Voltage 0.9V 1.0V 1.1V Temperature -20C 27C 105C
STA part 3
Overview
In this era of high performance electronics, timing continues to be a top priority and designers are spending increased effort addressing IC performance.
STA advantage
Speed (orders of magnitude faster than dynamic simulation) Capacity to handling full chip Exhaustive timing coverage Vectors are not required
STA disadvantage
It is pessimistic (too conservative) Reports false paths
Flow Inputs:
Gate-level Verilog. Constraints (SDC) Extracted nets (SPEF) Libraries (liberty format - .lib)
Timing Closure
Timing Closure is the ability to detect and fix timing problems in the design flow as early as possible. This is done by checking the correctness of intermediate results through Static Timing Analysis (STA) and also by dynamic timing simulation with SDF back annotation. In case of failure - which means that the timing goals have not been achieved - modification of timing constraints must be done through well defined loops, re-synthesis and in worst case re-design.
NLDM
Cell Delay (Non-linear) = f (CL, Sin) and Sout = f (CL, Sin)
Interpolate between table entries Interpolation error is usually below 10% of SPICE
Delay Calculation
Path Delays
When delay paths are added, the following factors affect the delays:
Slew propagation Ideally, the slew propagation should be timing path specific. However, the STA does not do this. It uses either worst_slew or worst_arrival.
worst_slew refers to using the slowest transition for signals arriving at a multi-input cell output (fastest transition for min delay mode). This is CTE default pessimistic behavior . worst_arrival refers to using the input signal that arrives the latest (using the earliest for min delay mode).
Analysis Modes
Semiconductor device parameters can vary with conditions such as fabrication process, operating temperature, and power supply voltage. The STA tool supports three analysis modes:
Single operating condition single set of delay parameters is used for the whole circuit, based on one set of process, temperature, and voltage conditions. Min-Max (BC-WC) operating condition simultaneously checks the circuit for the two extreme operating conditions, minimum and maximum. For setup checks, it uses maximum delays for all paths. For hold checks, it uses minimum delays. On-chip-variation mode - conservative analysis that allows both minimum and maximum delays to apply to different paths at the same time. For a setup check, it uses maximum delays for the launch clock path and data path, and minimum delays for the capture clock path. For a hold check, it uses minimum delays for the launch clock path and data path, and maximum delays for the capture clock path.
setAnalysisMode bcWc setAnalysisMode setup setOpCond min Best minLibrary fast.lib max Worst maxLibrary slow.lib
setAnalysisMode onChipVariation
Derating
Minimum and Maximum delays can be adjust by specified factors to model the effects of operating conditions. This adjustment of calculated delays is called derating. Derating affects the delay and slack values reported by report_timing. setTimingDerate max early 0.8 late 1.0 setTimingDerate min early 1.0 late 1.1
Timing exceptions
Timing exception includes the following:
False Path- Use the set_false_path command to specify a logic path that exists in the design but should not be analyzed. Setting a false path removes the timing constraints on the path. Multiple Cycle Path - Use the set_multicycle_path command to specify the number of clock cycles required to propagate data from the start to the end of the path. Min/Max Delay - Use the set_max_delay and set_min_delay commands t override the default setup and hold constraints with specific maximum and minimum time values.
Recovery/Removal check
Timing checks which are related to asynchronous input pin of a flip flop. Although a flip-flop is asynchronously set or clear , the negation from its reset state is synchronous . A recovery timing check specifies a minimum amount of time allowed between the release of a asynchronous signal from the active state to the next active clock edge . A removal timing check specifies the minimum amount of time between an active edge and the release of an asynchronous control signal.
Case Analysis
Case analysis allows timing analysis to be performed using logic constants or logic transitions (rising or falling) on ports or pins, to limit the signal propagated through the design. Case analysis is a path-pruning mechanism and is most commonly used for timing the device in a given operational configuration or functional mode. For example, case analysis can be used to compare normal circuit operation against scan or BIST operation.
Timing Models
Timing extraction plays an important role in hierarchical top-down flow and bottom-up IP authoring flow by reducing the complexity of timing verification and by providing a level of abstraction which hides the implementation details of IP blocks. Three most desired features in timing extraction are accuracy, efficiency, and usability. The model must preserve the timing behavior of the original circuit and produce accurate results. Three types of models can be generated: Quick Timing Model (QTM) Extracted Timing Model (ETM) Interface Logic Model (ILM)
QTM
A temporary model used early in the design cycle for a block that has no netlist available. QTM creation is faster than writing ad-hoc model . The model contains both min and max time arc for setup and hold checks. Check consistency between blocks constraints and updates boundary constraints (after each iteration of synthesis) The netlist used for QTM generation can be easily generated (low effort RTL mapping) since existence or absence of timing arc is independent from the logic/physical design. Inputs Constraints (SDC) Configuration file Header file The QTM model is generated using Black Box commands. Using this command set allows to define timing arcs and electrical data (i.e. output driver, input load,)
ILM
ILMs embody a structural approach to model generation, where the original gate-level netlist is replaced by another gate-level netlist that contains only the interface logic of the original netlist. Interface logic contains all circuitry leading from I/O ports to edgetriggered registers called interface registers. The clock tree leading to interface registers is preserved in an ILM. Logic that is only contained in register-to-register paths on a block is notin an ILM.
ETM
Extracted timing models differ from ILMs in that the interface logic for a block is replaced by context-independent timing relationships between pins on a library cell. The extracted library cell contains timing arcs between external pins. Internal pins are introduced only when there are clocks defined on internal pins of the design
Analysis Modes
Analysis Modes