DDR
DDR
The primary benefit of this type of memory is its ability to read or write two words of data over the
wide parallel data bus every clock cycle -- one word for the rising edge of the clock strobe, and a
second word on the falling edge of the clock strobe. Hence the name Double Data Rate (DDR)
memory.
The strobe signal is not a free running clock, but is transmitted along with the relevant active data
DDR
using the configurable set of hard macros, the following configuration options are available without
any area, power, or speed penalties:
Number of strobes (DQS)—differential or single-ended, one set per each data byte
Number of CS, WE, ODT—in order to support rank topology and multipoint ordering.
● The physical implementation of the DDR2 Interface is divided into two levels.
– A high level integration is set by constructing a PHY using already built hard
macro-cells and placing them adjacent to one another, providing the best power
connections and signal integrity.
– A lower level implementation is the creation of the firmed macro-cells
themselves.
● implementing a configurable firmed macro-cell that meets the following requirements:
– The exact physical dimensions dictated by the I/Os and abutment macros.
– The tight timing requirement imposed by the DDR2 protocol.
– The design rules introduced by both the Structured ASIC and cell-based
technology.
– High test coverage, using design for test (DFT) structures that do not impact the
required performance.
Timing thoughts
● Operating at a data transfer rate of 800 Msps does not leave
much timing budget. It can be observed that a total
theoretical data window of less then 200 ps is left for
correctly capturing the data. This small window shrinks
further due to the following parameters:
● SDRAM device skew (tDQSQ)
● Board trace skew
● DLL jitter
● Asymmetry of the I/O rise/fall times
● Setup/hold timing requirement
Floorplan and Cluster Placement
● The DDR2 PHY has strict physical dimensions, and the design is constructed from several
different and repeatable modules. The designer knows the optimum location of each module
inside the fabric. By following few simple steps, it is possible to allocate groups of cells to a cluster
and to force the tool to place the cells related to each cluster in a desired location. These steps
are:
– Identify a set of cells that have a close relationship.
– Collect the dimensions of the library cells in that group.
– Define a cluster attribute.
– Specify the best location of the specific cluster in the fabric, making sure the dimensions
of the cluster are large enough to include all relevant cells.
– Link all the cells in that group to the specific cluster.
– Execute “fix cell” after the hard placement of the structured-placement.
● set cluster [ data create cluster region $m central_cluster "336u 0u 252u 156u" ]
CTS
● Clock Mesh, Zero Skew
In order to meet the timing requirements presented by the DDR2 interface, a zero skew clock topology is preferred.
One effective approach to achieve zero skew on a relatively narrow clock tree is by forming a clock-mesh.
● A clock mesh is constructed when two or more driver cells are connected in parallel (all inputs of drivers are shorted
together and all outputs of drivers are shorted together) to drive a wide metal bus, achieving an extremely low skew
(close to zero). Adopting such a topology provides the advantage of achieving very low clock skew.
● One big drawback caused by using such approach is the lack of ability of the common EDA tool to calculate the
timing delay of a mesh accurately. This requires specific circuit simulations using stand alone analog simulation
such as SPICE. Since all driver inputs are shorted together, and the same driver outputs are also shorted together.
– The timing engine is not capable of providing the correct path delay. Moreover, while performing the physical assertion of such a
structure, all timing calculations have to be fed back to an external SPICE engine.
● Inside the firm macro-cells there are several clock-mesh implementations. For each one the following steps are
performed:
– Identify all cells that belong to the same clock and for which a zero skew is required.
– Extract the exact physical location of such cells.
– Generate an accurate Netlist, including parasitic values and input loads for the SPICE simulator.
– Analyze structure and form a mesh clock circuit using symmetric drive cells.
– Update netlist inside the generic EDA flow with a new clock mesh structure.
– Perform structured-placement of all cells in the clock mesh.
– Perform parasitic extraction of the netlist again, including the clock mesh,
– Simulate the clock mesh using SPICE to obtain:
– Exact path delay from root to each one of the cells’ clock pin
– Exact slew at input pin.
– Update the actual path delay and transition for all leaf pins.