Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Global Standards for the Microelectronics Industry

LPDDR5
System Training

Copyright © 2019
LPDDR5 Workshop [Raj Mahajan, Tsun Ho Liu]
LPDDR5 Interface Training Agenda
• Overview
• Address/command interface
• WCK2CK leveling
• WCK DCA training
• Read gate training
• Data interface training – read and write

LPDDR5 Workshop 2
Introduction
• The new LPDDR5 SDRAM interface pushes data rates to 6400 Mbps
• Boot up trainings are required to operate a parallel interface at such
high data rates
• This presentation will focus on the boot trainings required to operate
at up to 6400 Mbps
• The following will not be covered:
• Re-training / drift tracking
• Command Bus Training with DVFSQ

LPDDR5 Workshop 3
LPDDR5 training overview
• Critical timing relationships in LPDDR5 and their data rates in an LPDDR5-6400 system
• LPDDR5-6400 bit rates will be used as examples throughout this presentation
Training Date rate / freq Training target SoC DRAM
CK TX

RX
CK/CK#

TX
1. Command bus training Delay

800Mbps SoC : CA/CS delay 1


CK – CS CS TX
1

RX
CS[1:0]

TX
1600Mbps DRAM : Vref(CA) Delay
CK – CA 2
CA TX

RX
CA[6:0]

TX
Delay
2. WCK2CK leveling CK 800MHz
SoC : WCK delay WCK TX
3
CK – WCK WCK 3200MHz WCK/WCK# DCA

RX
TX
Delay

4
3. WCK Duty cycle training WCK 3200MHz DRAM : DCA code RDQS
RDQS_t/c

RX

TX
DCM

4. Read gate training RDQS 3200MHz SoC : Read gate delay 5


RDQS_t/c

RX
DQ+

TX
Delay

5. Read data training SoC : Rx delay,


6400Mbps 6
RDQS – DQ/DMI Vref(DQ)
DQ TX

TX

RX
Delay
6. Write data training SoC : Tx delay
6400Mbps
WCK - DQ/DMI/RDQS_t DRAM : Vref(DQ)

LPDDR5 Workshop 4
Command Bus Training
LPDDR5 Training: CA and CS

LPDDR5 Workshop 5
LPDDR5 CS and CA SoC DRAM
CK TX

RX
CK/CK#

TX
Delay

• The revolutionary LPDDR5 CS TX


1

RX
CS[1:0]

TX
interface decouples Delay

1
address/command clocking from CA TX

RX
CA[6:0]

TX
Delay
the blisteringly fast data
interface
• CK (clock for address/command)
WCK/WCK# DCA

RX
TX
WCK TX

runs up to 800 MHz, while data Delay

strobe can reach 3200 MH DQ

RX

TX
• Command bus training is used to
train SoC delay of CS and CA and
DRAM Vref for CA receivers

TX

RX
DQ TX
Delay

LPDDR5 Workshop 6
LPDDR5 CS and CA training
CS training CA training
• CS is 800 Mbps, VSS-terminated • CA is 1600 Mbps, VSS-
or unterminated terminated or unterminated
• CS should be trained for delay to • CA Vref should be trained to
center it on rising edges of CK remove uncertainty in sampling
level due to impedance
uncertainties
• CA should also be trained for
delay to center it on rising edges
of CK

LPDDR5 Workshop 7
LPDDR5 Command
Bus Training (“CBT”) SoC
CK TX
DRAM

RX
CK/CK#

TX
Delay

• Conceptually similar to LPDDR4 CS TX


1

RX
CS[1:0]

TX
Command Bus Training Delay
1
• 2 available modes: CA TX

RX
CA[6:0]

TX
Delay

• Mode 1 uses WCK & DQ[7:0] to train


delay
• Mode 2 also requires DMI pin and WCK/WCK# DCA

RX
TX
WCK TX
enables a means to train Vref also Delay

without exiting the training mode DQ

RX

TX
• In these modes:
• data sent on CS and CA and captured
on one edge of CK

TX

RX
• Sampled values are returned DQ TX
Delay

statically on DQ pins
LPDDR5 Workshop 8
LPDDR5 Setup for Command Bus Training
• Prior to entering Command Bus Training:
• Program all pertinent settings (latencies, termination, Vref, etc.) for one
inactive FSP
• Set VRCG to enable rapid changes in DRAM Vref level
• Send MRW-1 and MRW-2 “CBT Entry” commands and DQ[7] LOW to enter
the training mode
• Setting DQ[7] HIGH and toggling WCK will change the active FSP
• The change clock frequency and begin training
• To exit the training mode:
• Switch DQ[7] LOW to return to the original “known good” FSP
• With DQ[7] LOW, send MRW-1 and MRW-2 “CBT Exit” commands at low
speed
LPDDR5 Workshop
LPDDR5 CBT
• Mode 1 Training • Mode 2 Training
• Write MRs to configure one of the • Write MRs to configure one of the
unused FSPs unused FSPs
• Enter Mode 1 training and switch • Enter Mode 2 training and switch
to high frequency to high frequency
• New FSP will become active • New FSP will become active
• Adjust delays and send commands • Adjust delays and Vref(CA) and
with CS and CA to train them send commands with CS and CA to
• Responses will be provided on train them
DQ[6:0] • Setting DMI[0] LOW allows host to
• If training Vref, exit mode 1 provide Vref(CA) setting on DQ[6:0]
training, change Vref, and re-enter • Responses will be provided on
mode 1 training DQ[6:0]

LPDDR5 Workshop 10
WCK2CK Leveling Training
LPDDR5 Training: Aligning WCK to CK

LPDDR5 Workshop 11
WCK2CK Leveling SoC DRAM
CK TX

RX
CK/CK#

TX
Delay

• Write leveling aligns WCK CS TX

RX
CS[1:0]

TX
Delay
2
strobes to data for each byte CA TX

RX
CA[6:0]

TX
Delay

and each rank WCK TX WCK/WCK# DCA

RX
TX
Delay

• Host must adjust WCK delay RDQS


RDQS_t/c

RX

TX
to align WCK rising edge to
CK DCM

RDQS_t/c DQ+

RX

TX
Delay

DQ TX

TX

RX
Delay

LPDDR5 Workshop 12
WCK2CK Leveling SoC DRAM
CK TX

RX
CK/CK#

TX
Delay

• Writing MR18 OP[6]=1 puts the CS TX

RX
CS[1:0]

TX
Delay
2
LPDDR5 DRAM into write leveling CA TX

RX
CA[6:0]

TX
Delay

mode WCK TX WCK/WCK# DCA

RX
TX
Delay

• In this mode, host should toggle


WCK for 8 pulses at a time and a RDQS

RX
RDQS_t/c

TX
response indicating alignment to DCM

CK will be provided on DQ RDQS_t/c DQ+

RX

TX
Delay
• 0 indicates WCK is earlier than CK
• 1 indicates WCK is later than CK
DQ TX

• SoC should adjust WCK phase

TX

RX
Delay

delay until alignment is reached


LPDDR5 Workshop 13
WCK2CK Leveling Example

LPDDR5 Workshop 14
Multi-rank Sync-ing
• In multi-rank systems in which • Multi-rank systems can also be
performance is a higher priority than supported by sync’ing to only one
power, users may wish to sync WCK to rank at a time
both ranks and keep it always running • In this case, no averaging should be
• This eats into timing margins, as the done; leveling results should be
leveling requirements may be slightly independent for each byte in each
different at each of the 2 ranks rank
• Difference may be up to 100 ps, removing
up to 50 ps of accuracy from each rank
• Timing budgets for CK-WCK alignment
must be carefully managed in this case
• To support this, train as described in
previous slide and average the results

LPDDR5 Workshop 15
WCK Duty Cycle Training
LPDDR5 Training

LPDDR5 Workshop 16
WCK Duty Cycle
Training SoC
CK TX
DRAM

RX
CK/CK#

TX
Delay

• WCK duty cycle performance is critical to CS TX

RX
CS[1:0]

TX
Delay
several aspects of performance:
• RDQS duty cycle CA TX

RX
CA[6:0]

TX
Delay

• Odd/even read DQ launch WCK TX


3
WCK/WCK# DCA

RX
• Odd/even write DQ capture

TX
Delay

• As such, LPDDR5 DRAM have built in RDQS


RDQS_t/c

RX

TX
facilities to support correction of duty
cycle:
• Duty Cycle Adjuster (“DCA”) to control duty DCM

cycle RDQS_t/c

RX
DQ+

TX
• Duty Cycle Monitor (“DCM”) to observe duty Delay

cycle
• Ability to reverse inputs to the duty cycle
monitor (or “flip”) the monitor in order to DQ TX
correct for asymmetry in the monitor itself

TX

RX
Delay

LPDDR5 Workshop 17
Duty Cycle Training
• Issue CAS command with WCK2CK • Adjust WCK duty cycle for both bytes
Fast Sync by writing MR30
• Run WCK at full rate • Repeat the DCM measurement
• Set MR26 OP[0]=1 to initiate DCM described at left
operation • After sweeping DCA and identifying
• Wait tDCMM for measurement, then optimal setting, program MR30 for
flip DCM by setting MR26 OP[1]=1 mission mode operation
• Wait tDCMM than set MR26 OP[0]=0 • In 2 rank systems, do this once for
to complete DCM measurement each rank
• Read results for upper and lower
bytes from both flip settings from
MR26 OP[5:2]

LPDDR5 Workshop 18
Read Gate Training
LPDDR5 Training

LPDDR5 Workshop 19
Read Gate SoC DRAM
CK TX

RX
CK/CK#

TX
Delay

• The SoC PHY requires some CS TX

RX
CS[1:0]

TX
Delay

mechanism to determine when CA TX

RX
CA[6:0]

TX
Delay

to observe RDQS and DQ from WCK TX WCK/WCK# DCA

RX
TX
Delay

DRAM – call this a “read gate” Gate logic


4
RDQS
RDQS_t/c

RX

TX
• Train the time from read
command launch to response DCM

arriving at PHY RDQS_t/c

RX
DQ+

TX
Delay

• [Read gating logic represented by


,

red box at right]


DQ TX

TX

RX
Delay

LPDDR5 Workshop 20
Read Gate Training
• It is useful to be able to train read gate before training read data or write
data
• LPDDR5 provides 3 useful functions to that end:
• RDQS toggle mode provides a continuous RDQS from LPDDR5 DRAM to host. This
mode is entered by writing MR46 OP[0]=1.
• Enhanced RDQS training mode maintains RDQS_t=0/RDQS_c=1 between read
bursts. This mode is entered by writing MR46 OP[1]=1.
• DQ calibration training patterns. Patterns are programmable via MRWs (to MR31 –
MR34) without needing the DQ bus to program it.
• There are many possible approaches to training the read gate, but generally
an ability to sample RDQS within the PHY without using DQ data is useful
• With that, the PHY need only sweep the sampling mechanism timing to determine
RDQS arrival timing and set the read gate delay accordingly

LPDDR5 Workshop 21
Read Gate Training Mode Examples
RDQS Toggle Mode Enhanced RDQS Training Mode
(Entry Example) (Read during this mode example)

LPDDR5 Workshop 22
Write and Read Data Training
LPDDR5 Training

LPDDR5 Workshop 23
Data Training SoC DRAM
CK TX

RX
CK/CK#

TX
Delay

• Data training will ensure adequate CS TX

RX
CS[1:0]

TX
Delay

timing margins for write and read CA TX

RX
CA[6:0]

TX
interfaces Delay

• Read data training consists of


WCK TX WCK/WCK# DCA

RX
TX
Delay

training host Vref, equalization (if


supported) and delay RDQS
RDQS_t/c

RX

TX
DCM

• DQ and DMI are trained to RDQS


• DQ calibration patterns allow reads
to be trained before writes RDQS_t/c 5

RX
DQ+

TX
Delay

• Write data training consists of


DRAM DFE and Vref per byte and 6
host bit delay DQ TX

TX

RX
Delay
• DQ, DMI, and RDQS_t (linkECC) are
trained to WCK

LPDDR5 Workshop 24
Read Data Training SoC DRAM
CK TX

RX
CK/CK#

TX
Delay

• Write MR31 – MR34 to set CS TX

RX
CS[1:0]

TX
Delay

desired DQ training patterns CA TX

RX
CA[6:0]

TX
Delay

• Issue Read DQ Calibration (RDC) WCK TX WCK/WCK# DCA

RX
TX
Delay

commands to read the RDQS


RDQS_t/c

RX

TX
calibration patterns DCM

• SoC receive delays 5


RDQS_t/c
• SoC receive Vref

RX
DQ+

TX
Delay

• Other SoC receive characteristics,


such as equalization DQ TX

TX

RX
Delay

LPDDR5 Workshop 25
Write Data Training
SoC DRAM
CK TX

• Train DQ output delays

RX
CK/CK#

TX
Delay

• Optionally, other SoC output CS TX

RX
CS[1:0]

TX
Delay

characteristics may also be trained


here CA TX

RX
CA[6:0]

TX
Delay

• With reads trained previously, WCK TX WCK/WCK# DCA

RX
TX
Delay

writes may be trained


• LPDDR5 includes a FIFO that may be RDQS
RDQS_t/c

RX

TX
used for training with less protocol DCM

overhead than DRAM


• No activate, precharge, or refreshes
required RDQS_t/c

RX
DQ+

TX
Delay

• FIFO is 8 x BL16 deep


• Alternately, DRAM memory may 6
also be used instead of the FIFO DQ TX

TX

RX
• Enables arbitrarily long training Delay

patterns for even more stressful


training than the FIFO allows

LPDDR5 Workshop 26
Write Data Training – DMI and RDQS_t
• Training DMI pin requires special • Training RDQS_t (parity) also requires special
consideration considerations
• Option 1 : Using LPDDR5’s training FIFO • Option 1 : Using LPDDR5’s training FIFO with
DMI pin can be trained at the same time WCK-RDQS_t training mode (MR46 OP[2] = 1)
as DQ pins • Write data on RDQS_t is written to FIFO, and
these data can be read-out via DMI pin by Read
• Write data on DMI is written to FIFO, and FIFO command
these data can be read-out by Read FIFO
command • RDQS_t cannot be trained at the same time as
DQ/DMI. If both DMI and RDQS_t are used in a
• Option 2 : Using main memory system, 2 iterations are required, once to train
DMI pin can be trained after DQ with DQ/DMI and another to train with RDQS_t
• In this case, failures on DMI sampling with • Option 2 : Using LPDDR5’s Read/Write-based
complex patterns may be difficult to discern WCK-RDQS_t training mode (MR26 OP[7] = 1)
from failures in other DQ bits
• This mode is available only when DRAM supports
it (MR26 OP[6] =1)
• RDQS_t behaves like DMI pin, and DMI input is
ignored. DRAM inverts write data on DQ inputs
when RDQS_t is sampled High.

LPDDR5 Workshop 27
DRAM DFE Training
• LPDDR5 includes support for
Decision Feedback Equalization
(DFE)
• The DFE is 1 tap – equalization is
based on the previous bit sent • Training procedure:
• Set the DFE quantity in MR24
• The 1 tap has 8 possible settings • Perform writes to DRAM and read
(3 bits programmability), back
independently programmable • DRAM memory or training FIFO may
for each rank and byte be used to do this
• Use of DFE is optional • Adjust DFE quantity in MR24 and
repeat training patterns

LPDDR5 Workshop 28
Read Data Refinement
(Optional) SoC
CK TX
DRAM

RX
CK/CK#

TX
Delay

• Calibration training patterns CS TX

RX
CS[1:0]

TX
Delay

restrict the complexity of data CA TX

RX
CA[6:0]

TX
Delay

patterns that can be used for WCK TX WCK/WCK# DCA

RX
TX
training Delay

• After write training is RDQS


RDQS_t/c

RX

TX
DCM

completed, additional read


training with more complex data RDQS_t/c 5

RX
DQ+

TX
patterns is possible Delay

• The LPDDR5 FIFO or the DRAM


array may be used for refining DQ TX

TX

RX
Delay

read training

LPDDR5 Workshop 29
LPDDR5 training mode summary
• User can select appropriate training mode to optimize performance in LPDDR5 system
Training Training mode / command MR : mode selection Support Indicator Note

1. Command bus training - CBT mode1 MR13 OP[6] = 1 Supported Mode 1 is for
- CBT mode2 MR13 OP[6] = 0 DMI-less system
2. WCK2CK leveling - WCK2CK leveling mode MR18 OP[6] = 1 Supported
3. WCK Duty cycle training - MRW : DCM start MR26 OP[0] = 1 Supported
4. Read gate training - Enhanced RDQS training mode MR46 OP[1] = 1 Supported
- RDQS toggle mode MR46 OP[0] = 1
5a. Write data training - Training FIFO for DQ/DMI MR46 OP[2] = 0 Supported
- Training FIFO for RDQS_t MR46 OP[2] = 1 Supported
- Read/Write-based MR26 OP[7] = 1 MR26 OP[6]
WCK-RDQS-t training mode
5b. DRAM DFE training - MRW MR24 (DFE quantity) no mode select MR24 OP[7]
6. Read data training - RDC command no mode select Supported MR20, 31-34 define
- Training FIFO RDC data pattern

LPDDR5 Workshop 30
Periodic Retraining
• Some LPDDR5 DRAM timing parameters can drift over time with
voltage and temperature
• tWCK2DQO : Read response timing for RDQS + DQ
• tWCK2DQI : Write WCK-to-DQ offset
• Consequently, periodic updates to the following trainings will be
necessary to track temperature and low-frequency voltage changes:
• Write data training to track tWCK2DQI
• Read gate training to track tWCK2DQO

LPDDR5 Workshop 31
Thank You
• Question?

LPDDR5 Workshop 32

You might also like