Lpddr5 System Training
Lpddr5 System Training
LPDDR5
System Training
Copyright © 2019
LPDDR5 Workshop [Raj Mahajan, Tsun Ho Liu]
LPDDR5 Interface Training Agenda
• Overview
• Address/command interface
• WCK2CK leveling
• WCK DCA training
• Read gate training
• Data interface training – read and write
LPDDR5 Workshop 2
Introduction
• The new LPDDR5 SDRAM interface pushes data rates to 6400 Mbps
• Boot up trainings are required to operate a parallel interface at such
high data rates
• This presentation will focus on the boot trainings required to operate
at up to 6400 Mbps
• The following will not be covered:
• Re-training / drift tracking
• Command Bus Training with DVFSQ
LPDDR5 Workshop 3
LPDDR5 training overview
• Critical timing relationships in LPDDR5 and their data rates in an LPDDR5-6400 system
• LPDDR5-6400 bit rates will be used as examples throughout this presentation
Training Date rate / freq Training target SoC DRAM
CK TX
RX
CK/CK#
TX
1. Command bus training Delay
RX
CS[1:0]
TX
1600Mbps DRAM : Vref(CA) Delay
CK – CA 2
CA TX
RX
CA[6:0]
TX
Delay
2. WCK2CK leveling CK 800MHz
SoC : WCK delay WCK TX
3
CK – WCK WCK 3200MHz WCK/WCK# DCA
RX
TX
Delay
4
3. WCK Duty cycle training WCK 3200MHz DRAM : DCA code RDQS
RDQS_t/c
RX
TX
DCM
RX
DQ+
TX
Delay
TX
RX
Delay
6. Write data training SoC : Tx delay
6400Mbps
WCK - DQ/DMI/RDQS_t DRAM : Vref(DQ)
LPDDR5 Workshop 4
Command Bus Training
LPDDR5 Training: CA and CS
LPDDR5 Workshop 5
LPDDR5 CS and CA SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
interface decouples Delay
1
address/command clocking from CA TX
RX
CA[6:0]
TX
Delay
the blisteringly fast data
interface
• CK (clock for address/command)
WCK/WCK# DCA
RX
TX
WCK TX
RX
TX
• Command bus training is used to
train SoC delay of CS and CA and
DRAM Vref for CA receivers
TX
RX
DQ TX
Delay
LPDDR5 Workshop 6
LPDDR5 CS and CA training
CS training CA training
• CS is 800 Mbps, VSS-terminated • CA is 1600 Mbps, VSS-
or unterminated terminated or unterminated
• CS should be trained for delay to • CA Vref should be trained to
center it on rising edges of CK remove uncertainty in sampling
level due to impedance
uncertainties
• CA should also be trained for
delay to center it on rising edges
of CK
LPDDR5 Workshop 7
LPDDR5 Command
Bus Training (“CBT”) SoC
CK TX
DRAM
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Command Bus Training Delay
1
• 2 available modes: CA TX
RX
CA[6:0]
TX
Delay
RX
TX
WCK TX
enables a means to train Vref also Delay
RX
TX
• In these modes:
• data sent on CS and CA and captured
on one edge of CK
TX
RX
• Sampled values are returned DQ TX
Delay
statically on DQ pins
LPDDR5 Workshop 8
LPDDR5 Setup for Command Bus Training
• Prior to entering Command Bus Training:
• Program all pertinent settings (latencies, termination, Vref, etc.) for one
inactive FSP
• Set VRCG to enable rapid changes in DRAM Vref level
• Send MRW-1 and MRW-2 “CBT Entry” commands and DQ[7] LOW to enter
the training mode
• Setting DQ[7] HIGH and toggling WCK will change the active FSP
• The change clock frequency and begin training
• To exit the training mode:
• Switch DQ[7] LOW to return to the original “known good” FSP
• With DQ[7] LOW, send MRW-1 and MRW-2 “CBT Exit” commands at low
speed
LPDDR5 Workshop
LPDDR5 CBT
• Mode 1 Training • Mode 2 Training
• Write MRs to configure one of the • Write MRs to configure one of the
unused FSPs unused FSPs
• Enter Mode 1 training and switch • Enter Mode 2 training and switch
to high frequency to high frequency
• New FSP will become active • New FSP will become active
• Adjust delays and send commands • Adjust delays and Vref(CA) and
with CS and CA to train them send commands with CS and CA to
• Responses will be provided on train them
DQ[6:0] • Setting DMI[0] LOW allows host to
• If training Vref, exit mode 1 provide Vref(CA) setting on DQ[6:0]
training, change Vref, and re-enter • Responses will be provided on
mode 1 training DQ[6:0]
LPDDR5 Workshop 10
WCK2CK Leveling Training
LPDDR5 Training: Aligning WCK to CK
LPDDR5 Workshop 11
WCK2CK Leveling SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
2
strobes to data for each byte CA TX
RX
CA[6:0]
TX
Delay
RX
TX
Delay
RX
TX
to align WCK rising edge to
CK DCM
RDQS_t/c DQ+
RX
TX
Delay
DQ TX
TX
RX
Delay
LPDDR5 Workshop 12
WCK2CK Leveling SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
2
LPDDR5 DRAM into write leveling CA TX
RX
CA[6:0]
TX
Delay
RX
TX
Delay
RX
RDQS_t/c
TX
response indicating alignment to DCM
RX
TX
Delay
• 0 indicates WCK is earlier than CK
• 1 indicates WCK is later than CK
DQ TX
TX
RX
Delay
LPDDR5 Workshop 14
Multi-rank Sync-ing
• In multi-rank systems in which • Multi-rank systems can also be
performance is a higher priority than supported by sync’ing to only one
power, users may wish to sync WCK to rank at a time
both ranks and keep it always running • In this case, no averaging should be
• This eats into timing margins, as the done; leveling results should be
leveling requirements may be slightly independent for each byte in each
different at each of the 2 ranks rank
• Difference may be up to 100 ps, removing
up to 50 ps of accuracy from each rank
• Timing budgets for CK-WCK alignment
must be carefully managed in this case
• To support this, train as described in
previous slide and average the results
LPDDR5 Workshop 15
WCK Duty Cycle Training
LPDDR5 Training
LPDDR5 Workshop 16
WCK Duty Cycle
Training SoC
CK TX
DRAM
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
several aspects of performance:
• RDQS duty cycle CA TX
RX
CA[6:0]
TX
Delay
RX
• Odd/even write DQ capture
TX
Delay
RX
TX
facilities to support correction of duty
cycle:
• Duty Cycle Adjuster (“DCA”) to control duty DCM
cycle RDQS_t/c
RX
DQ+
TX
• Duty Cycle Monitor (“DCM”) to observe duty Delay
cycle
• Ability to reverse inputs to the duty cycle
monitor (or “flip”) the monitor in order to DQ TX
correct for asymmetry in the monitor itself
TX
RX
Delay
LPDDR5 Workshop 17
Duty Cycle Training
• Issue CAS command with WCK2CK • Adjust WCK duty cycle for both bytes
Fast Sync by writing MR30
• Run WCK at full rate • Repeat the DCM measurement
• Set MR26 OP[0]=1 to initiate DCM described at left
operation • After sweeping DCA and identifying
• Wait tDCMM for measurement, then optimal setting, program MR30 for
flip DCM by setting MR26 OP[1]=1 mission mode operation
• Wait tDCMM than set MR26 OP[0]=0 • In 2 rank systems, do this once for
to complete DCM measurement each rank
• Read results for upper and lower
bytes from both flip settings from
MR26 OP[5:2]
LPDDR5 Workshop 18
Read Gate Training
LPDDR5 Training
LPDDR5 Workshop 19
Read Gate SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
RX
CA[6:0]
TX
Delay
RX
TX
Delay
RX
TX
• Train the time from read
command launch to response DCM
RX
DQ+
TX
Delay
TX
RX
Delay
LPDDR5 Workshop 20
Read Gate Training
• It is useful to be able to train read gate before training read data or write
data
• LPDDR5 provides 3 useful functions to that end:
• RDQS toggle mode provides a continuous RDQS from LPDDR5 DRAM to host. This
mode is entered by writing MR46 OP[0]=1.
• Enhanced RDQS training mode maintains RDQS_t=0/RDQS_c=1 between read
bursts. This mode is entered by writing MR46 OP[1]=1.
• DQ calibration training patterns. Patterns are programmable via MRWs (to MR31 –
MR34) without needing the DQ bus to program it.
• There are many possible approaches to training the read gate, but generally
an ability to sample RDQS within the PHY without using DQ data is useful
• With that, the PHY need only sweep the sampling mechanism timing to determine
RDQS arrival timing and set the read gate delay accordingly
LPDDR5 Workshop 21
Read Gate Training Mode Examples
RDQS Toggle Mode Enhanced RDQS Training Mode
(Entry Example) (Read during this mode example)
LPDDR5 Workshop 22
Write and Read Data Training
LPDDR5 Training
LPDDR5 Workshop 23
Data Training SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
RX
CA[6:0]
TX
interfaces Delay
RX
TX
Delay
RX
TX
DCM
RX
DQ+
TX
Delay
TX
RX
Delay
• DQ, DMI, and RDQS_t (linkECC) are
trained to WCK
LPDDR5 Workshop 24
Read Data Training SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
RX
CA[6:0]
TX
Delay
RX
TX
Delay
RX
TX
calibration patterns DCM
RX
DQ+
TX
Delay
TX
RX
Delay
LPDDR5 Workshop 25
Write Data Training
SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
RX
CA[6:0]
TX
Delay
RX
TX
Delay
RX
TX
used for training with less protocol DCM
RX
DQ+
TX
Delay
TX
RX
• Enables arbitrarily long training Delay
LPDDR5 Workshop 26
Write Data Training – DMI and RDQS_t
• Training DMI pin requires special • Training RDQS_t (parity) also requires special
consideration considerations
• Option 1 : Using LPDDR5’s training FIFO • Option 1 : Using LPDDR5’s training FIFO with
DMI pin can be trained at the same time WCK-RDQS_t training mode (MR46 OP[2] = 1)
as DQ pins • Write data on RDQS_t is written to FIFO, and
these data can be read-out via DMI pin by Read
• Write data on DMI is written to FIFO, and FIFO command
these data can be read-out by Read FIFO
command • RDQS_t cannot be trained at the same time as
DQ/DMI. If both DMI and RDQS_t are used in a
• Option 2 : Using main memory system, 2 iterations are required, once to train
DMI pin can be trained after DQ with DQ/DMI and another to train with RDQS_t
• In this case, failures on DMI sampling with • Option 2 : Using LPDDR5’s Read/Write-based
complex patterns may be difficult to discern WCK-RDQS_t training mode (MR26 OP[7] = 1)
from failures in other DQ bits
• This mode is available only when DRAM supports
it (MR26 OP[6] =1)
• RDQS_t behaves like DMI pin, and DMI input is
ignored. DRAM inverts write data on DQ inputs
when RDQS_t is sampled High.
LPDDR5 Workshop 27
DRAM DFE Training
• LPDDR5 includes support for
Decision Feedback Equalization
(DFE)
• The DFE is 1 tap – equalization is
based on the previous bit sent • Training procedure:
• Set the DFE quantity in MR24
• The 1 tap has 8 possible settings • Perform writes to DRAM and read
(3 bits programmability), back
independently programmable • DRAM memory or training FIFO may
for each rank and byte be used to do this
• Use of DFE is optional • Adjust DFE quantity in MR24 and
repeat training patterns
LPDDR5 Workshop 28
Read Data Refinement
(Optional) SoC
CK TX
DRAM
RX
CK/CK#
TX
Delay
RX
CS[1:0]
TX
Delay
RX
CA[6:0]
TX
Delay
RX
TX
training Delay
RX
TX
DCM
RX
DQ+
TX
patterns is possible Delay
TX
RX
Delay
read training
LPDDR5 Workshop 29
LPDDR5 training mode summary
• User can select appropriate training mode to optimize performance in LPDDR5 system
Training Training mode / command MR : mode selection Support Indicator Note
1. Command bus training - CBT mode1 MR13 OP[6] = 1 Supported Mode 1 is for
- CBT mode2 MR13 OP[6] = 0 DMI-less system
2. WCK2CK leveling - WCK2CK leveling mode MR18 OP[6] = 1 Supported
3. WCK Duty cycle training - MRW : DCM start MR26 OP[0] = 1 Supported
4. Read gate training - Enhanced RDQS training mode MR46 OP[1] = 1 Supported
- RDQS toggle mode MR46 OP[0] = 1
5a. Write data training - Training FIFO for DQ/DMI MR46 OP[2] = 0 Supported
- Training FIFO for RDQS_t MR46 OP[2] = 1 Supported
- Read/Write-based MR26 OP[7] = 1 MR26 OP[6]
WCK-RDQS-t training mode
5b. DRAM DFE training - MRW MR24 (DFE quantity) no mode select MR24 OP[7]
6. Read data training - RDC command no mode select Supported MR20, 31-34 define
- Training FIFO RDC data pattern
LPDDR5 Workshop 30
Periodic Retraining
• Some LPDDR5 DRAM timing parameters can drift over time with
voltage and temperature
• tWCK2DQO : Read response timing for RDQS + DQ
• tWCK2DQI : Write WCK-to-DQ offset
• Consequently, periodic updates to the following trainings will be
necessary to track temperature and low-frequency voltage changes:
• Write data training to track tWCK2DQI
• Read gate training to track tWCK2DQO
LPDDR5 Workshop 31
Thank You
• Question?
LPDDR5 Workshop 32