Lpddr5 System Training

Global Standards for the Microelectronics Industry
LPDDR5
System Training
Copyright © 2019
LPDDR5 Workshop [Raj Mahajan, Tsun Ho Liu]
LPDDR5 Interface Training Agenda
• Overview
• Address/command interface
• WCK2CK leveling
• WCK DCA training
• Read gate training
• Data interface training – read and write
LPDDR5 Workshop 2
Introduction
• The new LPDDR5 SDRAM interface pushes data rates to 6400 Mbps
• Boot up trainings are required to operate a parallel interface at such
high data rates
• This presentation will focus on the boot trainings required to operate
at up to 6400 Mbps
• The following will not be covered:
• Re-training / drift tracking
• Command Bus Training with DVFSQ
LPDDR5 Workshop 3
LPDDR5 training overview
• Critical timing relationships in LPDDR5 and their data rates in an LPDDR5-6400 system
• LPDDR5-6400 bit rates will be used as examples throughout this presentation
Training Date rate / freq Training target SoC DRAM
CK TX
RX
CK/CK#
TX
1. Command bus training Delay
800Mbps SoC : CA/CS delay 1

CK – CS CS TX
1
RX
CS[1:0]
TX
1600Mbps DRAM : Vref(CA) Delay
CK – CA 2
CA TX
RX
CA[6:0]
TX
Delay
2. WCK2CK leveling CK 800MHz
SoC : WCK delay WCK TX
3
CK – WCK WCK 3200MHz WCK/WCK# DCA
RX
TX
Delay
4
3. WCK Duty cycle training WCK 3200MHz DRAM : DCA code RDQS
RDQS_t/c
RX
TX
DCM
4. Read gate training RDQS 3200MHz SoC : Read gate delay 5

RDQS_t/c
RX
DQ+
TX
Delay
5. Read data training SoC : Rx delay,

6400Mbps 6
RDQS – DQ/DMI Vref(DQ)
DQ TX
TX
RX
Delay
6. Write data training SoC : Tx delay
6400Mbps
WCK - DQ/DMI/RDQS_t DRAM : Vref(DQ)
LPDDR5 Workshop 4
Command Bus Training
LPDDR5 Training: CA and CS
LPDDR5 Workshop 5
LPDDR5 CS and CA SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
• The revolutionary LPDDR5 CS TX

1
RX
CS[1:0]
TX
interface decouples Delay
1
address/command clocking from CA TX
RX
CA[6:0]
TX
Delay
the blisteringly fast data
interface
• CK (clock for address/command)
WCK/WCK# DCA
RX
TX
WCK TX
runs up to 800 MHz, while data Delay
strobe can reach 3200 MH DQ
RX
TX
• Command bus training is used to
train SoC delay of CS and CA and
DRAM Vref for CA receivers
TX
RX
DQ TX
Delay
LPDDR5 Workshop 6
LPDDR5 CS and CA training
CS training CA training
• CS is 800 Mbps, VSS-terminated • CA is 1600 Mbps, VSS-
or unterminated terminated or unterminated
• CS should be trained for delay to • CA Vref should be trained to
center it on rising edges of CK remove uncertainty in sampling
level due to impedance
uncertainties
• CA should also be trained for
delay to center it on rising edges
of CK
LPDDR5 Workshop 7
LPDDR5 Command
Bus Training (“CBT”) SoC
CK TX
DRAM
RX
CK/CK#
TX
Delay
• Conceptually similar to LPDDR4 CS TX

1
RX
CS[1:0]
TX
Command Bus Training Delay
1
• 2 available modes: CA TX
RX
CA[6:0]
TX
Delay
• Mode 1 uses WCK & DQ[7:0] to train

delay
• Mode 2 also requires DMI pin and WCK/WCK# DCA
RX
TX
WCK TX
enables a means to train Vref also Delay
without exiting the training mode DQ
RX
TX
• In these modes:
• data sent on CS and CA and captured
on one edge of CK
TX
RX
• Sampled values are returned DQ TX
Delay
statically on DQ pins
LPDDR5 Workshop 8
LPDDR5 Setup for Command Bus Training
• Prior to entering Command Bus Training:
• Program all pertinent settings (latencies, termination, Vref, etc.) for one
inactive FSP
• Set VRCG to enable rapid changes in DRAM Vref level
• Send MRW-1 and MRW-2 “CBT Entry” commands and DQ[7] LOW to enter
the training mode
• Setting DQ[7] HIGH and toggling WCK will change the active FSP
• The change clock frequency and begin training
• To exit the training mode:
• Switch DQ[7] LOW to return to the original “known good” FSP
• With DQ[7] LOW, send MRW-1 and MRW-2 “CBT Exit” commands at low
speed
LPDDR5 Workshop
LPDDR5 CBT
• Mode 1 Training • Mode 2 Training
• Write MRs to configure one of the • Write MRs to configure one of the
unused FSPs unused FSPs
• Enter Mode 1 training and switch • Enter Mode 2 training and switch
to high frequency to high frequency
• New FSP will become active • New FSP will become active
• Adjust delays and send commands • Adjust delays and Vref(CA) and
with CS and CA to train them send commands with CS and CA to
• Responses will be provided on train them
DQ[6:0] • Setting DMI[0] LOW allows host to
• If training Vref, exit mode 1 provide Vref(CA) setting on DQ[6:0]
training, change Vref, and re-enter • Responses will be provided on
mode 1 training DQ[6:0]
LPDDR5 Workshop 10
WCK2CK Leveling Training
LPDDR5 Training: Aligning WCK to CK
LPDDR5 Workshop 11
WCK2CK Leveling SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
• Write leveling aligns WCK CS TX
RX
CS[1:0]
TX
Delay
2
strobes to data for each byte CA TX
RX
CA[6:0]
TX
Delay
and each rank WCK TX WCK/WCK# DCA
RX
TX
Delay
• Host must adjust WCK delay RDQS

RDQS_t/c
RX
TX
to align WCK rising edge to
CK DCM
RDQS_t/c DQ+
RX
TX
Delay
DQ TX
TX
RX
Delay
LPDDR5 Workshop 12
WCK2CK Leveling SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
• Writing MR18 OP[6]=1 puts the CS TX
RX
CS[1:0]
TX
Delay
2
LPDDR5 DRAM into write leveling CA TX
RX
CA[6:0]
TX
Delay
mode WCK TX WCK/WCK# DCA
RX
TX
Delay
• In this mode, host should toggle

WCK for 8 pulses at a time and a RDQS
RX
RDQS_t/c
TX
response indicating alignment to DCM
CK will be provided on DQ RDQS_t/c DQ+
RX
TX
Delay
• 0 indicates WCK is earlier than CK
• 1 indicates WCK is later than CK
DQ TX
• SoC should adjust WCK phase
TX
RX
Delay
delay until alignment is reached

LPDDR5 Workshop 13
WCK2CK Leveling Example
LPDDR5 Workshop 14
Multi-rank Sync-ing
• In multi-rank systems in which • Multi-rank systems can also be
performance is a higher priority than supported by sync’ing to only one
power, users may wish to sync WCK to rank at a time
both ranks and keep it always running • In this case, no averaging should be
• This eats into timing margins, as the done; leveling results should be
leveling requirements may be slightly independent for each byte in each
different at each of the 2 ranks rank
• Difference may be up to 100 ps, removing
up to 50 ps of accuracy from each rank
• Timing budgets for CK-WCK alignment
must be carefully managed in this case
• To support this, train as described in
previous slide and average the results
LPDDR5 Workshop 15
WCK Duty Cycle Training
LPDDR5 Training
LPDDR5 Workshop 16
WCK Duty Cycle
Training SoC
CK TX
DRAM
RX
CK/CK#
TX
Delay
• WCK duty cycle performance is critical to CS TX
RX
CS[1:0]
TX
Delay
several aspects of performance:
• RDQS duty cycle CA TX
RX
CA[6:0]
TX
Delay
• Odd/even read DQ launch WCK TX

3
WCK/WCK# DCA
RX
• Odd/even write DQ capture
TX
Delay
• As such, LPDDR5 DRAM have built in RDQS

RDQS_t/c
RX
TX
facilities to support correction of duty
cycle:
• Duty Cycle Adjuster (“DCA”) to control duty DCM
cycle RDQS_t/c
RX
DQ+
TX
• Duty Cycle Monitor (“DCM”) to observe duty Delay
cycle
• Ability to reverse inputs to the duty cycle
monitor (or “flip”) the monitor in order to DQ TX
correct for asymmetry in the monitor itself
TX
RX
Delay
LPDDR5 Workshop 17
Duty Cycle Training
• Issue CAS command with WCK2CK • Adjust WCK duty cycle for both bytes
Fast Sync by writing MR30
• Run WCK at full rate • Repeat the DCM measurement
• Set MR26 OP[0]=1 to initiate DCM described at left
operation • After sweeping DCA and identifying
• Wait tDCMM for measurement, then optimal setting, program MR30 for
flip DCM by setting MR26 OP[1]=1 mission mode operation
• Wait tDCMM than set MR26 OP[0]=0 • In 2 rank systems, do this once for
to complete DCM measurement each rank
• Read results for upper and lower
bytes from both flip settings from
MR26 OP[5:2]
LPDDR5 Workshop 18
Read Gate Training
LPDDR5 Training
LPDDR5 Workshop 19
Read Gate SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
• The SoC PHY requires some CS TX
RX
CS[1:0]
TX
Delay
mechanism to determine when CA TX
RX
CA[6:0]
TX
Delay
to observe RDQS and DQ from WCK TX WCK/WCK# DCA
RX
TX
Delay
DRAM – call this a “read gate” Gate logic

4
RDQS
RDQS_t/c
RX
TX
• Train the time from read
command launch to response DCM
arriving at PHY RDQS_t/c
RX
DQ+
TX
Delay
• [Read gating logic represented by

,
red box at right]

DQ TX
TX
RX
Delay
LPDDR5 Workshop 20
Read Gate Training
• It is useful to be able to train read gate before training read data or write
data
• LPDDR5 provides 3 useful functions to that end:
• RDQS toggle mode provides a continuous RDQS from LPDDR5 DRAM to host. This
mode is entered by writing MR46 OP[0]=1.
• Enhanced RDQS training mode maintains RDQS_t=0/RDQS_c=1 between read
bursts. This mode is entered by writing MR46 OP[1]=1.
• DQ calibration training patterns. Patterns are programmable via MRWs (to MR31 –
MR34) without needing the DQ bus to program it.
• There are many possible approaches to training the read gate, but generally
an ability to sample RDQS within the PHY without using DQ data is useful
• With that, the PHY need only sweep the sampling mechanism timing to determine
RDQS arrival timing and set the read gate delay accordingly
LPDDR5 Workshop 21
Read Gate Training Mode Examples
RDQS Toggle Mode Enhanced RDQS Training Mode
(Entry Example) (Read during this mode example)
LPDDR5 Workshop 22
Write and Read Data Training
LPDDR5 Training
LPDDR5 Workshop 23
Data Training SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
• Data training will ensure adequate CS TX
RX
CS[1:0]
TX
Delay
timing margins for write and read CA TX
RX
CA[6:0]
TX
interfaces Delay
• Read data training consists of

WCK TX WCK/WCK# DCA
RX
TX
Delay
training host Vref, equalization (if

supported) and delay RDQS
RDQS_t/c
RX
TX
DCM
• DQ and DMI are trained to RDQS

• DQ calibration patterns allow reads
to be trained before writes RDQS_t/c 5
RX
DQ+
TX
Delay
• Write data training consists of

DRAM DFE and Vref per byte and 6
host bit delay DQ TX
TX
RX
Delay
• DQ, DMI, and RDQS_t (linkECC) are
trained to WCK
LPDDR5 Workshop 24
Read Data Training SoC DRAM
CK TX
RX
CK/CK#
TX
Delay
• Write MR31 – MR34 to set CS TX
RX
CS[1:0]
TX
Delay
desired DQ training patterns CA TX
RX
CA[6:0]
TX
Delay
• Issue Read DQ Calibration (RDC) WCK TX WCK/WCK# DCA
RX
TX
Delay
commands to read the RDQS

RDQS_t/c
RX
TX
calibration patterns DCM
• SoC receive delays 5

RDQS_t/c
• SoC receive Vref
RX
DQ+
TX
Delay
• Other SoC receive characteristics,

such as equalization DQ TX
TX
RX
Delay
LPDDR5 Workshop 25
Write Data Training
SoC DRAM
CK TX
• Train DQ output delays
RX
CK/CK#
TX
Delay
• Optionally, other SoC output CS TX
RX
CS[1:0]
TX
Delay
characteristics may also be trained

here CA TX
RX
CA[6:0]
TX
Delay
• With reads trained previously, WCK TX WCK/WCK# DCA
RX
TX
Delay
writes may be trained

• LPDDR5 includes a FIFO that may be RDQS
RDQS_t/c
RX
TX
used for training with less protocol DCM
overhead than DRAM

• No activate, precharge, or refreshes
required RDQS_t/c
RX
DQ+
TX
Delay
• FIFO is 8 x BL16 deep

• Alternately, DRAM memory may 6
also be used instead of the FIFO DQ TX
TX
RX
• Enables arbitrarily long training Delay
patterns for even more stressful

training than the FIFO allows
LPDDR5 Workshop 26
Write Data Training – DMI and RDQS_t
• Training DMI pin requires special • Training RDQS_t (parity) also requires special
consideration considerations
• Option 1 : Using LPDDR5’s training FIFO • Option 1 : Using LPDDR5’s training FIFO with
DMI pin can be trained at the same time WCK-RDQS_t training mode (MR46 OP[2] = 1)
as DQ pins • Write data on RDQS_t is written to FIFO, and
these data can be read-out via DMI pin by Read
• Write data on DMI is written to FIFO, and FIFO command
these data can be read-out by Read FIFO
command • RDQS_t cannot be trained at the same time as
DQ/DMI. If both DMI and RDQS_t are used in a
• Option 2 : Using main memory system, 2 iterations are required, once to train
DMI pin can be trained after DQ with DQ/DMI and another to train with RDQS_t
• In this case, failures on DMI sampling with • Option 2 : Using LPDDR5’s Read/Write-based
complex patterns may be difficult to discern WCK-RDQS_t training mode (MR26 OP[7] = 1)
from failures in other DQ bits
• This mode is available only when DRAM supports
it (MR26 OP[6] =1)
• RDQS_t behaves like DMI pin, and DMI input is
ignored. DRAM inverts write data on DQ inputs
when RDQS_t is sampled High.
LPDDR5 Workshop 27
DRAM DFE Training
• LPDDR5 includes support for
Decision Feedback Equalization
(DFE)
• The DFE is 1 tap – equalization is
based on the previous bit sent • Training procedure:
• Set the DFE quantity in MR24
• The 1 tap has 8 possible settings • Perform writes to DRAM and read
(3 bits programmability), back
independently programmable • DRAM memory or training FIFO may
for each rank and byte be used to do this
• Use of DFE is optional • Adjust DFE quantity in MR24 and
repeat training patterns
LPDDR5 Workshop 28
Read Data Refinement
(Optional) SoC
CK TX
DRAM
RX
CK/CK#
TX
Delay
• Calibration training patterns CS TX
RX
CS[1:0]
TX
Delay
restrict the complexity of data CA TX
RX
CA[6:0]
TX
Delay
patterns that can be used for WCK TX WCK/WCK# DCA
RX
TX
training Delay
• After write training is RDQS

RDQS_t/c
RX
TX
DCM
completed, additional read

training with more complex data RDQS_t/c 5
RX
DQ+
TX
patterns is possible Delay
• The LPDDR5 FIFO or the DRAM

array may be used for refining DQ TX
TX
RX
Delay
read training
LPDDR5 Workshop 29
LPDDR5 training mode summary
• User can select appropriate training mode to optimize performance in LPDDR5 system
Training Training mode / command MR : mode selection Support Indicator Note
1. Command bus training - CBT mode1 MR13 OP[6] = 1 Supported Mode 1 is for
- CBT mode2 MR13 OP[6] = 0 DMI-less system
2. WCK2CK leveling - WCK2CK leveling mode MR18 OP[6] = 1 Supported
3. WCK Duty cycle training - MRW : DCM start MR26 OP[0] = 1 Supported
4. Read gate training - Enhanced RDQS training mode MR46 OP[1] = 1 Supported
- RDQS toggle mode MR46 OP[0] = 1
5a. Write data training - Training FIFO for DQ/DMI MR46 OP[2] = 0 Supported
- Training FIFO for RDQS_t MR46 OP[2] = 1 Supported
- Read/Write-based MR26 OP[7] = 1 MR26 OP[6]
WCK-RDQS-t training mode
5b. DRAM DFE training - MRW MR24 (DFE quantity) no mode select MR24 OP[7]
6. Read data training - RDC command no mode select Supported MR20, 31-34 define
- Training FIFO RDC data pattern
LPDDR5 Workshop 30
Periodic Retraining
• Some LPDDR5 DRAM timing parameters can drift over time with
voltage and temperature
• tWCK2DQO : Read response timing for RDQS + DQ
• tWCK2DQI : Write WCK-to-DQ offset
• Consequently, periodic updates to the following trainings will be
necessary to track temperature and low-frequency voltage changes:
• Write data training to track tWCK2DQI
• Read gate training to track tWCK2DQO
LPDDR5 Workshop 31
Thank You
• Question?
LPDDR5 Workshop 32

Lpddr5 System Training

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lpddr5 System Training

Uploaded by

Copyright:

Available Formats

Global Standards for the Microelectronics Industry

800Mbps SoC : CA/CS delay 1

4. Read gate training RDQS 3200MHz SoC : Read gate delay 5

5. Read data training SoC : Rx delay,

• The revolutionary LPDDR5 CS TX

runs up to 800 MHz, while data Delay

strobe can reach 3200 MH DQ

• Conceptually similar to LPDDR4 CS TX

• Mode 1 uses WCK & DQ[7:0] to train

without exiting the training mode DQ

• Write leveling aligns WCK CS TX

and each rank WCK TX WCK/WCK# DCA

• Host must adjust WCK delay RDQS

• Writing MR18 OP[6]=1 puts the CS TX

mode WCK TX WCK/WCK# DCA

• In this mode, host should toggle

CK will be provided on DQ RDQS_t/c DQ+

• SoC should adjust WCK phase

delay until alignment is reached

• WCK duty cycle performance is critical to CS TX

• Odd/even read DQ launch WCK TX

• As such, LPDDR5 DRAM have built in RDQS

• The SoC PHY requires some CS TX

mechanism to determine when CA TX

to observe RDQS and DQ from WCK TX WCK/WCK# DCA

DRAM – call this a “read gate” Gate logic

arriving at PHY RDQS_t/c

• [Read gating logic represented by

red box at right]

• Data training will ensure adequate CS TX

timing margins for write and read CA TX

• Read data training consists of

training host Vref, equalization (if

• DQ and DMI are trained to RDQS

• Write data training consists of

• Write MR31 – MR34 to set CS TX

desired DQ training patterns CA TX

• Issue Read DQ Calibration (RDC) WCK TX WCK/WCK# DCA

commands to read the RDQS

• SoC receive delays 5

• Other SoC receive characteristics,

• Train DQ output delays

• Optionally, other SoC output CS TX

characteristics may also be trained

• With reads trained previously, WCK TX WCK/WCK# DCA

writes may be trained

overhead than DRAM

• FIFO is 8 x BL16 deep

patterns for even more stressful

• Calibration training patterns CS TX

restrict the complexity of data CA TX

patterns that can be used for WCK TX WCK/WCK# DCA

• After write training is RDQS

completed, additional read

• The LPDDR5 FIFO or the DRAM

You might also like