GOOD DRAM Interface Tutorial
GOOD DRAM Interface Tutorial
DRAM Interface
Won-Joo Yun
2011. 11. 18
What is Memory?
1. the mental capacity or faculty of retaining and reviving facts,
events, impressions, etc., or of recalling or recognizing
previous experiences.
Also called computer memory, storage.
a. the capacity of a computer to store information subject to recall.
b. the components of the computer in which such information is stored.
[dictionary.com]
2011-11-18
What is DRAM?
Dynamic Random Access Memory
RAM
Unlike electromagnetic tape or disk, it allows stored data to be accessed in
any order (i.e. at random)
Random refers to the idea that any piece of data can be returned in a
constant time, regardless of its physical location and whether it is related to
the previous piece of data [wikipedia.com]
Dynamic
vs. static
needs refresh
the charge stored on the input
capacitance will leak off over time
[3Tr Cell of 1k DRAM]
2011-11-18
What is DRAM?
Sequential access
Random access
2011-11-18
Semiconductor memory
RAM
ROM
2011-11-18
DRAM
1 Tr. + 1 Cap.
SRAM
4 Tr. or 6 Tr.
Static
FeRAM
1 Tr. + 1 Cap.
Almost Static
Mask ROM
1 Tr. (Single
Poly)
Not Erasable
EPROM
Erasable by UV
EEPROM
FLASH
Volatile
Non-Volatile
Memory comparison
DRAM
SRAM
FLASH
FeRAM
MRAM
PRAM
Mechanism
for data
storage
charge and
discharge of
Cap.
switching of
cross-coupled
inv.
charge and
discharge of
F.G.
Dipole
switching of
Ferro-Cap
resistivity with
magnetic
polarization
state
resistivity with
chalcogenide
material
phase change
Access Time
< 100ns
< 50ns
< 100ns
< 100ns
< 50ns
< 100ns
Write Time
< 100ns
< 50ns
< 10us
< 100ns
< 50ns
< 500ns
Erase Time
No need
No need
~ms
No need
No need
No need
# of RD/WR
operation
R&W infinite
(> 1015)
R&W infinite
(> 1015)
106 ~ 1010
1012 ~ 1016
R&W infinite
(> 1015)
109 ~ 1011
Data
Retention
Time
need refresh
need not
refresh
~ 10 years
~ 10 years
~ 10 years
~ 10 years
Operating
Current
~ 100mA
~ 100mA
~ 10mA
~ 10mA
~ 10mA
~ 10mA
Standby
Current
~ 200uA
~ 10uA
~ 10uA
~ 10uA
~ 10uA
~ 10uA
[Hynix]
2011-11-18
[Hynix]
7
Comparisons
Intel Penryn Dual Core
process : 45nm
die area : 107mm2
6MB L2 cache
48Mb/38.5mm2 = 1.25Mb/mm2
[Intel, Micron]
8
Asynchronous
Synchronous
2011-11-18
1K
4K
16K
64K
256K
1M
4M
16M
64M
256M
2G
Year
1971
1975
1979
1982
1985
1988
1991
1994
1997
2000
2005
Design Rule
(um)
10
0.8
0.5
0.30
0.18
0.08
Chip Size
(mm2)
10
13
26
30
35
50
70
110
140
160
200
Cell Size
(um2)
3000
860
400
180
65
25
10
2.5
0.72
0.26
0.05
Power Supply
(V)
20
3.3
2.5
1.5
Operation
Mode
SRAM
EDO
SDR
DDR
DRD
DDR3
Gate Oxide
(nm)
120
12
Cell Type
3Tr
12
Page Mode
100
75
30
20
16
3-D Capacitor
High
[Hynix]
2011-11-18
10
[Hynix]
2011-11-18
11
[Hynix]
2011-11-18
12
DRAM Cell
DRAM unit cell : 1 Cell Transistor + 1 Capacitor
2011-11-18
13
2011-11-18
14
2011-11-18
15
Charge sharing
BL
Cell
1
BL
Cell
0
2011-11-18
16
Charge sharing
WL
Cb
BL
Vout =
2011-11-18
Cs
Cb*VBLP + Cs*VCELL
Cb + Cs
Cs
Stand by
Q=C*V=Cb*VBLP + Cs*VCELL
VBL=VCELL=Vout
Cb
Cs
BL SA operation
Cross-coupled sense amp
[Hynix]
2011-11-18
18
Why Synchronous?
Asynchronous DRAM
Page Mode DRAM
Fast Page Mode DRAM
EDO(Extended Data Out) DRAM
Synchronous DRAM
SDRAM
DDR SDRAM
Rambus DRAM
20
Cell Array
(sub) Matrix Array,, Cell (1T1C), WL, bit line (Folded )
Cap. -Data retention DRAM Tech. Core part.
SA array -- DRAM sensing, Refresh all page Cell
Data I/O
Control circuits
Operation of Read, Write, Refresh (Timing & Selection) according to /RAS, /CAS, /WE
2011-11-18
21
SDRAM features +
Pipeline
In previous DRAM, column address path time determines data freq.
With partitioning internal path, data are outputted every clock cycle after 2 or 3 clocks
Clock input
Up to EDO, input signals are directly controlled by /RAS, /CAS, /WE
Changed to command (referenced rising edge of clock) various operation and simple
spec.
Independent row access is consisted of multiple bank increase the size of page
capable of continuous operation with hiding pre-charge time
Programmable /CAS latency and burst length suitable for system environments (clock
frequency)
I/O Power
Dedicated power of data (Vccq, Vssq) for stable operation
2011-11-18
22
Pipeline
-. separate signals having long access time for faster input command
2011-11-18
23
Multi-bank Architecture
-. Bank is a unit which can be active independently and has same data bus
width as external output bus width
-. Interleaving bank operation while one bank accessed, another active
[Hynix]
2011-11-18
24
2011-11-18
[Hynix]
25
DDR features +
SSTL interface
Input reference voltage
guarantee of dout data window, termination
Differential input
reference by VREF
Differential clock
CLK, /CLK
EMRS control
Dout driver size & DLL
2011-11-18
26
SDR/DDR/DDR2/DDR3 operation
Data Rate
2011-11-18
27
SDR/DDR/DDR2/DDR3 operation
DDR
(2b pre-fetch)
2011-11-18
DDR2
(4b pre-fetch)
DDR3
(8b pre-fetch)
28
SDR/DDR/DDR2/DDR3 operation
2011-11-18
29
2011-11-18
30
[Hynix]
2011-11-18
31
[Hynix]
2011-11-18
32
Source Synchronous
Common (master) clock is not used for data transfer
Devices have an additional strobe pin
Minimizing differences in routed length & layer
characteristics between strobe and data signals is
required
33
Source Synchronous
Timing budget
ideal case : tSTB=tDATA
2011-11-18
34
External CLK
tD1
Internal CLK
(no DLL)
K
tC
DQ
)
D2
+t
D1
-(t
Internal CLK
(w/ DLL)
tD2
Desired
DQ
Data 1
Data 2
Data 3
Data 4
tD2
Data 1
Data 2
Data 3
Data 4
35
[Hynix]
2011-11-18
36
[Hynix]
2011-11-18
37
[Hynix]
2011-11-18
38
[Samsung]
2011-11-18
39
[Samsung]
2011-11-18
40
[Samsung]
41
Signal integrity
[Hynix]
2011-11-18
42
GDDR3 applications
Game
Game Consoles
Consoles
Laptop
Laptop // Mobile
Mobile
High-End
High-End // D-T
D-T
2011-11-18
44
2011-11-18
45
Design trends
Low Power
- Reduce operating current
- Guarantee operations at low voltage
- Data output
Low Cost
- Small area
- Design for testability
Die cost down
Test cost down
2011-11-18
High Performance
- Robust DLL
- Low jitter DLL
- Good quality of DCC
- Low SSO noise
Wide data valid window
46
Input
2011-11-18
Clocks
DLL / PLL
Output
Clock control
output enable
Driver
Impedance matching
Multi slew-rate
Data Bus Inversion
47
Input
Buffer
in mobile : just inverters
in graphics : low current two-stage amps
Clock
DLL
Architecture for low power consumption
Systematically low power operation
Output
Data Bus Inversion DC mode
2011-11-18
48
Power Consumption
vs. vs.
tCKtCK
Power
Consumption
mW
100
Proposed one
Previous one
90
80
70
60
79% reduction
50
40
20mW
30
4.2mW
20
10
0
1.0
1.1
1.2
1.4
tCK (ns)
tCK
(ns)
2011-11-18
1.6
2.0
10.0
[ISSCC 08]
49
2011-11-18
50
[Samsung]
51
2011-11-18
52
Output
Driver
Data Bus Inversion AC mode : reduce SSO noise
2011-11-18
53
54
GDDR5
[AMD(ATi)]
2011-11-18
56
[AMD(ATi)]
2011-11-18
57
[AMD(ATi)]
2011-11-18
58
[AMD(ATi), Qimonda]
2011-11-18
59
[AMD(ATi)]
2011-11-18
60
[AMD(ATi)]
2011-11-18
61
Applications
Conf.
Year
Issues
[6]
GDDR3
ISSCC
2006
[5]
GDDR3
ASSCC
2006
[2]
GDDR3
ISSCC
2008
[10]
GDDR3
ISSCC
2008
[1]
GDDR3
ISSCC
2009
[9]
Graphics
ISSCC
2009
[7]
GDDR5
ESSCIRC
2009
[8]
GDDR5
VLSI
2009
[11]
GDDR5
ISSCC
2010
[12]
GDDR5
VLSI
2010
2011-11-18
62
DDR4
DDR4
2011-11-18
64
DDR4
[PCwatch]
2011-11-18
65
2011-11-18
66
2011-11-18
67
2011-11-18
68
Summary
DRAM Introduction
DRAM Evolutions
Memory Interface
Interface for graphics memory
GDDR3
Low power, low cost, low jitter / high performance
GDDR5
CDR for read (data training), external VPP, error correction, clamshell
mode
DDR4 preview
2011-11-18
69
References
Web sites and published data from Hynix, Samsung, Rambus, Elpida, Micron, AMD(ATi),
Intel, nVidia, SONY, Nintendo, Microsoft, Pcwatch, JEDEC
DRAM Circuit Design, B. Keeth, R. J. Baker, B. Johnson, F. Lin, IEEE Press
[1] H. W. Lee, et al. A 1.6V 3.3Gb/s GDDR3 DRAM with Dual-Mode Phase- and DelayLocked Loop Using Power-Noise Management with Unregulated Power Supply in 54nm
CMOS, ISSCC 2009
[2] W. J. Yun, et al. A 0.1-to-1.5GHz 4.2mW All-Digital DLL with Dual Duty-Cycle
Correction Circuit and Update Gear Circuit for DRAM in 66nm CMOS Technology,
ISSCC 2008
[3] S. J. Bae, et al. An 80 nm 4 Gb/s/pin 32 bit 512 Mb GDDR4 Graphics DRAM With
Low Power and Low Noise Data Bus Inversion, JSSC 2008
[4] K. h. Kim, et al. An 8 Gb/s/pin 9.6 ns Row-Cycle 288 Mb Deca-Data Rate SDRAM
With an I/O Error Detection Scheme, JSSC 2007
[5] W. J. Yun, et al. A Low Power Digital DLL with Wide Locking Range for 3Gbps
512Mb GDDR3 SDRAM, ASSCC 2006
2011-11-18
70
References
[6] D. U. Lee, et al. A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with Series Pipelined CAS
Latency Control and Dual-Loop Digital DLL, ISSCC 2006
[7] K. H. Kim, et al. A 5.2Gb/s GDDR5 SDRAM with CML Clock Distribution
Network, ESSCIRC 2009
[8] D. Shin, et al. Wide-Range Fast-Lock Duty-Cycle Corrector with Offset-Tolerant
Duty-Cycle Detection Scheme for 54nm 7Gb/s GDDR5 DRAM Interface, VLSI 2009
[9] K. S. Ha, et al. A 6Gb/s/pin Pseudo-Differential Signaling Using Common-Mode
Noise Rejection Techniques Without Reference Signal for DRAM Interface, ISSCC 2009
[10] D. U. Lee, et al. Multi-Slew-Rate Output Driver and Optimized ImpedanceCalibration Circuit for 66nm 3.0Gb/s/pin DRAM Interface, ISSCC 2008
[11] T. Y. Oh, et al. A 7Gb/s/pin GDDR5 SDRAM with 2.5ns Bank-to-Bank Active Time
and No Bank-Group Restriction, ISSCC 2010
[12] S. J. Bae, et al. A 40nm 7Gb/s/pin Single-ended Transceiver with Jitter and ISI
Reduction Techniques for High-Speed DRAM Interface, VLSI 2010
2011-11-18
71
Thank you
Appendix
SDRAM categorization
by Speed / Applications
/ DDR1 / DDR2 / DDR3 / DDR4 /
GDDR1 / GDDR2 / GDDR3 / GDDR4 / GDDR5 / GDDR5+ / GDDR6
mDDR / LPDDR2 /
by Density
/ 256Mb / 512Mb / 1Gb / 2Gb / 4 ~ 8Gb /
by Bus-Width
x4 / x8 / x16 / x32 /
2011-11-18
74
2011-11-18
PC : x64
Server : x64
Graphics card : x64 / x128 / x256 / x512 /
Game consoles : x32 / x128 /
75
for laptops
x16 4Gb 4 devices can be used (64bit) : 4Gb X 4 = 2GB
for PCs
x4 / x8 / x16 configurations
16GB
[Hynix]
2011-11-18
76
256M x4
1Gb
128M x8
2Gb DDP
1Gb
Module (byte=x8)
512M x4
16ea.
256M x64
2GB
8ea.
128M x64
1GB
32ea.
64M x16
4ea.
1024M x64
64M x64
8GB
512MB
Applications
Server
PC
Notebook
Graphics memory
Component (bit)
Number
Bus-width
Total dens.
Applications
512Mb
8ea.
128bit (mirror)
512MB
XBOX 360
512Mb
16M x32
512Mb
16M x32
512Mb
2011-11-18
16M x32
16M x32
16ea./12ea.
4ea.
1ea.
512bit/384bit
128bit
32bit
1GB/768MB
256MB
64MB
High-End
PS3
Nintendo Wii
77
Data bandwidth
For example of GDDR3 on PS3
700MHz/pin
1.4Gb/s/data channel(pin)
Each device has 32bit data I/O
1.4Gb/s X 32 = 44.8Gb/s/component
4 components configurations (32bit X 4 = 128bit)
44.8Gb/s/component X 4 = 179.2Gb/s
Data bandwidth is 22.4GB/s
[SONY]
78
Data bandwidth
Increasing clock speed per pin
700MHz 1GHz
2.0Gb/s X 32 X 4 / 8 = 32GB/s
ex) High-end graphics cards use 1.3GHz (2.6Gbps) [GDDR3]
ex) 3.6 ~ 4.8Gbps [GDDR5] / up-to 7Gbps (@ES)
32bit 64bit
1.4Gb/s X 64 X 4 / 8 = 44.8GB/s
32bit is maximum in mass production
x4 x 128 (x512) in TSV
2011-11-18
79
Mirror function
To increase total density without increasing data bus-width
an example of 512Mb GDDR3
[ISSCC 09]
2011-11-18
[Hynix]
80
XBOX 360
8 x 32b
128b
[Microsoft]
2011-11-18
81
Prefetch operation
2011-11-18
82
DDR2/3 Architecture
2011-11-18
83
2011-11-18
84
Simulation schematic
[Hynix]
2011-11-18
85
[Hynix]
2011-11-18
86
2011-11-18
87
Internal voltages
2011-11-18
88
ZQ Cal
2011-11-18
89
Design trends
2011-11-18
90
2011-11-18
91