The rapid growth of in-memory compute applications is not surprising given the tremendous performance gains they can offer. Jobs that used to take hours can now take minutes or seconds as they are no longer subject to the rotational and seek latencies of spinning media. While Flash memory provide some relief, it is still a hundred times slower than the DRAM that in-memory compute applications utilize as their primary storage.
One drawback to in-memory compute applications is the high cost associated with DRAM. Not only are the acquisition costs an order of magnitude more expensive than Flash, DRAM consumes far more power. Power can be a significant issue in data centers besides contributing a major part to the operational costs. In addition, a single server has limited capacity for DRAM and datasets that are larger, need to find an alternate solution or cope with the nuisance of sharding. Furthermore, in order to utilize the maximum capacity of DRAM in a server, higher cost DRAM needs to be installed further escalating the cost of compute.
We discuss a paradigm to allow in-memory computing applications to extend their capacity by utilizing Flash memory; often with minimal performance loss. We give examples of applications that have been modified to utilize the paradigm and show performance comparisons. We also discuss TCO and the relative cost per transaction of the different solutions.
2. 2
During
our
mee2ng
today
we
will
make
forward-‐looking
statements.
Any
statement
that
refers
to
expecta2ons,
projec2ons
or
other
characteriza2ons
of
future
events
or
circumstances
is
a
forward-‐looking
statement,
including
those
rela2ng
to
products
and
their
capabili2es,
performance
and
compa2bility,
cost
savings
and
other
benefits
to
customers.
Actual
results
may
differ
materially
from
those
expressed
in
these
forward-‐looking
statements
due
to
a
number
of
risks
and
uncertain2es,
including
the
factors
detailed
under
the
cap2on
“Risk
Factors”
and
elsewhere
in
the
documents
we
file
from
2me
to
2me
with
the
SEC,
including
our
annual
and
quarterly
reports.
We
undertake
no
obliga2on
to
update
these
forward-‐looking
statements,
which
speak
only
as
of
the
date
hereof.
Forward-‐Looking
Statements
3. 3
Overview
• Flash-‐extending
in-‐memory
compu2ng
applica2ons
• Using
a
general
purpose
key-‐value
library
for
flash-‐extension/flash-‐op2miza2on
• Examples:
• Memcached
• Redis
• GigaSpaces
• MongoDB
• Couchbase
• Cassandra
• TCO
• Conclusion
4. 4
Flash-‐Extending
In-‐Memory
Compute:
Reduce
TCO
HDD
DRAM
SanDisk
No.
of
servers
34
6
2
Power
(kW)
12.7
2.8
0.8
$
per
transac9on
$8.44
$2.49
$1.02
2.4
80
50
0
10
20
30
40
50
60
70
80
90
HDD DRAM SanDisk
Transactions
per
second
(Thousand)
Capacity
!
"
!
Performance
"
!
!
Servers needed for 3TB dataset
Workload Throughput
Typical
performance
results*
*
Based
on
internal
SanDisk
assump2ons
of
representa2ve
performance,
not
actual
performance
5. 5
Flash-‐Extending
In-‐Memory
Apps
• Flash-‐extending
in-‐memory
applica9ons
• Exploit
flash
latency
and
IOPS
• Requires
extensive
parallelism
• Cache
hot
data
in
DRAM
• Get
“good-‐enough”
performance
at
in-‐flash
capacity
and
cost,
enabling
server
consolida2on
• Key-‐value
abstrac9on
is
a
good
seman9c
fit
for
extending
many
in-‐memory
apps
• A
good
key-‐value
storage
engine
can
simplify
flash-‐extension
• Many
applica2ons
manage
data
internally
as
objects
• Need
more
than
basic
CRUD
func2onality:
crash-‐safeness,
transac2ons,
snapshots,
range
queries
• Typical
applica)ons:
caching,
databases,
message
queues,
data
grids
• Flash
extending
applica0ons:
use
key-‐value
library
to
stage
data
between
DRAM
and
flash
• Flash
op0mizing
applica0ons:
replace
applica2on
storage
engine
with
a
more
op2mal
key-‐value
library
• A
good
key-‐value
library
can
drama9cally
reduce
the
work
required
to
flash
extend
or
flash
op9mize
applica9ons
6. 6
Applica2on
ZetaScale
API
ZetaScale
Library
Opera2ng
System
Device
Driver
Flash
Flash
ZetaScale™
Soeware
Crash-‐Safe
Object
Store
• Flash
vendor
independent
Works
with
flash
from
any
brand
and
/
or
vendor
• Device
interface
independent
Supports
any
flash
device
interface,
including
SAS,
SATA,
PCIe
or
NVMe
• Opera2ng
Systems
supported:
-‐
Linux
Centos
6.5
-‐
Linux
RHEL
6.5
• Any
user
applica2on,
typically:
-‐
NoSQL
database
-‐
In-‐Memory
Compute
applica9on
• Containers
Mul9ple
namespaces
offer
file
system
type
structure
like
folders
and
directories
Hash
table
indexes
provide
fast
range
queries
• Transac2ons
Guarantee
that
mul2ple
data
objects
can
be
wricen
atomically
• Snapshots
Offer
easy
method
to
copy
memory
data
to
persistent
storage
• Caching
Layer
Assures
that
frequently
used
data
is
readily
accessed
• Dynamically
loadable
• User
callable
• Key/value
paradigm
• C++
and
Java
interface
• Compiled
into
Applica2on
Features
and
Op9ons
8. 8
• Memcached
is
an
open
source,
in-‐memory
distributed
key-‐
value
cache/store
• CRUD
API
(create,
replace,
update,
delete)
• ASCII
and
Binary
protocols
• High
performance
• Wricen
in
C,
clients
available
for
most
popular
languages
memcached
§ Test
Hardware:
– Dell
R720
server:
Intel(R)
Xeon(R)
CPU
E5-‐2660
0
@
2.20GHz.
2
physical
CPUs
-‐
8
cores/16
threads
on
each,
visible
as
32
CPUs,
128
GB
DRAM,
10G
ethernet
– SSD:
400G
*
8
x
Lightning®
SSD
with
md
RAID
0
– Remote
client
with
10G
ethernet
§ Test
Sooware:
– Memcached
v1.4.15,
CentOS
release
6.5
– ZetaScale™
sooware
flash
size:
500G
– Memslap
benchmark:
64
threads,
512
concurrency,
250
byte
key,
1024
byte
value,
set/get
=
1:9,
3600s
run
2me
9. 9
Memcached
with
ZetaScale
Performance
Memcached throughput with
data set in Flash is similar to
Stock-Memcached
throughput with data set in
DRAM
Bare
Metal
Source:
Based
on
internal
tes2ng
by
SanDisk;
Jan
2015
ZS
=
ZetaScale
Sooware
9
0
50
100
150
200
250
300
350
400
450
0%
5%
42%
62%
Stock
With
ZS
DRAM
Miss
Rate
memslap
kTps
10. 10
ZetaScale-‐Memcached
Integra9on
client
set
get
ZetaScale-‐memcached
Memcached
Network
Layer
Memcached
Item
Manager
Layer
(Rewrote)
Call
API
ZS
API
SSD
§ Replace
memcached
get/put
rou2nes
with
calls
to
ZetaScale
get/
put
§ Use
exis2ng
memcached
mul2threading
to
get
sufficient
parallelism
to
drive
flash
IOPS
§ ZetaScale
automa2cally
caches
hot
objects
in
its
own
DRAM
cache,
so
bypass
stock
memcached
DRAM
cache
code
11. 11
redis
• Redis
(REmote
DIc2onary
Server)
is
an
open-‐source,
in-‐memory
key-‐value
• Supports
more
complex
data
types
such
as
strings,
hashes,
lists,
sets,
sorted
sets
• asynchronous
replica2on
to
1
or
more
slaves
• snapshot
facility
using
fork()
+
copy-‐on-‐write
• append-‐only
logging
with
configurable
fsync
()
policy
• pub/sub
capability
§ Test
Hardware:
– HP
Server:
2
x
6-‐core
2.90
GHz
Intel
Westmere;
DRAM:
96G;
Flash:
8
x
200G
Lightning
SSDs
– Remote
client
with
10G
network
connec2on
§ Test
Sooware:
– Redis
2.7.4
– YCSB:
uniform
workload
with
95%
read
and
5%
update
– Strings
were
1K
bytes;
Hash,
Lists,
Sets
and
Sorted
Sets
were
10
x
100
bytes
– Dataset
used
was
16
million
objects
for
stock
Redis
and
64
million
objects
for
Redis
with
ZetaScale
12. 12
Redis
with
ZetaScale
Performance
116
84
93
70
93
132
101
114
99
89
0
20
40
60
80
100
120
140
String
Hash
List
Set
Sorted
Set
SSB
kTps
Stock
Redis
(in
memory)
FDF-‐Redis
(out
of
memory)
ZetaScale Software-Redis
throughput with data set in
Flash is similar to Stock-
Redis throughput with data
set in DRAM
Bare
Metal
Source:
Based
on
internal
tes2ng
by
SanDisk;
Nov
2013
Stock
Redis
(in
memory)
ZS+Redis
(from
flash,
4x
larger
dataset)
ZS
=
ZetaScale
Sooware
13. 13
What
Was
Required
to
Exploit
Flash?
§ Replace
Redis
DRAM-‐to-‐storage
staging
code
with
calls
to
FDF
get/put
§ Convert
Redis
from
single
thread
to
mul2-‐thread
to
drive
flash
IOPS
14. 14
GigaSpaces
• GigaSpaces
XAP
is
in-‐memory
compute
applica2on
plaworm
design
for
real-‐2me
big
data
analy2cs
applica2ons
• Leverages
distributed
real-‐2me
computa2on
libraries
such
as
Storm
and
Apache
Samza
to
process
unbounded
streams
of
data
• ZetaScale
can
manage
large
amount
of
data
across
a
grid
of
high
capacity
servers
• Can
model
the
data
using
Objects/SQL,
Documents
or
rela2onal
• Supports
a
variety
of
programing
interfaces:
Java,
.Net
,
C++
,
Scala
§ Test
Hardware:
– 2
sockets
2.8GHz
CPU
with
total
12
cores,
148G
DRAM
– Fusion
ioMemory™
ioDrive®
Duo
PCIe
card
with
md
RAID
0
§ Test
Sooware:
– Gigaspaces-‐10.0.0-‐XAP
Premium-‐m2
– CentOS
5.8
– GS-‐provided
YCSB
client
– 1KB
object
size
and
uniform
distribu2on
15. 15
20
11
0
5
10
15
20
25
Stock
GigaSpaces
FDF-‐GigaSpaces
Number
of
servers
GigaSpaces/ZetaScale
XAP
MemoryXtend
-‐
1KB
object
size
and
uniform
distribu2on
-‐
2
sockets
2.8GHz
CPU
with
total
24
cores,
CentOS
5.8,
Fusion
ioMemory
ioDrive
Duo
PCIe
card,
md
RAID
0
-‐
YCSB
measurements
performed
by
SanDisk;
cost
calcula2ons
by
GigaSpaces
ZetaScale-‐GigaSpaces
ZetaScale-‐GigaSpaces
on
SSDs
Stock
GigaSpaces
in
DRAM
Provides
2x
–
3.6x
Becer
TPS/$
While
Reducing
Servers
by
50%
62
121
17
56
0
20
40
60
80
100
120
140
160
No
Read
/
100%
Write
100
%
Read
/
No
Write
TPS
per
dollar
FDF-‐GigaSpaces
on
SSDs
Stock
GigaSpaces
in
DRAM
ZetaScale-‐GigaSpaces
Source:
Based
on
internal
tes2ng
by
SanDisk
16. 16
• Stage
objects
in
and
out
of
DRAM
using
ZetaScale
via
exis2ng
“Off-‐Heap”
interface:
• GS
put
calls
ZSWriteObject()
API
• GS
replace
calls
ZSWriteObject()
API
• GS
get
calls
ZSReadObject()
API
• GS
remove
calls
ZSDeleteObject()
API
What
Was
Required
to
Exploit
Flash?
17. 17
MongoDB
• MongoDB
(from
“humongous”)
is
an
open
source
NoSQL
document
store
• JSON-‐style
documents
• Built-‐in
sharding
across
mul2ple
nodes
• Automa2c
resharding
when
adding
or
dele2ng
nodes
• Rich,
document-‐based
queries
• Supports
mul2ple
indices
§ Test
Hardware:
– 2
x
8-‐core
2.6
GHz
Intel
Xeon;
64G
DRAM;
8
x
200G
Lightning
SSDs
– client
co-‐resident
on
server
§ Test
Sooware:
– CentOS
6.6;
MongoDB
3.0.1
– YCSB:
point
read,
update
and
insert;
1K
objects;
15
minute
runs
– For
Read/Update:
128G
dataset
contained
128
million
1K
objects
18. 18
MongoDB
with
ZetaScale:
Read/Update
128G
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100/0
95/5
90/10
80/20
70/30
50/50
30/70
20/80
10/90
0/100
Transac9ons
per
Second
Read/Update
Mix
MMAPv1
Wired
Tiger
ZetaScale
Source:
Based
on
internal
tes2ng
by
SanDisk;
Apr/May
2015
19. 19
MongoDB
ZetaScale
Integra9on
MongoDB
ZetaScale
MongoDB
Shim
ZetaScale
ZSRecordStore
(Data
record
Store)
ZS
Read/Write/Delete
API
ZSSortedData
Interface
(Index
CRUD)
ZS
Read/Write/Delete
API
ZSIterator
ZS
Range
API
ZSCursor
(Index
and
Range
query)
ZS
Range
API
ZSRecovery
Unit
(Durability
and
Isola9on)
ZS
Transac9on
API
ZetaScale
MongoDB
Shim
MongoDB
collec2on
and
indexes
map
to
one
or
more
ZetaScale
Btree
containers.
Record
loca2on
is
iden2fied
by
unique
auto
generated
ID
Secondary
indexes
record
loca2on
as
value
SSDs
MongoDB
Storage
Engine
API
ZS
=
ZetaScale
Sooware
20. 20
• Couchbase
Server
is
an
open-‐source
NoSQL
distributed
database
with
a
flexible
data
model
• Integrated
object
caching
via
memcached
• On-‐demand
elas2c
scalability
• Supports
binary
and
JSON
data
types
• Supports
indexes
on
JSON
fields
• Inter
and
intra
data
center
replica2on
Couchbase
§ Test
Hardware:
– 2
x
8-‐core
2.60
GHz
Intel
Xeon
E5-‐2670;
64G
DRAM;
8
x
200G
Lightning
SSDs
– remote
client:
8
core
2.53
GHz
Intel
Xeon
E5540;
64G
DRAM,
10G
ethernet,
Oracle
Linux
6.3
§ Test
Sooware:
– CentOS
6.5;
Couchbase
3.03
– Stock
Couchbase:
48GB
DRAM;
Threads:
24
frontend
(FE),
4
backend
read
(BR),
4
backend
write
(BW)
– ZetaScale
Couchbase:
Couchbase:
8GB
DRAM,
ZetaScale:
40GB
DRAM;
Threads:
64
FE,
4
BR,
32
BW
– YCSB;
24M
1K
objects
for
in-‐memory
test;
128M
1K
objects
for
Flash
test;
128
threads
21. 21
Couchbase
with
ZetaScale
0
20000
40000
60000
80000
100000
120000
140000
160000
100/0
95/5
90/10
80/20
70/30
60/40
50/50
40/60
30/70
20/80
10/90
0/100
Transac9ons
per
Second
Read/Update
Mix
Stock
ZetaScale
Source:
Based
on
internal
tes2ng
by
SanDisk;
Apr
2015
22. 22
ZetaScale
Couchbase
Integra9on
Highlights
• Replace
CouchKVstore
to
ZetaScale
KV
storage
engine
• Couchbase
VBs
are
mapping
to
ZetaScale
Containers
23. 23
Cassandra
§ Cassandra
is
an
open
source
distributed
key-‐value
store
– large
scale
synchronous/asynchronous
replica2on
– automa2c
fault-‐tolerance
and
scaling
– tunable
consistency
efficient
support
for
large
rows
(1000’s
of
columns)
– CQL
(SQL-‐like)
query
language
– supports
mul2ple
indices
– Op2mized
for
high
write
workloads
§ Test
Hardware:
– Dell
R720:
2
x
8-‐core
Intel
2.60GHz
CPU;
DRAM:
128G;
Flash:
8
x
Lightning
SSDs;
Controller:
LSI
9207
HBA
– remote
client
with
2
x
8-‐core
Intel
Xeon
CPU,
10G
ethernet
§ Test
Sooware:
– Cassandra
v2.0.3:
stock
and
with
ZetaScale
– Datastax
modified
Cassandra-‐stress
tool
– 60M
rows,
5
columns
per
row,
100
byte
object
size;
128
threads
24. 24
Cassandra
with
ZetaScale
0
10000
20000
30000
40000
50000
60000
32R/1W
16R/1W
4R/1W
1R/1W
1R/4W
1R/16W
1R/32W
Transac9ons
per
Second
Read/Write
Mix
Stock
Cassandra
Cassandra
with
ZetaScale
Source:
Based
on
internal
tes2ng
by
SanDisk;
Mar
2014
25. 25
ZetaScale
Cassandra
Integra9on
Highlights
• Replace
Cassandra
LSMTree
(memtable
&
sstables)
with
ZetaScale
storage
engine
• Route
object
get/put
calls
to
ZetaScale
get/put
• Disable
stock
Cassandra
journal:
ZetaScale
maintains
its
own
journal
• ZetaScale
indexing
is
used
for
row
and
column
range
queries
• ZetaScale
transac2ons
are
used
to
enforce
atomicity
of
row
updates
and
secondary
index
modifica2ons
• ZetaScale
snapshots
are
used
for
full
and
incremental
backups
• Compac2on
is
eliminated!
Thrie
Service
Client
MemTable
CommitLog
SSTable
SSTable
Memory
Storage
SSTable
Write Flush
Read Thrie
Service
Client
ZetaScale
Memory
Storage
Serialize Deserialize
26. 26
Flash
Extension
and
TCO
578
276
1110
222
0
500
1000
1500
2000
Stock
with
ZS
OpEx
CapEx
$
(thousands)
3
Year
TCO
27. 27
• In-‐memory
compute
applica2ons
can
use
flash
to
extend
capacity
and
s2ll
maintain
good
performance,
leading
to
reduced
TCO
• Key-‐value
abstrac2on
is
a
good
seman2c
fit
for
extending
many
in-‐memory
apps
• Need
sufficient
func2onality:
crash-‐safeness,
transac2ons,
snapshots,
range
queries
• Proof
points
using
the
ZetaScale
key-‐value
library:
Memcached,
Redis,
GigaSpaces,
MongoDB,
Couchbase,
Cassandra
• Proof
points
show
that
although
performance
drops
using
flash
extension,
it
is
s2ll
good
• For
capacity
limited
applica2ons,
flash
extension
can
reduce
overall
TCO
significantly
Conclusion