IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing

Flash-‐Extending
In-‐Memory
Compu9ng

Brian
O’Kra>a,
Engineering
Fellow

June
30,
2015

2

During
our
mee2ng
today
we
will
make
forward-‐looking
statements.

Any
statement
that
refers
to
expecta2ons,
projec2ons
or
other
characteriza2ons
of
future
events
or
circumstances
is

a
forward-‐looking
statement,
including
those
rela2ng
to
products
and
their
capabili2es,
performance
and

compa2bility,
cost
savings
and
other
benefits
to
customers.

Actual
results
may
differ
materially
from
those
expressed
in
these
forward-‐looking
statements
due
to
a
number
of

risks
and
uncertain2es,
including
the
factors
detailed
under
the
cap2on
“Risk
Factors”
and
elsewhere
in
the

documents
we
file
from
2me
to
2me
with
the
SEC,
including
our
annual
and
quarterly
reports.
We
undertake
no

obliga2on
to
update
these
forward-‐looking
statements,
which
speak
only
as
of
the
date
hereof.

Forward-‐Looking
Statements

3

Overview

•  Flash-‐extending
in-‐memory
compu2ng
applica2ons

•  Using
a
general
purpose
key-‐value
library
for
ﬂash-‐extension/ﬂash-‐op2miza2on

•  Examples:

•  Memcached

•  Redis

•  GigaSpaces

•  MongoDB

•  Couchbase

•  Cassandra

•  TCO

•  Conclusion

4

Flash-‐Extending
In-‐Memory
Compute:
Reduce
TCO

HDD
DRAM
SanDisk

No.
of
servers
34
6
2

Power
(kW)
12.7
2.8
0.8

$
per
transac9on
$8.44
$2.49
$1.02

2.4
80
50
0
10
20
30
40
50
60
70
80
90
HDD DRAM SanDisk
Transactions
per
second
(Thousand)
Capacity
!
"
!

Performance
"
!
!

Servers needed for 3TB dataset
Workload Throughput
Typical
performance
results*
*
Based
on
internal
SanDisk
assump2ons
of
representa2ve
performance,
not
actual
performance

5

Flash-‐Extending
In-‐Memory
Apps

•  Flash-‐extending
in-‐memory
applica9ons

•  Exploit
flash
latency
and
IOPS

•  Requires
extensive
parallelism

•  Cache
hot
data
in
DRAM

•  Get
“good-‐enough”
performance
at
in-‐flash
capacity
and
cost,
enabling
server
consolida2on

•  Key-‐value
abstrac9on
is
a
good
seman9c
fit
for
extending
many
in-‐memory
apps

•  A
good
key-‐value
storage
engine
can
simplify
flash-‐extension

•  Many
applica2ons
manage
data
internally
as
objects

•  Need
more
than
basic
CRUD
func2onality:
crash-‐safeness,
transac2ons,
snapshots,
range
queries

•  Typical
applica)ons:
caching,
databases,
message
queues,
data
grids

•  Flash
extending
applica0ons:
use
key-‐value
library
to
stage
data
between
DRAM
and
flash

•  Flash
op0mizing
applica0ons:
replace
applica2on
storage
engine
with
a
more
op2mal
key-‐value
library

•  A
good
key-‐value
library
can
drama9cally
reduce
the
work
required
to
flash
extend
or
flash
op9mize

applica9ons

6

Applica2on

ZetaScale
API

ZetaScale
Library

Opera2ng
System

Device
Driver

Flash
Flash

ZetaScale™
Soeware
Crash-‐Safe
Object
Store

•  Flash
vendor
independent

Works
with
flash
from
any
brand
and
/
or
vendor

•  Device
interface
independent

Supports
any
flash
device
interface,
including

SAS,
SATA,
PCIe
or
NVMe

•  Opera2ng
Systems
supported:

-‐
Linux
Centos
6.5

-‐
Linux
RHEL
6.5

•  Any
user
applica2on,
typically:

-‐
NoSQL
database

-‐
In-‐Memory
Compute
applica9on

•  Containers

Mul9ple
namespaces
offer
file
system
type

structure
like
folders
and
directories

Hash
table
indexes
provide
fast
range
queries

•  Transac2ons

Guarantee
that
mul2ple
data
objects
can
be

wricen
atomically

•  Snapshots

Offer
easy
method
to
copy
memory
data
to

persistent
storage

•  Caching
Layer

Assures
that
frequently
used
data
is
readily

accessed

•  Dynamically
loadable

•  User
callable

•  Key/value
paradigm

•  C++
and
Java
interface

•  Compiled
into
Applica2on

Features
and
Op9ons

7

ZetaScale
Flash-‐Op9mized
Applica9ons

8

•  Memcached
is
an
open
source,
in-‐memory

distributed
key-‐
value
cache/store

•  CRUD
API
(create,
replace,
update,
delete)

•  ASCII
and
Binary
protocols

•  High
performance

•  Wricen
in
C,
clients
available
for
most
popular
languages

memcached

§  Test
Hardware:

–  Dell
R720
server:
Intel(R)
Xeon(R)
CPU
E5-‐2660
0
@
2.20GHz.
2
physical
CPUs
-‐
8
cores/16
threads
on
each,
visible
as
32
CPUs,

128
GB
DRAM,
10G
ethernet

–  SSD:

400G
*
8
x
Lightning®
SSD
with
md
RAID
0

–  Remote
client
with
10G
ethernet

§  Test
Sooware:

–  Memcached
v1.4.15,
CentOS
release
6.5

–  ZetaScale™
sooware
ﬂash
size:
500G

–  Memslap
benchmark:
64
threads,
512
concurrency,
250
byte
key,
1024
byte
value,
set/get
=
1:9,
3600s
run
2me

9

Memcached
with
ZetaScale
Performance

Memcached throughput with
data set in Flash is similar to
Stock-Memcached
throughput with data set in
DRAM

Bare
Metal

Source:
Based
on
internal
tes2ng
by
SanDisk;
Jan
2015

ZS
=
ZetaScale
Sooware

9

0

50

100

150

200

250

300

350

400

450

0%
5%
42%
62%

Stock

With
ZS

DRAM
Miss
Rate

memslap
kTps

10

ZetaScale-‐Memcached
Integra9on
client
set
get
ZetaScale-‐memcached
Memcached
Network
Layer
Memcached

Item
Manager
Layer
(Rewrote)
Call
API
ZS
API
SSD
§  Replace
memcached
get/put

rou2nes
with
calls
to
ZetaScale
get/
put

§  Use
exis2ng
memcached

mul2threading
to
get
suﬃcient

parallelism
to
drive
ﬂash
IOPS

§  ZetaScale
automa2cally
caches
hot

objects
in
its
own
DRAM
cache,
so

bypass
stock
memcached
DRAM

cache
code

11

redis

•  Redis
(REmote
DIc2onary
Server)
is
an
open-‐source,
in-‐memory

key-‐value

•  Supports
more
complex
data
types
such
as
strings,
hashes,
lists,
sets,
sorted

sets

•  asynchronous
replica2on
to
1
or
more
slaves

•  snapshot
facility
using
fork()
+
copy-‐on-‐write

•  append-‐only
logging
with
conﬁgurable
fsync
()
policy

•  pub/sub
capability

§  Test
Hardware:

–  HP
Server:
2
x
6-‐core
2.90
GHz
Intel
Westmere;
DRAM:
96G;
Flash:
8
x
200G
Lightning
SSDs

–  Remote
client
with
10G
network
connec2on

§  Test
Sooware:

–  Redis
2.7.4

–  YCSB:
uniform
workload
with
95%
read
and
5%
update

–  Strings
were
1K
bytes;
Hash,
Lists,
Sets
and
Sorted
Sets
were
10
x
100
bytes

–  Dataset
used
was
16
million
objects
for
stock
Redis
and
64
million
objects
for
Redis
with
ZetaScale

12

Redis
with
ZetaScale
Performance

116

84

93

70

93

132

101

114

99

89

0

20

40

60

80

100

120

140

String
Hash
List
Set
Sorted
Set

SSB
kTps

Stock
Redis
(in
memory)

FDF-‐Redis

(out
of
memory)

ZetaScale Software-Redis
throughput with data set in
Flash is similar to Stock-
Redis throughput with data
set in DRAM

Bare
Metal

Source:
Based
on
internal
tes2ng
by
SanDisk;
Nov
2013

Stock
Redis
(in
memory)

ZS+Redis
(from
ﬂash,
4x

larger
dataset)

ZS
=
ZetaScale
Sooware

13

What
Was
Required
to
Exploit
Flash?

§  Replace
Redis
DRAM-‐to-‐storage
staging
code
with
calls
to
FDF

get/put

§  Convert
Redis
from
single
thread
to
mul2-‐thread
to
drive
ﬂash

IOPS

14

GigaSpaces

•  GigaSpaces
XAP
is
in-‐memory
compute
applica2on
plaworm

design
for
real-‐2me
big
data
analy2cs
applica2ons

•  Leverages
distributed
real-‐2me
computa2on
libraries
such
as
Storm
and

Apache
Samza
to
process
unbounded
streams
of
data

•  ZetaScale
can
manage
large
amount
of
data
across
a
grid
of
high
capacity

servers

•  Can
model
the
data
using
Objects/SQL,
Documents
or
rela2onal

•  Supports
a
variety
of
programing
interfaces:
Java,
.Net
,
C++
,
Scala

§  Test
Hardware:

–  2
sockets
2.8GHz
CPU
with
total
12
cores,
148G
DRAM

–  Fusion
ioMemory™
ioDrive®
Duo
PCIe
card
with
md
RAID
0

§  Test
Sooware:

–  Gigaspaces-‐10.0.0-‐XAP
Premium-‐m2

–  CentOS
5.8

–  GS-‐provided
YCSB
client

–  1KB
object
size
and
uniform
distribu2on

15

20

11

0

5

10

15

20

25

Stock
GigaSpaces
FDF-‐GigaSpaces

Number
of
servers

GigaSpaces/ZetaScale
XAP
MemoryXtend

-‐
1KB
object
size
and
uniform
distribu2on

-‐
2
sockets
2.8GHz
CPU
with
total
24
cores,
CentOS
5.8,
Fusion
ioMemory
ioDrive
Duo
PCIe
card,
md
RAID
0

-‐
YCSB
measurements
performed
by
SanDisk;
cost
calcula2ons
by
GigaSpaces

ZetaScale-‐GigaSpaces

on
SSDs

Stock
GigaSpaces
in
DRAM

Provides
2x
–
3.6x
Becer
TPS/$
While
Reducing
Servers
by
50%

62

121

17

56

0

20

40

60

80

100

120

140

160

No
Read
/
100%
Write
100
%
Read
/
No
Write

TPS
per
dollar

FDF-‐GigaSpaces
on
SSDs

Stock
GigaSpaces
in
DRAM


Source:
Based
on
internal
tes2ng
by
SanDisk

16

•  Stage
objects
in
and
out
of
DRAM
using
ZetaScale
via
exis2ng
“Oﬀ-‐Heap”

interface:

•  GS
put
calls
ZSWriteObject()
API

•  GS
replace
calls
ZSWriteObject()
API

•  GS
get
calls
ZSReadObject()
API

•  GS
remove
calls
ZSDeleteObject()
API

What
Was
Required
to
Exploit
Flash?

17

MongoDB

•  MongoDB
(from
“humongous”)
is
an
open
source
NoSQL

document
store

•  JSON-‐style
documents

•  Built-‐in
sharding
across
mul2ple
nodes

•  Automa2c
resharding
when
adding
or
dele2ng
nodes

•  Rich,
document-‐based
queries

•  Supports
mul2ple
indices

§  Test
Hardware:

–  2
x
8-‐core
2.6
GHz
Intel
Xeon;
64G
DRAM;
8
x
200G
Lightning
SSDs

–  client
co-‐resident
on
server

§  Test
Sooware:

–  CentOS
6.6;
MongoDB
3.0.1

–  YCSB:
point
read,
update
and
insert;
1K
objects;
15
minute
runs

–  For
Read/Update:
128G
dataset
contained
128
million
1K
objects

18

MongoDB
with
ZetaScale:
Read/Update
128G

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100/0
95/5
90/10
80/20
70/30
50/50
30/70
20/80
10/90
0/100

Transac9ons
per
Second

Read/Update
Mix

MMAPv1
Wired
Tiger
ZetaScale

Source:
Based
on
internal
tes2ng
by
SanDisk;
Apr/May
2015

19

MongoDB
ZetaScale
Integra9on

MongoDB

ZetaScale
MongoDB
Shim

ZetaScale

ZSRecordStore

(Data
record
Store)

ZS
Read/Write/Delete

API

ZSSortedData
Interface

(Index
CRUD)

ZS
Read/Write/Delete

API

ZSIterator
ZS
Range
API

ZSCursor

(Index
and
Range
query)

ZS
Range
API

ZSRecovery
Unit

(Durability
and
Isola9on)

ZS
Transac9on
API

ZetaScale
MongoDB
Shim

MongoDB
collec2on
and
indexes
map
to
one
or

more
ZetaScale
Btree
containers.

Record
loca2on
is
iden2ﬁed
by
unique
auto

generated
ID

Secondary
indexes
record
loca2on
as
value

SSDs

MongoDB
Storage
Engine
API

ZS
=
ZetaScale
Sooware

20

•  Couchbase
Server
is
an
open-‐source
NoSQL
distributed

database
with
a
ﬂexible
data
model

•  Integrated
object
caching
via
memcached

•  On-‐demand
elas2c
scalability

•  Supports
binary
and
JSON
data
types

•  Supports
indexes
on
JSON
ﬁelds

•  Inter
and
intra
data
center
replica2on

Couchbase

§  Test
Hardware:

–  2
x
8-‐core
2.60
GHz
Intel
Xeon
E5-‐2670;
64G
DRAM;
8
x
200G
Lightning
SSDs

–  remote
client:
8
core
2.53
GHz
Intel
Xeon
E5540;
64G
DRAM,
10G
ethernet,
Oracle
Linux
6.3

§  Test
Sooware:

–  CentOS
6.5;
Couchbase
3.03

–  Stock
Couchbase:
48GB
DRAM;
Threads:
24
frontend
(FE),
4
backend
read
(BR),
4
backend
write
(BW)

–  ZetaScale
Couchbase:
Couchbase:
8GB
DRAM,
ZetaScale:
40GB
DRAM;
Threads:
64
FE,
4
BR,
32
BW

–  YCSB;
24M
1K
objects
for
in-‐memory
test;
128M
1K
objects
for
Flash
test;
128
threads

21

Couchbase
with
ZetaScale

0

20000

40000

60000

80000

100000

120000

140000

160000

100/0
95/5
90/10
80/20
70/30
60/40
50/50
40/60
30/70
20/80
10/90
0/100

Transac9ons
per
Second

Read/Update
Mix

Stock
ZetaScale

Source:
Based
on
internal
tes2ng
by
SanDisk;
Apr
2015

22

ZetaScale
Couchbase
Integra9on
Highlights

•  Replace
CouchKVstore
to
ZetaScale
KV
storage
engine

•  Couchbase
VBs
are
mapping
to
ZetaScale
Containers

23

Cassandra
§  Cassandra
is
an
open
source
distributed
key-‐value
store

–  large
scale
synchronous/asynchronous
replica2on

–  automa2c
fault-‐tolerance
and
scaling

–  tunable
consistency
eﬃcient
support
for
large
rows
(1000’s
of
columns)

–  CQL
(SQL-‐like)
query
language

–  supports
mul2ple
indices

–  Op2mized
for
high
write
workloads

§  Test
Hardware:

–  Dell
R720:
2
x
8-‐core
Intel
2.60GHz
CPU;
DRAM:
128G;
Flash:
8
x
Lightning
SSDs;
Controller:
LSI
9207
HBA

–  remote
client
with
2
x
8-‐core
Intel
Xeon
CPU,
10G
ethernet

§  Test
Sooware:

–  Cassandra
v2.0.3:
stock
and
with
ZetaScale

–  Datastax
modiﬁed
Cassandra-‐stress
tool

–  60M
rows,
5
columns
per
row,
100
byte
object
size;
128
threads

24

Cassandra
with
ZetaScale

0

10000

20000

30000

40000

50000

60000

32R/1W
16R/1W
4R/1W
1R/1W
1R/4W
1R/16W
1R/32W

Transac9ons
per
Second

Read/Write
Mix

Stock
Cassandra
Cassandra
with
ZetaScale

Source:
Based
on
internal
tes2ng
by
SanDisk;
Mar
2014

25

ZetaScale
Cassandra
Integra9on
Highlights

•  Replace
Cassandra
LSMTree
(memtable
&
sstables)
with
ZetaScale
storage
engine

•  Route
object
get/put
calls
to
ZetaScale
get/put

•  Disable
stock
Cassandra
journal:
ZetaScale
maintains
its
own
journal

•  ZetaScale
indexing
is
used
for
row
and
column
range
queries

•  ZetaScale
transac2ons
are
used
to
enforce
atomicity
of
row
updates
and
secondary
index
modiﬁca2ons

•  ZetaScale
snapshots
are
used
for
full
and
incremental
backups

•  Compac2on
is
eliminated!

Thrie

Service
Client
MemTable
CommitLog
SSTable
SSTable
Memory
Storage
SSTable
Write Flush
Read Thrie

Service
Client
ZetaScale
Memory
Storage
Serialize Deserialize

26

Flash
Extension
and
TCO

578

276

1110

222

0

500

1000

1500

2000

Stock
with
ZS

OpEx

CapEx

$
(thousands)

3
Year
TCO

27

•  In-‐memory
compute
applica2ons
can
use
flash
to
extend
capacity
and
s2ll
maintain
good

performance,
leading
to
reduced
TCO

•  Key-‐value
abstrac2on
is
a
good
seman2c
fit
for
extending
many
in-‐memory
apps

•  Need
sufficient
func2onality:
crash-‐safeness,
transac2ons,
snapshots,
range
queries

•  Proof
points
using
the
ZetaScale
key-‐value
library:
Memcached,
Redis,
GigaSpaces,

MongoDB,
Couchbase,
Cassandra

•  Proof
points
show
that
although
performance
drops
using
flash
extension,
it
is
s2ll
good

•  For
capacity
limited
applica2ons,
flash
extension
can
reduce
overall
TCO
significantly

Conclusion

Thank
You!

brian.okra~a@sandisk.com

@BigDataFlash

#bigdataflash

ITblog.sandisk.com

hcp://bigdataflash.sandisk.com

©2015
SanDisk
Corpora2on.
All
rights
reserved.
SanDisk
is
a
trademark
of
SanDisk
Corpora2on,
registered
in
the
United
States
and
other
countries.
ZetaScale,
Fusion
ioMemory,
ioDrive
and
Lightning

are
trademarks
of
SanDisk
Enterprise
IP
LLC.
All
other
product
and
company
names
are
used
for
iden2fica2on
purposes
and
may
be
trademarks
of
their
respec2ve
holder(s).

IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing

Related slideshows

More Related Content

IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing