Aerospike Hybrid Memory Architecture

High Performance
NoSQL Database
Powering New
Opportunities at Scale
Ask The Experts:
Architectural Overview

Overview
■  Overview
■ Database landscape
■ Use cases
■  Architecture
■  Storage
■  Indexes and Operations
■  Cross Datacenter Replication
■  Questions

Response time: Hours, Weeks
TB to PB
Read Intensive
TRANSACTIONS
(OLTP)
Response time: Seconds
Gigabytes of data
Balanced Reads/Writes
ANALYTICS (OLAP)
STRUCTURED
DATA
Response time: Seconds
Terabytes of data
Read Intensive
BIG DATA ANALYTICS
Real-time Transactions
Response time: < 5 ms
1-100 TB
Balanced Reads/Writes
24x7x365 Availability
UNSTRUCTURED
DATA
REAL-TIME BIG DATA
Database Landscape

Next Generation Systems of Engagement –
An Emerging Market with Multiple Technologies
Aerospike Delivers Predictable Performance, Highest Availability, and Lowest TCO
Systems of Engagement - TCO
TCO
($)
Scale TB
Systems of Engagement – Many Choices
Alternative
TCO
Aerospike
TCO
Speed
TPS
Scale TB
Significant functional
overlap - Commodity DB
problem set
Unique Functional
Capabilities and High
Value Problem Set

High Performance NoSQL +
■  Unlimited Key Value pairs, record size
up to 128KB - 1MB.
■  Complex & Scalar Types - integer,
double, string, blob, list, map,
geospatial.
■  Distributed Queries on secondary
indices (exact match, integer range,
geospatial queries).
■  User Defined Functions extend the
database.
■  Patented Indexed Map-Reduce –
distributed queries can be filtered,
transformed, aggregated, and
reduced.

MILLIONS OF CONSUMERS
BILLIONS OF DEVICES
APP SERVERS
DATA
WAREHOUSEINSIGHTS
Advertising Technology Stack
WRITE CONTEXT
In-memory NoSQL
WRITE REAL-TIME CONTEXT
READ RECENT CONTENT
PROFILE STORE
Cookies, email, deviceID, IP address, location,
segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS
Best sellers, top scores, trending tweets
BATCH ANALYTICS
Discover patterns,
segment data:
location patterns,
audience affinity
Currently about 3.0M / sec in North
American

Challenge
•  Billions of users & cookies across the internet
•  Accessible using provisioning applications
(self-serve and through support personnel)
•  Real-time algorithms used for targeting, offers.
Need for Extremely High Availability,
Reliably, Low latency
•  10’s TBs of data
•  1B ~ 10B objects
•  1M ~ 10M TPS
Selected NoSQL
•  Clustered HA system
•  Predictable low latency at high throughput
•  Highly-available and reliable on failure
•  Cross data center (XDR) support
AdTech – Targeting, Bidding, Programmatic
INTERNET 
AD EXCHANGE
BIDDING 
APPLICATION
SEARCHES
VISITS 
TIME ON PAGE
AUDIENCE
HISTORICAL
DATA
BEHAVIOR 
MODELS
 
MACHINE 
LEARNING

Travel Portal
PRICING DATABASE
(RATE LIMITED)
Poll for
Pricing
Changes
PRICING
DATA
Store
Latest
Price
SESSION
MANAGEMENT
Session
Data
Read
Price
XDR
Airlines forced interstate
banking
Legacy mainframe
technology
Multi-company
reservation and pricing
Requirement: 1M TPS
allowing overhead
Travel App

Financial Services – Intraday Positions
10M+ user records
Primary key access
1M+ TPS
•  Challenge
–  DB2 stores positions for 10 Million
customers
–  Value-at-risk calculations in minutes,
not hours
–  Consistent view of trade state across all
applications
–  Must update stock prices, show balances
on 300 positions, process 250M
transactions, 2 M updates/day
–  Cache uneconomical – 150 servers
growing to 1000
•  Need to scale reliably
–  3 à 13 TB
–  100 à 400 Million objects
–  200k à I Million TPS
•  Selected NoSQL
–  Flash
–  Predictable Low latency at High
Throughput
–  Immediate consistency
–  Cross data center (XDR) support
–  10 Server Cluster
IBM DB2
(MAINFRAME)
Read/Write
Start of Day
Data Loading
End of Day
Reconciliation
Query
REAL-TIME
DATA FEED
ACCOUNT
POSITIONS
XDR

QOS & Real-Time Billing for Telcos
Challenge
•  Per-account routing rules win edge systems
•  Traffic shaping to implement account policies
•  Accessible using provisioning applications
(self-serve and through support personnel)
Need for Extremely High Availability,
Reliably, Low latency
•  TBs of data
•  10-100M objects
•  10-200K TPS
Selected NoSQL
•  Clustered system
•  Predictable low latency at high throughput
•  Highly-available and reliable on failure
•  Cross data center (XDR) support
SOURCE
DEVICE/USER DESTINATIONReal-Time
Auth. QoS Billing
Request Execute
Request
Real-Time ChecksConfig Module App
Update Device
User Setting
Hot-Standby
XDR

Traditional SOE Architecture Has Significant Limitations
Challenges:
• Complex
• Maintainability
• Durability
• Consistency
• Scalability
• Cost ($)
• Data LagCaching Layer
Operational Database
Legacy RDBMS
HDFS BASED
Fast speed – Consumer Scale
Real-time
Consumer Facing
Pricing /
Inventory/Billing
Real-time
Decisioning
Streaming
Data
Legacy Database
(Mainframe)
RDBMS
Database
Transactional
Systems
Enterprise Environment

XDR
Aerospike
Hybrid Memory Systems - Enabling a New Class of Real-time
Applications
Aerospike Delivers Predictable Performance, Highest Availability, and Lowest TCO
Legacy Database
(Mainframe)
RDBMS
Database
Transactional
Systems
Enterprise Environment
Powered by High Performance NoSQL
Fast speed – Consumer Scale
Hybrid Memory Database
Benefits:
• Simplicity
• Maintainability
• Durability
• Consistency
• Scalability
• Cost ($)
• Data Lag Reduced
Real-time
Consumer Facing
Pricing /
Inventory/Billing
Real-time
Decisioning
Streaming
Data
Legacy RDBMS
HDFS BASED

Architecture – The Big Picture
1)  No Hotspots
– Distributed Hash Table
simplifies data partitioning
2)  Smart Client – 1 hop to data,
no load balancers
3)  Shared Nothing Architecture,
every node is identical
4)  Smart Cluster, Zero Touch
– auto-failover, rebalancing,
rack aware, rolling upgrades
5)  Transactions and long-running
tasks prioritized in real-time
6)  XDR – sync replication across
data centers ensures Zero
Downtime

How Data is Organized
Aerospike RDBMS
Namespace Tablespace or Database
Set Table
Record Row
Bin Column
Bin type
Integer
Double
String
BLOB
List
Map /
SortedMap
GeoJSON

Smart Client™
■  The Aerospike Client is implemented as a library, JAR or DLL, and
consists of 2 parts:
■ Operation APIs – These are the operations that you can execute on the
cluster – CRUD+ etc.
■ First class observer of the Cluster – Monitoring the state of each node and
aware on new nodes or node failures.

Smart Client - Distributed Hash table
■  Distributed Hash Table with No Hotspots
■ Every key hashed with RIPEMD160
into an ultra efficient 20 byte (fixed length) string
■ Hash + additional (fixed 64 bytes) data
forms index entry in RAM
■ Some bits from hash value are used to
calculate the Partition ID (4096 partitions)
■ Partition ID maps to Node ID in the cluster
■  1 Hop to data
■ Smart Client simply calculates Partition
ID to determine Node ID
■ No Load Balancers required

Even record distribution
Node A Node B Node C
Z
Z’
Y
Y’
X
X’
AerospikeClient
Application

Automatic rebalancing
Adding, or removing a node, the cluster
automatically rebalances
1.  Cluster discovers new node via gossip
protocol
2.  Paxos vote determines new data
organization
3.  Partition migrations scheduled
4.  When a partition migration starts,
write journal starts on destination
5.  Partition moves atomically
6.  Journal is applied and source data deleted
After migration is complete, the cluster is
evenly balanced.

Data is distributed evenly across nodes in a cluster using the Aerospike
Smart Partitions™ algorithm.
■  RIPEMD160 (no collisions yet found)
■  4096 Data Partitions
■  Even distribution of
■ Partitions across nodes
■ Records across Partitions
■ Data across Flash devices
■  Primary and Replica
Partitions
Even Data Distribution

Massively Parallel
Automatic Distribution of Data
•  Even amount of data on all nodes and all drives
•  All hardware used equally
•  Load on all servers is balanced
•  No “hot spots”
•  No configuration changes as workload or use
case changes
Smart Clients
•  Single “hop” from client to server
•  Cluster-spanning operations (scan, query, batch)
sent to all processing nodes for parallel
processing.

Scale up Architecture - Server internals
TCP/IPSocket
FlashStorage
Service Threads
Service Queues
Transaction
Threads

Predictable Performance
DIGEST & TREE INFO
RECORD METADATA
STORAGE POINTER
Reads
Single hop DRAM Read
OWNING SERVER PRIMARY INDEX STORAGE
DIGEST & TREE INFO
RECORD METADATA
STORAGE POINTER
Writes
Single hop DRAM Write
OWNING SERVER PRIMARY INDEX MEMORY BUFFER
Flush
ASYNC STORAGE
DIGEST & TREE INFO
RECORD METADATA
STORAGE POINTER
DRAM Write
REPLICA SERVER PRIMARY INDEX MEMORY BUFFER
Flush
ASYNC STORAGE
Synchronous
Replica Write,
Single hop

Predictable Performance
Performance Built In
•  Written in C with memory-optimized libraries => No garbage collection
•  Continual defragmentation of storage => No compactions
•  Known master for any piece of data => No quorum reads
•  Designed as a distributed database => Networking primary consideration
Storage Optimizations
•  Writes done to memory buffer => Avoid storage slowdown
•  Storage used in “block” mode => No file system overhead
•  Reads and writes striped across devices => Concurrent use of hardware
Smart Clients
•  Single “hop” from client to server

Data Consistency
•  Written data should be immediately consistent within a cluster
without introducing additional latency
•  Mixed workloads (true concurrent reads/writes) should not cause
issues
•  Written data should be asynchronously written to remote clusters

Data Consistency
OWNING SERVER REPLICA SERVER
Local Cluster
Remote Cluster
ASYNC REPLICATION
SYNCHRONOUS
REPLICATION
XDR
WRITE
READ

Data Storage Layer – Hybrid Architecture

Data in RAM
Data in RAM is very fast – at a price
■  Indexes and Data both in-memory
■  $$$ (great < 100G, Cloud)
■  More servers
■  Super fast
■  Optional HDD as backing store

Data on Flash / SSD
■ Record data stored contiguously
■ 1 read per record
■ Automatic continuous defragment
■ Data written in flash optimal blocks
■ Automatic distribution across drives
■ Writes buffered BLOCK INTERFACE
SSD SSDSSD
AEROSPIKE
HYBRID MEMORY SYSTEM™

Indexes in DRAM, Data on SSD
•  Small amount of DRAM => avoid cost and server sprawl
•  No concept of cache misses => Predictable, low latency
performance on NVMe/SSD

Primary Index
Primary index
■ DHT of rbTrees (one per partition)
■  Index entry
■ 64 bytes
■ Write generation
■ Time To Live
■ Last Update Time
■ Storage address
■ Uses shared memory for
Fast Restart

Key Value operations using the Primary Index
■  Put
■  Exists
■  Get
■  CAS
■  Increment (counters)
■  Append/Prepend
■  List Operations
■  SortedMap Operations
■  Touch
■  Delete
■  Batch Read/Exists
■  Scan

Secondary Indexes
■  Bin (Column) indices
■  Declarative index
■ String, Integer, List, Map Keys,
■  Map Values, GeoJSON
■  In RAM – fast
■  Multi-node
■ Co-located with primary index
■  Reference local data only
■  Index creation
■ Tools: AQL, ascli
■ Client API – developer only

Queries on Secondary Indexes
A query is a value based lookup using a secondary index similar to a SQL
select statement.
The query is sent to all nodes in the cluster in parallel
■  Scatter-gather
■  Multi-threaded
Best for “low selectivity” indices
Good for “high selectivity” indices
Selectivity = Cardinality / Rows*100
SECONDARY INDEX
PRIMARY INDEX
UDF UDF UDF
RECORD RECORDRECORD RECORD
SSD
SSD
DRAM
…
……

Cross Datacenter Replication - XDR

XDR Architecture
Each node in the clusterDistributed clusters

XDR Topologies
Star Replication
Simple Active-Passive Simple Active-Active
More Complex Topology

Failure Handling
Node failure within a cluster – nodes with replica data will continue
Link failure – XDR keeps track of link failures and data to be shipped over
that link. It will recover when the link comes up.
Node failure in a Cluster Link failure between Clusters

Aerospike – Enabling Your Digital Transformation
Powered by High
Performance NoSQL
Aerospike – The Next Generation
Operational Database
TRUE HYBRID MEMORY ARCHITECTURE
•  No cache required – simpler architecture! Smaller Server Footprint
•  Patented Flash Optimization – Log structured File System
•  Record Oriented, Schema Free NoSQL KV Store
PREDICTABLE PERFORMANCE
•  True Real Time DB engine, multi threaded, massively parallel
•  DRAM or Hybrid DRAM/Flash for Persistence
•  Stable, Low Latency and high throughput under any condition
•  Deployable on Bare Metal, virtualized, containerized, or Cloud
DYNAMIC CLUSTER MANAGEMENT
•  Highest Uptime & Availability (5 nines plus), Scalable
•  Automatic DB Cluster formation, self healing and dynamic sharding
•  Cross Data Center Replication (XDR)
INTELLIGENT CLIENTS
•  Machine Learning
•  Broad language support (C/C++, Java,C#, Python, Go, Node.js, PHP)
•  Patented functionality, DB aware Clients, No load balancers required
•  Rich API’s - Accelerated development
TCO
•  Optimized for Flash and DRAM
•  Demonstrated 10:1 price performance savings
•  Up to 10x reduction in servers deployed
•  Huge operational efﬁciency – “Set it and Forget it”
$

High Performance
NoSQL Database
Powering New
Opportunities at Scale
@aerospikedb
NEXT STEPS:
See how much you can save with Aerospike:
http://www.aerospike.com/tco-calculator/
Ready to get started?
http://www.aerospike.com/quick-start/
If you have any questions or want to further
explore if Aerospike is right for you, contact
us:
info@aerospike.com

Aerospike Hybrid Memory Architecture

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Aerospike Hybrid Memory Architecture

Similar to Aerospike Hybrid Memory Architecture (20)

More from Aerospike, Inc.

More from Aerospike, Inc. (10)

Recently uploaded

Recently uploaded (20)

Aerospike Hybrid Memory Architecture