With AWS you can choose the right database technology and software for the job. Given the myriad of choices, from relational databases to non-relational stores, this session provides details and examples of some of the choices available to you. This session also provides details about real-world deployments from customers using Amazon RDS, Amazon ElastiCache, Amazon DynamoDB, and Amazon Redshift.
3. Starting with the Customer
• How many of you use databases on AWS?
• How many of you use Amazon RDS, Amazon DynamoDB, Amazon
Redshift, or Amazon ElastiCache?
• How many of you have a well defined DR strategy for your
databases?
• How many of you are building geo-spatial and context sensitive
applications?
• We suggest that you attend Werner’s keynote!
4. Introducing: Cross Region Support
US GovCloud
(US ITAR
Region
-- Oregon)
US West x 2
(N. California
and Oregon)
US East
(Northern
Virginia)
LATAM
(Sao
Paola)
Europe West
(Dublin)
>10 data centers
In US East alone
9 AWS Regions including 25 Availability Zones and growing
46 world-wide points of presence
Asia Pacific
Region
(Singapore)
Asia Pacific
Region
(Tokyo)
Australia
Region
(Australia)
• RDS Snapshot Copy
•
All engines
5. Zoopla
“We are very happy with RDS cross region snapshot copy feature as it gives
us the ability to copy our data from one AWS region to another AWS region
with minimal effort.
Prior to this feature, it used to take 3 days and a number of manual steps to
copy our snapshots. Now we have an automated process that helps us to
achieve disaster recovery capabilities in just few steps.”
Joel Callaway, IT Operations Manager
Zoopla Property Group Ltd, UK
6. Your Mission is Clear
1. Zero to App in ____ Minutes
2. Zero to Millions of users in ____ Days
3. Zero to “Hero” in ____ Months
14. Thinking About the Questions
Should I use
MySQL or
PostgreSQL?
Should I use
SQL or NoSQL?
Should I use
MongoDB,
Cassandra, or
DynamoDB?
?
Should I use Redis,
Memcache, or
ElastiCache?
15. Actually, Thinking About the Right Questions
What are my
transactional and
consistency
needs?
What are my scale
and latency
needs?
What are my
read/write, storage
and IOPS needs?
?
What are my time
to market and
server control
needs?
16. Factors to Consider
Factors
SQL
NoSQL
Application
• App with complex business logic?
• Web app with lots of users?
Transactions
• Complex txns, joins, updates?
• Simple data model, updates, queries?
Scale
• Developer managed
• Automatic, on-demand scaling
Performance
• Developer architected
• Consistent, high performance at scale
Availability
• Architected for fail-over
• Seamless and transparent
Core Skills
• SQL + Java/Ruby/Python/PhP
• NoSQL + Java/Ruby/Python/PhP
Best of both worlds: Possible to Use SQL and NoSQL models in one App
17. Factors to Consider
Self-Managed Service
Managed Service
• Full control over the instance,
db and OS parameters
• Upgrades, back-ups, fail-over
are yours to manage
• All aspects of security is
managed by you
• Complex replication topologies
and data management
• Off-load the infrastructure and
software management
• Automate database life-cycle
with APIs
• Focus on database access and
app security
• Limited control over replication
topologies
18. Pace of Innovation – a Bonus
RDS team
launched 23+
features
•
•
•
•
SQL Server TDE, Version upgrade
Oracle TDE, Statspack, Fine grain access, 3TB/30K IOPS
Cross Region Snapshot Copy, Parallel replica, Chained replica
Multi-AZ SLA, Log access, VPC groups, …
NoSQL team
launched 10+
features
•
•
•
•
Redis engine support
Amazon DynamoDB Fine grain access control
Amazon DynamoDB local, Geospatial indexing library
Transaction library, Local secondary index, parallel scan
Redshift team
launched 20+
features
•
•
•
•
Encryption with HSM support
Audit logging, SNS notification, snapshot sharing
COPY from Amazon EMR/HDFS/SSH
Faster resize, improved concurrency, distributed tables, …
19. Amazon RDS is a managed SQL database service.
Choice of Database engines
Simple to deploy and scale
Reliable and cost effective
Without any operational burden
20. Optimizing for Developer Productivity
Schema design
Migration
Backup and recovery
Patching
Query construction
Configuration
Query optimization
Focus on the “innovation”
Software upgrades
Storage upgrades
Frequent server upgrades
Hardware crash
Off load the “administration”
21. Optimizing for Developer Productivity
Multiple databases per instance
MySQL Manual for Read Replica
Use MySQL tools & drivers
Quickly set up Read Replicas
High availability Multi-AZ option (99.95% SLA)
Ability to promote Read replicas, Rename as Master
Diagnostics
OR Amazon RDS console
Native MySQL replication
SSL for encryption over the wire
Monitor metrics
Shell, super user or direct file system access (Think security!)
22. ElastiCache is a managed caching service.
Easy to set up and operate cache clusters
Supports Memcached and Redis engines
Scale cache clusters with push button ease
Ultra fast response time for read scaling
Without any operational burden
23. ElastiCache is a Performance Booster
Serve most read queries
In-memory performance
Read Replica (Redis)
Master
App
Reads
Cache
Updates
Clients
Elastic Load
Balancing
EC2 App
Instances
Read/write queries
SSD performance
RDS
MySQL DB
Instance
with PIOPS
24. Amazon DynamoDB is a managed NoSQL
database service.
Store and retrieve any amount of data
Scale throughput to millions of IO
Single digit millisecond latencies
Without any operational burden
25. Optimizing for Developer Productivity
CreateTable
UpdateTable
DeleteTable
Manage tables
PutItem
GetItem
UpdateItem
DescribeTable
ListTables
DeleteItem
Query
Query specific
items OR scan the
full table
BatchGetItem
Scan
BatchWriteItem
“Select”, “insert”,
“update” items
Bulk select or
update (max 1MB)
26. Amazon Redshift is a managed data warehouse
service.
Petabyte scale columnar database
Fast response time (~10x that of typical relational stores)
Under $1,000 per TB per year
Without any operational burden
27. So, what are the tips and techniques for
successful deployments?
28. Thousands of Successful Deployments
Two Highlights
SugarCRM
CRM Software
Gaming Platform
Zac
Sprackett
Mike
Thomas
36. Delivering On Time and On Budget
• Amazon lets you easily spin up testing environments
– Testing only works if you make use of it. Don’t make assumptions
– Monitor everything
• Change in cost model can surprise finance
– Planned capital expenditures versus after the fact operational expenditures
– Use reserved instances
– Third party tools such as Cloudability can help alert you of issues early
• Manage access keys effectively to control cost
– Learn to love AWS Identity and Access Management (IAM)
37. Things to Watch Out For
•
Understand your IO requirements
–
•
•
Use the heck out of read replicas
Snapshots are incredibly useful
–
•
Don’t get stuck waiting for deployments in a forced failover scenario
ElastiCache is not clustered across availability zones
Watch out for the SLA
–
–
•
Unless you really like restarting databases
Cold Standby is not instant on
–
•
•
But not available from a read replica
Don’t use the default parameter group for Amazon RDS
–
•
Make effective use of each of instance backed, Amazon EBS and Provisioned IOPS file
systems
99.95% for a region even across two AZ’s
This doesn’t include user error
You still need DBAs and Ops but they get to do cooler stuff
40. Our technical infrastructure allows
developers to build games
efficiently for both iOS and Android.
Millions of Users
Billions of Turns
All titles have reached the Top 5
in the App Store, and the last
three have been #1.
ABOUT
SCOPELY
41. Challenges
• Build a single platform to support many different
kinds of games – asynchronous turn based, single
player, synchronous, etc.
• Scale up and down as games are tested, launched,
grow, and are retired.
• We are not an infrastructure company – we must
focus on building features that support game
development.
42. Platform Features
•
•
•
•
•
•
•
•
•
•
•
Accounts / authentication
•
Gameplay / state persistence
•
Chat / messaging
•
In game economy
•
Facebook integration
•
Gifting
•
Single Player state tracking
•
Promotion / cross-promotion system •
Statistics
•
Tournaments
•
Achievements
Email targeting
Suggested friends
In game news system
External partner integration
Invitation attribution
Push notifications
Content management
Generic storage API
Application / device configuration
AB Testing
43. Different Features/Different Requirements
•
•
•
•
•
Dynamic scaling (game launches, promotions, tests)
High write/read ratio (playing turns)
Transactional consistency (real money purchases)
Indexed data (user accounts)
Complex, real-time data (leaderboards)
44. Operational Data Storage
Scopely Gaming Platform
Memcached for
performance,
scalability, and cost
savings
ElastiCache
Amazon S3 for
asset and image
storage.
S3
Redis for fast, complex
caching and message
passing.
Amazon DynamoDB for
unbounded data
with heavy write load.
ElastiCache
DynamoDB
RDS
MySQL for bounded,
transactional, queryable
data.
45. Analytics Data Pipeline
Scopely Gaming Platform
SQS: In-Flight Events
Redshift Data Warehouse
EC2: Message Loader
S3: Staged Messages
EMR: Transformer
S3: Processed Data
EC2: Redshift Loader
RDS: Process / Job Tracking
47. Use Case: Leaderboards
•
“What is my rank in today’s tournament?”
•
Hard to cache since a single player getting a new high score
changes everyone’s rank
•
Highly optimized schema required 4 m2.2xlarge RDS nodes
•
Latency for “what is my rank” could be above 100ms
•
Redis sorted sets provide exactly what we need. Two m2.xlarge
instances are more than enough. Rank query is now in single digit
milliseconds.
Redis
48. Use Case: Game/Turn State
•
Extremely high throughput. Extremely large dataset.
DynamoDB
•
Semi-structured data – each game models “state” differently.
•
Always queried by UserID or GameID.
•
Maxed out an Amazon RDS instance – instead of spending time sharding /
optimizing Amazon RDS, we moved to Amazon DynamoDB.
•
Saves operational time and development time by not having to worry about
growing games/adding new games/traffic spikes.
49. Use Case: User Accounts
• Need to maintain uniqueness across multiple
columns (email, username, etc.)
MySQL (RDS)
• Queryable on multiple facets (email, username, external identifier)
• Entire table needs to be scanned regularly (promotions)
• Bounded data size
50. Use Case: Global Caching
• Cache everything possible in Memcached
including both entities in Amazon DynamoDB
and RDS.
Memcached
(ElastiCache)
• Single interface providing session caching, memcached
caching, and Amazon DynamoDB access encourages
consistent use of caching.
51. Use Case: Global Caching
public class CoherentStorage
{
public Cache L1Cache { get; set; }
public Cache L2Cache { get; set; }
public DynamoClient Dynamo { get; set; }
private readonly Games _game;
public CoherentStorage(Games game)
{
_game = game;
L1Cache = Cache.Request;
L2Cache = Cache.GetMemcached(String.Format("{0}GameState", game));
Dynamo = DynamoClient.Instance;
}
public void Save(object instance) { }
public void Delete(object instance) { }
public T Get<T>(object id, bool skipCache = false, bool consistentRead = true) { }
}
Memcached
(ElastiCache)
52. Tips & Traps
• Know your data – use reasonable heuristics for expected
data growth.
• Each data storage technology introduces some level of
operational and engineering overhead. Choose wisely.
• Get creative with Amazon DynamoDB.
• Prepare for the unexpected with Metadata columns in
MySQL.
53. Please give us your feedback on this
presentation
DAT201
As a thank you, we will select prize
winners daily for completed surveys!