AWS gives designers of enterprise storage systems a completely new set of options. Aimed at enterprise storage specialists and managers of cloud-integration teams, this session gives you the tools and perspective to confidently integrate your storage workloads with AWS. We show working use cases, a thorough TCO model, and detailed customer blueprints. Throughout we analyze how data-tiering options measure up to the design criteria that matter most: performance, efficiency, cost, security, and integration.
6. Storage Services
Scalable and durable
high performance cloud storage
Amazon S3
Redundant, High-Scale Object Store
Amazon Glacier
Deployment & Administration
App Services
Compute
Storage
Low-cost Archive Storage in the Cloud
Amazon Elastic Block Store
Database
Persistent Block Storage for EC2
Networking
AWS Storage Gateway
AWS Global Infrastructure
Corporate File Sharing and Seamless Backup
of Enterprise Data to Amazon S3
7. Amazon S3 Standard Storage Is…
Designed to provide 99.999999999% durability and
99.99% availability of objects over a given year.
If you put 10,000 objects in S3 you can expect to
lose 1 object every 10,000,000 years
10. Next Generation Enterprise Storage
Customer
Data
Center
Block
File
AWS Direct
Connect
AWS
Cloud
Archive
Backup
Internet Web
Services API
HTTP(S)
Storage
Use
Cases
Disaster Recovery
17. Option 1: Software Integration
1. Configure on-premises backup software to use S3
2. Backup and restore directly from software
3. Backup server communicates with cloud (S3) over
Internet links
4. Use software-based encryption, compression,
dedupe, backup management tools
5. Check security / integrity / functionality /
performance / operations / speed
18. Tapeless Cloud Backup
Backup Server
Physical
Virtual
EU West
Region
(Ireland)
S. America
Region
(Sao Paulo)
APAC
Region
(Singapore)
Japan
Region
(Tokyo)
US West
Region (N.
California)
US West
Region
(Oregon)
AWS
GovCloud
Region (US)
US East
Region (N.
Virginia)
19. Option 2: Plain File Transfer
1.
2.
3.
4.
Store target file(s) on a file share.
Configure policies on target Amazon S3 buckets
Encrypt / compress data sets on premises
Transfer files via regular file transfer (Amazon S3,
SFTP, SCP, FTP etc). Or use massively parallel filetransfer options
5. Retrieve encrypted file from Amazon S3 using using the
same options
6. Test integrity / security / operations / performance
7. Add parallelization for performance optimization
20. Plain File Transfer Diagram
1.Store
Backup File
on FileShare
Encrypt Output
File
Create
Output
Encrypted Data Is
Written to on
FileShare
Internet
or Direct
Connection
Transfer / Retrieve
Encrypted File to
Amazon S3 Using
Regular File Transfer
Customer Data Center
AWS Region
21. Option 3 : Use AWS Gateway
• Integrates on-premises IT environments with cloud storage
for remote office backup and DR
• Utilizes a virtual appliance that sits in customer datacenter
• Exposes compatible iSCSI interface on front end
• Provides low-latency on-premises performance
• Asynchronously uploads data to AWS where it is stored in
Amazon S3 as Amazon EBS snapshots
22. Option 3 : Use AWS Gateway
• Support for Windows and RedHat iSCSI initiators
• Point-in-time snapshots accessible locally and from
Amazon EBS
• Encryption via SSL and Amazon S3 server side
encryption
• Snapshot scheduling
• WAN compression
• Supported in all public regions
• Bandwidth throttling
• CACHED VOLUMES / VTL SUPPORT
23. Gateway-Cached Volumes – Key Use
Cases
Cacheable data like departmental file
Corporate File
Sharing
Backup
shares, home directories
Store files in Amazon S3, while
keeping recently accessed data on
premises
iSCSI interface compatible with
traditional backup applications
(Netbackup, Tivoli, Backup Exec, etc.)
Store backups in Amazon S3, keep
recent backups on premises
25. VTL Gateway – Archive Your Data to Glacier
Backup
Software
VTL Gateway
Archive to AWS
Amazon
S3 AWS Cloud
App/DB/SAN/NAS
Corporate Data Center
Amazon
Glacier
versus
Traditional
Approach
SAN
Tier 2
Storage
Corporate Data Center
Disk
Backup
Offsite Tape
Storage
26. Architecture of VTL Gateway
VTL (1500 tapes)
VTS (unlimited
Customer Data Center
tapes)
Tape Ingestion into Glacier
AWS Storage
Gateway VM
iSCSI
Media Changer
SSL
Tape Drive 1
Production
Systems
NetBackup /
CommVault /
[Backup
Software]
AWS Storage Gateway for
Virtual Tape Library
…
Tape Drive 2
AWS Storage
Gateway
Service
Amazon S3
Amazon
Glacier
Tape Drive N
On-premises
Host
Direct Attached or
Storage Area Network Disks
(for internal cache & buffer storage)
AWS Storage Gateway on EC2
for Disaster Recovery or Data Mirroring
EC2
Application
NetBackup /
CommVault /
[Backup Software]
on EC2
AWS Storage
Gateway on
EC2 AMI
27. VTL Gateway Characteristics
•
•
•
•
•
•
Single point virtual appliance for archive use case and for customer in need for a
simple VTL Interface
Each virtual appliance can manage up to 140TB in VTL (Virtual Tape Library) but
unlimited in VTS (Virtual Tape Shelf)
Cost of each appliance could be around $125
Ease of mgmt. when data grows in multi PBs per year
Current ingest rate is about 3-5 TBs per day per gateway (option to use multiple
GW in a cluster environment)
Data passed through VTL gateway is not deduped (ease of restore and reuse) –
suited for long-term archive
Bottom line: Archive, fixed content, entertainment, scientific, social networks, compliance and
unstructured data requirements generate much of today’s tier 3 storage demand and have become the
primary drivers for tape storage demand. With Amazon Glacier and VTL gateway, AWS is very well
poised to help customers leverage the benefits of cloud storage!
29. How Does It Work?
• Enterprise storage gateway presents itself as
–
–
–
–
CIFS/NFS file share
iSCSI endpoint
File archive via file tiering policies from filers
Policy-based routing from FC switch
• Gateway cache data locally, tiers data back to
Amazon S3-based on policies after dedupe,
encryption, compression
• Data is accessible to all other gateways
30. Design Considerations
•
•
•
•
•
•
•
Ingest / restore / access rates
Deduplication / compression rates
Throughput rates
High availability / integrity
Restore in the cloud option
Data transfer costs
Security integration
31. Option 4: Enterprise Storage Gateway
AWS Direct
Connect
Enterprise Storage
Gateway Dedupes,
Compresses and
Encrypts Data and
Then Moves Data
to AWS
Corporate Data Center
AWS Region
33. Block vs. File vs. Object Storage
•
Block storage:
–
–
–
•
File storage:
–
–
–
•
Data organized as an array of unrelated blocks
Host file system places data on disk: Microsoft NTFS or Unix ZFS
Structured data is predicted to grow at 18.7% CAGR until 2018
Unrelated data blocks managed by a file (serving) system
Native file system places data on disk: EMC UxFS or NTAP WAFL
Unstructured data is predicted to grow at 47.3% CAGR until 2018
What is object storage?:
–
–
A new data access, data storage, and data management model
• API access to data vs. traditional block or file system access
• Metadata driven, policy-based, self-managing storage
• No host overhead for storage functions
A system that stores virtual containers that encapsulate the data, data attributes,
metadata, and Object IDs
34. Cloud Storage: SDK or Plug & Play?
Plug & Play: IT can bridge onpremises environments with familiar
storage interfaces and methodologies
via cloud storage gateways
SDK: Application developers can
leverage the Amazon S3 SDK for
custom application integration
Internet Web
Services API
HTTP(S)
S3
S3
AWS Storage Gateway
36. NFS / CIFS Archive
Case Data
Analysis Data
Administrative Data
•
•
•
•
Global Deduplication
Encryption
Multiple Gbps
Global Online Access
Cloud Storage
Offload stale data to low-cost cloud storage
Scale instantly as needed
Integrate seamlessly with standard archiving, tiering solutions
“Cloud drive” just another disk target, accessible anywhere
37. NFS / CIFS Backup
Global Deduplication
Encryption
Multiple Gbps
•
•
•
•
•
•
•
100s TB raw local cache
Eliminate tape from infrastructure
Slash time and manpower for data protection
Global deduplication
Military-grade encryption
Seamless integration with major backup tools
Restore anywhere, virtual or physical
Global Online Access
Cloud Storage
38. Panzura + Amazon S3
•
•
•
•
•
•
•
•
Eleven 9s durability
Four 9s vailability
Highly-secure sites
Unlimited scale
Commodity pricing
Glacier option
Multiple-geos
Largest public cloud
▪
▪
▪
▪
▪
▪
▪
▪
Global file system
Military-grade encryption
Global deduplication
CIFS/NFS
Global file locking
Local caching/pinning
AD integration/ACLs
Snapshots
40. File Archive with S3
Local file system interface to the SOAP / REST
Local file system interface to the SOAP / REST
API used by the Amazon storage cloud platform
API used by the Amazon storage cloud platform
2
Virtual namespace to seamlessly integrate local
Virtual namespace to seamlessly integrate local
and Amazon cloud storage cloud for users
and Amazon cloud storage cloud for users
3
4
CIFS / NFS
1
SOAP / REST
5
Local Storage
Automatically identify inactive and other
Automatically identify inactive and other
appropriate files to store in the Amazon storage
appropriate files to store in the Amazon storage
cloud
cloud
Migrate files to Amazon storage cloud platform
Migrate files to Amazon storage cloud platform
without disrupting user access or causing
without disrupting user access or causing
downtime
downtime
Encrypt every file stored in the Amazon storage
cloud for data security
41. Unified Storage with Amazon S3
Unified storage
• Files, databases, & VMs
• NAS & SAN
42. NextGen Enterprise Storage
Customer Location
AWS Cloud
Mobile
Devices
Servers
CTERA Appliance
Workstations
LOCAL
BACKUP
NAS
SECURE
REPLICATION
REMOTE
MANAGEMENT
CLOUD
STORAGE
• File & Bare Metal
Roaming Laptops
• RAID 0/1/5/6
• AES-256 + SSL
• Administration
• Pay as you go
• Exchange, SQL,
AD Recovery
• NFS, CIFS,
AFP, FTP, rsync
• De-duplicated
• Partner dashboard
• Secure & redundant
• Bandwidth controlled
• Central logging
• Thin snapshots
• Automated
• AD Integration
• Incremental
• Firmware updates
• Incremental
• Thin snapshots
• Flexible backend
options
• Compression
46. NASDAQ OMX and AWS
• History
• Options provided to NASDAQ OMX teams
• Evaluation of architectural options and
NASDAQ OMX technology selection
• Managing operations for cloud backup
• Security
48. Sample FinQloud Workflow – How Does It Work?
Customer data sets are ingested at Nasdaq hosted inbox via
secure file transfer
Nasdaq preprocesses data (e.g., trade data) at Nasdaq
Facilities
Split trade data records into chunks (about 1M records per
chunk)
Each file is encrypted with AES-256 / FIPS complaint system
Custom encryption is applied (e.g. per client/per day, random
initialization)
49. Sample FinQloud Workflow – How Does It Work?
A custom metadata header file is attached to encrypted chunk.
Metadata is signed via SHA-256
File chunks are uploaded to Amazon S3/R3 – each FinQloud
customer gets a new AWS account, a new bucket
WORM or regular Amazon S3 buckets can be utilized
Search and retrieve functionality is performed by Amazon Elastic
MapReduce (AWS-managed Hadoop) for performance
Each customer gets an assigned Amazon Virtual Private Cloud (VPC)
Amazon EMR request key files from Nasdaq hosted host security files
50. Sample FinQloud Workflow – How Does It Work?
For cloud-based processing (e.g. reporting) trade data chunks,
and files are decrypted in the memory with Amazon EMR
Data is never in clear-text in transit or at rest
Once the jobs are completed, data sets are re-encrypted again
and either written to Amazon S3/R3 or shipped back to
Nasdaq
Results data-sets can be decrypted at Nasdaq facilities via
HSM hosted keys or customer can integrate their PGP keys
for asymmetric encryption and download the results
51. Selection of Enterprise Storage Workload
Type for First Cloud Project
• Cloud first strategy
• Tier 3 storage vs backup workloads
• Selection of backup as the first use case
– Greater control of the implementation / outcome
– Less risk as it was backup data vs production data
• Backup technology options
52. Selection of Architecture and Technology
• Why Riverbed Whitewater was chosen
–
–
–
–
–
–
–
Ease of deployment
Strong vendor support model
Good integration and compatibility
On-premises cache repository for backup platform
Inline dedupe, compression
Data encryption at appliance before data transfer
Listed company
54. How NASDAQ OMX Runs Storage
Operations for the Cloud
• No major changes since Netbackup is integrated
with Riverbed Whitewater
• Riverbed appliance looks like a standard disk
pool to the media server
• NBU policies altered to make us of RB disk pool,
which in turn sends data into Amazon S3
55. How Does Security and Isolation Work?
AWS standard security +
– Data is always encrypted at transit and at rest
– Keys are stored at Nasdaq facilities
– Nasdaq InfoSec department performed security
review and provided sign off of security
measures.
56. Integration Points
• All AWS services+
–
–
–
–
–
End-to-end isolation
End-to-end encryption
Separation of duties by hybrid security
Patented worm: R3
Dynamic key management
58. Why Tier-3 Cloud Storage Solution?
• NASDAQ OMX is planning to leverage scale, durability
and the cost advantages of cloud-based storage solutions
• In addition to backup and archive workloads, testing Tier3 storage on the cloud makes sense for NASDAQ OMX
due to the ratio of spend on Tier-3 storage (compared
with backup/archive workloads)
• There is a management initiative to leverage cloud
technologies at NASDAQ OMX
59. Why AWS Storage Gateway?
AWS Storage Gateway is a service that connects an onpremises software appliance with cloud-based storage to
provide seamless and secure integration between on-premises
IT environment and AWS' storage. AWS Storage Gateway is:
– Native AWS offering
– Scalable
– Cost effective
– Controllable from AWS Management Console
– Promising roadmap
60. Objective of Proof-of-Concept
• The objective of this Proof-of-Concept (PoC) is to provide a
high-level analysis and checklist of all elements and attributes
necessary to successfully implement a Tier-3 Cloud Storage
Gateway.
• The PoC is the initial step prior to undertaking a detailed system
design and implementation and is intended to function as a
prototype system. It is meant to demonstrate key technologies,
as well as provide an environment for experimentation and
evaluation. The design and implementation of a POC, while very
detailed and organized, does not serve as a replacement for a
complete system analysis and design.
63. PoC AWS Storage Gateway Requirements
VMware:
•
VMware ESXi Hypervisor (v 4.1 or v 5)
•
4 virtual processors assigned to the VM
•
7.5 GB of RAM assigned to the VM
•
75 GB of disk space for .ova installation and system data
External Connectivity:
•
Ports 80 and 443 are used by the vSphere client to communicate to the ESXi host.
•
Port 80 is used when you activate your gateway from the AWS Storage Gateway console.
•
Port 3260 is the default port that your application server uses to connect to iSCSI targets.
64. PoC On Premises Hardware Requirements
VMware:
• VMware ESXi Hypervisor server x 2 (existing servers can be used)
Ethernet NICs:
• Existing NICs can be leveraged; dual NIC tests are recommended
Ethernet Switch:
• Existing network switches can be leveraged; isolation and
bandwidth allocation recommended
Connectivity:
• Existing AWS connectivity or Internet connections can be leveraged
65. Provision 2 VMware physical servers (hosts)
Download the AWS Storage Gateway software at
http://console.aws.amazon.com/storagegateway
Allocate on-premises storage for active data
Activate gateway and select an AWS region
Create and mount iSCSI volumes
Provision Ethernet Cards and Network Infrastructure
Test Primary Storage Access over iSCSI on new volumes
Configure volume management to copy data sets from
existing volumes to new volumes
66. Sizing On-Prem Storage
Upload Buffer
Estimate the approximate data plan to write on a daily basis.
It is recommended to allocate at least 150 GBs.
67. Sizing On-Prem Storage
Backup use cases: at least the size of upload buffer
File share use cases: 20% of current storage
Cache Storage
68. Additional Configuration Considerations
• Solaris iSCSI initiators
• Cache storage is durable store
• Allocate separate disks for cache storage and
upload buffer
• Quick format vs. full format of drives
• Virus scanning
69. PoC Schedule
• By January 31 this PoC should be completed.
• This POC is not an open-ended project. It is a limited
implementation for a fixed period of time. The project
duration will be a direct result of the project objectives.
• This duration includes project time necessary to plan,
design, and implement the POC system.
• Project management teams should actively work to
control scope.
70. PoC Resources and Responsibilities
Amazon Web Services
•
High-level design
•
AWS Storage Gateway installation and configuration
•
AWS cloud components configuration
•
Assistance in iSCSI and VMware configurations
•
Provide test plans
•
Assistance is test execution
•
Delivery of final analysis
NASDAQ OMX
•
On-premises hardware installation (NICs, hosts)
•
VMware installation and configuration
•
On-premises network configuration (Switches, VLANs etc.)
•
Providing test targets, assistance in test plans
•
Execute tests
•
Assistance in final analysis
71. Limitations of PoC
• Virtualization hardware (VMware physical host)
and network systems are not for production
• Hardware and network limitations that are not
critical for success will not be addressed
• Compromises will be made to accommodate the
smaller scale of the implementation