In this session we will review Amazon EFS and how it delivers fully managed, petabyte-scale file storage for Amazon EC2 instances. Large scale and consistent performance make Amazon EFS ideal for web and content serving, enterprise applications, media processing, container storage, and Big Data analytics use cases. Session attendees will learn how to identify appropriate applications for use with Amazon EFS, understand performance details and security models, and hear how established customers are using it in production. The target audience is file system administrators, application developers, and application owners that operate or build file-based applications that require consistent latencies at cloud scale. Learn More: https://aws.amazon.com/government-education/
2. What to expect from this session
Recognize why and when to use Amazon EFS
Understand key technical/security concepts
Learn how to leverage EFS’s performance
Review EFS’s economics
3. What to expect from this session
Recognize why and when to use Amazon EFS
Understand key technical/security concepts
Learn how to leverage EFS’s performance
Review EFS’s economics
4. Data Transfer
Direct
Connect
Snowball 3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
File
Amazon EFS
Block
Amazon EBS
(persistent)
Object
Amazon GlacierAmazon S3 Amazon EC2
Instance Store
(ephemeral)
How EFS fits in to the AWS storage platform
5. We focused on changing the game
Simple Elastic Scalable
1 2 3
Highly durable
Highly available
6. Amazon EFS is Simple
• Fully managed
- No hardware, network, file layer
- Create a scalable file system in seconds!
• Seamless integration with existing tools and apps
- NFS v4.1—widespread, open
- Standard file system access semantics
- Works with standard OS file system APIs
• Simple pricing = simple forecasting
1
7. Amazon EFS is Elastic
• File systems grow and shrink automatically as
you add and remove files
• No need to provision storage capacity or
performance
• You pay only for the storage space you use,
with no minimum fee
2
8. • File systems can grow to petabytes of
capacity
• Throughput scales automatically as file
systems grow
• Consistent low latencies regardless of file
system size
• Support for thousands of concurrent NFS
connections
Amazon EFS is Scalable
3
9. • Every file system object is redundantly
stored across multiple Availability Zones in a
Region
• Designed to sustain Availability Zone offline
conditions
• Superior to traditional NAS availability
models
• Appropriate for production/tier 0 applications
Highly Durable and Highly Available (Multi-AZ)
10. How to think about EFS relative to EBS
Amazon EFS Amazon EBS PIOPS
Performance
Per-operation
latency
Low, consistent Lowest, consistent
Throughput
scale
Multiple GBs per second Single GB per second
Characteristics
Data availability
/ durability
Stored redundantly across multiple AZs Stored redundantly in a single AZ
Access
1 to 1000s of EC2 instances, from
multiple AZs, concurrently
Single EC2 instance in a single AZ
Use cases
Big Data and analytics, media processing
workflows, content management, web
serving, home directories
Boot volumes, transactional and
NoSQL databases, data warehousing
& ETL
11. Do you need an EFS file system?
If you have an application running on EC2 or use case that
requires a file system…
AND
• Requires multi-attach OR
• GBs/s throughput OR
• Multi-AZ availability/durability OR
• Requires automatic scaling (grow/shrink) of storage
12. Access your EFS file system via AWS Direct Connect
Direct Connect EFS in your Amazon VPCOn-premises servers
13. Direct Connect support addresses three of the
scenarios
Bursting
Migration
Tiering
Backup / DR
14. What customers are using EFS for today
Web serving
Content management
Analytics
Media and Entertainment
workflows
Workflow management
Home directories
Container storage
Database backups
15. Where is EFS available today?
• US West (Oregon)
• US East (N. Virginia)
• US East (Ohio)
• EU (Ireland)
• Asia Pacific (Sydney)
More coming soon!
16. What to expect from this session
Recognize why and when to use Amazon EFS
Understand key technical/security concepts
Learn how to leverage EFS’s performance
Review EFS’s economics
17. EFS’s Design
AVAILABILITY ZONE 1
REGION
AVAILABILITY ZONE 2
AVAILABILITY ZONE 3
VPC
EC2
EC2
EC2
EC2
File system
Data can be accessed from any AZ in the Region while maintaining full consistency
18. What is a file system?
• The primary resource in EFS for storing
files and directories
• Regional construct
• 10 per account per region (soft)
• Default throughput limit 3 GB/s (soft)
Accessible from EC2
• VPC, EC2-Classic via ClassicLink
• Accessible from on-premises
• AWS Direct Connect
19. What is a mount target?
• To access your file system within
a VPC, you create mount targets
in the VPC
• A mount target is an NFS endpoint
that lives in your VPC
• A mount target has an IP address
and a DNS name you use in your
mount command
• A mount target is highly available
AVAILABILITY ZONE 1
REGION
AVAILABILITY ZONE 2
AVAILABILITY ZONE 3
VPC
EC2
EC2
EC2
EC2
Mount
target
21. Mount an EFS File System
Launch EC2 instance from EC2 Console
Connect to the instance
Make a directory
Mount EFS file system
Query disk file system & file system table
• df; df -hT; df -h -t nfsv4; mount -t nfsv4
mount –t nfs4 –o nfsvers=4.1
[file system DNS name]:/
/[user’s target directory]
22. Recommended kernel version and NFS mount options
Kernel
version
Use Linux kernel 4.0+ (e.g., Amazon Linux 2016.03.0, Ubuntu
15.10 or 16.04)
Mount
options
Mount via NFSv4.1
Specify 1MB read/write buffers (“rsize”/”wsize”)
Ensure operations are asynchronous
Recommend the following mount options:
-o nfsvers=4.1,
rsize=1048576,wsize=1048576,hard,
timeo=600,retrans=2,async
23. Resources for Amazon EFS
Tags
• Typical key-value pair
• Create & associate tag with file system
• Up to 50 tags per file system
24. Resources for Amazon EFS
Mount Targets
• One or more per file system
• Create in a VPC Subnet
• One per Availability Zone
• Must be in the same VPC
25. Resources for Amazon EFS
Security Groups
• Standard VPC Security Group
• Same VPC as subnet
• Up to five per mount target
• Allow inbound TCP port 2049
from NFS clients
26. Several security mechanisms
Control network traffic to and from file systems (mount targets) by
using VPC security groups and network ACLs
Control file and directory access by using POSIX permissions
Control administrative access (API access) to file systems by
using AWS Identity and Access Management (IAM)
EFS supports action-level and resource-level permissions
27. The AWS Management Console, CLI, and SDK each allow
you to perform a variety of management tasks
Create a file system
Create and manage mount targets
Tag a file system
Delete a file system
View details on file systems in your AWS account
29. What to expect from this session
Recognize why and when to use Amazon EFS
Understand key technical/security concepts
Learn how to leverage EFS’s performance
Review EFS’s economics
30. Amazon EFS is designed for wide spectrum of
performance needs
High throughput and parallel I/O
Low latency and serial I/O
Genomics
Big data analytics
Scale-out jobs
Home directories
Content management
Web serving
Metadata-intensive
jobs
31. Amazon EFS has a distributed data storage design
EC2
EC2
…
EC2
EC2
…
EC2
EC2
…
• File systems distributed across
unconstrained number of servers
• Avoids bottlenecks/constraints of
traditional file servers
• Enables high levels of aggregate
IOPS/throughput
• Data also distributed across
Availability Zones (durability,
availability)
32. Choose the performance mode best suited to
your workload
Mode What’s it for? Advantages Tradeoffs When to use
General
purpose
(default)
Latency-sensitive
applications and
general-purpose
workloads
Lowest latencies
for file operations
Limit of 7,000 ops/sec Best choice for most
workloads
Max I/O Large-scale and data-
heavy applications
Virtually unlimited
ability to scale out
throughput/IOPS
Slightly higher
latencies
Consider if 10s (or
more) instances
access your file
system concurrently
33. Use the PercentIOLimit CloudWatch metric to determine
if you’re constrained by General Purpose mode
34. Burst Model
Based on size of file system
Starts w/ 2.1 TiB burst credits
Min. burst throughput 100 MiB/s
Baseline throughput 50 MiB/s per TiB
Burst throughput 100 MiB/s Per TiB
39. How to take advantage of EFS’s distributed architecture:
Parallelize
Parallelize via multiple threads and/or multiple instances
0
5000
10000
15000
20000
25000
30000
0 20 40 60 80 100 120 140 160
IOPS
# of Total Threads
Aggregate IOPS of parallel writes using
10 m4.xlarge instances
40. Use CloudWatch for a number of views of file
system performance
DataReadIOBytes
DataWriteIOBytes
MetadataIOBytes
TotalIOBytes
Measure throughput (‘Sum’ of bytes divided by
seconds in time period) or ops/sec (‘Data
Samples’ divided by seconds in time period)
BurstCreditBalance Monitor your burst credit usage over time to
ensure sufficient throughput capacity
PermittedThroughput Compare to actual throughput to determine
whether you’re being constrained by the burst
model
ClientConnections View the number of clients connected to your
file system
PercentIOLimit Determine whether you’re being constrained by
General Purpose mode (PercentIOLimit at or
near 100%)
41. Transferring media assets to EFS
• Size ranges from a few GB to
100+GB per file
• Data sources:
• Amazon S3
• Amazon EBS
42. Transferring many small files to EFS
• Size ranges from 64K to 256K
• Data sources:
• Amazon S3
• Amazon EBS
43. GNU parallel
• Tool for executing jobs in parallel
• Similar to xargs
• Replace loops in shell scripts
• GNU parallel makes sure output
from the commands is the same
output as you would get if you had
run the commands sequentially
https://www.gnu.org/software/parallel/
For people who live life in the parallel lane
44. As with copying from within EC2, using a script
based on the GNU parallel tool reduces transfer time
0
100
200
300
400
500
600
700
800
900
0 2 4 6 8 10 12 14 16 18
Time
Number of Threads
Total Time to Copy 26200 Files vs Number of
Threads
45. Use parallel threads – GNU parallel
# Create destination directory tree from source
find . -type d -print0 | parallel -j $N_THREADS -0 "mkdir -p
${DST_DIR}/{}" > /dev/null 2>&1
# Copy files
find . ! ( -type d ) -print0 | parallel -j $N_THREADS -0 "cp -
f {} ${DST_DIR}/{}"
47. Summary / tl;dr
• Parallelize everything
• Threads
• Instances
• Test, test, test
• Capture & analyze test data
• Check your burst credit earn/spend
rate when testing – ensure sufficient
amount of storage
• Less than $5/hr for 300 instances
48. What to expect from this session
Recognize why and when to use Amazon EFS
Understand key technical/security concepts
Learn how to leverage EFS’s performance
Review EFS’s economics
49. Operating your own multi-attach file storage on
the cloud is complex and expensive
Use an NFS
server or shared
file layer
Replicate EBS
volumes (1 per
EC2 instance)
Substantial management overhead (sync data, provision
and manage volumes)
Costly (one volume per instance)
Complex to set up and maintain
Scale challenges
HA challenges
Costly (compute + storage)
50. Do It Yourself – Cost and Complexity
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
51. EFS TCO example
Let’s say you need to store ~500 GB and require high availability and durability
Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization)
and fully replicate the data to a second Availability Zone for availability/durability
Example comparative cost:
Storage (2x 600 GB EBS gp2 volumes): $120 per month
Compute (2x m4.xlarge instances): $350 per month
Inter-AZ data transfer costs (est.): $129 per month
Total $599 per month
EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges
52. EFS: Simple and Fully Managed
NFS
Clients
NFS
Clients
NFS
Clients
Mount
Target
Single Namespace
Mount
Target
Mount
Target
53. EFS Economics
No minimum commitments or up-front fees
No need to provision storage in advance
No other fees, charges, or billing dimensions
Price: $0.30/GB-Month (US Regions)
$0.33/GB-Month (EU Ireland)
$0.36/GB-Month (AP Sydney)