Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Darryl S. Osborne
AWS - Storage Specialist Solutions Architect
July 27, 2017
Deep Dive on Amazon Elastic File System
SRV401
Bring everyone up to speed
Dive deep
Amazon Elastic File System (EFS)
Provides simple, scalable, highly available
and durable file storage in the cloud
Petabyte-scale file system is distributed
across an unconstrained number of storage
servers in multiple Availability Zones (AZs)
Elastic capacity automatically grows and
shrinks as you add and remove files
Amazon Elastic File System (EFS)
Standard file system interface and semantics
Shared storage
Highly available and highly durable
Consistent low latency
Strong read-after-write consistency
Elastic capacity
Fully managed
Do you need an EFS file system?
If you have an application running on EC2 or a use case
that requires a file system…
AND
• Requires multi-attach OR
• GBs/s throughput OR
• Multi-AZ availability/durability OR
• Requires automatic scaling (grow/shrink) of storage
What customers are using EFS for today
Web serving
Content management
Analytics
Media and entertainment
workflows
Workflow management
Home directories
Container storage
Database backups
Shared file solutions in the cloud… before EFS
Third-party software
Do it yourself
Third-party hardware in AWS
Direct Connect locations
Do it yourself – NFS architecture
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
Do it yourself – NFS architecture
 Launch, patch, monitor, & pay for EC2 instances
 Create, attach, monitor, & pay for provisioned EBS
volumes
 Create, maintain, and monitor Auto Scaling group
 Install, patch, monitor, & pay for* file system software
 Configure, maintain, monitor, & pay for file system data
intra/inter-Availability Zone replication
• IOPS for replication are still IOPS
 Configure DNS for client HA access to inter-Availability
Zone NFS fleet
Do it yourself
Do it yourself NFS architecture
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
Amazon EFS architecture
NFS
Clients
NFS
Clients
NFS
Clients
Mount
Target
Single Namespace
Mount
Target
Mount
Target
Do it yourself – Cost
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
Amazon EFS architecture
NFS
Clients
NFS
Clients
NFS
Clients
Mount
Target
Single Namespace
Mount
Target
Mount
Target
Hands-on: Create an EFS file system (Console)
Resources for Amazon EFS
File System
• Mount Targets
• Subnet ID
• Security Groups
• Tags
• Key-value pairs
Resources for Amazon EFS
File system
• Regional construct
• Default throughput limit 3 GB/s (soft)
• Metered size updates approx. every hour
• Two performance modes ( gp & maxIO )
• Accessible from EC2
• VPC, EC2-Classic via ClassicLink
• Accessible from on premises
• AWS Direct Connect
Resources for Amazon EFS
File System, cont.
• Scenarios for on premises via Direct Connect
Bursting
Migration
Tiering
Backup / DR
Resources for Amazon EFS
Mount Targets
• One or more per file system
• Create in a VPC subnet
• One per Availability Zone
• Must be in the same VPC
Resources for Amazon EFS
Subnet IDs
• Create mount target in one subnet per Availability Zone
• Mount target gets IP from subnet
• Automatic or static IP addresses
• IP addresses do not change
subnet-8d73b6e7
Resources for Amazon EFS
Security Groups
• Standard VPC Security Group
• Same VPC as subnet
• Up to five per mount target
• Allow inbound TCP port 2049
from NFS clients
Resources for Amazon EFS
Tags
• Typical key-value pair
• Create & associate tag with file system
• Up to 50 tags per file system
Mount EFS
NFSv4.0
NFSv4.1
Linux Kernel 4+
Recommended mount options
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,async
Mount using NFSv4.1 (default options)
Specify 1 MB read/write buffers
Hard mount
Timeout of 60 seconds (600 tenths of a second)
2 minor timeouts & retransmissions before major timeout
Ensure operations are asynchronous
Amazon EFS architecture
NFS
Clients
Mount
Target
Availability Zone
VPC subnet
Amazon EFS
EFS File System
fs-123456ab
Default
Default
Amazon EFS Architecture
NFS
Clients
Mount
Target
Availability Zone
VPC subnet
Amazon EFS
VPC subnet
EFS File System
fs-123456ab
Default
Default
Amazon EFS architecture
NFS
Clients
Mount
Target
Amazon EFS
Mount
Target
EFS File System
fs-123456ab
Default
Default
Default
Default
VPC subnet VPC subnet
Security
Control network traffic using VPC security
groups and network ACLs
Control file and directory access by using
POSIX permissions
Control administrative access (API access) to
file systems by using AWS Identity and Access
Management (IAM)
action-level and resource-level permissions
High throughput and parallel I/O
Low latency and serial I/O
Genomics
Big data analytics
Scale-out jobs
Home directories
Content management
Web serving
Metadata-intensive
jobs
Amazon EFS is designed for wide spectrum of
performance needs
EC2
EC2
…
EC2
EC2
…
EC2
EC2
…
• File systems distributed across
unconstrained number of servers
• Avoids bottlenecks/constraints of
traditional file servers
• Enables high levels of aggregate
IOPS/throughput
• Data also distributed across
Availability Zones (durability,
availability)
Amazon EFS - distributed data storage design
How to think about EFS perf relative to EBS
Amazon EFS Amazon EBS PIOPS
Performance
Per-operation
latency
Low, consistent Lowest, consistent
Throughput
scale
Multiple GBs per second Single GB per second
Characteristics
Data availability
/ durability
Stored redundantly across multiple
Availability Zones
Stored redundantly in a single
Availability Zone
Access
1 to 1000s of EC2 instances, from
multiple Availability Zones, concurrently
Single EC2 instance in a single
Availability Zone
Use cases
Big data and analytics, media processing
workflows, content management, web
serving, home directories
Boot volumes, transactional and
NoSQL databases, data warehousing
and ETL
Performance modes for different workloads
Mode What it is for Advantages Tradeoffs When to use
General
purpose
(default)
Latency-
sensitive
applications and
general-purpose
workloads
Lowest
latencies for file
operations
Limit of 7K
ops/sec
Best choice for
most workloads
Max I/O
Large-scale and
data-heavy
applications
Virtually
unlimited ability
to scale out
throughput /
IOPS
Slightly higher
latencies
Consider for
large scale-out
workloads
EFS CloudWatch Metric - PercentIOLimit
Determine whether you’re being constrained by General Purpose
mode (PercentIOLimit at or near 100%)
Burst Credit Model
Based on size of file system
Starts with 2.1 TiB burst credits
Min. burst throughput 100 MiB/s
Baseline throughput 50 MiB/s per TiB
Burst throughput 100 MiB/s Per TiB
Burst Credit Model Examples
File System
Size (GiB)
Baseline Aggregate
Throughput (MiB/s)
Burst Aggregate
Throughput (MiB/s)
Maximum Burst Duration
(Min/Day)
10 0.5 100 7.2
512 25 100 360
1024 50 100 720
4096 200 400 720
16384 800 1600 720
Burst Credit Model
Current throughput is
above baseline…
Decreasing
BurstCreditBalance
Throughput(MiB/s)
Time
Baseline
Current
Burst Credit Model
Current throughput is
below baseline…
Increasing
BurstCreditBalance
Throughput(MiB/s)
Time
Baseline
Current
I/O size impacts
throughput of
serialized
operations
4 KB 32 KB 256 KB 2 MB 16 MB
I/O size
Throughput
I/O Size Implication
Take advantage of
the EFS distributed
architecture
Parallelize via multiple threads and/or multiple instances
0
5000
10000
15000
20000
25000
30000
0 20 40 60 80 100 120 140 160
IOPS
# of Total Threads
Aggregate IOPS of parallel writes using
10 m4.xlarge instances
Parallelize
Previous scalability test
Small files – 300 instancesLarge files – 50 instances
Maximize EFS throughput
Not all EC2 instance types are created equal
• Select the appropriate EC2 instance type for the job
• Look at vCPU, memory, network performance, EBS-optimized, etc.
• Sample of EFS throughput of m4 instance family
• Max. throughput per EC2 instance 250 MB/s
m4.large
moderate
~60 MB/s
m4.xlarge
high
~ 95 MB/s
m4.2xlarge
high
~125 MB/s
m4.4xlarge
high
~210 MB/s
m4.10xlarge
10 gigabit
~250 MB/s
m4.16xlarge
20 gigabit
~250 MB/s
Maximize EFS throughput
Not all EBS volumes types are created equal
• Select the appropriate EBS volume type for the job
• Sample of max throughputs of EBS volume types
• Max. throughput per EC2 instance 250 MB/s
gp2
160 MiB/s
Io1
320 MiB/s
st1
500 MiB/s
sc1
250 MiB/s
Maximize EFS throughput
Not all file transfer utilities are created equal
• Select the appropriate utility for the job
Maximize EFS throughput
Select the appropriate EC2 instance / EBS volume type
demo
Select the best transfer tool for the job
demo
Use multiple threads
demo
Use multiple instances – parallelize – scale-out
demo
Maximize EFS throughput
Not all file transfer utilities are created equal
• Select the appropriate utility for the job
rsync cp mcp fpsync
cp+
GNU parallel
fpart+cpio+
GNU parallel
single-
threaded
single-
threaded
multi-
threaded
multi-
threaded
multi-threaded multi-threaded
Poor
(very chatty)
Good Better Better Better Best
Tools & Citations
 GNU Parallel - The Command-Line Power Tool
 http://www.gnu.org/s/parallel
 Author: Ole Tange
 fpart – sort file trees & pack them into partitions
 fpsync – wraps fpart & rsync
 https://github.com/martymac/fpart
 Author: Ganaël Laplanche
 mutil | mcp – multi-threaded drop-in replacement of cp
 https://github.com/pkolano/mutil
 Author: Paul Kolano (NASA)
EFS CloudWatch Metrics
• DataReadIOBytes
• DataWriteIOBytes
• MetaDataIOBytes
• TotalIOBytes
• BurstCreditBalance
• PermittedThroughput
• ClientConnections
• PercentIOLimit
Amazon
CloudWatch
EFS economics
No minimum commitments or up-front fees
No need to provision storage in advance
No other fees, charges, or billing dimensions
Price: $0.30/GB-Month (US Regions)
$0.33/GB-Month (EU Ireland)
$0.36/GB-Month (EU Frankfurt)
$0.36/GB-Month (AP Sydney)
EFS TCO example
Let’s say you need to store ~500 GB and require high availability and durability
Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization)
and fully replicate the data to a second Availability Zone for availability/durability
Example comparative cost:
Storage (2x 600 GB EBS gp2 volumes): $120 per month
Compute (2x m4.xlarge instances): $350 per month
Inter-AZ data transfer costs (est.): $129 per month
Total $599 per month
EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges
Where is EFS available today?
• US West (Oregon)
• US East (N. Virginia)
• US East (Ohio)
• EU (Ireland)
• Asia Pacific (Sydney)
• EU (Frankfurt)
More coming soon!
Key recommendations
• Test your application!
• Use General Purpose mode for lowest latency, Max-I/O for
scale-out
• Use Linux kernel version 4.0 or newer, mount via NFSv4.1
• To optimize, look for opportunities to:
• Aggregate I/O
• Perform async operations
• Parallelize
• Cache
• Don’t forget to check your burst credit earn/spend rate when
testing – ensure sufficient amount of storage
Key recommendations
• When accessing EFS, know the perf characteristics:
• Source
• Network
• Destination
• Not all EC2 instance types are created equal
• Not all EBS volume types are created equal
• Not all file transfer utilities are created equal
• Test, test, test !!!
Reference
AWS Loft EFS Hands-on Walk-through - https://bit.ly/awsloft2017
AWS 10-minute Tutorials - https://aws.amazon.com/getting-started/tutorials/
Amazon EFS Web page - https://aws.amazon.com/efs/
YouTube AWS Channel - https://www.youtube.com/user/AmazonWebServices
Reference Architecture - https://aws.amazon.com/architecture/
QuickStarts - https://aws.amazon.com/architecture/
qwikLABS - https://aws.qwiklabs.com/
Thank you!
Darryl Osborne
darrylo@amazon.com

More Related Content

Deep Dive on Amazon Elastic File System (Amazon EFS)

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Darryl S. Osborne AWS - Storage Specialist Solutions Architect July 27, 2017 Deep Dive on Amazon Elastic File System SRV401
  • 2. Bring everyone up to speed
  • 4. Amazon Elastic File System (EFS) Provides simple, scalable, highly available and durable file storage in the cloud Petabyte-scale file system is distributed across an unconstrained number of storage servers in multiple Availability Zones (AZs) Elastic capacity automatically grows and shrinks as you add and remove files
  • 5. Amazon Elastic File System (EFS) Standard file system interface and semantics Shared storage Highly available and highly durable Consistent low latency Strong read-after-write consistency Elastic capacity Fully managed
  • 6. Do you need an EFS file system? If you have an application running on EC2 or a use case that requires a file system… AND • Requires multi-attach OR • GBs/s throughput OR • Multi-AZ availability/durability OR • Requires automatic scaling (grow/shrink) of storage
  • 7. What customers are using EFS for today Web serving Content management Analytics Media and entertainment workflows Workflow management Home directories Container storage Database backups
  • 8. Shared file solutions in the cloud… before EFS Third-party software Do it yourself Third-party hardware in AWS Direct Connect locations
  • 9. Do it yourself – NFS architecture NFS Clients NFS Server Volume Volume NFS Clients NFS Server Volume Volume NFS Clients NFS Server Volume Volume
  • 10. Do it yourself – NFS architecture  Launch, patch, monitor, & pay for EC2 instances  Create, attach, monitor, & pay for provisioned EBS volumes  Create, maintain, and monitor Auto Scaling group  Install, patch, monitor, & pay for* file system software  Configure, maintain, monitor, & pay for file system data intra/inter-Availability Zone replication • IOPS for replication are still IOPS  Configure DNS for client HA access to inter-Availability Zone NFS fleet
  • 12. Do it yourself NFS architecture NFS Clients NFS Server Volume Volume NFS Clients NFS Server Volume Volume NFS Clients NFS Server Volume Volume
  • 14. Do it yourself – Cost NFS Clients NFS Server Volume Volume NFS Clients NFS Server Volume Volume NFS Clients NFS Server Volume Volume
  • 16. Hands-on: Create an EFS file system (Console)
  • 17. Resources for Amazon EFS File System • Mount Targets • Subnet ID • Security Groups • Tags • Key-value pairs
  • 18. Resources for Amazon EFS File system • Regional construct • Default throughput limit 3 GB/s (soft) • Metered size updates approx. every hour • Two performance modes ( gp & maxIO ) • Accessible from EC2 • VPC, EC2-Classic via ClassicLink • Accessible from on premises • AWS Direct Connect
  • 19. Resources for Amazon EFS File System, cont. • Scenarios for on premises via Direct Connect Bursting Migration Tiering Backup / DR
  • 20. Resources for Amazon EFS Mount Targets • One or more per file system • Create in a VPC subnet • One per Availability Zone • Must be in the same VPC
  • 21. Resources for Amazon EFS Subnet IDs • Create mount target in one subnet per Availability Zone • Mount target gets IP from subnet • Automatic or static IP addresses • IP addresses do not change subnet-8d73b6e7
  • 22. Resources for Amazon EFS Security Groups • Standard VPC Security Group • Same VPC as subnet • Up to five per mount target • Allow inbound TCP port 2049 from NFS clients
  • 23. Resources for Amazon EFS Tags • Typical key-value pair • Create & associate tag with file system • Up to 50 tags per file system
  • 25. Recommended mount options -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,async Mount using NFSv4.1 (default options) Specify 1 MB read/write buffers Hard mount Timeout of 60 seconds (600 tenths of a second) 2 minor timeouts & retransmissions before major timeout Ensure operations are asynchronous
  • 26. Amazon EFS architecture NFS Clients Mount Target Availability Zone VPC subnet Amazon EFS EFS File System fs-123456ab Default Default
  • 27. Amazon EFS Architecture NFS Clients Mount Target Availability Zone VPC subnet Amazon EFS VPC subnet EFS File System fs-123456ab Default Default
  • 28. Amazon EFS architecture NFS Clients Mount Target Amazon EFS Mount Target EFS File System fs-123456ab Default Default Default Default VPC subnet VPC subnet
  • 29. Security Control network traffic using VPC security groups and network ACLs Control file and directory access by using POSIX permissions Control administrative access (API access) to file systems by using AWS Identity and Access Management (IAM) action-level and resource-level permissions
  • 30. High throughput and parallel I/O Low latency and serial I/O Genomics Big data analytics Scale-out jobs Home directories Content management Web serving Metadata-intensive jobs Amazon EFS is designed for wide spectrum of performance needs
  • 31. EC2 EC2 … EC2 EC2 … EC2 EC2 … • File systems distributed across unconstrained number of servers • Avoids bottlenecks/constraints of traditional file servers • Enables high levels of aggregate IOPS/throughput • Data also distributed across Availability Zones (durability, availability) Amazon EFS - distributed data storage design
  • 32. How to think about EFS perf relative to EBS Amazon EFS Amazon EBS PIOPS Performance Per-operation latency Low, consistent Lowest, consistent Throughput scale Multiple GBs per second Single GB per second Characteristics Data availability / durability Stored redundantly across multiple Availability Zones Stored redundantly in a single Availability Zone Access 1 to 1000s of EC2 instances, from multiple Availability Zones, concurrently Single EC2 instance in a single Availability Zone Use cases Big data and analytics, media processing workflows, content management, web serving, home directories Boot volumes, transactional and NoSQL databases, data warehousing and ETL
  • 33. Performance modes for different workloads Mode What it is for Advantages Tradeoffs When to use General purpose (default) Latency- sensitive applications and general-purpose workloads Lowest latencies for file operations Limit of 7K ops/sec Best choice for most workloads Max I/O Large-scale and data-heavy applications Virtually unlimited ability to scale out throughput / IOPS Slightly higher latencies Consider for large scale-out workloads
  • 34. EFS CloudWatch Metric - PercentIOLimit Determine whether you’re being constrained by General Purpose mode (PercentIOLimit at or near 100%)
  • 35. Burst Credit Model Based on size of file system Starts with 2.1 TiB burst credits Min. burst throughput 100 MiB/s Baseline throughput 50 MiB/s per TiB Burst throughput 100 MiB/s Per TiB
  • 36. Burst Credit Model Examples File System Size (GiB) Baseline Aggregate Throughput (MiB/s) Burst Aggregate Throughput (MiB/s) Maximum Burst Duration (Min/Day) 10 0.5 100 7.2 512 25 100 360 1024 50 100 720 4096 200 400 720 16384 800 1600 720
  • 37. Burst Credit Model Current throughput is above baseline… Decreasing BurstCreditBalance Throughput(MiB/s) Time Baseline Current
  • 38. Burst Credit Model Current throughput is below baseline… Increasing BurstCreditBalance Throughput(MiB/s) Time Baseline Current
  • 39. I/O size impacts throughput of serialized operations 4 KB 32 KB 256 KB 2 MB 16 MB I/O size Throughput I/O Size Implication
  • 40. Take advantage of the EFS distributed architecture Parallelize via multiple threads and/or multiple instances 0 5000 10000 15000 20000 25000 30000 0 20 40 60 80 100 120 140 160 IOPS # of Total Threads Aggregate IOPS of parallel writes using 10 m4.xlarge instances Parallelize
  • 41. Previous scalability test Small files – 300 instancesLarge files – 50 instances
  • 42. Maximize EFS throughput Not all EC2 instance types are created equal • Select the appropriate EC2 instance type for the job • Look at vCPU, memory, network performance, EBS-optimized, etc. • Sample of EFS throughput of m4 instance family • Max. throughput per EC2 instance 250 MB/s m4.large moderate ~60 MB/s m4.xlarge high ~ 95 MB/s m4.2xlarge high ~125 MB/s m4.4xlarge high ~210 MB/s m4.10xlarge 10 gigabit ~250 MB/s m4.16xlarge 20 gigabit ~250 MB/s
  • 43. Maximize EFS throughput Not all EBS volumes types are created equal • Select the appropriate EBS volume type for the job • Sample of max throughputs of EBS volume types • Max. throughput per EC2 instance 250 MB/s gp2 160 MiB/s Io1 320 MiB/s st1 500 MiB/s sc1 250 MiB/s
  • 44. Maximize EFS throughput Not all file transfer utilities are created equal • Select the appropriate utility for the job
  • 45. Maximize EFS throughput Select the appropriate EC2 instance / EBS volume type demo Select the best transfer tool for the job demo Use multiple threads demo Use multiple instances – parallelize – scale-out demo
  • 46. Maximize EFS throughput Not all file transfer utilities are created equal • Select the appropriate utility for the job rsync cp mcp fpsync cp+ GNU parallel fpart+cpio+ GNU parallel single- threaded single- threaded multi- threaded multi- threaded multi-threaded multi-threaded Poor (very chatty) Good Better Better Better Best
  • 47. Tools & Citations  GNU Parallel - The Command-Line Power Tool  http://www.gnu.org/s/parallel  Author: Ole Tange  fpart – sort file trees & pack them into partitions  fpsync – wraps fpart & rsync  https://github.com/martymac/fpart  Author: Ganaël Laplanche  mutil | mcp – multi-threaded drop-in replacement of cp  https://github.com/pkolano/mutil  Author: Paul Kolano (NASA)
  • 48. EFS CloudWatch Metrics • DataReadIOBytes • DataWriteIOBytes • MetaDataIOBytes • TotalIOBytes • BurstCreditBalance • PermittedThroughput • ClientConnections • PercentIOLimit Amazon CloudWatch
  • 49. EFS economics No minimum commitments or up-front fees No need to provision storage in advance No other fees, charges, or billing dimensions Price: $0.30/GB-Month (US Regions) $0.33/GB-Month (EU Ireland) $0.36/GB-Month (EU Frankfurt) $0.36/GB-Month (AP Sydney)
  • 50. EFS TCO example Let’s say you need to store ~500 GB and require high availability and durability Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization) and fully replicate the data to a second Availability Zone for availability/durability Example comparative cost: Storage (2x 600 GB EBS gp2 volumes): $120 per month Compute (2x m4.xlarge instances): $350 per month Inter-AZ data transfer costs (est.): $129 per month Total $599 per month EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges
  • 51. Where is EFS available today? • US West (Oregon) • US East (N. Virginia) • US East (Ohio) • EU (Ireland) • Asia Pacific (Sydney) • EU (Frankfurt) More coming soon!
  • 52. Key recommendations • Test your application! • Use General Purpose mode for lowest latency, Max-I/O for scale-out • Use Linux kernel version 4.0 or newer, mount via NFSv4.1 • To optimize, look for opportunities to: • Aggregate I/O • Perform async operations • Parallelize • Cache • Don’t forget to check your burst credit earn/spend rate when testing – ensure sufficient amount of storage
  • 53. Key recommendations • When accessing EFS, know the perf characteristics: • Source • Network • Destination • Not all EC2 instance types are created equal • Not all EBS volume types are created equal • Not all file transfer utilities are created equal • Test, test, test !!!
  • 54. Reference AWS Loft EFS Hands-on Walk-through - https://bit.ly/awsloft2017 AWS 10-minute Tutorials - https://aws.amazon.com/getting-started/tutorials/ Amazon EFS Web page - https://aws.amazon.com/efs/ YouTube AWS Channel - https://www.youtube.com/user/AmazonWebServices Reference Architecture - https://aws.amazon.com/architecture/ QuickStarts - https://aws.amazon.com/architecture/ qwikLABS - https://aws.qwiklabs.com/