This document provides an overview and summary of Amazon Web Services (AWS) file storage options, including Amazon Elastic File System (Amazon EFS) and Amazon FSx for Windows File Server. It discusses the key features and use cases of each service. It also provides guidance on choosing the right file storage solution based on an application's needs and examples of how customers can use AWS file storage for specific workloads.
2. Agenda
Introduction to AWS file storage
What are Amazon Elastic File System (Amazon EFS) and Amazon FSx
for Windows File Server?
Key features of Amazon EFS and Amazon FSx
Deep dive on Amazon EFS and Amazon FSx
3. Your digital transformation is a journey
AWS meets you where you are today—and tomorrow
Security
Cost optimizationPerformance
Availability Modernization
EdgeData services
Data lake Analytics
Vertical solutionsAI/ML
Real time
Maximize your
business results
BusinessArchitecture
Increase agility and
ability to innovate
Improve fundamentals:
security, availability,
performance, and cost
Infrastructure
4. Fully managed cloud file systems
Amazon
EFS
Amazon FSx for
Windows File Server
Amazon FSx
for Lustre
File systems for business workloads
File system for
compute-intensive workloads
Linux-based workloads Windows-based workloads
Fully managed file
storage for Windows
Fully managed cloud-native file system
for Linux-based applications
Fully managed Lustre file system for
compute-intensive workloads
Compute-intensive workloads
AWS provides file system options that help you easily address
the diverse needs of your file-based applications and workloads
5. Manage hardware
Procure and purchase hardware
Set up storage servers and volumes
Detect and address hardware failures
Invest capital expenditure (capex)
Manage software
What “fully managed” means
What you no longer need to do
6. Highly reliable
Amazon EFS is a fully managed file system that is…
Amazon EFS: Network file system (NFS) evolved
Cost optimizedCloud native
10. Analytics
Media workflows
Lift-and-shift
enterprise applications
Web serving
Content management
Database backups
Home directories
Container storage
Application test
and development
Scale-out jobsMetadata-intensive jobs
High-throughput and parallel I/OLow-latency and serial I/O
Use cases for Amazon EFS
11. What is Amazon FSx for Windows File Server?
Broadly
accessible
Fully managed
Windows file storage
12. Amazon FSx for Windows File Server use cases
Home
directories
Line-of-business
applications
Web serving and
content management
Software
development
environments
NEW!
High availability
SQL Server
databases
Backup and
disaster
recovery
NAS lift-and-shift
NEW!
14. New: Amazon ECS and AWS Fargate support for
Amazon EFS
Simple: All Amazon EFS configuration is inside the
Amazon ECS task definition, and connectivity is handled
behind the scenes
Serverless: AWS Fargate tasks can now leverage shared
persistent storage
Secure: Access to file systems can be authorized by IAM,
and access to data can be controlled by Amazon EFS
access points
16. Amazon EFS IA: Storage class for infrequently accessed files
for an effective price as low as $0.08/GB per month*
Amazon EFS Infrequent Access (IA)
Automated
lifecycle
management
*Pricing in the US East (N. Virginia) Region
Cost
savings up
to 92%
No changes to existing
applications using
Amazon EFS
17. Enabling Amazon EFS lifecycle management
Infrequently
accessed files are
automatically
moved to
Amazon EFS IA
Enable lifecycle
management,
choose lifecycle
policy
All Amazon EFS
features are
supported with
Amazon EFS IA
Lifecycle policies can be configured to 7, 14, 30, 60, or 90 days since last access
18. Restricting EFS access using an IAM resource policy
NFS client
Stunnel
process
Kernel NFS
client
EFS mount helper
Orchestrates
Amazon EFS
(file system)
IAM
IAM credentials
{
"Statement" : {
"Effect" : "allow",
"Action" : "elasticfilesystem:Nfs*",
"Principal" : { "AWS": "myrole" }
}
}
19. FSx for Windows File Server deployment options
Replicates
data within an
Availability Zone
Continually monitors
and addresses
hardware failures
Single-AZ Multi-AZ
Replicates
data across
Availability Zones
Automatically fails
over across
Availability Zones
Replicates
data within an
Availability Zone
Continually monitors
and addresses
hardware failures
20. FSx for Windows File Server storage options
SSD HDD
Highest performance Lowest cost
Flexibility to choose throughput independent of file system size
21. Effective storage cost with data deduplication
Per GB-month
Typical savings from deduplication for general file shares is 50%–60%
HDD-based storage
SSD-based storage
Single-AZ Multi-AZ
22. Example TCO
Storage requirements
• 10 TB of storage
• With deduplication, 50% of storage needed
• Deployment type: Multi-AZ
• Storage type: HDD
File system component Total cost
Storage (Multi-AZ, HDD, 5 TB at $0.025/GB-mo) $128
Throughput capacity (16 MB/s at $4.50/MBps-mo) $72
Total cost (excluding backups) $200/month (or $0.02/GB-mo)
Backups (5 TB at $0.05/GB-mo) $256
Total cost (including backups) $456/month (or $0.04/GB-mo)
Throughput requirements
16 MB/s sustained, 100 MB/s burst
Backup requirements
Expected backup storage usage:
1x storage capacity
23. Data deduplication
Large Windows-based datasets often contain significant duplication,
which increases storage costs
User shares (home directories)
Multiple users have many copies or versions of a file
Software development shares
Most portions of binaries remain unchanged from build to build
Use data deduplication to reduce costs associated with duplicated data
Scenario Content Typical space savings
User documents Office documents, photos, music, and videos 30%–50%
Software
development shares
Software binaries, build files, and program symbols 70%–80%
General file shares Mix of the above 50%–60%
25. Amazon EFS: High availability for containers
Examples
Jira
Artifactory
Git
Jupyter
JupyterHub
Availability Zone Availability Zone
EFS mount
target
EFS mount
target
Amazon EFS
(file system)
Active Active
https://aws.amazon.com/blogs/storage/best-practices-for-using-amazon-efs-for-container-storage/
26. Shared storage for NFS-based scale-out applications
Examples
Machine learning
training (MXNet,
TensorFlow)
Analytics
Containerized
applications
Availability Zone Availability Zone
EFS mount
target
EFS mount
target
Amazon EFS
(file system)
27. Region
Instances
fs-0123456789.example.comA-F
VPC
Subnet 2
Availability Zone A
fs-9876543210.example.comN-Z
Availability Zone X
Subnet N
Namespace
server 2
example.comcorp
A-F
G-M
N-Z
3x read/write
performance
fs-5678901234.example.comG-M
Availability Zone B
Namespace
server 1
Subnet 1
Scaling out storage & performance with DFSN for Windows workloads
Demo: https://www.youtube.com/watch?v=s482kj_xMeE
28. Availability Zone 2Availability Zone 1
AWS Region
SQL Server
FCI primary
SQL Server
FCI secondary
Automatic failover
fs-0123456789.example.com
Amazon FSx: Support for SQL Server HA deployments
• Supports SMB transparent
failover (aka continuously
available shares)
• Use Amazon FSx to store
databases and logs for SQL
Server Always On Failover
Cluster Instance (FCI)
deployments
• No need to deploy,
manage, and pay license
fees for storage replication
software solutions
VPC
https://docs.aws.amazon.com/fsx/latest/WindowsGuide/sql-server.html
29. Secure and
compliant
One-time migrations
or ongoing transfers
AWS
integrated
Flexible
Multiple protocols (NFS, SMB) and
destinations (FSx for Windows File Server,
Amazon EFS, Amazon S3)
Fast
Parallelized transfer, 10 Gbps per agent,
scale-out with multiple agents
Automated
No scripts; does validation,
filtering, throttling, scheduling
AWS DataSync
Easily and efficiently transfer hundreds of terabytes and millions of files
30. Learn storage with AWS Training and Certification
45+ free digital courses cover topics related to cloud
storage, including
Resources created by the experts at AWS to help you build cloud storage skills
Classroom offerings, such as Architecting on AWS, feature
AWS expert instructors and hands-on activities
• Amazon S3
• AWS Storage Gateway
• Amazon S3 Glacier
• Amazon EFS
• Amazon EBS
Visit the storage learning path at https://aws.training/storage
Hi Everyone, Thank you so much for your time!
If you are interested in evaluating what it means to build a data lake on AWS, or on your way to building your analytics platform on AWS and just keen to explore some possible architectural patterns, and best practices, this session is for you.
My name is Kumar Nachiketa, a storage partner solutions architect at AWS. And today, I am going to discuss and demonstrate ways to build a data lake with Amazon S3 being a core component. I am super excited about sharing this with you.
How can you transform faster with AWS file storage solutions?
We see three common patterns on why customers choose AWS to transform their organization. In summary, customers are looking to transform their IT infrastructure, transform their application architectures, or transform their business results.
EFS:
Easily shared between multiple applications, instances, and on-premises servers simultaneously
Achieve petabyte scale from a distributed design that avoids the constraints imposed by traditional file servers
FSx for Windows:
Built on Windows Server with native support for Windows file system features you use today
SSD storage for high throughput, IOPS, and sub-millisecond latencies
FSx for Lustre:
Built on the highly popular, open source parallel file system Lustre
Process data at hundreds of GB of throughput per second, millions of IOPs, and sub-millisecond latencies. For scratch/short-lived or long-lived/self-healing workloads (recently introduced at re:Invent).
Probably something that almost everyone in the audience can relate to is that managing file servers is a lot of work.
Before the cloud, [Manage hardware]
Well, now we have the cloud. If managing file servers on AWS, you don’t need to worry about the hardware piece, but you still need to manage the software…
Let’s start by talking about what is Amazon Elastic File System (or Amazon EFS).
Amazon EFS is Network File System evolved.
It is a fully-managed shared file system, that is cloud native, highly reliable, and cost optimized.
Cloud native – simple, elastic, scalable, and integrated with cloud services like ECS, EKS, and SageMaker
Highly reliable – highly available and durable, secure, global footprint, 99.9% SLA
Cost optimized – no provisioning, no commitments, and built-in lifecycle management to optimize between an SSD-performance class and an infrequent access class that is 92% lower cost
Cloud Native
Grow & shrink on demand
No need to provision and manage infrastructure & capacity
Pay as you go, payonly for what you use
Shared access from on-premises, inter region, and cloud native applications
Integrated with various AWS computing models
Access concurrently from thousands of Amazon EC2 instances
Attach to containers launched by both Amazon ECS and EKS
Use with Amazon SageMaker notebooks
Highly Reliable/ Scalable
Stores data across three availability zones for high availability and durability
Amazon EFS isavailable in 19 regions
New regions recently added: Bahrain, Sao Paulo, Stockholm, Hong Kong
Grow up to petabytes
Performance modes for low latencies and maximum I/O
Throughput that scales with storage
Provisioned throughput available
Cost-optimized
And that’s exactly what we did. Amazon EFS Infrequent Access or EFS IA is a new storage class that can save customers up to 92% compared to the pre-existent EFS storage class, which we’re now calling EFS Standard.
Customers can use EFS IA to cost-effectively store larger amounts of data in their file systems, and expand their use of EFS to an ever wider set of applications.
Using EFS IA is super easy. It doesn’t require any changes to existing applications, as EFS provides a single file system namespace that transparently serves files from both storage classes – Standard and IA.
To get data into EFS IA, you enable a the Lifecycle Management capability for your file system that automatically moves files into IA, and by doing so, you can save up to 92% compared to the EFS Standard storage class.
Note: EFS-IA can be enabled for any existing EFS file system simply by selecting the lifecycle management option in the EFS console
Before we dive deeper into these core tenets, let’s talk about why Amazon EFS, lets talk about it’s use cases.
Cloud Native
Grow & shrink on demand
No need to provision and manage infrastructure & capacity
Pay as you go, payonly for what you use
Shared access from on-premises, inter region, and cloud native applications
Integrated with various AWS computing models
Access concurrently from thousands of Amazon EC2 instances
Attach to containers launched by both Amazon ECS and EKS
Use with Amazon SageMaker notebooks
Highly Reliable/ Scalable
Stores data across three availability zones for high availability and durability
Amazon EFS isavailable in 19 regions
New regions recently added: Bahrain, Sao Paulo, Stockholm, Hong Kong
Grow up to petabytes
Performance modes for low latencies and maximum I/O
Throughput that scales with storage
Provisioned throughput available
COST OPTIMIZD
And that’s exactly what we did. Amazon EFS Infrequent Access or EFS IA is a new storage class that can save customers up to 92% compared to the pre-existent EFS storage class, which we’re now calling EFS Standard.
Customers can use EFS IA to cost-effectively store larger amounts of data in their file systems, and expand their use of EFS to an ever wider set of applications.
Using EFS IA is super easy. It doesn’t require any changes to existing applications, as EFS provides a single file system namespace that transparently serves files from both storage classes – Standard and IA.
To get data into EFS IA, you enable a the Lifecycle Management capability for your file system that automatically moves files into IA, and by doing so, you can save up to 92% compared to the EFS Standard storage class.
Note: EFS-IA can be enabled for any existing EFS file system simply by selecting the lifecycle management option in the EFS console
Before we dive deeper into these core tenets, let’s talk about why Amazon EFS, lets talk about it’s use cases.
We also designed EFS to serve the vast majority of file-based workloads, covering a wide spectrum of performance needs.
This spectrum runs from highly parallelized, scale out jobs – those workloads that require the highest possible throughput – things like big data applications, and media workflows,
To single-threaded, latency-sensitive workloads, and everything in between.
So what is it?
FSx for Windows File Server provides fully managed, native Windows file systems…
…with a service that is deeply integrated with AWS
On April 8th we announced native integration between ECS and EFS, including Fargate. Compared to what you could do before, there are three main differences. First, using EFS with ECS is much simpler: you configure EFS inside your ECS task definition, and don’t worry at all about what happens under the hood. Next, for the first time you can use EFS from AWS Fargate, which means you now have persistence for serverless containers. Last, we’ve added new security capabilities, namely the ability to use IAM to authorize access to your file system based on your task role, and control access to data using EFS Access Points.
Now when you look at overall benefits of ECS and EFS working together, in addition to the simplicity, serverless, and security there are additional benefits.
Availability and Durability: Both EFS and ECS are regional services, which means they run across all availability zones in a region. With EFS, data is durable across at least 3 availability zones, and available from all zones. With ECS, clusters span as many AZs as you configure. The end result is your application can be scheduled across multiple AZs and share data as if they’re local to each other.
Elasticity: this should be of no surprise given that Elastic Container Service and Elastic File System have elastic in the name. This fact means that as your application needs to scale-out, this combination of services instantly provides additional compute and storage capacity. This means you always pay for what you use, and you don’t need to forecast usage, overprovision, or get slowed down when you run out of capacity.
[Our 1st file system launched in 2016 was Amazon Elastic File System (EFS).
Designed to provide a cloud-scale file system for the vast majority of Linux-based workloads. Today EFS serves 100,000’s of customers in all AWS regions.]
We built EFS to be cloud-scale (Elastic), simple (set and forget), cost-effective, performant.
On April 1 we announced General Purpose mode file systems support up to 35,000 read operations per second, a 400% increase from the previous limit of 7,000. Maximum write operations are unchanged at 7,000 per second. Link: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-elastic-file-system-announces-increase-in-read-operations-for-general-purpose-file-systems/
General Purpose mode (GP mode) is the default performance mode for Amazon EFS. It offers the lowest per-operation latency and is the recommended choice for most applications. Amazon EFS also offers the Max I/O performance mode which can scale to higher levels of aggregate throughput and supports over 500,000 operations per second with slightly higher metadata latencies than GP mode.
Amazon EFS IA: Storage class for infrequently accessed files for $0.025/GB per month* - assuming 80% of your files are colder for use with Amazon EFS IA, and 20% using EFS Standard, that equates to a BLENDED, or effective price of $0.08/GB-month!!
EFS IA -- 92% savings compared to EFS Standard
OK, so let’s talk a little bit about how EFS IA works at a high level.
First, you create a file system. EFS file systems transparently serve data from both storage classes, providing a common file system namespace, so you don’t have to worry about which of your files are actively used and which are infrequently accessed.
Next, you enable EFS Lifecycle Management for your file system. With EFS Lifecycle Management enabled, any files that aren’t accessed per the lifecycle policy (14, 30, 60 and 90 days) are automatically moved into the Infrequent Access storage class. So in having both these capabilities, the IA storage class and Lifecycle Management, we’re eliminating the need to manually manage your data to optimize for cost.
Today you can use IAM identity policies to control what EFS administrative APIs a user has access to.
IAM Authorization for NFS Clients will allow customers to manage NFS client access and permissions for an EFS file system using AWS IAM. It adds an additional layer of security on top of EFS’s current network-based access controls and provides often-requested security features such as root squashing, read-only access, and the ability to enforce TLS.
With IAM authorization for NFS clients, we are adding three actions that allow you to control NFS client access to your file systems, adding ClientMount (permission to mount as read-only), ClientWrite (permission to read and write), and ClientRootAccess (permission to read and write as the root user).
For those times you need more granular policies we are adding file system resource policies. These policies make it easy to specify which exact users have read, write, or root access to a particular file system.
On the one hand, today’s SSD devices provide more IOPS per dollar, more throughput per gigabyte, and lower latency than today’s HDD devices. On the other hand, continued density improvements in HDD technology drive the cost per gigabyte down, but also reduce the effective throughput per gigabyte.
And with HDD, we offer the LOWEST COST WINDOWS FILE STORAGE IN THE CLOUD. And offer other ways th e
We also two weeks ago announced data deduplication, removes portions of files that are redundant across your data set, typical savings for general file shares is 50-60%.
Call out with HDD $0.025 for MAZ = lowest cost Windows File Storage in the cloud (even before data dedup )
Works at the sub-file level
Uses post-processing optimization to minimize performance impact
Removes duplicated content and compresses common content
Use remote management PowerShell CLI on your file system to…
Enable/disable Data Deduplication
Customize schedule for deduplication jobs
Monitor how much savings you’re achieving with deduplication
Let’s take a deeper look into two key use cases for Amazon EFS- container storage and scale out workloads and some of the latest features
Stateful containers store data in shared storage
Scale container instances without reconfiguring the file system
No need to monitor or provision storage as your persistent storage grows
Supports rapid failover and event based provisioning
Proximity to other container and dev/ops services
For customers running analytics, EFS can be a great option. EFS is a POSIX compliant file system, so applications that can access data over NFS can utilize EFS with no code modification required. For example, if you are running analytics against an on premises or built in the cloud NFS file store, your application is seamlessly portable to EFS. Additionally, there are many data sources feeding analytics applications that understand how to write to a file system. This may be lab equipment in the healthcare and life sciences world, or machine data in manufacturing. For these applications, having a common interface where data can be written, transformed if needed, then analyzed provides ease of use and flexibility. Additionally, since EFS provides a scalable, decoupled datastore accessible from thousands of EC2 instances simultaneously, processing analytics jobs can be done quickly and efficiently providing businesses with faster time to insights from their data.
So FSX Windows file server server supports maximum up-to 64 TB filesystem size. However you can scale out the throughput by using Microsoft's Distributed File System (DFS) Namespaces. You can use DFS Namespaces to group file shares on multiple file systems into one common folder structure (a namespace) that you use to access the entire file dataset. DFS Namespaces can help you to organize and unify access to your file shares across multiple file systems.
Also, DFS Namespaces help to scale file data storage beyond what each file system supports (64 TB) for large file datasets—up to hundreds of petabytes.you can scale performance up to tens of gigabytes per second of throughput, with millions of IOPS, across hundreds of petabytes of Windows-based file data.
With Multi-AZ file systems, we now support a type of SMB share called a CA share. With CA shares…
AWS DataSync is a data transfer service that makes it easy for you to automate moving data between on-premises storage and Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync automatically handles many of the tasks related to data transfers that can slow down migrations or burden your IT operations, including running your own instances, handling encryption, managing scripts, network optimization, and data integrity validation.
Fast
You can use DataSync to transfer data at speeds up to 10 times faster than open source tools.
Automated
DataSync automates both the management of data transfer processes and the infrastructure required for high-performance, secure data transfer. The service also includes automatic encryption and data validation. All of this minimizes the in-house development and management otherwise needed for fast, reliable, and secure transfers.
One-time migrations or on-going transfers
The service enables one-time data migrations, recurring data processing workflows, and automated replication for data protection and recovery.
Flexible
DataSync supports flexible configurations to suit your specific needs, including bandwidth throttling, copying source permissions and metadata.
AWS integrated
Getting started with DataSync is easy: Deploy the DataSync agent on-premises, connect it to a file system or storage array, select Amazon EFS, Amazon S3, or Amazon FSx for Windows File Server as your AWS storage, and start moving data.
Secure & compliant
All of your data is encrypted in transit and at rest, and for each transfer, the service performs integrity checks. These checks ensure that the data written to your destination matches the data read from your source, validating consistency. DataSync also makes sure that your data arrives securely, intact, and ready to use.