AWS Cloud Resilience
Cloud resilience refers to the ability for an application to resist or recover from disruptions, including those related to infrastructure, dependent services, misconfigurations, transient network issues, and load spikes. Cloud resilience also plays a critical role in an organization’s broader business resilience strategy, including the ability to meet digital sovereignty requirements.
Resilient applications are those built with high availability—the percentage of time the application is available for use—and also those with a disaster recovery or continuity of operations plan in place.
Millions of customers trust that AWS is the right place to build and run their business and mission-critical applications with high availability.
AWS has made significant investments in building and running the world’s most resilient cloud. We have designed a unique and highly available global infrastructure, built safeguards into our service design and deployment mechanisms, and instilled resilience into our operational culture. AWS also makes it easier for you to build and run resilient applications in the cloud, with a comprehensive set of purpose-built resilience services, solutions, architectural best practices, and guidance.
Benefits
Highest network availability
AWS delivers the highest network availability of any cloud provider and is the only cloud provider to offer three or more Availability Zones (AZs) in all Regions, providing more redundancy and better isolation to contain issues.
Comprehensive resilience services and guidance
AWS makes it easier for customers to design, build, and run highly available applications through its comprehensive portfolio of purpose-built resilience services, integrated resilience features, and expert guidance.
Unparalleled operational expertise
AWS has over 17 years of proven operational expertise and unmatched scale helping millions of customers in regulated and non-regulated industries meet their resilience requirements.
Use Cases
Designing and Building
Leverage the best practices in the Reliability and Operational Excellence Pillars from the AWS Well-Architected Framework to build resilient applications.
Evaluating and Testing
Continuously measure and test your workload performance against your resilience goals with AWS Resilience Hub and AWS Fault Injection Service.
Monitoring and Observability
Implement monitoring and observability services like Amazon CloudWatch to quickly detect, investigate, and remediate issues impacting your applications.
Failover and Failback
Use Amazon Application Recovery Controller, AWS Elastic Disaster Recovery, and AWS Backup to ensure your applications recover quickly.
Featured Services and Solutions
AWS Resilience Hub
Define, test, and track the resilience of your applications to ensure you are able to meet your recovery objectives.
AWS Fault Injection Service
Improve application performance, observability, and resilience through controlled fault injection experiments.
AWS Elastic Disaster Recovery
Minimize downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications.
AWS Backup
Protect data at scale using this cost-effective, fully managed, policy-based service.
Amazon Application Recovery Controller
Automate management and coordination of recovery for your applications across AWS AZs or Regions.
Amazon CloudWatch
Collect and visualize real-time logs, metrics, and event data in automated dashboards to streamline your infrastructure and application.
AWS Well-Architected
Build and run resilient applications with architectural and operational best practices and measure improvement over time.
AWS Trusted Advisor
Improve resilience of your AWS resources with automated resilience best practice checks.
AWS Health
Monitor the health of your AWS resources and take the necessary actions.
AWS Solutions
Leverage pre-built AWS Solutions, Partner Solutions, and resilience guidance in the AWS Solutions Library.
Customers
“At Broadridge, we have critical systems that can’t afford to be down. We developed an ‘always on’ program using AWS services to ensure we were having near-zero recovery time objectives and recovery point objectives.”
-Todd Peterson, Vice President of Broadridge
Featured Content
Resilience Lifecycle Framework
A continuous approach to resilience improvement
Improving the resilience posture of an application is not a one-time effort; it is a continuous process that should be incorporated into how you build and operate your applications. This whitepaper shares strategies, services, and mechanisms you can use to drive continuous resilience into your organization.