● It’s a managed DB service for DB use SQL as a query language. ● It allows you to create databases in the cloud that are managed by AWS ○ Postgres ○ MySQL ○ MariaDB ○ Oracle ○ Microsoft SQL Server ○ Aurora (AWS Proprietary database) Advantage over using RDS versus deploying DB on EC2 ● RDS is a managed service: ○ Automated provisioning, OS patching ○ Continuous backups and restore to specific timestamp (Point in Time Restore)! ○ Monitoring dashboards ○ Read replicas for improved read performance ○ Multi AZ setup for DR (Disaster Recovery) ○ Maintenance windows for upgrades ○ Scaling capability (vertical and horizontal) ○ Storage backed by EBS (gp2 or io1) ● BUT you can’t SSH into your instances RDS Solution Architecture - EC2 RDS– Storage Auto Scaling ● Helps you increase storage on your RDS DB instance dynamically ● When RDS detects you are running out of free database storage, it scales automatically ● Avoid manually scaling your database storage ● You have to set Maximum Storage Threshold (maximum limit for DB storage) ● Automatically modify storage if: ○ Free storage is less than 10% of allocated storage ○ Low-storage lasts at least 5 minutes ○ 6 hours have passed since last modification ○ Useful for applications with unpredictable workloads ○ Supports all RDS database engines (MariaDB, MySQL,
● PostgreSQL, SQL Server, Oracle
RDS Read Replicas for read scalability ● Up to 5 Read Replicas
● Within AZ, Cross AZ or
Cross Region
● Replication is ASYNC, so reads are eventually consistent
● Replicas can be promoted
to their own DB
● Applications must update
the connection string to leverage read replicas RDS Read Replicas – Network Cost ● In AWS there’s a network cost when data goes from one AZ to another ● For RDS Read Replicas within the same region, you don’t pay that fee Amazon Aurora ● Aurora is a proprietary technology from AWS (not open sourced) ● PostgreSQL and MySQL are both supported as Aurora DB ● Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS ● Aurora storage automatically grows in increments of 10GB, up to 64 TB. ● Aurora costs more than RDS (20% more) – but is more efficient
NOTE: Not in the free tier
Aurora DB Cluster Features of Aurora ● Automatic fail-over ● Backup and Recovery ● Isolation and security ● Industry compliance ● Push-button scaling ● Automated Patching with Zero Downtime ● Advanced Monitoring ● Routine Maintenance ● Backtrack: restore data at any point of time without using backups Aurora Replicas - Auto Scaling Aurora – Custom Endpoints ● Define a subset of Aurora Instances as a Custom Endpoint ● Example: Run analytical queries on specific replicas ● The Reader Endpoint is generally not used after defining Custom Endpoints Aurora Serverless ● Automated database instantiation and auto- scaling based on actual usage ● Good for infrequent, intermittent or unpredictable workloads ● No capacity planning needed ● Pay per second, can be more cost-effective Backups RDS ● Automated backups: Aurora ○ Daily full backup of the database (during the ● Automated backups maintenance window) ○ Transaction logs are backed-up by RDS every ○ 1 to 35 days (cannot be disabled) 5 minutes ○ => ability to restore to any point in time (from ○ point-in-time recovery in that oldest backup to 5 minutes ago) ○ 1 to 35 days of retention, set 0 to disable timeframe automated backups
● Manual DB Snapshots ● Manual DB Snapshots
○ Manually triggered by the user ○ Retention of backup for as long as you want ○ Manually triggered by the user ○ Retention of backup for as long as you ● Trick: in a stopped RDS database, you will want still pay for storage. If you plan on stopping it for a long time, you should snapshot & restore instead Amazon ElastiCache Overview ● The same way RDS is to get managed Relational Databases… ● ElastiCache is to get managed Redis or Memcached ● Caches are in-memory databases with high performance, low latency ● Helps reduce load off databases for read intensive workloads ● AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups ElastiCache Solution Architecture - Cache ElastiCache – Redis vs Memcached DynamoDB ● Fully Managed Highly available with replication across 3 AZ ● NoSQL database - not a relational database ● Scales to massive workloads, distributed “serverless” database ● Millions of requests per seconds, trillions of row, 100s of TB of storage ● Fast and consistent in performance ● Single-digit millisecond latency – low latency retrieval ● Integrated with IAM for security, authorization and administration ● Low cost and auto scaling capabilities ● Standard & Infrequent Access (IA) Table Class DynamoDB Accelerator - DAX ● Fully Managed in-memory cache for DynamoDB ● 10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB tables ● Secure, highly scalable & highly available ● Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases DynamoDB – Global Tables ● Make a DynamoDB table accessible with low latency in multiple-regions ● Active-Active replication (read/write to any AWS Region) Redshift Overview ● Redshift is based on PostgreSQL, but it’s not used for OLTP ● It’s OLAP – online analytical processing (analytics and data warehousing) ● Load data once every hour, not every second ● 10x better performance than other data warehouses, scale to PBs of data ● Columnar storage of data (instead of row based) ● Massively Parallel Query Execution (MPP), highly available ● Pay as you go based on the instances provisioned ● Has a SQL interface for performing the queries ● BI tools such as AWS Quicksight or Tableau integrate with it Amazon EMR ● EMR stands for “Elastic MapReduce” ● EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data ● The clusters can be made of hundreds of EC2 instances ● Also supports Apache Spark, HBase, Presto, Flink… ● EMR takes care of all the provisioning and configuration ● Auto-scaling and integrated with Spot instances ● Use cases: data processing, machine learning, web indexing, big data… Amazon Athena ● Serverless query service to analyze data stored in Amazon S3 ● Uses standard SQL language to query the files ● Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto) ● Pricing: $5.00 per TB of data scanned ● Use compressed or columnar data for cost-savings (less scan) ● Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
● Exam Tip: analyze data in S3 using serverless SQL, use Athena
Amazon QuickSight ● Serverless machine learning-powered business intelligence service to create interactive dashboards ● Fast, automatically scalable, embeddable, with per-session pricing ● Use cases: ● Business analytics ● Building visualizations ● Perform ad-hoc analysis ● Get business insights using data ● Integrated with RDS, Aurora, Athena, Redshift, S3… DocumentDB ● Aurora is an “AWS-implementation” of PostgreSQL / MySQL … ● DocumentDB is the same for MongoDB (which is a NoSQL database) ● MongoDB is used to store, query, and index JSON data ● Similar “deployment concepts” as Aurora ● Fully Managed, highly available with replication across 3 AZ ● DocumentDB storage automatically grows in increments of 10GB, up to 64 TB. ● Automatically scales to workloads with millions of requests per seconds Amazon Neptune ● Fully managed graph database ● A popular graph dataset would be a social network ○ Users have friends ○ Posts have comments ○ Comments have likes from users ○ Users share and like posts… ● Highly available across 3 AZ, with up to 15 read replicas ● Build and run applications working with highly connected datasets – optimized for complex and hard queries ● Can store up to billions of relations and query the graph with milliseconds latency ● Highly available with replications across multiple AZs ● Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking Amazon QLDB ● QLDB stands for ”Quantum Ledger Database” ● A ledger is a book recording financial transactions ● Fully Managed, Serverless, High available, Replication across 3 AZ ● Used to review history of all the changes made to your application data over time ● Immutable system: no entry can be removed or modified, cryptographically verifiable ● 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL ● Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules Amazon Managed Blockchain ● Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. ● Amazon Managed Blockchain is a managed service to: ○ Join public blockchain networks
○ Or create your own scalable private network
● Compatible with the frameworks Hyperledger Fabric & Ethereum
AWS Glue ● Managed extract, transform, and load (ETL) service ● Useful to prepare and transform data for analytics ● Fully serverless service ● AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. ● AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives. ● For example, you can configure AWS Glue to initiate your ETL jobs to run as soon as new data becomes available in Amazon Simple Storage Service (S3). DMS – Database Migration Service ● Quickly and securely migrate databases to AWS, resilient, self healing ● The source database remains available during the migration ● Supports: ● Homogeneous migrations: ex Oracle to Oracle ● Heterogeneous migrations: ex Microsoft SQL Server to Aurora Databases & Analytics Summary in AWS ● Relational Databases - OLTP: RDS & Aurora (SQL) ● Differences between Multi-AZ, Read Replicas, Multi-Region ● In-memory Database: ElastiCache ● Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB) ● Warehouse - OLAP: Redshift (SQL) ● Hadoop Cluster: EMR ● Athena: query data on Amazon S3 (serverless & SQL) ● QuickSight: dashboards on your data (serverless) ● DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database) ● Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable) ● Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains ● Glue: Managed ETL (Extract Transform Load) and Data Catalog service ● Database Migration: DMS ● Neptune: graph database