Amazon Redshift is a fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using standard SQL and existing BI tools. It provides fast query performance by using columnar storage and data compression, automatic parallelization and distribution of data and queries, continuous backups and automated repairs. Customers use Redshift for analytics, business intelligence, and reporting on large datasets for use cases such as reducing costs, improving performance, and making more data available for analysis.
Report
Share
Report
Share
1 of 51
More Related Content
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
4. Petabyte scale
Massively parallel
Rela2onal data warehouse
Fully managed; zero admin
Amazon
Redshift
a lot faster
a lot cheaper
a whole lot simpler
5. Common Customer Use Cases
• Reduce costs by
extending DW rather than
adding HW
• Migrate completely from
existing DW systems
• Respond faster to
business
• Improve performance by
an order of magnitude
• Make more data
available for analysis
• Access business data via
standard reporting tools
• Add analytic functionality
to applications
• Scale DW capacity as
demand grows
• Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
11. Amazon Redshift Node Types
• Optimized for I/O intensive workloads
• High disk density
• On demand at $0.85/hour
• As low as $1,000/TB/Year
• Scale from 2TB to 1.6PB
DW1.XL: 16 GB RAM, 2 Cores
3 Spindles, 2 TB compressed storage
DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles
16 TB compressed, 2 GB/sec scan rate
• High performance at smaller storage size
• High compute and memory density
• On demand at $0.25/hour
• As low as $5,500/TB/Year
• Scale from 160GB to 256TB
DW2.L *New*: 16 GB RAM, 2 Cores,
160 GB compressed SSD storage
DW2.8XL *New*: 256 GB RAM, 32 Cores,
2.56 TB of compressed SSD storage
12. Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage • With row storage you do
unnecessary I/O
• To get total amount, you have
to read everything
ID
Age
State
Amount
123
20
CA
500
345
25
WA
250
678
40
FL
125
957
37
WA
375
13. • With column storage, you
only read the data you need
to evaluate the query
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
ID
Age
State
Amount
123
20
CA
500
345
25
WA
250
678
40
FL
125
957
37
WA
375
18. • Load in parallel from Amazon S3,
Elastic MapReduce, Amazon
DynamoDB or any SSH connection
• Data automatically distributed and
sorted according to DDL
• Scales linearly with number of nodes
Amazon S3 / DynamoDB / EMR / SSH
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Amazon Redshift parallelizes and distributes everything
• Query
• Load
•
• Backup/Restore
• Resize
19. • Backups to Amazon S3 are automatic,
continuous and incremental
• Configurable system snapshot retention period.
Take user snapshots on-demand
• Cross region backups for disaster recovery
• Streaming restores enable you to resume
querying faster
Amazon S3
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
20. • Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from node to node
• Only charged for source cluster
SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
21. SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
• Automatic SQL endpoint
switchover via DNS
• Decommission the source cluster
• Simple operation via Console or
API
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
22. Amazon Redshift is priced to let you analyze all your data
• Number of nodes x cost
per hour
• No charge for leader
node
• No upfront costs
• Pay as you go
DW1 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year Reservation $ 0.161 $ 8,794
3 Year Reservation $ 0.100 $ 5,498
23. Amazon Redshift is easy to use
• Provision in minutes
• Powerful Web Console
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups
24. Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated
– All blocks on disks and in Amazon S3
encrypted
– HSM Support
• No direct access to compute nodes
• Audit logging & AWS CloudTrail
integration
• Amazon VPC support
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon DynamoDB
Customer VPC
Internal
VPC
JDBC/ODBC
Leader
Node
Compute
Node
Compute
Node
Compute
Node
25. Amazon Redshift continuously backs up your
data and recovers from failures
• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of
data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental
– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and nodes
• Able to restore snapshots to any Availability Zone within a region
• Easily enable backups to a second region for disaster recovery
26. Amazon Redshift Feature Delivery
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
Unload Encrypted Files
DUB (4/25)
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
4 byte UTF-8 (7/18)
Statement Timeout (7/22)
SHA1 Builtin (7/15)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit Logging/
CloudTrail, Concurrency, Resize Perf.,
Approximate Count Distinct, SNS Alerts
(11/13)
SOC1/2/3 (5/8)
Sharing snapshots (7/18)
Resource Level IAM (8/9)
PCI (8/22)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
27. 50+ new features since launch
• Regions – N. Virginia, Oregon, Dublin, Tokyo, Singapore, Sydney
• Certifications – PCI, SOC 1/2/3
• Security – Load/unload encrypted files, Resource-level IAM, Temporary credentials, HSM
• Manageability – Snapshot sharing, backup/restore/resize progress indicators, Cross-region
snapshots
• Query – Regex, Cursors, MD5, SHA1, Time Zone, workload queue timeout, HLL
• Ingestion – S3 Manifest, LZOP/LZO, JSON built-ins, UTF-8 4byte, invalid character
substitution, CSV, auto datetime format detection, epoch, Ingest from SSH
47. Learning
• Plan your network in advance
• Different way of thinking
– Distributed data and column stores
– No stored procedures
– Small queries
• Work with AWS
– AWS invested in our success