Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze your data for a fraction of the cost of traditional data warehouses.
In this webinar, you will learn how to easily migrate your data from other data warehouses into Amazon Redshift, efficiently load your data with Amazon Redshift's massively parallel processing (MPP) capabilities, and automate data loading with AWS Lambda and AWS Data Pipeline. You will also learn about ETL tools from our partners to extract, transform, and prepare data from disparate data sources before loading it into Amazon Redshift.
Learning Objectives:
Understand common patterns for migrating your data to Amazon Redshift
See live examples of the Copy command that fully parallelizes data ingestion
Learn how to automate the load process using AWS Lambda & AWS Data Pipleline
Techniques for real time data loading
Options for ETL tools from our partners
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
Nasdaq has extended its use of Amazon Redshift to include Amazon EMR and Amazon S3 in order to better manage storage and compute resources separately. Data is ingested into Redshift and then transformed and unloaded to S3. EMR is then used to convert the data to Parquet format and write it to S3 partitioned by date. The data in S3 is accessed using Presto with encryption at rest. Hive is used to manage schemas and partitions across data sources. Tools were developed to help with encryption, schema management, and data migrations between systems while maintaining security and performance.
This document provides an overview and use cases for Amazon Redshift, a fast, fully managed, petabyte-scale data warehouse service from Amazon Web Services. It summarizes Redshift's features including columnar storage, data compression, and massively parallel query processing. It also provides examples of how Redshift is used by companies to reduce costs, improve query performance, and scale their data warehousing needs. Specific use cases and customers of Redshift are highlighted.
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
Amazon DynamoDB is a fully managed NoSQL database service provided by AWS that provides fast and predictable performance with seamless scalability. It offers a flexible data model and reliable access patterns. With DynamoDB, users do not need to provision, operate, or scale their own database clusters and can instead pay only for the storage and throughput capacity they need.
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
In the last six month, we have set up Amazon Redshift to power our interactive data analysis at Pinterest. It has tremendously improved the speed of analyzing our data.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
In this workshop, you migrate a sample sporting event and ticketing database from Oracle or Microsoft SQL Server to Amazon Aurora or Postgre SQL using the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS). The workshop includes the migration of tables, indexes, procedures, functions, constraints, views, and more. We run SCT on a Amazon EC2 Windows instance--bring a laptop with Remote Desktop (or some other method of connecting to the Windows instance). Ideally, you should be familiar with relational databases, especially Oracle or SQL Server and PostgreSQL or Aurora, to get the most from this session. Additionally, attendees should be familiar with SCT and DMS. Familiarity with SQL Developer and pgAdmin III will be helpful but is not required.
Prerequisites:
- Participants should have an AWS account established and available for use during the workshop.
- Please bring your own laptop.
Redshift is a petabyte-scale data warehouse that is a lot faster, a lot less expensive and a whole lot simpler to use. How can you get your data into Amazon Redshift? In this webinar, hear from representatives of Attunity (Amazon Redshift Partner), and AWS as they present many of the options available for data integration. Whether your data is in an on premise platform or a cloud based database like DynamoDB, we will show you how you can easily load your data in to Re
dshift.
Reasons to attend: - Learn about best practices to efficiently integrate data into Redshift. - Attend Q&A session with Redshift experts
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and use work load management.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
• Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Who Should Attend:
• Data Warehouse Developers, Big Data Architects, BI Managers, and Data Engineers
Near Real-Time Data Analysis With FlyData FlyData Inc.
This document describes our products. FlyData makes it easy to load data automatically and continuously to Amazon Redshift. You can also refer to our HP ( http://flydata.com/ ) for more information.
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.
Take an in-depth look at data warehousing with Amazon Redshift and get answers to your technical questions. We will cover performance tuning techniques that take advantage of Amazon Redshift's columnar technology and massively parallel processing architecture. We will also discuss best practices for migrating from existing data warehouses, optimizing your schema, loading data efficiently, and using work load management and interleaved sorting.
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
In this chalk talk, we take a deep dive on Amazon Redshift architecture and the latest performance enhancements that give you faster insights into your data. We also cover Amazon Redshift Spectrum, a feature of Amazon Redshift that enables you to analyze data across Amazon Redshift and your Amazon S3 data lake to deliver unique insights not possible by analyzing independent data silos.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
Amazon Redshift is a data warehouse service that runs on AWS. It has a leader node that coordinates queries and compute nodes that store and process the data in parallel. The compute nodes can use either HDD storage optimized for large datasets or SSD storage optimized for fast queries. Data is stored in columns and compressed to reduce I/O. Queries are optimized using statistics on the data distribution, sort keys and other metadata. The EXPLAIN command and STL tables provide visibility into query plans and performance.
Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use work load management, tune your queries, and use Amazon Redshift's interleaved sorting features. Finally, learn how to use these best practices to give their entire organization access to analytic insights at scale.
Presented by: Alex Sinner, Solutions Architecture PMO, Amazon Web Services
Customer Guest: Luuk Linssen, Product Manager, Bannerconnect
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce you to SPICE - a Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
This document summarizes a presentation on Amazon Redshift. Redshift is a fully managed data warehouse service that makes it easy to analyze large amounts of data for less than $1,000 per terabyte per year. The presentation covers how to get started with Redshift, best practices for table design and data loading, using Redshift for analytics, and upgrading and scaling a Redshift data warehouse over time.
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
Amazon Redshift is a fast, simple, cost-effective data warehousing solution, and in this session, we look at the tools and techniques you can use to migrate your existing data warehouse to Amazon Redshift. We will then present a case study on Scholastic’s migration to Amazon Redshift. Scholastic, a large 100-year-old publishing company, was running their business with older, on-premise, data warehousing and analytics solutions, which could not keep up with business needs and were expensive. Scholastic also needed to include new capabilities like streaming data and real time analytics. Scholastic migrated to Amazon Redshift, and achieved agility and faster time to insight while dramatically reducing costs. In this session, Scholastic will discuss how they achieved this, including options considered, technical architecture implemented, results, and lessons learned.
The document provides an introduction to Amazon DynamoDB, a fully managed NoSQL database service. It discusses how DynamoDB provides fast and consistent performance at scale without the need to provision or manage infrastructure. It also demonstrates how to build a serverless web application using DynamoDB along with AWS Lambda and API Gateway.
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...Amazon Web Services
by Everett Dolgner, Business Development Manager, AWS
AWS offers a suite of tools to help you surmount limitations associated to data migration from on premise to the cloud. Attend this session to learn about moving data by using networks, roads, and AWS technology partners. We will also discuss how to move data into and out of the Cloud in batches, increments, and streams.
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
- TrueCar migrated their data warehouse from an on-premises Hadoop cluster to Amazon Redshift. They load clickstream, transactions, inventory, and lead data into Redshift for analytics and reporting.
- They use ETL tools like Talend and Hive to process data and load it into HDFS and S3, then load it into Redshift using a custom utility. The data is organized into schemas separating raw, user, and reporting data.
- Best practices for Redshift include designing tables for compression, sort keys, and distribution, managing cluster size and workloads over time, and vacuuming and analyzing tables regularly. TrueCar's migration to Redshift improved performance and reduced costs.
This document provides an overview and agenda for a presentation on Amazon DynamoDB. It discusses key concepts like tables, data types, partitioning, indexing and scaling in DynamoDB. It also provides best practices and examples for modeling different data scenarios like event logging, product catalogs, messaging apps and multiplayer games.
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
Learn how you can use Amazon ElastiCache to easily deploy a Memcached or Redis compatible, in-memory caching system to speed up your application performance. We show you how to use Amazon ElastiCache to improve your application latency and reduce the load on your database servers. We'll also show you how to build a caching layer that is easy to manage and scale as your application grows. During this session, we go over various scenarios and use cases that can benefit by enabling caching, and discuss the features provided by Amazon ElastiCache.
Amazon Redshift is a fully managed data warehouse service that makes it fast, simple and cost effective to analyze data using SQL and existing business intelligence tools. The document provides an overview of Amazon Redshift and its benefits including speed, low cost, security, scalability and ease of use. It also provides examples of how various companies use Redshift for big data analytics including analyzing social media firehoses, mobile usage and real-time IoT streaming data.
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAmazon Web Services
Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS cloud.
We will explain how Amazon Glacier works and walk through some best practices to get the most out of the service
We will also highlight how to choose between Amazon Glacier and Amazon S3’s Glacier storage option.
Learn more: http://aws.amazon.com/glacier/
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)Amazon Web Services
In this session, we fill you in about Amazon EFS, including an overview of this recently introduced service, its use cases, and best practices for working with it.
An overview of the Amazon ElastiCache managed service, with examples of how it can be used to increase performance, lower costs and augment other database services and databases to make things faster, easier and less expensive.
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...Amazon Web Services
Moving files over long distances, or terabyte and petabyte volumes of data into the cloud can be a sticking point for cloud migrations. Come learn how customers are using the latest Snowball to move large scale data stores to the cloud, while remaining compliant with existing regulations.
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
The document provides information about a webinar on getting started with AWS, including deploying a static website. It outlines the agenda which includes: watching a 15 minute presentation on AWS; watching a 25 minute demo of deploying a static website; and having 45-60 minutes to complete the demo independently. It then details the various sections of the webinar which cover creating an AWS account, enabling security features, using S3 buckets to host the website, configuring permissions, associating a domain name, and using CloudFront for acceleration.
DynamoDB is a scalable NoSQL database service provided by Amazon that allows developers to purchase throughput rather than storage. It automatically spreads data and traffic across servers and SSDs for predictable performance. While it does not automatically scale, administrators can request more throughput. DynamoDB integrates with other AWS services like EMR for Hadoop and Redshift for data warehousing.
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon Web Services
Amazon EC2 Systems Manager provides capabilities that enable automated configuration and ongoing management of systems at scale across Windows and Linux workloads running in Amazon EC2 or on-premises at no additional charge. It offers components like Run Command, State Manager, Inventory, Maintenance Windows, Patch Manager, Automation, and Parameter Store to remotely manage servers, define consistent configurations, gather inventory, schedule maintenance windows, automate patching, simplify deployments, and securely store parameters. Using these capabilities is expected to reduce the total cost of ownership for hybrid and cloud environments compared to traditional management tools.
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
In this session, we provide a peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns with our Redis and Memcached offerings and how customers have used them for in-memory operations to reduce latency and improve application throughput. During this session, we review ElastiCache best practices, design patterns, and anti-patterns.
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Amazon Web Services
To help prevent unexpected access to your AWS resources, it is critical to maintain strong identity and access policies. It is equally important to track and alert on changes to your AWS resources. In this tech talk, you will learn how to use AWS Identity and Access Management (IAM) to control access to your AWS resources and integrate your existing authentication system with AWS IAM. We will cover how you can deploy and control your AWS infrastructure using code templates, including change management policies with AWS CloudFormation. In addition, we will explore different options for managing both your AWS access logs and your Amazon Elastic Compute Cloud (EC2) system logs using Amazon CloudWatch Logs. We also will cover how to use these logs to implement an audit and compliance validation process using services such as AWS Config, AWS CloudTrail, and Amazon Inspector.
Learning Objectives:
• Understand the AWS Shared Responsibility Model.
• Understand AWS account and identity management options and configuration.
• Learn the concept of infrastructure as code and change management using AWS CloudFormation.
• Learn how to audit and log your AWS service usage.
• Learn about AWS services to add automatic compliance checks to your AWS infrastructure.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss Amazon EBS encryption and share best practices for Amazon EBS snapshot management. Throughout, we share tips for success.
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, DynamoDB Streams, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWSAmazon Web Services
This session explores some of the key features of Amazon Glacier, including security, durability, and configuration for storing compliance and regulatory data. It covers best practices for managing your cold data, including ingest, retrieval, and security controls. Other topics include: how to optimize storage, upload, and retrieval costs; how to identify the most applicable workloads; and recommended optimizations based on a few sample use cases from a number of industry verticals.
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools. NOTE: Make this more themed towards QuickSight as it applies to other AWS Big Data Services - Redshift, Athena, S3, RDS.
- Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service in the cloud. It uses massively parallel processing and columnar storage to enable fast queries on large data sets for a fraction of the cost of traditional data warehousing.
- Some key features include automatic scaling, continuous backups, integrated security and access controls, integration with other AWS services like S3 and DynamoDB, and simple point-and-click management.
- Customers are seeing significant improvements in performance, often 50-100x faster than alternatives like Hive, as well as large cost reductions of up to 80% compared to on-premises data warehousing.
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
Amazon Redshift is a fast, fully managed data warehousing service that allows customers to analyze petabytes of structured data, at one-tenth the cost of traditional data warehousing solutions. It provides massively parallel processing across multiple nodes, columnar data storage for efficient queries, and automatic backups and recovery. Customers have seen up to 100x performance improvements over legacy systems when using Redshift for applications like log and clickstream analytics, business intelligence reporting, and real-time analytics.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Amazon Web Services
In this session, we show you how to set the source Oracle database environment, the target PostgreSQL environment, and parameter group configuration. We also recommended database parameters to disable foreign keys and triggers. Finally, we discuss best practices for using AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) and show you how to choose the instance type and configure AWS DMS.
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
Speaker:
Shaun Pearce, AWS Solutions Architect
With AWS you can choose the right database technology and software for the job. Given the myriad of choices, from relational databases to non-relational stores, this session provides details and examples of some of the choices available to you. This session also provides details about real-world deployments from customers using Amazon RDS, Amazon ElastiCache, Amazon DynamoDB, and Amazon Redshift.
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. This presentation will give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
Steffen Krause, Technical Evangelist, AWS
Padraic Mulligan, Architect and Lead Developer and Mike McCarthy, CTO, Skillspage
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
On AWS you can choose from a variety of managed database services that save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We'll explain the fundamentals of Amazon RDS, a managed relational database service in the cloud; Amazon DynamoDB, a fully managed NoSQL database service; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We will cover how each service might help support your application, how much each service costs, and how to get started.
Learning Objectives:
• Overview of managed database services available on AWS
• How to combine them for high-performance cost effective architectures
• Learn how to choose between the AWS database services based on the use case
Who Should Attend:
• IT Managers, DBAs, Enterprise and Solution Architects, IT Managers, DBAs, Enterprise and Solution Architects, Devops Engineers and Developers
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
• Get an overview of managed database services available on AWS
• Learn how to combine them for high-performance cost effective architectures
• Learn how to choose between the AWS database services based on your use case
On AWS you can choose from a variety of managed database services that save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We'll explain the fundamentals of Amazon RDS, a managed relational database service in the cloud; Amazon DynamoDB, a fully managed NoSQL database service; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be economical. We will cover how each service might help support your application and how to get started.
In this session, we will introduce Amazon RedShift, a new petabyte scale data warehouse service. We'll walk through the basics of the Redshift architecture, launching a new cluster and run SQL queries across a large scale, public dataset. After demonstrating how easy it is to get started with RedShift, we will show how to visualize and query large scale datasets, running queries, reports, and analytics against millions of rows of records in just a few seconds.
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis
The document provides tips and tricks for configuring and optimizing SharePoint Server 2013. It discusses best practices for Windows Server, SQL Server, and SharePoint Server configuration. It also covers monitoring, patching, and client-side considerations. The key recommendations include using a load balancer for SharePoint farms, tuning Windows Server and SQL Server for performance, separating resource-intensive services like search onto dedicated servers, automating administrative tasks with PowerShell, and applying patches carefully with testing.
AWS ofrece una gran variedad de servicios de base de datos que se adaptan a los requisitos de su aplicación. Los servicios de bases de datos están totalmente administrados y se pueden implementar en cuestión de minutos con tan solo unos clics. Los servicios de AWS incluyen Amazon Relational Database Service (Amazon RDS), compatible con 6 motores de bases de datos comunes, Amazon Aurora, base de datos relacional compatible con MySQL con un desempeño 5 veces superior, Amazon DynamoDB, servicio de bases de datos NoSQL rápido y flexible, Amazon Redshift, almacén de datos a escala de petabytes, y Amazon Elasticache, servicio de caché en memoria compatible con Memcached y Redis. AWS también proporciona AWS Database Migration Service, un servicio que permite migrar las bases de datos a la nube de AWS de forma sencilla y rentable.
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity, automates time-consuming database administration tasks, and provides you with six familiar database engines to choose from: Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we will take a close look at the capabilities of Amazon RDS and explain how it works. We’ll also discuss the AWS Database Migration Service and AWS Schema Conversion Tool, which help you migrate databases and data warehouses with minimal downtime from on-premises and cloud environments to Amazon RDS and other Amazon services. Gain your freedom from expensive, proprietary databases while providing your applications with the fast performance, scalability, high availability, and compatibility they need.
A quick overview of Redshift and common use-cases. Followed by tools and links to performance tuning. How Redshift fits in the AWS data services. A list of key new features since last meetup in September 2016, including Redshift Spectrum that allows one to run SQL directly on your data sitting on Amazon S3. It also includes Redshift echosystem with data integration, bi, consultancy and data modelling partners.
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully-managed petabyte-scale data warehouse service, for less than $1,000 per TB per year. In this presentation, you'll get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Learn how, with just a few clicks in the AWS Management Console, you can set up with a fully functional data warehouse, ready to accept data without learning any new languages and easily plugging in with the existing business intelligence tools and applications you use today. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort. In this webinar, you will learn: • Understand what Amazon Redshift is and how it works • Create a data warehouse interactively through the AWS Management Console • Load some data into your new Amazon Redshift data warehouse from S3 Who Should Attend • IT professionals, developers, line-of-business managers
Similar to (ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
1) The document discusses building a minimum viable product (MVP) using Amazon Web Services (AWS).
2) It provides an example of an MVP for an omni-channel messenger platform that was built from 2017 to connect ecommerce stores to customers via web chat, Facebook Messenger, WhatsApp, and other channels.
3) The founder discusses how they started with an MVP in 2017 with 200 ecommerce stores in Hong Kong and Taiwan, and have since expanded to over 5000 clients across Southeast Asia using AWS for scaling.
This document discusses pitch decks and fundraising materials. It explains that venture capitalists will typically spend only 3 minutes and 44 seconds reviewing a pitch deck. Therefore, the deck needs to tell a compelling story to grab their attention. It also provides tips on tailoring different types of decks for different purposes, such as creating a concise 1-2 page teaser, a presentation deck for pitching in-person, and a more detailed read-only or fundraising deck. The document stresses the importance of including key information like the problem, solution, product, traction, market size, plans, team, and ask.
This document discusses building serverless web applications using AWS services like API Gateway, Lambda, DynamoDB, S3 and Amplify. It provides an overview of each service and how they can work together to create a scalable, secure and cost-effective serverless application stack without having to manage servers or infrastructure. Key services covered include API Gateway for hosting APIs, Lambda for backend logic, DynamoDB for database needs, S3 for static content, and Amplify for frontend hosting and continuous deployment.
This document provides tips for fundraising from startup founders Roland Yau and Sze Lok Chan. It discusses generating competition to create urgency for investors, fundraising in parallel rather than sequentially, having a clear fundraising narrative focused on what you do and why it's compelling, and prioritizing relationships with people over firms. It also notes how the pandemic has changed fundraising, with examples of deals done virtually during this time. The tips emphasize being fully prepared before fundraising and cultivating connections with investors in advance.
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
This document discusses Amazon's machine learning services for building conversational interfaces and extracting insights from unstructured text and audio. It describes Amazon Lex for creating chatbots, Amazon Comprehend for natural language processing tasks like entity extraction and sentiment analysis, and how they can be used together for applications like intelligent call centers and content analysis. Pre-trained APIs simplify adding machine learning to apps without requiring ML expertise.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
AC Atlassian Coimbatore Session Slides( 22/06/2024)apoorva2579
This is the combined Sessions of ACE Atlassian Coimbatore event happened on 22nd June 2024
The session order is as follows:
1.AI and future of help desk by Rajesh Shanmugam
2. Harnessing the power of GenAI for your business by Siddharth
3. Fallacies of GenAI by Raju Kandaswamy
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
2. Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
4. Data loading options
• Parallel upload to Amazon S3
• AWS Direct Connect
• AWS Import/Export
• Amazon Kinesis
• Systems integrators
Data Integration Systems Integrators
5. Amazon Redshift architecture
Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads,
backups, restores, resizes
Start at $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
6. Amazon Redshift is priced to analyze all your data
DS2 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.850 $ 3,725
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DC1 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.250 $ 13,690
1 Year Reservation $ 0.161 $ 8,795
3 Year Reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No upfront costs
Pay as you go
7. Common migration patterns
• Data from a variety of relational online transaction
processing (OLTP) systems structure lends itself to SQL
schemas
• Data from logs, devices, sensors,…data is less
structured
8. Structured data loading
• Data is often being loaded into another warehouse from
an existing ETL process
• Temptation is to “lift and shift” workload
• Resist temptation; instead consider:
• What do I really want to do?
• What do I need?
9. Ingesting less-structured data
• Some data does not lend itself to a relational schema
• Common pattern is to use Amazon EMR to:
• Impose structure
• Import into Amazon Redshift
• Other solutions are often home-grown scripting
applications
10. Loading data
• Load to an empty Amazon Redshift database
• Load changes captured in the source system to Amazon
Redshift
11. Truncate and load
This is by far the easiest option:
• Move the data to Amazon S3
• Multi-part upload
• Import/export service
• AWS Direct Connect
• COPY the data into Amazon Redshift, a table at a time
12. Load changes
• Identify changes in source systems
• Move data to Amazon S3
• Load changes:
• ‘Upsert process’
• Partner ETL tools
13. Partner ETL
• Amazon Redshift is supported by a variety of ETL
vendors
• Many simplify the process of data loading
• A variety of vendors offer a free trial of their products,
allowing you to evaluate and choose the one that suits
your needs
• Visit http://aws.amazon.com/redshift/partners
14. Upsert
• The goal is to insert new rows into and update changed
rows in Amazon Redshift
• Load data into a temporary staging table
• Join the staging table with production and delete the
common rows
• Copy the new data into the production table
• See Updating and Inserting New Data in the Amazon
Redshift Database Developer Guide
15. COPY command
• Set COMPUPDATE to ON when running on an empty
table
• Use the COPY command
• Each slice can load one file at a time
• Partition input files so all slices can load in parallel
• Use a manifest file
16. Use multiple input files to maximize throughput
• Use the COPY command
• Each slice can load one file at
a time
• A single input file means only
one slice is ingesting data
• Instead of 100 MB/s, you’re
getting only 6.25 MB/s
17. Use multiple input files to maximize throughput
• Use the COPY command
• You need at least as many
input files as you have slices
• With 16 input files, all slices
are working so you maximize
throughput
• Get 100 MB/s per node; scale
linearly as you add nodes
18. Primary keys and manifest files
• Amazon Redshift doesn’t enforce primary key
constraints:
• If you load data multiple times, Amazon Redshift won’t complain
• If you declare primary keys in your data manipulation language
(DML), the optimizer expects the data to be unique
• Use manifest files to control exactly what is loaded and
how to respond if input files are missing:
• Define a JSON manifest on Amazon S3
• Ensures that the cluster loads exactly what you want
20. - Data Architecture
- Success Criteria
- Solutions Evaluated
- Additional Benefits
- Big data Agility
- Summary
Agenda
21. 90+ M
Ad engagements/year
100
Operator partners
100+ Countries
6 Continents
Media Largest ad network
Engaging mobile audiences via Wi-Fi
Wi-Fi Largest operator
of airport wireless networks in the world
DAS
Largest operator
of independent indoor cellular networks
in the U.S.
Broadband
Largest provider
of wireless high-speed Internet & TV
for the military
1 Million+
Hotspots
Nearly
2000
Commercial locations
19
DAS Locations
Boingo: Reaching 1 Billion Consumers Annually
100+
Worldwide
22. Boingo on AWS
S3
Datawarehouse
Storage and
Content Delivery
Compute and
Networking Database
RDS
Admin and
Security Deployment App Services
Amazon EC2 AMI Elastic IP
VPC VPN Conn Gateway(s)
Route 53 Route
Table
ELB
Auto scaling ENI Lambda
EBS
Glacier
CloudFront
ElastiCache
MySQL DB
CloudWatch
Trusted Advisor
IAM
CloudTrail
Elastic Beanstalk
CloudFormation
OpsWorks
MFA Token
SQS
SQS
Oracle 11g(r2)
23. Data Architecture
SAP Data Services
Eng data
S3
Flat files
Database
Oracle RDS 11g(r2)
Front end Visualization
(Business Objects)
1. ETL 2. Data Storage 3. Reporting
24. Issues
• Data is growing which is making OLAP slow
• Inefficient Row based approach (mostly)
• Standard Oracle compression
• Mediocre IOPS
• Single DB server (no concurrency)
• Not enough memory (64GB)
• Administration
– Partitioning
– DB patches, updates, OS patches, updates
– Maintenance (backup, snapshots, replication)
– Recovery failure etc.
• Expensive (license, hardware, support etc.)
25. Success Criteria
What do we need?
• Memory (at least 256GB)
• Parallel Processing
• Plenty of IOPS
• Less Administration
• Low TCO
Growth rate:
• Currently at 15TB
• 2-3TB average growth per year
Nice to have
• Ingest any data type/store
• Realtime Streaming analysis
• Massive Parallel Processing
• Scale (up or down)
• Integrate any (& every) database
• Multiple levels of Security
• Smart Alerts and Monitoring
• Cost Effective
• Lesser (or zero) CAPEX
• Keep up with Industry
Security/Compliances
• Automated audit reporting
27. AWS Data Solutions
• Oracle
• SQL Server
• PostgreSQL
• MySQL
• Aurora (MySQL
compatible)
• Small and large scale
non-RDS
• Schemaless
• Using open source
memcached/Redis
• Works on any database
• Datawarehouse
• Petabyte scale
• Massive Parallel
processing
RDS NoSQL In Memory
DataWarehouse
Redshift
Fully Managed, No CAPEX, Highly secure, Scalable
• DAT202: Understanding Database Options on AWS (Wednesday, Oct 7, 11:00 AM - 12:00 PM, San Polo 3501B)
• DAT302 - Relational Database Management Systems in the Cloud: Deploying SQL Server on AWS (Thursday, Oct 8, 5:30 PM - 6:30 PM, San Polo 3501B)
• DAT303: Oracle on AWS and Amazon RDS: Secure, Fast and Scalable (Friday, Oct 9 9:00-10AM, Delfino 4102)
28. Redshift TCO
EaaS
Eng. Data
S3
Flat files
Redshift
Datawarehouse
Front end Visualization
(Business Objects)
1. ETL 2. Data storage 3. BI reports
- Cluster of 50 DB servers
- 100 CPU cores
- 8TB SSD storage
- 750GB Memory
- Self organizing Cluster(s)
- 160GB increments
Annual Cost: $48,500Annual Cost: ~ $6,500
Annual Cost: ~ $55,000
Database installs, patches, OS installs,
patches, backup, replication, server
maintenance, scaling, security etc.
Managed Service
30. Performance Results
7,200
2,700
15 15
Query Performance Data Load Performance
1 year of data
1 million records
Latencyinseconds
RedshiftExisting System
7,200
55,000
6500
Existing System Redshift
ETLannualcost
ETL
31. Migration and Ease Of Use
Database installs, patches, OS installs,
patches, backup, replication, server
maintenance, scaling, security etc.
Administration and Support
0 1 2 3 4
Other Systems
Redshift
Migration Time (in months)
2
4
32. TCO
Estimated Cluster
- Cluster of 50 DB servers
- 100 CPU cores
- 8TB SSD storage
- 750GB Memory
- Self organizing Cluster(s)
- 160GB increments
Actual Cluster
$48,500
$12,000
Savings:
• 40% for upto 1 year term
• 60% for upto 3 year term
Options:
• No upfront 20% *
• Partial upfront 41% - 73%
• All upfront 42% - 76%
Cancellation:
• Full refund within 7 days *
• Prorated refund within 30 days *
• Prorated refund within 90 days
Talend ($6500)
* For 1 year term RI
Python Scripts ($0)
Elasticity Reserved Instances ETL
- ISM208 - The Science of Saving with AWS Reserved Instances (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Delfino 4105)
33. 3. Subnets
Additional Benefits
1. Access Control
• “Deny All” DB cluster
• Firewall rules
• IAM management
2. VPC
• BYOIP
• Ingress access
• Extend to corporate
data center
Cloud
• MFA
• Encryption
• Transit : SSL with TLS v1.2
• Storage : Encryption at rest
• Further isolation inside VPC
• IAM management
• SEC302 - IAM Best Practices to Live By (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Palazzo K)
• NET201 - Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Titian 2201B)
• ARC403 - From One to Many: Evolving VPC Design (Wednesday, Oct 7, 2:45 PM - 3:45 PM, Palazzo N)
35. Monitoring and Alerts
Intrusion Detection
• DDoS
• MiTM
• IP Spoofing
• Packet Sniffing
• Port Monitoring
Service
• DVO303 - Scaling Infrastructure Operations with AWS Service Catalog, AWS Config, and AWS CloudTrail (Friday, Oct 9, 9:00 AM - 10:00 AM, Lido 3001B)
• ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency (Friday, Oct 9, 9:00 AM - 10:00 AM, Palazzo K)
36. Big Data Agility
Production Datawarehouse
- Cluster of 50 DB servers
- 100 CPU cores
- 8TB SSD storage
- 750GB Memory
- Self organizing Cluster(s)
- 160GB increments
Backup
QA Cluster
Predictive Analysis/Adhoc Cluster
Performance Cluster
< 30mins
< 5/hour
< $5/hour
< $5/hour
DAT311 - Large-Scale Genomic Analysis with Amazon Redshift (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Lando 4306)
DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift (Thursday, Oct 8, 4:15 PM - 5:15 PM, Palazzo C)
BDT401 - Amazon Redshift Deep Dive: Tuning and Best Practices (Thursday, Oct 8, 2:45 PM - 3:45 PM, Marcello 4506)
37. Summary
• (Very) Cost Efficient
• (Highly) Secure (Enterprise grade Encryption)
• Managed service (Administration)
• Quick(er) Migration time
• 167+ Security and Compliancy features
• Proved to work (NASDAQ, NASA, Financial Times, Pinterest etc.)
• Faster with better performance
• Future proof (Ecosystem, security, new services etc.)
• 2+ years on AWS
• Ease of use
ROI
38. Related Sessions
• DAT311 - Large-Scale Genomic Analysis with Amazon Redshift (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Lando 4306)
• DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift (Thursday, Oct 8, 4:15 PM - 5:15 PM,
Palazzo C)
• BDT401 - Amazon Redshift Deep Dive: Tuning and Best Practices (Thursday, Oct 8, 2:45 PM - 3:45 PM, Marcello 4506)
• DAT202: Understanding Database Options on AWS (Wednesday, Oct 7, 11:00 AM - 12:00 PM, San Polo 3501B)
• DAT302 - Relational Database Management Systems in the Cloud: Deploying SQL Server on AWS (Thursday, Oct 8, 5:30
PM - 6:30 PM, San Polo 3501B)
• DAT303: Oracle on AWS and Amazon RDS: Secure, Fast and Scalable (Friday, Oct 9 9:00-10AM, Delfino 4102)
• SEC302 - IAM Best Practices to Live By (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Palazzo K)
• NET201 - Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options (Wednesday, Oct 7, 1:30 PM -
2:30 PM, Titian 2201B)
• ARC403 - From One to Many: Evolving VPC Design (Wednesday, Oct 7, 2:45 PM - 3:45 PM, Palazzo N)
• DVO303 - Scaling Infrastructure Operations with AWS Service Catalog, AWS Config, and AWS CloudTrail (Friday, Oct 9,
9:00 AM - 10:00 AM, Lido 3001B)
• ISM208 - The Science of Saving with AWS Reserved Instances (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Delfino 4105)
• ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency (Friday, Oct 9, 9:00 AM - 10:00 AM, Palazzo K)
RedshiftDatabasesInfrastructureCost
41. OF CAR BUYERS INFLUENCED BY
EDMUNDS.COM
59%
*R. L. Polk & Co.
43. Edmunds.com
• 18M unique visitors a month
• 200M+ page views a month
• Over 10k dealer partners
• 14k+ API users
• Over 6M automotive
inventory
• Over 1M content pages
• Lots and lots of data
• Continuously growing data
• 24x7 real-time BI
• DWH in Amazon Redshift
• 32-node cluster
44. From unsustainable, painful operations to:
• Efficient, cost-effective cluster
• Squeak-free operations
• Happy customers
• Cost reduction (new system costs 1/5 of the old one)
Improvement
45. Challenges
• Painfully slow queries
• High system resource utilization
• Slow data loading
• Timeouts !
• …all in all, we were running into HUGE PROBLEMS
46. Lessons learned
• Know the system, the strengths, and the limitations
• Understand the end-to-end usage scenario
• Design the processes following Best Practices
• Invest in real-time monitoring
• Lift and shift may not be the best choice
• Let Enterprise Support and TAMs be your partners
• Monitor, monitor, and trend
47. The System, the infrastructure
• Syntactical differences (i.e., PostgreSQL 7 vs.
PostgreSQL 8)
• Architectural choices (i.e., columnar database)
• Transaction processing
• Historical data analysis, business intelligence
• Node type, cluster size
• Shared infrastructure vs. dedicated throughput
• The larger the cluster, the bigger the resizing effort
48. Make the up-front investment: Design
• Select the right sort key
• Timestamp, range filtering on column name, joins
• Compound sort key, interleaved sort key
• Measure query performance, system load, and vacuum
• Ensuring tables have a sort key alone helped us gain
20% performance
• Over 50% of our tables did not have a sort key
• Ensuring that the right sort key is assigned is the path to
winning
49. Make the upfront investment: Use cases
• Select the right distribution style
• Locate data faster
• Uniform load
• Less data movement
• A good distribution style ensures a healthy system
• Many of our tables did not have the right distribution style
50. Queries
• Select * is #1 performance killer
• Use WHERE clause on the primary sort column
• Watch out for queries that create “temporary tables”
• Long-running queries might impact downstream services
• Define constraints
51. VACUUM
• Run VACUUM frequently
• Run right after loading data
• Monitor vacuum time
52. Data loading
• Load data in sort key order
• Load using multiple files (1 MB to 1 GB)
• #files: Multiples of slices in cluster
• Use compression
• Use single COPY command
• S3 is your best friend
53. A closer look
• Each node is split into slices
• One slice per core
• Each slice is allocated
memory, CPU, and disk
space
• Each slice processes a
piece of the workload in
parallel
56. Monitoring
• Console/Amazon CloudWatch monitoring
• CPU, memory, processes
• Data distribution across slices
• Space used per table
• WLM query count, queue wait time, execution time
• Commit stats, top time-consuming queries
57. In closing
• Amazon Redshift is a great data warehousing platform
• Parting advice: Make investment in Best Practices
• Check out Redshift Utils