Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BIG-DATA
  When your data sets become
 so large that you have to start
innovating how to collect, store,
      analyze and share it
Volume
3Vs   Velocity
      Variety
BIG-DATA
   The collection and
analysis of large amounts
     of data creates
 competitive advantage
BIGGER IS BETTER
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Online Population
    Mobile Phone
    Machine Data
1 Trillion Objects!
COLLECT | STORE | ANALYZE | SHARE
COLLECT | STORE | ANALYZE | SHARE
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
•   Stream data to Amazon using Apache Flume
    • Amazon S3
    • Amazon Elastic MapReduce
COLLECT | STORE | ANALYZE | SHARE
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Structure
               High                                      Low
       Large
                                                         S3
                                            EMR HDFS
                                 Hbase
Size                 Dynamo DB



               RDS
       Small                             Logs on App servers
ANALYZE
ORGINIZE | CLEAN | ENRICH | CONDENSE
DynamoDB Table:           On-Premise DB Table:
Daily-Orders              Customer-Demographics
NoSQL Table               SQL Table




                  RDS Table:
                  Targeting-Information
DynamoDB Table:                  On-Premise DB Table:
Daily-Orders                     Customer-Demographics
NoSQL Table                      SQL Table

S3://clickstream-data/            3rd Party Data:
           Apache Logs            Social Networking Information
                                  Accessed via web API



                         RDS Table:
                         Targeting-Information
S3 file:
s3://weekly-trend-data/
CSV Report


S3 file:
s3://monthly-trend-data/
CSV Report
AMAZON ELASTIC MAPREDUCE
Reduces complexity/cost of Hadoop Management
Integrates seamlessly with AWS Services
Leverages unmatched operational experience
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Hadoop on Elastic MapReduce
lowers the cost of developing and
  operating a distributed system.
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Amazon EMR and Amazon S3
                           S3
Recommendation Ad-hoc
      Engine    Analysis   Personalization

                              Prod Cluster
           S3                    (EMR)


                                 EMR




Data consumed in multiple ways
Prod Cluster
         (EMR)

S3
        EMR



     Query Cluster
        (EMR)


        EMR
         EMR

               EMR

                     EMR
DynamoDB




   S3
EMR   DynamoDB




S3
DynamoDB
ANALYZE SHARE
VISUALIZE | EXPLORE | DECIDE
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Big Data Use Cases
Digital Advertising

                      Web Analytics

                                      Log Processing


                                                       Data Warehousing
Social
Media/Advertising   Oil & Gas       Retail        Life Sciences   Financial Services      Security
                                                                                                         Network/Gaming




                                                                                                             User
                                                                                          Anti-virus
                                                                                                          Demographics
    Targeted                    Recommendations
                                                                     Monte Carlo
   Advertising                                                       Simulations




                    Seismic                         Genome
                                                                                       Fraud Detection    Usage analysis
                    Analysis                        Analysis



   Image and
                                  Transactions
     Video                          Analysis                        Risk Analysis
   Processing                                                                              Image
                                                                                         Recognition     In-game metrics
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Who is VivaKi?




           ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
Big Data Challenge for VivaKi




Enablement       Activation                                             Attribution




             ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
The Product Solution – Fluent from Razorfish
A digital marketing technology platform that provides marketers and agencies with a single,
integrated software application to target, distribute, and manage multi-channel digital campaigns and
experiences.




                                                                Marketing Central
                                      (Marketing Planning and Management, Team Collaboration and Workflow)


                                                               Experience Publishing
                                    (CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring)


                              Targeting                                                                     Insights
            (Multi-Channel Aware Segmentation and Targeting)                             (Analytics and Reporting, including Attribution)


                                                                 Data Warehouse
                              (Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management)


                                                         Amazon Cloud Infrastructure



                                               ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
VivaKi Technology Solution




           ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
Example: Atlas Cookie Level Data

                    Click Stream

                                                                                                                      Historical Click Stream




                                                         Fe
                                                                                                                               Data




                                                            e
User Browsing




                                                           d
                                   Ad Server Logs
  Session
                                                                                                                          Data Mining




                                                                                                   Apply
                                                                                               Customization




                                                                                                                        Segmentation &
                                                                                                                        Categorization
                                                                                                                          Algorithm



                                                    Customer Loyalty Data




           Ad Serving System                                                                   Cross Selling System



                                      ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
Example: Atlas Cookie Level Data
 Operational Specifics
                   Traditional Data Center Solution                                Amazon Cloud Solution
                   30 Processing Servers (HP Proliant DL-360)
                   3 SQL Servers (HP Proliant DL-580)                              EMR Cluster of up to 1000 EC2 Instances
  Configuration    10TB SAN Storage                                                200GB additional S3 storage per month
  Processing       2 to 30 hours                                                   reliably 9 hours
  Data Retention   90 days                                                         18 months
  System Cost      $5000/month                                                     $10000/month
  Personnel Cost   $15000/month                                                    $5500/month



 Business Impact
    no upfront investment in hardware
    no hardware procurement delay
    no additional operations staff was hired
    We completed development and testing of our first client project in six weeks. Our
     process is completely automated.
    our first client campaign experienced a 500% increase in their return on ad spend
                                        ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Better?
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Search Ads Restyled
Etsy on
Oprah                           Search Ads Restyled




                                      Hurricane
                                      Strikes

 Justin Beiber   New Cat Meme
 Sneezes
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
5%

95%
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Thank you!


aws.amazon.com/big-data
We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

More Related Content

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

  • 3. BIG-DATA When your data sets become so large that you have to start innovating how to collect, store, analyze and share it
  • 4. Volume 3Vs Velocity Variety
  • 5. BIG-DATA The collection and analysis of large amounts of data creates competitive advantage
  • 8. Online Population Mobile Phone Machine Data
  • 10. COLLECT | STORE | ANALYZE | SHARE
  • 11. COLLECT | STORE | ANALYZE | SHARE
  • 13. Stream data to Amazon using Apache Flume • Amazon S3 • Amazon Elastic MapReduce
  • 14. COLLECT | STORE | ANALYZE | SHARE
  • 16. Structure High Low Large S3 EMR HDFS Hbase Size Dynamo DB RDS Small Logs on App servers
  • 17. ANALYZE ORGINIZE | CLEAN | ENRICH | CONDENSE
  • 18. DynamoDB Table: On-Premise DB Table: Daily-Orders Customer-Demographics NoSQL Table SQL Table RDS Table: Targeting-Information
  • 19. DynamoDB Table: On-Premise DB Table: Daily-Orders Customer-Demographics NoSQL Table SQL Table S3://clickstream-data/ 3rd Party Data: Apache Logs Social Networking Information Accessed via web API RDS Table: Targeting-Information
  • 20. S3 file: s3://weekly-trend-data/ CSV Report S3 file: s3://monthly-trend-data/ CSV Report
  • 21. AMAZON ELASTIC MAPREDUCE Reduces complexity/cost of Hadoop Management Integrates seamlessly with AWS Services Leverages unmatched operational experience
  • 24. Hadoop on Elastic MapReduce lowers the cost of developing and operating a distributed system.
  • 26. Amazon EMR and Amazon S3 S3
  • 27. Recommendation Ad-hoc Engine Analysis Personalization Prod Cluster S3 (EMR) EMR Data consumed in multiple ways
  • 28. Prod Cluster (EMR) S3 EMR Query Cluster (EMR) EMR EMR EMR EMR
  • 29. DynamoDB S3
  • 30. EMR DynamoDB S3
  • 32. ANALYZE SHARE VISUALIZE | EXPLORE | DECIDE
  • 36. Big Data Use Cases
  • 37. Digital Advertising Web Analytics Log Processing Data Warehousing
  • 38. Social Media/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Anti-virus Demographics Targeted Recommendations Monte Carlo Advertising Simulations Seismic Genome Fraud Detection Usage analysis Analysis Analysis Image and Transactions Video Analysis Risk Analysis Processing Image Recognition In-game metrics
  • 41. Who is VivaKi? ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 42. Big Data Challenge for VivaKi Enablement Activation Attribution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 43. The Product Solution – Fluent from Razorfish A digital marketing technology platform that provides marketers and agencies with a single, integrated software application to target, distribute, and manage multi-channel digital campaigns and experiences. Marketing Central (Marketing Planning and Management, Team Collaboration and Workflow) Experience Publishing (CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring) Targeting Insights (Multi-Channel Aware Segmentation and Targeting) (Analytics and Reporting, including Attribution) Data Warehouse (Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management) Amazon Cloud Infrastructure ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 44. VivaKi Technology Solution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 45. Example: Atlas Cookie Level Data Click Stream Historical Click Stream Fe Data e User Browsing d Ad Server Logs Session Data Mining Apply Customization Segmentation & Categorization Algorithm Customer Loyalty Data Ad Serving System Cross Selling System ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 46. Example: Atlas Cookie Level Data  Operational Specifics Traditional Data Center Solution Amazon Cloud Solution 30 Processing Servers (HP Proliant DL-360) 3 SQL Servers (HP Proliant DL-580) EMR Cluster of up to 1000 EC2 Instances Configuration 10TB SAN Storage 200GB additional S3 storage per month Processing 2 to 30 hours reliably 9 hours Data Retention 90 days 18 months System Cost $5000/month $10000/month Personnel Cost $15000/month $5500/month  Business Impact  no upfront investment in hardware  no hardware procurement delay  no additional operations staff was hired  We completed development and testing of our first client project in six weeks. Our process is completely automated.  our first client campaign experienced a 500% increase in their return on ad spend ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 57. Etsy on Oprah Search Ads Restyled Hurricane Strikes Justin Beiber New Cat Meme Sneezes
  • 68. We are sincerely eager to hear your FEEDBACK on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.