Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
David Stein, Business Development EBS
November 30, 2016
Case Study: How Zendesk and
Videology Modernized Their Big
Data Platforms on Amazon EBS
STG311
What to Expect from the Session
• How to architect big data processing platforms to scale to meet
growing demand while improving performance, availability, and cost
with Amazon EBS
• Learn how about new ST1 and SC1 Throughput Optimized EBS
volumes designed for big data workloads
• Overview of how Zendesk runs a large ELK (Elasticsearch,
Logstach, Kibana) on Amazon EC2 and EBS for their cloud-based
customer support platform
• Overview of how Videology runs a Hadoop architecture on EC2 and
EBS to ingest, process, and analyze logs for their converged
advertising solution
Amazon EFS
File
Amazon EBS
Amazon EC2
Instance Store
Block
Amazon S3 Amazon Glacier
Object
AWS storage is a platform
Data Transfer
AWS Direct
Connect
ISV
Connectors
Amazon
Kinesis
Firehose
AWS Storage
Gateway
Amazon S3
Transfer
Acceleration
AWS
Snowball
Amazon
CloudFront
Internet/VPN
EBS volume types
Hard disk drive
(HDD)
Solid state drive
(SSD)
EBS volume types
General Purpose
SSD
gp2
Provisioned IOPS
SSD
io1
Throughput Optimized
HDD
st1
Cold
HDD
sc1
SSD HDD
EBS volume types: throughput
Throughput
Optimized HDD
st1
Baseline: 40 MB/s per TB up to 500 MB/s
Capacity: 500 GB to 16 TB
Burst: 250 MB/s per TB up to 500 MB/s
Ideal for large-block, high-throughput sequential workloads
Cold HDD
sc1
EBS volume types: throughput
Baseline: 12 MB/s per TB up to 192 MB/s
Capacity: 500 GB to 16 TB
Burst: 80 MB/s per TB up to 250 MB/s
Ideal for sequential throughput workloads such as logging and backup
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kyle House, David Bernstein, Zendesk
November 30, 2016
Case Study: How Zendesk Modernized
Their Big Data Platforms on Amazon EBS
Inside Our New ELK Deployment
Zendesk builds software for better customer relationships. It empowers
organizations to improve customer engagement and better understand
their customers. More than 87,000 paid customer accounts in over 150
countries and territories use Zendesk products. Based in San Francisco.
What to Expect from the Session
• Discuss storage redesign, utilizing
new Amazon EBS volumes
• Talk through design choices
• Explain benefits of new storage
• model
• Cost benefits of “rightsizing” storage
ELK at Zendesk
Distributed database
Log ingestion/parsing
Beautiful visualizations
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
The Problem
- Operational headaches
- Encryption
- Data retention
- Cost too high
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
The Investigation
- User access patterns
- Performance requirements
- New EBS volume types
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
The Proposal
- Full usage of EBS with new volume types
- Create a tiered storage model
- Optimize instance types; decouple instances from
storage
Tiered storage
Hot (0-7 days)
General Purpose
SSD (gp2)
Warm (8-30 days)
Throughput
Optimized HDD (st1)
Cold (31-60 days) Cold HDD (sc1)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
Topology
VPN
gateway
3 x m4.large
esclient/esmaster
Proxy
Bastion
3 x m4.large
esclient/esmaster
gp2 roots
8 x c4.large
logindexers
8 x c4.large
logindexers
gp2 roots
gp2 roots
gp2 roots
gp2 roots +
11G (hot)
st1
35G (warm)
sc1
80g (cold)
10 x r3.2large
esdata
10 x r3.2large
esdata
gp2 roots +
11G (hot)
st1
35G (warm)
sc1
80g (cold)
Availability Zone
Availability Zone
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
Sparkleformation
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)
The Result
- Reduced operating costs by 50%
- Increased data retention 3x
- Predictable scaling model
• Storage allocation detached from instance count
- Increased data transport reliability
- Reduced operational overhead
- Increased cluster stability
49% Reduction 79% Reduction
Recommendations
- Identify data usage model before you build
- Find places where performance matters, and where
cost can be optimized
- Reduce over-provisioned storage/IOPS
- Utilize AWS managed services whenever possible
Thank you!
Up next in this session:
Videology
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Videology
Paul Frederiksen – Principal DevOps Engineer
David Ortiz – Senior Software Engineer
Videology Big Data Team
November 30, 2016
On the Rocky Road to EBS
Videology’s Journey to EBS-backed Big Data
What to Expect from the Session
• Intro to Videology
• Challenges
• Road to EBS-backed cluster
• Happy engineers
Videology overview
Founded:
2007 by Scott Ferber, co-founder of Advertising.com, which sold to
AOL Time Warner in 2004 for $497 Million
Corporate
Headquarters:
New York, NY
Operations:
• Operating in 28 Global Markets
• Key Offices – New York, Baltimore, Toronto, London, Singapore
& Sydney
Employees: Approximately 380
Investors
NEA, Comcast Ventures, Harbourvest, Catalyst Investors,
Pinnacle Ventures, Valhalla Venture
Customers:
4,500 Active Users including Brand marketers, agencies, trading
desks, media companies, MVPD’s
Ecosystem
Integrations:
Open platform with 2200+ ecosystem integrations, including 1000+
media companies, 40 data providers, all major 3rd party
verification providers, and dozens of technology partners across
the media ecosystem
Recent Client
Wins:
Videology provides a
converged advertising solution
that is screen-agnostic,
ensuring unduplicated reach
with the right frequency
cadence to achieve
guaranteed results.
45
Industry accolades…
Videology was named Best Digital Video Ad Platform by Cynopsis Media at their
2015 Model D Awards.
“ ”
Videology was able to show that their platform drove brand lift that was on average 6X
higher than Nielsen's norms.
“ ”
Videology has the most sophisticated media optimizer to analyze the right
allocation of TV and online video to optimize reach and campaign cost.
“ ”
Hadoop overview
NameNode
ResourceManager
Gateway
DataNode
NodeManager
Where does big data processing fit in?
Original production
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
m3.xlarge 1 Jumpbox 4 15 80
m3.xlarge 1 Cloudera
Manager
4 15 80
m3.2xlarge 2 NN/RM 8 30 160
cc2.8xlarge 1 Service Master 32 60 3,200
cc2.8xlarge 30 Worker 32 60 3,200
I’ve got 99 problems and Hadoop is a few of them
Reliability
Scalability
Distcp
CPU to Memory Ratio
2015
Q2
2016
Q3
2016
Q4 and
beyond
Engaged Cloudera
for EBS support
Gave up on EBS
and tested D2s
New EBS to
the rescue!
Take advantage of
new hardware
CC2.8XL M4.10XLD2.8XL
Old
Not enough disk
Expensive
NirvanaLots of disk!
Not enough memory
Expensive
D2.8xl prototype
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
r3.large 1 Jumpbox 2 15.25 32
r3.large 1 Cloudera
Manager
2 15.25 32
r3.xlarge 2 NN/RM 8 30 160
r3.2xlarge 2 Service Master 8 61 160
d2.8xl 10 Worker 36 244 48,000
M4.10xlarge w/ sc1 prototype
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
r3.large 1 Jumpbox 2 15.25 32
r3.large 1 Cloudera
Manager
2 15.25 32
r3.xlarge 2 NN/RM 8 30 160
r3.2xlarge 2 Service Master 8 61 160
m4.10xlarge 18 Worker 40 160 4,000
M4.10xlarge w/ st1 prototype
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
r3.large 1 Jumpbox 2 15.25 32
r3.large 1 Cloudera Manager 2 15.25 32
r3.xlarge 2 NN/RM 8 30 160
r3.2xlarge 2 Service Master 8 61 160
m4.10xlarg
e
18 Worker 40 160 8,000
Problems no more!
• No more rebuilding Nodes
• 1 critical incident since switch vs. 5 in the year prior to release
• Get to play with kids instead of babysitting cluster
Engineering benefits - capacity
No longer restricted by
memory, we now have
resources to pursue other
tools to improve our reliability
and speed:
• Spark
• HBase
• Flafka
• Offloading processing from
Amazon Redshift to CDH
More resilient to log volume
increases
Can expand storage as
requirements changes
Financial benefits
$0.00
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
$30,000.00
Total Cost Cost by Utilization
Cc2 M4
$0.00
$0.01
$0.02
$0.03
$0.04
$0.05
$0.06
$0.07
$0.08
$0.09
$0.10
Cost to Process 1000 Requests
Cc2 M4
Thank you!
Questions?
Remember to complete
your evaluations!

More Related Content

AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Big Data Platforms on Amazon EBS (STG311)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. David Stein, Business Development EBS November 30, 2016 Case Study: How Zendesk and Videology Modernized Their Big Data Platforms on Amazon EBS STG311
  • 2. What to Expect from the Session • How to architect big data processing platforms to scale to meet growing demand while improving performance, availability, and cost with Amazon EBS • Learn how about new ST1 and SC1 Throughput Optimized EBS volumes designed for big data workloads • Overview of how Zendesk runs a large ELK (Elasticsearch, Logstach, Kibana) on Amazon EC2 and EBS for their cloud-based customer support platform • Overview of how Videology runs a Hadoop architecture on EC2 and EBS to ingest, process, and analyze logs for their converged advertising solution
  • 3. Amazon EFS File Amazon EBS Amazon EC2 Instance Store Block Amazon S3 Amazon Glacier Object AWS storage is a platform Data Transfer AWS Direct Connect ISV Connectors Amazon Kinesis Firehose AWS Storage Gateway Amazon S3 Transfer Acceleration AWS Snowball Amazon CloudFront Internet/VPN
  • 4. EBS volume types Hard disk drive (HDD) Solid state drive (SSD)
  • 5. EBS volume types General Purpose SSD gp2 Provisioned IOPS SSD io1 Throughput Optimized HDD st1 Cold HDD sc1 SSD HDD
  • 6. EBS volume types: throughput Throughput Optimized HDD st1 Baseline: 40 MB/s per TB up to 500 MB/s Capacity: 500 GB to 16 TB Burst: 250 MB/s per TB up to 500 MB/s Ideal for large-block, high-throughput sequential workloads
  • 7. Cold HDD sc1 EBS volume types: throughput Baseline: 12 MB/s per TB up to 192 MB/s Capacity: 500 GB to 16 TB Burst: 80 MB/s per TB up to 250 MB/s Ideal for sequential throughput workloads such as logging and backup
  • 8. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kyle House, David Bernstein, Zendesk November 30, 2016 Case Study: How Zendesk Modernized Their Big Data Platforms on Amazon EBS Inside Our New ELK Deployment
  • 9. Zendesk builds software for better customer relationships. It empowers organizations to improve customer engagement and better understand their customers. More than 87,000 paid customer accounts in over 150 countries and territories use Zendesk products. Based in San Francisco.
  • 10. What to Expect from the Session • Discuss storage redesign, utilizing new Amazon EBS volumes • Talk through design choices • Explain benefits of new storage • model • Cost benefits of “rightsizing” storage
  • 11. ELK at Zendesk Distributed database Log ingestion/parsing Beautiful visualizations
  • 15. The Problem - Operational headaches - Encryption - Data retention - Cost too high
  • 18. The Investigation - User access patterns - Performance requirements - New EBS volume types
  • 21. The Proposal - Full usage of EBS with new volume types - Create a tiered storage model - Optimize instance types; decouple instances from storage
  • 22. Tiered storage Hot (0-7 days) General Purpose SSD (gp2) Warm (8-30 days) Throughput Optimized HDD (st1) Cold (31-60 days) Cold HDD (sc1)
  • 32. Topology VPN gateway 3 x m4.large esclient/esmaster Proxy Bastion 3 x m4.large esclient/esmaster gp2 roots 8 x c4.large logindexers 8 x c4.large logindexers gp2 roots gp2 roots gp2 roots gp2 roots + 11G (hot) st1 35G (warm) sc1 80g (cold) 10 x r3.2large esdata 10 x r3.2large esdata gp2 roots + 11G (hot) st1 35G (warm) sc1 80g (cold) Availability Zone Availability Zone
  • 38. The Result - Reduced operating costs by 50% - Increased data retention 3x - Predictable scaling model • Storage allocation detached from instance count - Increased data transport reliability - Reduced operational overhead - Increased cluster stability
  • 39. 49% Reduction 79% Reduction
  • 40. Recommendations - Identify data usage model before you build - Find places where performance matters, and where cost can be optimized - Reduce over-provisioned storage/IOPS - Utilize AWS managed services whenever possible
  • 41. Thank you! Up next in this session: Videology
  • 42. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Videology Paul Frederiksen – Principal DevOps Engineer David Ortiz – Senior Software Engineer Videology Big Data Team November 30, 2016
  • 43. On the Rocky Road to EBS Videology’s Journey to EBS-backed Big Data
  • 44. What to Expect from the Session • Intro to Videology • Challenges • Road to EBS-backed cluster • Happy engineers
  • 45. Videology overview Founded: 2007 by Scott Ferber, co-founder of Advertising.com, which sold to AOL Time Warner in 2004 for $497 Million Corporate Headquarters: New York, NY Operations: • Operating in 28 Global Markets • Key Offices – New York, Baltimore, Toronto, London, Singapore & Sydney Employees: Approximately 380 Investors NEA, Comcast Ventures, Harbourvest, Catalyst Investors, Pinnacle Ventures, Valhalla Venture Customers: 4,500 Active Users including Brand marketers, agencies, trading desks, media companies, MVPD’s Ecosystem Integrations: Open platform with 2200+ ecosystem integrations, including 1000+ media companies, 40 data providers, all major 3rd party verification providers, and dozens of technology partners across the media ecosystem Recent Client Wins: Videology provides a converged advertising solution that is screen-agnostic, ensuring unduplicated reach with the right frequency cadence to achieve guaranteed results. 45
  • 46. Industry accolades… Videology was named Best Digital Video Ad Platform by Cynopsis Media at their 2015 Model D Awards. “ ” Videology was able to show that their platform drove brand lift that was on average 6X higher than Nielsen's norms. “ ” Videology has the most sophisticated media optimizer to analyze the right allocation of TV and online video to optimize reach and campaign cost. “ ”
  • 48. Where does big data processing fit in?
  • 49. Original production Instance Type Qty Role vCPU RAM (GB) Storage (GB) m3.xlarge 1 Jumpbox 4 15 80 m3.xlarge 1 Cloudera Manager 4 15 80 m3.2xlarge 2 NN/RM 8 30 160 cc2.8xlarge 1 Service Master 32 60 3,200 cc2.8xlarge 30 Worker 32 60 3,200
  • 50. I’ve got 99 problems and Hadoop is a few of them Reliability Scalability Distcp CPU to Memory Ratio
  • 51. 2015 Q2 2016 Q3 2016 Q4 and beyond Engaged Cloudera for EBS support Gave up on EBS and tested D2s New EBS to the rescue! Take advantage of new hardware
  • 52. CC2.8XL M4.10XLD2.8XL Old Not enough disk Expensive NirvanaLots of disk! Not enough memory Expensive
  • 53. D2.8xl prototype Instance Type Qty Role vCPU RAM (GB) Storage (GB) r3.large 1 Jumpbox 2 15.25 32 r3.large 1 Cloudera Manager 2 15.25 32 r3.xlarge 2 NN/RM 8 30 160 r3.2xlarge 2 Service Master 8 61 160 d2.8xl 10 Worker 36 244 48,000
  • 54. M4.10xlarge w/ sc1 prototype Instance Type Qty Role vCPU RAM (GB) Storage (GB) r3.large 1 Jumpbox 2 15.25 32 r3.large 1 Cloudera Manager 2 15.25 32 r3.xlarge 2 NN/RM 8 30 160 r3.2xlarge 2 Service Master 8 61 160 m4.10xlarge 18 Worker 40 160 4,000
  • 55. M4.10xlarge w/ st1 prototype Instance Type Qty Role vCPU RAM (GB) Storage (GB) r3.large 1 Jumpbox 2 15.25 32 r3.large 1 Cloudera Manager 2 15.25 32 r3.xlarge 2 NN/RM 8 30 160 r3.2xlarge 2 Service Master 8 61 160 m4.10xlarg e 18 Worker 40 160 8,000
  • 56. Problems no more! • No more rebuilding Nodes • 1 critical incident since switch vs. 5 in the year prior to release • Get to play with kids instead of babysitting cluster
  • 57. Engineering benefits - capacity No longer restricted by memory, we now have resources to pursue other tools to improve our reliability and speed: • Spark • HBase • Flafka • Offloading processing from Amazon Redshift to CDH More resilient to log volume increases Can expand storage as requirements changes
  • 58. Financial benefits $0.00 $5,000.00 $10,000.00 $15,000.00 $20,000.00 $25,000.00 $30,000.00 Total Cost Cost by Utilization Cc2 M4 $0.00 $0.01 $0.02 $0.03 $0.04 $0.05 $0.06 $0.07 $0.08 $0.09 $0.10 Cost to Process 1000 Requests Cc2 M4