Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Common Sense
Performance
Indicators


           Nick Gerner
         June 24, 2010
Goals
 Common Sense in the Cloud
     same as outside the cloud


1. Tune performance
2. Investigate issues
3. Visualize architecture
Nick Gerner
              www.nickgerner.com
                  @gerner

•   Formerly senior engineer at SEOmoz
•   Linkscape: index of the web for SEO
•   Lead data services
•   Developer
•   Back-end ops guy
SEOmoz
• Seattle-based Startup (~7 engineers)
• SEO Blog and Community
• Toolset and Platform
    OpenSiteExplorer.org
• 300TB/month processing pipeline
• 5 mil req/day API hits
SEOmoz Engineering
• 50 < nodes < 500
• AWS based since 2008
  – EC2 – linux root access to bare VM
  – S3 – networked disk
  – EBS – local disk I/O
  – ELB – load balancing as a service
SEOmoz Architecture
         Processing


The                  Raw
Web     Crawlers
         Crawlers
                    Storage
                                    Process   Prepare




                    Data Pipeline
SEOmoz Architecture
           API

      Memcache   App   Lighttpd
                                        Partners


      Memcache   App   Lighttpd   ELB
S3

                                        SEOmoz
      Memcache   App   Lighttpd          Apps
End-to-End
 Performance Indicators

Latency   Conversion
            Rate

                 DNS
    Time to
    On-load
               Web
              Object
              Count
Great
...but not the focus of this talk

 Latency     Conversion
               Rate

                      DNS
      Time to
      On-load
                   Web
                  Object
                  Count
Performance Indicators
   System                                App
Characteristics                         Stack
                                          Front-End

 CPU      Mem     Drives                 Middleware

                                           Caching
          Net
 Disk             Competes                Back-end
                    For

                               Database                WS-API


                             http://www.flickr.com/photos/dnisbet/3118888630/
Performance Indicators
   System
Characteristics                          App
                                        Stack
  CPU     Mem                          Front-End
                   Drives             Middleware

                                        Caching
                   Competes
                     For
                                        Back-end
           Net
  Disk                         Database          WS-API




                   http://www.flickr.com/photos/dnisbet/3118888630/
/proc
• System stats
• Per-process stats
• It all comes from here
    ...but use tools to see it
System Characteristics

      Load Average
          CPU
        Memory
          Disk
        Network
Load Average
• Combines a few things
• Good place to start
• Explains nothing


                http://www.flickr.com/photos/maple03/4176389418/
CPU
• Break out by process
• Break out user vs system
• User, System, I/O wait, Idle


                     http://www.flickr.com/photos/pacdog/213442876/
Why watch it?
•   Who's doing work
•   Is CPU maxed?
•   Blocked on I/O?
•   Compare to Load Average
                    http://www.flickr.com/photos/pacdog/213442876/
Memory
• Break out by Process
• Free, cached, used



                 http://www.flickr.com/photos/williamhook/3118248600/
Why watch it?
• Cached + Free = Available
• Do you have spare memory?
  – App uses
  – Memcache
  – DB cache

               http://www.flickr.com/photos/williamhook/3118248600/
Disk
• Read bytes/sec
• Write bytes/sec
• Disk utilization


                     http://www.flickr.com/photos/robfon/2174992215/
Why watch it?

• Is disk busy?
• When?
• Who's using it?


                    http://www.flickr.com/photos/robfon/2174992215/
Network
• Read bytes/sec
• Write bytes/sec
• Established connections


                     http://www.flickr.com/photos/ahkitj/20853609/
Why watch it?
• Max connections
      (~1024 is magic)
• Bandwidth is $$$
• When are you busy?
• SOA considerations http://www.flickr.com/photos/ahkitj/20853609/
v Perf Monitoring   Solution
FREE, in Apt

  1. data collection (collectd)
  2. data storage (rrdtool)
  3. dashboard management (drraw)
Perf Monitoring Architecture
 Multiple Clusters

Multiple Applications

  Nodes come up
   and go down




     Cluster
                        Cluster
Perf Monitoring Architecture




                      collectd agents

                       new nodes get
 Cluster               generic config

            Cluster      node names
                      follow convention
                      according to role
Perf Monitoring Architecture

                                      On its own server:
                                       collectd server
       Perf Monitoring                  Web server
                                          drraw.cgi
           Server
                                     allows connections
                                       from new nodes

                                   perf data backed up daily



 Cluster
                         Cluster
Perf Monitoring Architecture
                                     Happy Sysadmin

                                    Visibility into system
                                   history of performance

       Perf Monitoring
           Server




 Cluster
                         Cluster
Perf Dashboard Featurs

1. Summarize nodes/systems
2. Visualize data over time
3. Stack measurements
– Per-process
– Per-node
4. Handle new nodes
–
Batch Mode Dashboard
CPU
Memory
Disk
Network
Web Server Dashboard
Web Requests
mod_status
System-Wide Dashboard
Per-request
Graph Summary
•   cpu, mem, disk, net
•   over time
•   per node
•   per process
•   Through in relevant app measures
      e.g. per request stats:
       • req/sec
       • median latency/req
Ad-hoc Tools
• $ dstat -cdnml
    system characteristics
• $ iotop
    per-process disk I/O
• $ iostat -x 3
    detailed disk stats
• $ netstat -tnp
    fast, per-process TCP connection stats
Resources
• Perf Testing: What, How, Why
      http://www.nickgerner.com/2010/02/performance-testing-
      what-andhow-why/

• Perf Testing Case Study: OSE
      http://www.nickgerner.com/2010/01/performance-testing-
      case-study-ose/

• S3 Benchmarks
      http://twopieceset.blogspot.com/2009/06/s3-
      performance-benchmarks.html

• Perf Measurement
  – http://twopieceset.blogspot.com/2009/03/performance-
    measurement-for-small-and.html
  –
More Resources
•   http://www.collectd.org
•   http://oss.oetiker.ch/rrdtool/
•   http://web.taranis.org/drraw/
•   http://dag.wieers.com/home-made/dstat/

• $ man proc
    –
Q: Why? A: Perf Tuning
                     Test


Validate                                Measure




           Improve          Interpret
Q: Why? A: System Arch
• Better Devs/Ops
• Identify Bottlenecks
• Scaling
  Considerations
Q: Why? A: Issue Investigation
•   Machine Specific?
•   System Wide?
•   Which Component?
•   Timeline?
•   Cascading Failures?

More Related Content

What's hot (20)

RackN Physical Layer Automation Innovation
RackN Physical Layer Automation InnovationRackN Physical Layer Automation Innovation
RackN Physical Layer Automation Innovation
rhirschfeld
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017
Ari Kamlani
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Cloudera, Inc.
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
Alfresco tuning part2
Alfresco tuning part2Alfresco tuning part2
Alfresco tuning part2
Luis Cabaceira
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
Sergey Dzyuban
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Spark Summit
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
Miguel Rodriguez
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
Albert Chen
 
Splunk Java Agent
Splunk Java AgentSplunk Java Agent
Splunk Java Agent
Damien Dallimore
 
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Angelo Failla
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
Robert Metzger
 
Australian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackAustralian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStack
Matt Ray
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5
IndicThreads
 
Alfresco scalability and performnce
Alfresco   scalability and performnceAlfresco   scalability and performnce
Alfresco scalability and performnce
Paul Hampton
 
Stac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesStac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakes
iguazio
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 
Belvedere
BelvedereBelvedere
Belvedere
Colin Panisset
 
Ansible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network AutomationAnsible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network Automation
Cumulus Networks
 
RackN Physical Layer Automation Innovation
RackN Physical Layer Automation InnovationRackN Physical Layer Automation Innovation
RackN Physical Layer Automation Innovation
rhirschfeld
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017
Ari Kamlani
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Cloudera, Inc.
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
Sergey Dzyuban
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Spark Summit
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
Miguel Rodriguez
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
Albert Chen
 
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Angelo Failla
 
Australian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackAustralian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStack
Matt Ray
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5
IndicThreads
 
Alfresco scalability and performnce
Alfresco   scalability and performnceAlfresco   scalability and performnce
Alfresco scalability and performnce
Paul Hampton
 
Stac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesStac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakes
iguazio
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 
Ansible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network AutomationAnsible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network Automation
Cumulus Networks
 

Similar to Common Sense Performance Indicators in the Cloud (20)

Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)
Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)
Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)
VirtualTech Japan Inc.
 
Performance on a budget
Performance on a budgetPerformance on a budget
Performance on a budget
Dimitry Ushakov
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
Matt Ray
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 Summit
Matt Ray
 
Chef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdfChef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdf
OpenStack Foundation
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
Matt Ray
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 
TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15
Sumo Logic
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
Yue Tian
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Docker based Architecture by Denys Serdiuk
Docker based Architecture by Denys SerdiukDocker based Architecture by Denys Serdiuk
Docker based Architecture by Denys Serdiuk
Lohika_Odessa_TechTalks
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
Framework and Application Benchmarking
Framework and Application BenchmarkingFramework and Application Benchmarking
Framework and Application Benchmarking
Paul Jones
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Amazon Web Services
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
MamathaBusi
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
Amazon Web Services
 
Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)
Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)
Bare Metal Provisioning for Big Data - OpenStack最新情報セミナー(2016年12月)
VirtualTech Japan Inc.
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
Matt Ray
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 Summit
Matt Ray
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
Matt Ray
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 
TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15
Sumo Logic
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
Yue Tian
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Docker based Architecture by Denys Serdiuk
Docker based Architecture by Denys SerdiukDocker based Architecture by Denys Serdiuk
Docker based Architecture by Denys Serdiuk
Lohika_Odessa_TechTalks
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
Framework and Application Benchmarking
Framework and Application BenchmarkingFramework and Application Benchmarking
Framework and Application Benchmarking
Paul Jones
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Amazon Web Services
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
MamathaBusi
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
Amazon Web Services
 

Recently uploaded (20)

Gojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptxGojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptx
V3cube
 
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
 
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
 
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
ScyllaDB
 
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOTSMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
TanmaiArni
 
Early Adopter's Guide to AI Moderation (Preview)
Early Adopter's Guide to AI Moderation (Preview)Early Adopter's Guide to AI Moderation (Preview)
Early Adopter's Guide to AI Moderation (Preview)
nick896721
 
Fl studio crack version 12.9 Free Download
Fl studio crack version 12.9 Free DownloadFl studio crack version 12.9 Free Download
Fl studio crack version 12.9 Free Download
kherorpacca127
 
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
 
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
 
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)
Tsuyoshi Hirayama
 
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
Safe Software
 
Future-Proof Your Career with AI Options
Future-Proof Your  Career with AI OptionsFuture-Proof Your  Career with AI Options
Future-Proof Your Career with AI Options
DianaGray10
 
World Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a CrossroadsWorld Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a Crossroads
Joshua Randall
 
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
 
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterQ4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
MariaBarbaraPaglinaw
 
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
 
Wondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 LatestWondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 Latest
udkg888
 
AIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramento
AIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramentoAIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramento
AIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramento
Alessandro Bogliolo
 
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]
Jonathan Bowen
 
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
 
Gojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptxGojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptx
V3cube
 
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
 
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
 
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by K...
ScyllaDB
 
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOTSMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
TanmaiArni
 
Early Adopter's Guide to AI Moderation (Preview)
Early Adopter's Guide to AI Moderation (Preview)Early Adopter's Guide to AI Moderation (Preview)
Early Adopter's Guide to AI Moderation (Preview)
nick896721
 
Fl studio crack version 12.9 Free Download
Fl studio crack version 12.9 Free DownloadFl studio crack version 12.9 Free Download
Fl studio crack version 12.9 Free Download
kherorpacca127
 
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
 
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
 
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)
Tsuyoshi Hirayama
 
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
Safe Software
 
Future-Proof Your Career with AI Options
Future-Proof Your  Career with AI OptionsFuture-Proof Your  Career with AI Options
Future-Proof Your Career with AI Options
DianaGray10
 
World Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a CrossroadsWorld Information Architecture Day 2025 - UX at a Crossroads
World Information Architecture Day 2025 - UX at a Crossroads
Joshua Randall
 
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
 
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterQ4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
MariaBarbaraPaglinaw
 
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
 
Wondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 LatestWondershare Filmora Crack 14.3.2.11147 Latest
Wondershare Filmora Crack 14.3.2.11147 Latest
udkg888
 
AIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramento
AIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramentoAIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramento
AIXMOOC 2.3 - Modelli di reti neurali con esperimenti di addestramento
Alessandro Bogliolo
 
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]
Jonathan Bowen
 
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
 

Common Sense Performance Indicators in the Cloud

  • 1. Common Sense Performance Indicators Nick Gerner June 24, 2010
  • 2. Goals Common Sense in the Cloud same as outside the cloud 1. Tune performance 2. Investigate issues 3. Visualize architecture
  • 3. Nick Gerner www.nickgerner.com @gerner • Formerly senior engineer at SEOmoz • Linkscape: index of the web for SEO • Lead data services • Developer • Back-end ops guy
  • 4. SEOmoz • Seattle-based Startup (~7 engineers) • SEO Blog and Community • Toolset and Platform OpenSiteExplorer.org • 300TB/month processing pipeline • 5 mil req/day API hits
  • 5. SEOmoz Engineering • 50 < nodes < 500 • AWS based since 2008 – EC2 – linux root access to bare VM – S3 – networked disk – EBS – local disk I/O – ELB – load balancing as a service
  • 6. SEOmoz Architecture Processing The Raw Web Crawlers Crawlers Storage Process Prepare Data Pipeline
  • 7. SEOmoz Architecture API Memcache App Lighttpd Partners Memcache App Lighttpd ELB S3 SEOmoz Memcache App Lighttpd Apps
  • 8. End-to-End Performance Indicators Latency Conversion Rate DNS Time to On-load Web Object Count
  • 9. Great ...but not the focus of this talk Latency Conversion Rate DNS Time to On-load Web Object Count
  • 10. Performance Indicators System App Characteristics Stack Front-End CPU Mem Drives Middleware Caching Net Disk Competes Back-end For Database WS-API http://www.flickr.com/photos/dnisbet/3118888630/
  • 11. Performance Indicators System Characteristics App Stack CPU Mem Front-End Drives Middleware Caching Competes For Back-end Net Disk Database WS-API http://www.flickr.com/photos/dnisbet/3118888630/
  • 12. /proc • System stats • Per-process stats • It all comes from here ...but use tools to see it
  • 13. System Characteristics Load Average CPU Memory Disk Network
  • 14. Load Average • Combines a few things • Good place to start • Explains nothing http://www.flickr.com/photos/maple03/4176389418/
  • 15. CPU • Break out by process • Break out user vs system • User, System, I/O wait, Idle http://www.flickr.com/photos/pacdog/213442876/
  • 16. Why watch it? • Who's doing work • Is CPU maxed? • Blocked on I/O? • Compare to Load Average http://www.flickr.com/photos/pacdog/213442876/
  • 17. Memory • Break out by Process • Free, cached, used http://www.flickr.com/photos/williamhook/3118248600/
  • 18. Why watch it? • Cached + Free = Available • Do you have spare memory? – App uses – Memcache – DB cache http://www.flickr.com/photos/williamhook/3118248600/
  • 19. Disk • Read bytes/sec • Write bytes/sec • Disk utilization http://www.flickr.com/photos/robfon/2174992215/
  • 20. Why watch it? • Is disk busy? • When? • Who's using it? http://www.flickr.com/photos/robfon/2174992215/
  • 21. Network • Read bytes/sec • Write bytes/sec • Established connections http://www.flickr.com/photos/ahkitj/20853609/
  • 22. Why watch it? • Max connections (~1024 is magic) • Bandwidth is $$$ • When are you busy? • SOA considerations http://www.flickr.com/photos/ahkitj/20853609/
  • 23. v Perf Monitoring Solution FREE, in Apt 1. data collection (collectd) 2. data storage (rrdtool) 3. dashboard management (drraw)
  • 24. Perf Monitoring Architecture Multiple Clusters Multiple Applications Nodes come up and go down Cluster Cluster
  • 25. Perf Monitoring Architecture collectd agents new nodes get Cluster generic config Cluster node names follow convention according to role
  • 26. Perf Monitoring Architecture On its own server: collectd server Perf Monitoring Web server drraw.cgi Server allows connections from new nodes perf data backed up daily Cluster Cluster
  • 27. Perf Monitoring Architecture Happy Sysadmin Visibility into system history of performance Perf Monitoring Server Cluster Cluster
  • 28. Perf Dashboard Featurs 1. Summarize nodes/systems 2. Visualize data over time 3. Stack measurements – Per-process – Per-node 4. Handle new nodes –
  • 30. CPU
  • 32. Disk
  • 39. Graph Summary • cpu, mem, disk, net • over time • per node • per process • Through in relevant app measures e.g. per request stats: • req/sec • median latency/req
  • 40. Ad-hoc Tools • $ dstat -cdnml system characteristics • $ iotop per-process disk I/O • $ iostat -x 3 detailed disk stats • $ netstat -tnp fast, per-process TCP connection stats
  • 41. Resources • Perf Testing: What, How, Why http://www.nickgerner.com/2010/02/performance-testing- what-andhow-why/ • Perf Testing Case Study: OSE http://www.nickgerner.com/2010/01/performance-testing- case-study-ose/ • S3 Benchmarks http://twopieceset.blogspot.com/2009/06/s3- performance-benchmarks.html • Perf Measurement – http://twopieceset.blogspot.com/2009/03/performance- measurement-for-small-and.html –
  • 42. More Resources • http://www.collectd.org • http://oss.oetiker.ch/rrdtool/ • http://web.taranis.org/drraw/ • http://dag.wieers.com/home-made/dstat/ • $ man proc –
  • 43. Q: Why? A: Perf Tuning Test Validate Measure Improve Interpret
  • 44. Q: Why? A: System Arch • Better Devs/Ops • Identify Bottlenecks • Scaling Considerations
  • 45. Q: Why? A: Issue Investigation • Machine Specific? • System Wide? • Which Component? • Timeline? • Cascading Failures?