Capacity Management: For Web Operations
Capacity Management: For Web Operations
Capacity Management: For Web Operations
???
security incidents
real capacity problems*
* (should be the last thing you need to worry about)
Capacity != Performance
a lot of great deployment and management tricks come from them, adopted by web ops
Metrics
System Statistics
Application Level
Metrics
(photos processed per minute)
(apache requests)
Metrics
App-level meets system-level
2400
photos per minute being uploaded right NOW (Tuesday
the most amount of work your resources will allow before degradation or failure
Ceiling s
Forget Benchmarking
The End
Ceilings
waiting on disk sustained disk I/O wait for >40% creates too much slave lag*
*for us, YMMV
35,000
Safety Factors
Safety Factors
Safety Factors
webserver!
Safety Factors
what you have left
Safety Factors
Yahoo Front Page link to Chinese NewYear Photos
(8% spike)
(photo requests/second)
Forecasting
Forecasting
Forecasting
peak of the week
Forecasting
Forecasting
not too shabby
now
Forecasting
ceiling this will tell you when it is
Forecasting
Forecasting Automation
Writing excel macros is boring
All we want is days remaining, so all we need is the curve-fit
Forecasting
Forecasting Automation
Forecasting Automation
jallspaw:~]$cfityk ./fit-storage.fit
1> # Fityk script. Fityk version: 0.8.2 2> @0 < '/home/jallspaw/storage-consumption.xy' 15 points. No explicit std. dev. Set as sqrt(y) 3> guess Quadratic New function %_1 was created. 4> fit Initial values: lambda=0.001 WSSR=464.564 #1: WSSR=0.90162 lambda=0.0001 d(WSSR)=-463.663 (99.8059%) #2: WSSR=0.736787 lambda=1e-05 d(WSSR)=-0.164833 (18.2818%) #3: WSSR=0.736763 lambda=1e-06 d(WSSR)=-2.45151e-05 (0.00332729%) #4: WSSR=0.736763 lambda=1e-07 d(WSSR)=-3.84524e-11 (5.21909e-09%) Fit converged. Better fit found (WSSR = 0.736763, was 464.564, -99.8414%). 5> info formula in @0 # storage-consumption 14147.4+146.657*x+0.786854*x^2 6> quit bye...
Forecasting Automation
fityk gave: y = 0.786854x2 + 146.657x + 14147.4 ( R2 = 99.84) Excel gave: y = 0.7675x2 + 146.96x + 14147.3
( R2 = 99.84)
(SAME)
Capacity Health
12,629 nagios checks 1314 hosts 6 datacenters 4 photo farms farm = 2 DCs (east/west)
alert if lower
type
20
80
20
18
40
950
busy 62.50 1600 1000 procs % I/O 27.50 800 220 wait % req/se 66.67 17,100 11,400 c %
36
120
48
Diagonal Scaling
vertically scaling your already horizontal nodes
Diagonal Scaling
example: image processing
4 cores
8 cores
Diagonal Scaling
example: image processing throughput
Diagonal Scaling
example: image processing
went from:
23
1035 photos/min
23U rack
to:
1036.8 8U 1120 HP DL140 G3s Watts photos/min rack !!! (75% faster, even)
3.52
www118
dbcontacts3 admin1 admin2
etc., etc.
Bake the dynamic into static Some Y! properties have a big red button to instantly bake (and unbake) at will
thanks
http://flickr.com/photos/bondidwhat/402089763/ http://flickr.com/photos/74876632@N00/2394833962/ http://flickr.com/photos/42311564@N00/220394633/ http://flickr.com/photos/unloveable/2422483859/ http://flickr.com/photos/absolutwade/149702085/ http://flickr.com/photos/krawiec/521836276/ http://flickr.com/photos/eschipul/1560875648/ http://flickr.com/photos/library_of_congress/2179060841/ http://flickr.com/photos/jekkyl/511187885/ http://flickr.com/photos/ab8wn/368021672/ http://flickr.com/photos/jaxxon/165559708/ http://flickr.com/photos/sparktography/75499095/
questions?