Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Analytic Modeling of Load Balancing Policies for Tasks  with Heavy-tailed Distributions Alma Riska Evgenia Smirni Gianfranco Ciardo Department of Computer Science College of William and Mary Williamsburg, VA 23187-8795, USA e-mail friska,esmirni, iardog s.wm.edu ABSTRACT We present an analyti te hnique for modeling load balan ing poli ies on a luster of servers onditioned on the fa t that the servi e times of arriving tasks are drawn from heavy tail distributions. We propose a new modeling methodology for the exa t solution of an M/Hk /1 server and illustrate its use for modeling two distin t load balan ing poli ies in a distributed multi-server system. Our analyti results provide exa t information regarding the distribution of task sizes that ompose the waiting queue on ea h server and suggest an easy and inexpensive way to provide load balan ing based on the sizes of the in oming tasks. 1. INTRODUCTION We onsider the resour e allo ation problem in a distributed multi-server system. We assume that tasks arrive to a front-end system, whi h is responsible for dispat hing them to the ba k-end nodes. This happens a ording to a task s heduling poli y that aims to route the request to the \best" ba k-end server, sin e a task an potentially be served by any server. Su h a system an be onsidered as an abstra tion of a distributed web server [7, 10, 19℄. Balan ing the load a ross the ba k-end servers is riti al for performan e [7℄. In the past two de ades, there has been a signi ant resear h e ort in task s heduling and load balan ing (see [11℄ and referen es therein). The This work has been supported by National S ien e Foundation under grands EIA-9974992 and EIA9977030, and by the National Aeronauti s and Spa e Administration under NASA Grant NAG-1-2168. basi assumption in mu h of this work is that the servi e demands of the various tasks are governed by an exponential distribution. In ontrast to the above assumption, there is very strong eviden e that the size of web do uments, and a ordingly their servi e demands, are governed by heavy-tailed distributions [3, 4, 1, 2℄. As a onsequen e, load balan ing in distributed servers must be re-examined. A problem that plagues the analysis of load balan ing poli ies via simulation is the heavy-tailed distribution of the task servi e times. Simulation be omes either very time- onsuming, or at times even intra table. Consequently, analyti models be ome an attra tive alternative. In this paper, we illustrate the use of analyti models for the analysis of load balan ing poli ies in distributed servers whose abstra tion is illustrated in Figure 1. We onsider two s heduling poli ies that do not use information about the load of ea h individual server for their s heduling de isions and are onsequently easy to implement. We ompare a random poli y (i.e., a poli y where the dispat her allo ates ea h new task to a randomly sele ted server), and a size-based poli y (i.e., a poli y that where the dispat her allo ates ea h new task to the servers based solely on the task's size). Size based poli ies have been shown to outperform even dynami s heduling poli ies where in oming tasks are dispat hed towards the least loaded host if the workload is governed by a bounded Pareto distribution [10℄. Our modeling results further reinfor es the belief that the heavier the tail of the distribution, the better the size-based poli y performs. The analysis of our model uses an extended version of the ETAQA approa h that was re ently developed for the study of quasi-birth-death (QBD) pro esses and a more omplex ase of M/G/1-type pro esses [5, 6℄. ETAQA provides a solution that is very eÆ ient both omputation-wise and storage-wise in omparison to the lassi solutions for QBD and M/G/1-type pro esses but its major drawba k is the fa t that it applies to Arriving tasks Front−end Dispatcher .. . Back−end Nodes Figure 1: Model of a distributed server. a very restri ted family of pro esses for whi h \returns" from a higher level of states to the immediate lower level are always dire ted toward a single state only. Effe tively, this means that ETAQA an apply when the matrix B (i.e., the matrix that represents the returns from a higher level of states to the immediate lower level) is a single olumn matrix. Here, we extend ETAQA by allowing the matrix B to be a full matrix onsisting of non-zero olumns that are multiples of one (any) olumn. This allows us to solve M/Hk /1-type queues exa tly (i.e., queues with Markovian arrivals and k-stage hyper-exponential servers) using ETAQA. This parti ular result is of great importan e in the area of modeling of queues where the ustomer demands are drawn by heavy-tailed distributions, sin e su h distributions an be losely approximated by hyper-exponential distributions [9℄. Using real tra e data that show the le distribution of the web server for all requests for the 1998 World So er Cup, we apply tting te hniques [9℄ to extra t a hyper-exponential distribution that best approximates the a tual data distribution. Then, using ETAQA, we study the behavior of the random and the size-based poli y. ETAQA allows for the individual al ulation of the ontribution to the queue length (or to any of its higher moments) due to ea h individual \phase" of the server. Thus, we an perform a detailed study that quanti es the e e t of the task sizes on queue build-up for the various servers. This allows us to argue about the e e tiveness of a size-based poli y. This paper is organized as follows. In Se tion 2 we present the workload and the various steps we follow in order to obtain a hyper-exponential distribution that losely approximates the workload behavior. In Se tion 3 we extend ETAQA to obtain the exa t solution of M/Hk /1-type queues. Se tion 4 presents the performan e omparisons of the two load balan ing poli ies. Se tion 5 on ludes the paper and outlines future work. 2. THE WORKLOAD The workload used in our analysis is obtained from web server tra es. Sin e we evaluate di erent load balan ing poli ies using analyti te hniques, we hara terize the data with a hyper-exponential distribution. In this se tion we des ribe the workload used and the tting steps that allow us to derive a hyper-exponential distribution that losely mat hes the web server tra e data. We obtained workload tra es of the 1998 World So er Cup Web site1 . The World Cup site server was omposed of 30 low-laten y platforms distributed a ross four physi al lo ations. Client requests were dispat hed to a lo ation via a Cis o Distributed Dire tor, and ea h lo ation was responsible for load balan ing in oming requests among its available servers. The tra es provide information about ea h request reeived by ea h server. For ea h request the following information is re orded: the IP address of the lient issuing the request, the date and time of the request, the URL requested, the HTTP response status ode, and the ontent length (in bytes) of the transferred do ument. Tra e data were olle ted for ea h day during the total period of time that the web server was operational. Sin e the fo us of this work is on load balan ing, irrespe tive of possible a hing poli ies at the server, we only extra ted the ontent length of the transfered do ument from ea h tra e re ord assuming that the servi e time of ea h request is a fun tion of the size of the requested do ument. For a detailed analysis of the World Cup workload see [2℄. 2.1 Fitting a day’s data into a distribution It is a well-known fa t that the sizes of web server requests are highly variable and are best des ribed by heavy-tail distributions. To he k for the heavy-tail property, we used Boston University's aest tool that veri es and estimates the heavy-tail portion of a distribution [8℄2 . Using the s aling estimator methodology, the tool helps identify the portion of the data set that exhibits power-law behavior by demonstrating graphially the tail of the distribution where the heavy-tailed behavior is present. The sele tion of the point where the power-law behavior starts is signi ant be ause it a e ts the omputation of the parameters of the distribution. Figure 2 shows the results of the s aling analysis for a representative day of the dataset, day 80. Considering the tail portion of the plots, for requests larger than 1 MByte, we see that they are lose to linear, suggesting that the heavy-tailed portion of the dataset begins at around 1 MByte. Based on this observation, we on lude that the original distribution is best approximated by a hybrid model that ombines a lognormal distribution for the body of the data and a power-law distribution for its tail [2℄. After identifying the two portions of the workload, we 1 http://resear hsmp2. .vt.edu/ gi bin/reposit/sear h.pl?details=YES&detailso set=135. 2 http://www. s.bu.edu/fa ulty/ rovella/aest.html. 1 Cumulative Distribution Function 0 -1 log10(ccdf) -2 -3 Raw Data 2-Aggregated 4-Aggregated 8-Aggregated 16-Aggregated 32-Aggregated 64-Aggregated 128-Aggregated 256-Aggregated 512-Aggregated -4 -5 -6 -7 0 1 2 6 7 8 Figure 2: Tail hara terization for day 80. The various urves in the gure show the omplementary umulative distribution fun tion ( df) of the dataset on a log-log s ale for su essive aggregations of the dataset by fa tors of two. The gure illustrates that the shape of the tail (i.e., for size > 106 ) is lose to linear and suggests the parameter for its power-law distribution. The `+' signs on the plot indi ate the points used to ompute the in the distribution. The elimination of points for ea h su essive aggregation indi ates the presen e of a heavy tail. need to ompute the parameters of ea h of its portions. The body of the distribution is onsidered lognormal with probability distribution fun tion (p.d.f): f (x) = p1 bx 2 exp  (ln x a)2 2b2  We ompute b > 0 (i.e., the shape parameter), and a 2 ( 1; 1) (i.e., the s ale parameter) using the maximum likelihood estimators [14℄: a^ = Pn i=1 ln Xi ; n 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3 4 5 log10(request size) ^b = Pn i=1 (ln Xi n 1 a^) 2 5 10 15 20 Request Size in Log2 Scale 25 30 Figure 3: Fitted data of day 57. separately and nally ombine both distributions into a hyper-exponential one using a weighted sum. The weights for ombining the two hyper-exponential distributions orresponding to the lognormal and the powerlaw portions of the original data are given by the probability that a request is for a le for size less or equal to, or greater than, 1 Mbyte, respe tively, as omputed from the empiri al data. The Feldmann-Whitt algorithm attempts to t various regions of the distribution with exponential omponents in a re ursive manner. At ea h step, the tted exponential omponent is subtra ted from the distribution, su h that ea h omponent fo uses on a spe i portion of the random variable values, in reasingly loser to 0. If there are enough exponential omponents, the algorithm manages to losely approximate a heavy-tail distribution in the area of primary interest. The algorithm output is then a hyper-exponential distribution Hk with probability distribution fun tion k X i=1 i i e i x ; x  0: We point the interested reader to [9℄ for a detailed des ription of the algorithm and its tting a ura y. where Xi for 1  i  n are the sample data. if its P [X > x℄  x ; x ! 1; 0 < < 2 where X is the random variable des ribing the request size. In our study, we ompute 0 h(x) = 2 The workload is heavy-tailed with tail index omplementary distribution fun tion is: Hybrid Dist. Data Hyperexp.Dist. 0.9 via the aest tool. 2.2 Fitting the distribution into a mix of exponentials Our se ond step is to apply Feldmann and Whitt's algorithm [9℄ for approximating a heavy-tailed distribution with a hyper-exponential distribution. Sin e both the body and the tail of our distributions have a heavy-tail omponent, we apply the algorithm to ea h omponent 2.3 Fitting examples We use the Feldmann-Whitt algorithm to t the data from two representative days, day 57 and day 80 into hyper-exponential distributions. Figures 3, 4 illustrate the umulative distribution of the a tual data, their tting into a hybrid distribution (lognormal for the body and power-law for the tail), and the tting of the hybrid distribution into a hyper-exponential one. We observe that the resulting hyper-exponential distribution losely mat hes the behavior of the original data. Tables 1 and 2 illustrate the parameters of the lognormal and the power-law portions of the distribution for ea h day. For both days, we see that the bulk of the data Cumulative Distribution Function 1 Data from Day 80 Lognormal(a; b) Power( ) Weight for a = 7:43343 = 0:89 Lognormal b = 1:42824 0.999977 Parameters of the H7 (i ; i : 1  i  7) tting Hybrid Dist. Fitted Data Hyperexp. Dist. 0.9 0.8 0.7 0.6 i 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 Log10( Request Sizes) in bytes 30 0.0000000137089 0.0000002156677 0.0000432933235 0.0000053128803 0.0000296466848 0.0001509998638 0.0008702852306 i 0.0000000001905 0.0000000015697 0.0005699982398 0.0007440620149 0.0393217234927 0.3679160877122 0.5914481267802 Figure 4: Fitted data of day 80. Table 2: Workload parameters for day 80. Data from Day 57 Lognormal(a; b) Power( ) Weight for a = 7:033358 = 0:82 Lognormal b = 1:509296 0.99935 Parameters of the H7 (i ; i : 1  i  7) tting i 0.0000000084695 0.0000001060318 0.0000113172652 0.0000015100478 0.0000189146863 0.0001900075398 0.0012356156678 i 0.0000000004383 0.0000000023312 0.0006499972304 0.0000147002669 0.0144507953832 0.4437013666789 0.5411831376710 Table 1: Workload parameters for day 57. As our purpose is to analyze load balan ing poli ies a ording to workload variability, we also onsidered a syntheti workload that exhibits di erent variability hara teristi s with respe t to one of the sele ted days, namely day 57. Sin e the power-tail portion of ea h of the sele ted days is very small, we turn our attention to the lognormal portion whi h also exhibits a heavy-tail behavior. By hanging simultaneously both the a and b parameters of a lognormal distribution, we hange both the s ale and shape of the distribution so as to vary the varian e of the distribution and at the same type keep the mean of the distribution onstant (and equal to the mean of day 57, i.e., 3629 Bytes. This way, we an examine the sensitivity of our load balan ing poli ies to 1 Cumulative Distribution Function lies in the lognormal portion of the workload, while only a very small part (albeit with very large le sizes) lies in the power-law part. Tables 1 and 2 also show the parameters that the Feldmann-Whitt algorithm suggests for the tting of the above distributions into hyperexponentials. In both ases, a total of seven exponential phases, four for the lognormal portion and three for the power-law portion, are suÆ ient to a hieve an ex ellent approximation of the original data. 0.8 Highest Variability b=2.2, a=5.752 0.6 Fitted Data, Day 57 b=1.509, a=7.033 0.4 Smallest Variability b=0.8, a=7.852 0.2 0 0 5 10 15 20 Log2 ( Request Sizes) in bytes 25 Figure 5: Shape of the df when hanging the variability of the lognormal portion of the workload. the workload variability. Figure 5 illustrates how the df of the distribution hanges when we hange the workload variability (the tting te hnique we use resulted in three, four, or ve stages for the lognormal tting, thus a total of six, seven, or eight stages were used to t the mixture of the lognormal and power-law distribution). We will return to the issue of poli y sensitivity analysis as a fun tion of workload variability in Se tion 4. 3. ANALYSIS OF M/HK /1 QUEUES After performing the tting steps outlined in the previous se tion, we obtain a hyper-exponential distribution that losely des ribes the workload. Therefore, we an model ea h server as an M/Hk /1 queue. In this se tion, we des ribe a new methodology for the exa t solution of su h queues. The ontinuous time Markov hain (CTMC) modeling the behavior of an M/Hk /1 queue [12, page 143℄ has a matrix geometri form [15, 16℄. This implies that its state spa e an be partitioned into the \boundary" (0) set of states S (0) = fs(0) 1 ; : : : ; sm g, of size m, and the (j ) \repetitive" sets of states S = fs(1j ) ; : : : ; s(nj ) g, for j  1, ea h of size n. The in nitesimal generator an a ordingly be blo k partitioned as: 3 2 ^ ^ L F 0 0 0  6 ^ L F 0 0  7 B 7 6 6 0 B L F 0  7 Q=6 (1) 7 6 5 4 0 0 B L F  7 .. .. .. .. .. . . . . . . . . (we use the letter \L", \F", and \B" a ording to whether the matri es des ribe \lo al", \forward", and \ba kward" transition rates, respe tively, and we use a \^" for matri es related to S (0) ). The repetitive stru ture of the in nitesimal generator of quasi-birth-death (QBD) pro esses, i.e., pro esses with an in nitesimal generator illustrated in (1) allows for a re ursive formulation of the stationary probabilities for the CTMC states that fa ilitates their omputation. A signi ant body of resear h on entrates on the solution of QBD pro esses with matrix geometrix form, with most prominent the works of Neuts [16℄ and Latou he and Ramaswami [13℄. Neuts [16℄ proposed an algorithm that takes advantage of the repetitive stru ture of a QBD pro ess. If (j ) is the stationary probability ve tor for the states in S (j ) , the values of  (j ) , j  2 have a geometri form: (j) = (1)  Rj 1 ; j 2 where R is the rate matrix, solution of the quadrati equation F + R  L + R2  B = 0; and an be obtained numeri ally using an iterative proedure. Another matrix that an greatly fa ilitate the omputation of the probability distribution of all states in QBDs is G [13℄, solution of: B + L  G + F  G2 = 0 whi h implies G = (L + F  G) 1  B: Both R and G have important probabilisti interpretations. G expresses the probability of rst entering S (j 1) (through ea h of its states, starting from ea h state S j ) . R re ords the expe ted number of visits to ea h state in S (j ) , starting from ea h state in S (j 1) , before reentering S (j 1) . The following relation between matri es R and G holds [13, pages 137-8℄: R = F  (L + FG) 1 from whi h we derive the fundamental relation R  B = F  G: (2) Obtaining R is usually omputationally intensive in general (see [13℄ for an overview of alternative methods to ompute R), but knowing G an greatly fa ilitate the omputation of R [18℄. In [5℄, ETAQA, an alternative methodology for the solution of QBD pro esses that avoids using the matri es R and G was introdu ed. ETAQA does not operate on the steady state distribution of all states in the hain, not even in impli it re ursive form, but rather on the exa t aggregate probabilities of lasses of states de ned by partitioning the state spa e. Even if it only omputes aggregate probabilities, ETAQA allows to easily obtain measures of interest su h as queue length or any of its higher moments. The major drawba k of ETAQA as presented in [5℄ is that it applies to a restri ted family of QBDs, for whi h transition from states in S (j ) to the immediately preeding set S (j 1) an only be dire ted toward a single spe ial \return" state in S (j 1) . This e e tively imposes a spe ial stru ture on the matrix B: all of its olumns must be zero ex ept for the one orresponding to the spe ial return state. Given ETAQA's original restri tion, it is immediately obvious that it annot be used as-is for the solution of M/Hk /1 queues sin e, in this ase, the matrix B is full, albeit with a spe ial stru ture: its olumns are multiples of ea h other. This implies that B =   , where  is a olumn ve tor and is a row ve tor of k elements ea h. The elements of  and are the parameters of the hyper-exponential servi e time distribution Hk , i.e., the rate of the exponential stages and their respe tive probabilities. This implies that  1T = 1 and, oupled with the fa t that B an be expressed as the produ t of two ve tors, also implies that G an be expli itly obtained and is equal to [18℄: G=1 : (3) Using (3) we an extend ETAQA to the ase of M/Hk /1 queues, as shown in the next se tion. 3.1 ETAQA for M/Hk /1 queues Partitioning the stationary probability ve tor a ording to the set of states S (0) , and S (i) for i  1 de ned in se tion 3 we get: h i  = (0) ; (1) ; (2) ;    with  (0) 2 IRm and  (1) ;  (2) ;    2 IRn . Furthermore we modify the in nitesimal generator into Q0 su h that Q = Q0 and Q0 is de ned as: 2 6 6 Q0 = 6 6 6 4 ^ ^ L F ^ B L 0 B+S 0 S .. .. . . 0 0 0  F 0 0  L F+S 0  B L F +S  .. .. .. .. . . . . 3 77 77 ; (4) 75 where as: S =FG F  G = 0. We an write   Q0 = 0 8 (0) L^ + (1) B^ = 0 > > 1 > X > > > > (i) S + (0) F^ + (1) L + (2) B = 0 > > > < i=2 (1) F + (2) L + (3) B = 0 > (2) >  S + (2) F + (3) L + (4) B = 0 > > > > (3) S + (3) F + (4) L + (5) B = 0 > > > > .. : [ (0) ; (1) ; () ℄  Q = [0℄; where : (5) We are going to show how to derive m + 2n equations (0) (1) () in P1 ,(i) , and a new ve tor of n unknowns,  =  , representing the stationary probability of bei=2 (i) ing in the ma ro-state fsj : i  2g, for 1  j  n. First, however, observe that, sin e the matrix-geometri property holds,  (j +1) =  (j )  R for j  1. Using (2) we obtain  B =   R  B =   F  G: j  1 Summing (6) over all j  2 we obtain  (j ) 1 X j =3 (j ) (j)  B = 1 X j =2 whi h implies ()  F  G = (6) j =3 (j)  B: ()  F  G + (2)  B = ()  B: - The rst row in (5) provides m equations:   L^ +  (1)  B^ = 0: (7) - The se ond row in (5) provides n equations: F  G) = 0: (8) - If we sum all the remaining equations in (5), we obtain: (1)  F + ()  (L + F + S) + 1 X (j)  B = j =3 () j =3   F + ()  (L + F + F  G) = (1) G F L +F+FG (11) The following theorem states that the rank of Q is 1, implying that (10) is suÆ ient to ompute (0) , (1) , and () , if we also take into a ount the normalization onstraint m + 2n (0)  1T + (1)  1T + ()  1T = 1: Theorem 3.1. Given an ergodi CTMC with in nitesimal generator Q having the stru ture shown in (1), the rank of matrix Q de ned in (11) is m + 2n 1. We stress that, in the spe ial ase of M/Hk /1 queues, we an expli itly obtain G using (3). Indeed, we only need to store ve tor to fully apture G. Knowing the aggregated steady state probability ve tors allows us to ompute a ri h olle tion of measures of interest. A detailed des ription on how to derive them an be found in [5℄. Here, we only summarize these results. By writing the measure of interest r as r = (0)  (0)T + (1)  (1)T + 1 X j =2 (j)  (j)T (where  = [(0) ; (1) ;    ℄ are the reward rates for the states in S (0) ; S (1) ;    ), the de nition of  is only restri ted by our need to ompute the above summation. Assuming the reward rate of state s(ij ) , for j  2 and i = 1; : : : ; n, is a polynomial of degree k in j with arbi[1℄ [k℄ trary oeÆ ients a[0℄ i ; ai ; : : : ; a i :   F +   (L + F + F  G) + 1 X (j)  B () F  G = (1) F 3 5: 3.2 Measures of interest (0)  F^ + (1)  L ()  F  G +  ()  F  G + (2)  B = (0)  F^ + (1)  L + ()  (B L B 0 The proof is based on the fa t that Q0 is an in nitesimal generator, and its olums are linearly independent, ex ept one of them. We use all the olumns of Q0 in our proof, and obtain in this way Q. The rst m + n olumns of Q are the rst m + n olumns of Q0 . The last n olumns of Q are linear ombination of the rest of the olumns in Q0 . So, the rank of matrix Q is m + 2n 1. We need to drop any of its olumns and substitute it with a olumn of 1s in order to obtain a unique solution for the our stationary probability ve tor [(0) ; (1) ; () ℄ or alternatively that (0) ^ F (10) Proof. (j)  F  G 1 X 2 ^ L ^ Q = 4 B 0 . (j +1) Equations (7), (8), and (9) an be olle tively written in matrix form as: 0 (9) 8j  2; 8i 2 f1; 2; : : : ; ng; [1℄ [k℄ k (ij ) = a[0℄ i +ai j +  +ai j : we obtain 1 X (j ) (j )T j =2   = 1 X j =2 = 1 X j =2 ( )  j  a [0℄ + a[1℄ j +    + a[k℄ j ( )  a[0℄ +    + j T 1 X j =2 T k j k (j )  a[k℄T = r[0℄  a[0℄T +    + r[k℄  a[k℄T ; P where r[l℄ = 1 j l (j ) for l = 0; : : : ; k, and its omj =2 putation an be illustrated by strong indu tion. For the base ase, r[0℄ is simply () . For k > 0, r[k℄ an be omputed by solving the system of n linear equations:  [k℄ r  (L + F + B)1:n;1:n 1 = b1:n 1 (12) [k℄ r  (F B)  1T = where b =  ^ + 2k  (1)  L + 3k (1)  F+ 2k (0)  F  k  X k 2l r[k l l℄ F+r [k l℄ L ! l=1 = 2k (1)  F  1T  k  X k r[k l l℄ F1 T l=1 In pra ti e, to obtain the system queue length we need to solve a linear system in n unknowns, and to ompute its kth moment we must solve k linear systems in n unknowns ea h. 4. ANALYSIS OF LOAD BALANCING POLICIES We onsider the following model of a distributed server environment. We assume a xed number of hosts with the same pro essing power, ea h serving tasks in rstome- rst-serve order. We further assume that ea h host has an unbounded queue. Tasks arrive to the dispat her from the outside world a ording to a Poisson pro ess. The dispat her is responsible for distributing the jobs among the various ba k-end servers a ording to a s heduling poli y. We also assume that the dispat her an derive the request duration (the size of the le) from the name of the le requested. We onsider the following two load balan ing poli ies (in neither ase the dispat her uses feedba k from the individual hosts to better balan e the load among them): The dispat her assigns the in oming job to a randomly sele ted host, with probability 1= . The performan e of this poli y has been shown to be very similar that of round-robin, i.e., a poli y that assigns jobs to hosts in y li al fashion [10℄. Random: The dispat her assigns tasks to hosts a ording to the tasks' size. This poli y is motivated by the desire to separate large from small tasks, to avoid the signi ant slowdowns that small tasks would experien e when queued behind large tasks. A poli y based on the same prin iple has been examined in [10℄ and ompared very favorably to a dynami poli y where the dispat her assigns tasks to hosts a ording to the hosts' load at the time of task arrival. Size-based: We rst onsider the performan e of the random poli y under the syntheti workload des ribed in Subse tion 2.3, re alling that one of the urves in Figure 5 orresponds to the a tual tted data from day 57 from the World Cup 98 tra e data. In all ases, we assume that the overall arrival rate to the dispat her is , and that there are eight hosts. Thus, the arrival rate to ea h individual host with the random poli y is =8. The average task slowdown for the random poli y is illustrated in Figure 6(a). Although the system saturates at the same value of  regardless of the workload hoi e (re all that all workloads have the same mean task size), the average task slowdown di ers dramatially from workload to workload, espe ially in the range of medium-to-high system utilization. Figure 6(b) illustrates the average queue length at ea h host as a fun tion of the workload variability for various arrival rates3 . The gure further on rms that the higher the workload variability, the more dramati the average queue buildup is. To further understand the behavior of the system, we look loser at the range of task sizes that ontribute to the queue build-up and onsequently to the performan e degradation. Our analyti al model allows us to further explore the system behavior by analyzing how the queue length builds up. Figure 7 depi ts the CTMC that models a host, an M/H7 /1 server (for presentation larity, not all ar s are illustrated in the pi ture, but the reader an visualize the shape of the Markov hain and most importantly identify the parts of the CTMC that orrespond to the power-law portion of the workload and the lognormal portion of the workload). As des ribed in Se tion 3, our analyti model allows the exa t omputation of the system queue length that orresponds to the di erent portions of the distribution. Figure 8 illustrates the ontribution to the overall queue length from the power-law portion of the workload and from all tasks ( les) greater than 100 KBytes (i.e., the phase of the lognormal distribution with large 3 For presentation larity, the x-axis of Figure 6(b) shows only the value of the b parameter of the lognormal distribution for the workload but we remind the reader that a di erent value of b implies also a di erent value of a, to keep the same mean task size a ross all workloads. 80 60 highest variability(b=2.2) 300 250 200 lowest variability (b=1.3) 150 100 50 0 (a) 70 400 350 Queue Length Mean Slowdown 500 450 highest arrival rate(λ=0.0021) 50 40 30 20 10 0 0.0004 0.0008 0.0012 0.0016 0.0020 Arrival Rate (λ) 0 (b) 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 Workload Variability (b) Figure 6: Average task slowdown as a fun tion of the overall task arrival rate  for workloads with high-to-low varian e in their task servi e time (a), and average queue length as a fun tion of the workload variability b for various arrival rates of the workload (b). ... ... ... ... Lognormal Portion The stage of Lognormal with large request sizes ... ... ... Power Tail Portion Figure 7: The CTMC that models one host. request sizes and all phases of the hyper-exponential orresponding to the power-law part of the distribution). Sin e the Feldmann-Whitt algorithm \splits" the workload (ea h portion orresponding to one phase of the hyper-exponential distribution), it is possible to alulate approximately the ontribution of spe i task sizes to the queue length. Figure 8(a) illustrates that, at medium-to-high load, the queue o upation due to tasks with power-law distribution is about 20% of the overall queue length (even if the frequen y of these tasks is almost negligible). This per entage is mu h larger at smaller arrival rates. We also note that if the lognormal portion has a small variability, the power tail queue dominates the queue build up. This appears at rst ounterintuitive, but it an be explained by examining the ontribution to the queue build up by the tail of the lognormal distribution. Figure 8(b) shows that the tail of the lognormal distribution is very important for performan e (requests for les larger than 100 KBytes dominate the queue a ross the whole range of arrival rates) and illustrates that for higher workload variabilities, the system queue length due to large yet rare tasks is signi ant. These last observations suggest that it may be appropriate to assign tasks to spe i hosts a ording to their sizes. We onje ture that by reserving hosts for s heduling tasks of similar sizes, we ensure that no severe imbalan es in the utilization of ea h of the hosts o ur. The workload tting provided by the Feldmann-Whitt algorithm provides a hyper-exponential distribution with a spe ial property: ea h exponential phase orresponds to a ertain range of task ( le) sizes. Thus, we use the hyper-exponential distribution to make an edu ated guess about a how to distribute the workload a ross the ba k-end servers, ensuring that the varian e of the servi e time distribution of the tasks served by ea h server is kept as low as possible. Figure 9 illustrates this sizebased poli y. By applying the size-based poli y to our workload, we noti e that a single server suÆ es to serve the powerlaw and the tail of the lognormal portions of the requests. The body of the lognormal portion must instead by served by the remaining seven servers. Figure 10 illustrates the average queue length of the hosts using either the size-based or the random poli y for two xed arrival rates,  = 0:0012 and  = 0:0016, representing modest and high load, respe tively. In ontrast to the random poli y, the average queue length with the size-based poli y does not in rease as a fun tion of the workload variability. This indi ates that the size-based poli y a hieves a good utilization of all ba k-end hosts. Figure 10 shows similar behavior with respe t to the expe ted task slowdown. We on lude this se tion by stressing that, while the size-based balan ing algorithm does not provide an optimal solution to the problem (thus, there are \bumps" in the size-based graphs in Figure 10), it o ers a simple and inexpensive solution that is signi antly better than the random poli y. (Jobs > 100K Queue) / Overall Queue Power Tail Queue / Overall Queue 1 0.8 0.8 highest variability Queue Length Ratio Queue Length Ratio 1 smallest variability 0.6 0.4 0.2 highest variability 0.6 0.4 0.2 smallest variability 0 0 0.0004 0.0008 0.0012 0.0016 0 0.002 Arrival Rate (λ) 0 0.0004 0.0008 0.0012 0.0016 0.002 Arrival Rate (λ) Figure 8: Contribution to the overall queue from the power-law portion of the workload (a), and from all les greater than 100 KBytes (b). 1. Compute the expe ted servi e time Si for ea h phase of the hyper-exponential distribution, weighted by its probability i : Si = ii , 1  i  k 2. Normalize Si to ompute ea h stage's ontribution to the overall expe ted mean servi e time of the distribution: S^i = PkSi S , 1  i  k i=1 i 3. If servers are available, then phase i should be served by servers i = S^i  (the spe i server for the task is hosen randomly among the i servers) 4. Treat heavy-tail di erently from the body of the distribution: a. 8 i < 1; 1  i  k, (i.e., for stages orresponding to the heavy tail) su h that i < 1:5, are to be served the same single server. b. 8 i  1; 1  i  k, (i.e., for stages orresponding to the body), assign b i + 0:5 servers and s hedule jobs within these servers using the random poli y. Attention should be paid so as to ensure that the total server assignment a ross all stages of the hyper-exponential does not ex eed . P Figure 9: Our size-based s heduling poli y. 5. CONCLUSIONS In this paper we presented an analyti methodology for the exa t analysis of load balan ing poli ies in distributed multi-server system onditioned on the fa t that the duration of arriving tasks is best des ribed by a heavy tail distribution. The ontributions are two-fold:   the development of a new analyti methodology for the exa t solution of M/Hk /1 queues, and the appli ation of the analyti methodology for the analysis of the performan e bene ts of a sizebased s heduling poli y. We emphasize that our methodology allows for the exa t quanti ation of the e e t of the task sizes on queue build-up for the various servers, and onsequently allows for the dete tion of the ause of load imbalan es. Our analysis indi ates that a size-based poli y that reserves spe i servers for spe i ranges of in oming task sizes greatly outperforms a poli y that simply assigns an inoming task to a randomly sele ted host. In the future, we intend to explore the e e ts of di erent inter-arrival distributions to the performan e of the two poli ies analyzed in this paper and to onsider the e e ts of a hing at ea h ba k-end server. Further, we λ=0.0016 λ=0.0012 30 80 70 25 Queue Length Queue Length 60 20 Random 15 10 50 Random 40 30 20 5 10 Sized Based (a) 0 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 Variability of the body of distribution (b) (b) 0 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.1 2.2 λ=0.0016 110 45 100 40 90 Mean Slowdown Mean Slowdown λ=0.0012 Random 30 25 20 15 10 5 2 Variability of the body of distribution (b) 50 35 Sized Based 80 70 Random 60 50 40 30 20 Sized Based 0 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 10 2.1 2.2 (c) Variability of the body of distribution (b) Figure 10: Poli y (d) Sized Based 0 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 Variability of the body of distribution (b) omparisons as a fun tion of the workload variability. will explore the appli ability of our model to analyze the performan e of more omplex s heduling poli ies, and in parti ular of poli ies that provide feedba k to the dispat her. To this end, we intend to ombine simulation with analysis in the form of hybrid modeling. [3℄ P. Barford and M.E. Crovella, Generating Representative Web Workloads for Network and Server Performan e Evaluation. In Pro eedings of Performan e '98/ACM SIGMETRICS '98, pp. 151-160, Madison Acknowledgments [4℄ P. Barford, A. Bestavros, A. Bradley and M.E. Crovella, Changes in Web Client A ess Patterns: Chara teristi s and Ca hing Impli ations. In World Wide Web, Spe ial Issue on Chara terization and Performan e Evaluation, Vol. 2, pp. 15-28, 1999. We would like to thank Stephen K. Park for providing us insightful suggestions on tting real data into distributions. 6. REFERENCES [1℄ M. Arlitt and C.L. Williamson. Web Serwer Workload Chara terization, the Sear h for Invariants. In Pro eedings of ACM SIGMETRICS Conferen e, pp. 126-138, Philadelphia, PA, May 1996. [5℄ G. Ciardo and E. Smirni. ETAQA: An EÆ ient Te hnique for the Analysis of QBD-pro esses by Aggregation. Performan e Evaluation 36-37 1999, pp. 71-93. [2℄ M. Arlitt and T. Jin. Workload Chara terization of the 1998 World Cup Web Site. Hewlett-Pa kard Laboratories Te hni al Report, September 1999. [6℄ G. Ciardo, A. Riska and E. Smirni. An Aggregation-based Solution method for M/G/1-type pro esses. In Pro eedings of , pp. 21-40, Zaragoza, Spain, September 1999. M. Colajanni, P.S. Yu and D.M. Dias, Analysis of Task Assignment Poli ies in S alable Distributed Web-Servers Systems. , Vol. (, N). 6, June 1998. M.E. Crovella and M.S. Taqqu. Estimating the Heavy Tail Index from S aling Properties. In , Vol 1, No. 1, pp. 55-79, 1999. A. Feldmann and W. Whitt. Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network Performan e Models. , 31(8), pp. 963{976, Aug. 1998. M. Har hol-Balter, M.E. Crovella and C.D. Murta. On Choosing a Task Assignment Poli y for a Distributed Server System. Vol 1469, pp. 231{242, 1998. M. Har hol-Balter, and A. Downey. Exploiting Pro ess Lifetime Distributions for Dynami Load Balan ing. , 15(3), pp. 253{285, Aug. 1997. L. Kleinro k. . Wiley, 1975. Numeri al Solution of Markov [7℄ hains '99 IEEE Transa tions on Parallel and Distributed Systems [8℄ Methodology and Computing in Applied Probability [9℄ Performan e Evaluation [10℄ In Pro eedings of Performan e Tools '98, Le ture Notes in Computer S ien e [11℄ ACM Transa tions on Computer Systems [12℄ Queueing Systems Volume I: Theory [13℄ G. Latou he and V. Ramaswami. . ASA-SIAM, 1999. [14℄ A.M. Law and W.D. Kelton. . M Graw-Hill In ., 1982. [15℄ R. Nelson. Matrix geometri solutions in Markov models: a mathemati al tutorial. Resear h Report RC 16777 (#742931), IBM T.J. Watson Res. Center, Yorktown Heights, NY, Apr. 1991. [16℄ M.F. Neuts. . Johns Hopkins University Press, Baltimore, MD, 1981. [17℄ M.F. Neuts. . Mar el Dekker, New York, NY, 1989. [18℄ V. Ramaswami and G. Latou he. A general lass of Markov pro esses with expli it matrix-geometri solutions. , pp. 209{218, Aug. 1986. [19℄ V. Pai, M. Aron, G. Banga, M. Svendsen, P. Drus hel, W. Zwaenepoel and E. Nahum. Lo ality-aware Request Distribution in Cluster-based Network Servers. Introdu tion to Matrix Analyti Methods in Sto hasti Modeling Simulation Modeling and Analysis Matrix-geometri sto hasti solutions in models Stru tured sto hasti matri es of M/G/1 type and their appli ations Operation Resear h Spe trum 8 In Pro eedings of the Eighth International Conferen e on Ar hite tural Support for Programming Languages (ASPLOS-VIII), San Jose, California, O tober 1998. and Operating Systems