Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Focused Layered Performance Modelling by Aggregation

Published: 26 November 2022 Publication History

Abstract

Performance models of server systems, based on layered queues, may be very complex. This is particularly true for cloud-based systems based on microservices, which may have hundreds of distinct components, and for models derived by automated data analysis. Often only a few of these many components determine the system performance, and a smaller simplified model is all that is needed. To assist an analyst, this work describes a focused model that includes the important components (the focus) and aggregates the rest in groups, called dependency groups. The method Focus-based Simplification with Preservation of Tasks described here fills an important gap in a previous method by the same authors. The use of focused models for sensitivity predictions is evaluated empirically in the article on a large set of randomly generated models. It is found that the accuracy depends on a “saturation ratio” (SR) between the highest utilization value in the model and the highest value of a component excluded from the focus; evidence suggests that SR must be at least 2 and must be larger to evaluate larger model changes. This dependency was captured in an “Accurate Sensitivity Hypothesis” based on SR, which can be used to indicate trustable sensitivity results.

1 Introduction

Layered queueing network (LQN) models are well matched to analyzing the performance of large distributed server systems such as web services systems based on microservices in the cloud. Models of hundreds of services can be solved in seconds or minutes. However, automated model construction provides models with excessive detail, and automated analysis techniques must explore many (thousands) of variations on a system. A simplified model, focused on the important system resources, may be essential.
Expert modellers decide what to include in a performance model and what to ignore or approximate. However, automated construction from execution data (e.g. [1, 2, 18, 41]) or design data (e.g. [1, 28, 39]) includes every component. Very large models are not useful for automated analysis such as optimization of the deployment [5, 24] or of the design (e.g., Reference [28], which required over 1,000 model evaluations). Since one of the goals of the automated techniques is to support analysts with less expertise, focused modeling is an enabling technology for automation.
Performance models are always simplifications of reality, to some degree, and many techniques have been devised to further simplify a given model, depending on its form. For queueing models, the Flow-Equivalent Service Centre (FESC) [23] is used to represent a subsystem by a single server. However simultaneous resource possession in LQNs [19] limit its usefulness for layered queueing, so this work looks elsewhere for a simplification approach. Other formalisms such as Petri Nets and Markov Chains are not considered here, because they do not scale up well enough for modeling large server systems. However, they do have a large literature on simplification, including state lumping and aggregation [3, 26, 32], sometimes guided by symmetries in the corresponding structural model (as in well-formed nets [14]).
A motivating example model is shown in Figure 1(a), with an original model (OM) with 86 components created automatically from a system design in the Palladio software tool [28]. The analysis reported in Reference [28] required over 1,300 model evaluations and took many hours. The focused model (FM) in Figure 1(b) has only eight components and gives almost identical predictions for performance under varying loads. To compare the run times of the model, 1000 evaluations of these two models for varying parameters on a desktop computer required 6,311 seconds (1 hour, 45 minutes, and 11 seconds) for the OM and 6.42 seconds for the FM, a factor of nearly 1,000.
Fig. 1.
Fig. 1. Complex LQN generated by the palladio component model tool [28] and a focused equivalent.
The goal of this work is to analyze the sensitivity of LQN models to changes in design, configuration, and deployment using aggregated models. Aggregation focusses on the key model elements, while preserving enough of the model to give sufficient accuracy for a desired range of changes to the base model. Earlier incomplete work by the same authors [15, 16] introduced special dependency groups for aggregation. This work completes that approach and determines how to ensure the accuracy of the sensitivities. It describes an algorithm FSPT in which
Layered queueing models are simplified to support automated sensitivity analysis.
The analyst can choose which components to preserve in the model focus and a range of system throughputs (system scales) for which sensitivities will be computed. Preservation of components provides traceability of their parameter impacts.
Additional components are preserved to cover the desired range of system scales.
The non-preserved components are aggregated in groups determined by system dependencies.
The contributions of this article are, first, to fill a gap in the earlier method including an updated evaluation of accuracy. Second, an effectiveness index is defined to characterize models that resist simplification. Third, an “Accurate Sensitivity Hypothesis” (ASH) is defined that guides the correct use of simplified models. Fourth, the nature of sensitivity results is explored on randomly generated models. For simplifications satisfying the ASH, over 90% of many thousands of sensitivity calculations were found to be accurate within 20%. In Section 4, it is found empirically that the accuracy of a predicted change in performance depends on the resource saturation of the aggregated and non-aggregated components, and on the scale-up (the change in the system throughput). The saturation Si for each component and a saturation ratio (SR) for the focused model are given by the following:
\begin{align} {S}_i &= {\rm{saturation}}\,{\rm{of}}\,{\rm{component}}\,i\nonumber \\[-3.5pt] &= {\rm{the}}\,{\rm{fraction}}\,{\rm{of}}\,{\rm{the}}\,{\rm{resource}}\,{\rm{capability}}\,{\rm{of}}\,{\rm{component}}\,i\ {\rm{ that\ is}}\,{\rm{being}}\,{\rm{used}}\nonumber \\[-3.5pt] &= \left( {{\rm{its}}\,{\rm{utilization}}} \right)\!/\!( {{\rm{its}}\,{\rm{resource}}\,{\rm{multiplicity}}}),\,\,{\rm{for}}\,{\rm{resource}}\,{\rm{pools}}\,{\rm{with}}\,{\rm{multiple}}\,{\rm{units}} \end{align}
(1)
\begin{equation} SR = \left( {{\rm{largest}}\,{{\rm{S}}}_i\,{\rm{for}}\,{\rm{any}}\,{\rm{component}}\,i\,outsid{\rm{e}}\,{\rm{the}}\,{\rm{focus}}} \right)\!/\!\left( {{\rm{largest}}\,{{\rm{S}}}_i} \right)\!{\rm{.}}\qquad\qquad\qquad\qquad \end{equation}
(2)
Resource multiplicity is defined in Section 2.1. A larger SR gives a broader focus (by preserving more components) and a more accurate approximation. If SR is not large enough, then errors related to queueing delays at out-of-focus servers may become significant when a model change increases the load. Changes that decrease load tend to have decreased errors, which is intuitively reasonable.
The following empirical and heuristic condition for accurate sensitivity predictions is introduced in this article, as described in Section 4.7.
Accurate Sensitivity Hypothesis
Performance predictions by a FM with Saturation Ratio SR will “almost always” have “acceptable accuracy” if the throughput for the prediction does not increase or decrease by a multiplier larger than SR – 2. From the results analyzed in Section 4.7, “acceptable accuracy” was taken to be errors of less than 20%, for which “almost always” means “in more than 90% of cases.”
The ASH defines a “trusted range” for results, in which the throughput multiplier is in the range \((1/ (SR-2),\,SR-2)\) and implies that SR > 2 to reliably give acceptably accurate sensitivities.
Section 4 describes a wide-ranging evaluation that supports this hypothesis and shows that it is conservative in the sense that many predictions that do not satisfy it, still have less than 20% error.

2 Layered Queuing Network Models and SPT

2.1 LQN Concepts and Notation

LQNs are extended queueing models for contention effects in layered systems, in which a software server depends not just on its host processor but on other software servers as well. A summary of LQN concepts is found in Reference [9], based on earlier work in References [10, 31, 37]. A tutorial guide is in Reference [40]. LQN concepts are illustrated by the example in Figure 2. Software components called tasks run on hardware components called hosts. Task T has mT servers (representing concurrent threads), host H has mH servers (representing multiple CPUs), and we assume that the queue disciplines are FIFO at tasks and processor sharing at hosts (representing time-slicing schedulers). The operations offered by a task are called entries. Entry E requires \(d_{E}\,/ s_{H}\) seconds of service by its host (where dE is the mean CPU demand in suitable units and sH is the host service rate or speed factor) and may make \(y_{E,E2}\) mean calls to another entry E2. Details of an entry may be defined by a precedence graph containing activities. In practice, demands are measured on some “reference” host type and the speed factor of a given host is its execution speed relative to the reference type. The capacity of a host H is defined as \(c_{H}\,= \,m_{H}s_{H}\) and is the equivalent number of reference type CPUs.
Fig. 2.
Fig. 2. LQN model of a three-tier application.
The system's users are represented by a special task User, for which \(m_{User}\) gives the number of users. It has one entry whose execution represents a user response, including a think time \(Z_{User}\) (1,000 ms in Figure 2) and calls to system services. The throughput of the User task is the system throughput.
In this work, the model is assumed to have a single set of Users, which may, however, combine different behaviors. The tasks are assumed to be heterogeneous; replicated servers and symmetrically replicated subsystems can be simplified efficiently by a special technique summarized in Reference [9].
FSPT and SPT use the following parameters and measures derived from the LQN model:
\(N_{User}\,= \,\) number of users
\({\lambda }_{User}\) and \(R_{User}\,=\) user throughput and response time, with:
\(\circ\) \({\lambda }_{OM,User}\,{R}_{OM,User} =\) values for the original model OM
\(\circ\) \({\lambda }_{FM,User},{R}_{FM,User} =\) values for the focused model FM
\({\bf E}(T)\) and \({\bf T}(H)\,=\) the set of entries (operations) of task T, and the set of tasks on host H.
\(Y_{E}\) and \({Y}_{E1,E2} =\) the mean calls to entry E, and the mean calls from entry E1 to E2 per User response;
\(D_{E}\,=\) the total CPU demand of entry E per User response;
\(S_{T}\) and \(S_{H}\,=\) the mean saturation of the servers at task T and host H as defined by Equation (1). The saturation of a resource is determined by the OM solution and depends on delays at the lower-layer resources.
\({S}_{H,T} = {\rm{\ }}{\lambda }_{User}\mathop \sum \nolimits_{E \in {{\bf E}}\left( T \right)} {D}_E/{m}_H\ =\) the saturation of host H by task T deployed on it, (3)
\({S}_H = \mathop \sum \nolimits_{T \in {{\bf T}}\left( H \right)} {S}_{H,T},\) (4)
\({{\rm{S}}}_{{\rm{max}}} = {\rm{the}}\,{\rm{maximum}}\,{\rm{saturation}}\,{\rm{of}}\,{\rm{any}}\,{\rm{host}}\,{\rm{or}}\,{\rm{task,}}\,{\rm{in}}\,{\rm{the}}\,{\rm{OM}}\,{\rm{solution,}}\)
\(S{R}_T = {S}_T/{S}_{max} = \,\,{\rm{the}}\,{\rm{saturation}}\,{\rm{ratio}}\,{\rm{of}}\,{\rm{task}}\,T(S{R}_H = {S}_H/{S}_{max}\,{\rm{for}}\,{\rm{host}}\,H)\) .
The approximation errors of the FM are given by
\(Er{r}_\lambda = ({\lambda }_{FM,User} - {\lambda }_{OM,User})/{\lambda }_{OM,User} =\) relative throughput error of FM (5)
\(Er{r}_R = \ ({R}_{FM,User} - {R}_{OM,User})/{R}_{OM,User} = {\rm{ relative}}\,{\rm{response}}\,{\rm{time}}\,{\rm{error}}\,{\rm{of}}\,{\rm{FM}}\)
Given a user think time Z, the throughput and response time are related by Littles’ result as \({R}_{User} = {N}_{User}/{\lambda }_{User} - Z\) . Because the evaluations were done with zero user think time \(Z = 0\) , it follows that the relative errors of the throughput and response time are the same, so \(Er{r}_R = {\rm{ }}Er{r}_\lambda\) . This value is denoted by \(Err\) in the rest of the article. The degree of aggregation is measured by
FT = fraction of tasks in OM that are preserved in FM
F = model size fraction = (tasks + hosts in FM)/(tasks + hosts in OM).

2.2 Elements of SPT from Reference [16]

SPT applies to models with one load-generating User task, which may represent a combination of response types. The model is restricted to having synchronous calls with no cycles in the graph of the calls between tasks.
Construct Dependency Groups. The following notation is used for the dependency groups:
\({\bf PT}=\) the set of preserved tasks, including at least the User task, the most heavily saturated resource, and the hosts of preserved tasks.
\({\bf PH}\,=\) the set of preserved hosts.
\({\bf PR}\,=\,{\bf PT}\cup{\bf PH}\,=\) the focus set of resources (tasks and hosts)
\({\textbf {G}}\) and \({\textbf {g}}\,=\) sets of resources (tasks and hosts) to be aggregated
\({{\boldsymbol {\Pi} }}\) = the set of subsets of \({\bf PT}\) (its powerset minus the empty set and \({\bf PT}\) itself), with elements \({{{\boldsymbol \Pi }}}_{{i}}\) .
\({{{\boldsymbol \Pi }}}_{{i}}\) = a dependency subset of PT, one of the elements of \({{\boldsymbol \Pi }}\) .
SPT aggregates the tasks outside PT in dependency groups. Task \(T_{1}\) is said to depend on the operations of another task \(T_{2}\) if it calls \(T_{2}\) directly or indirectly via another task. For each dependency subset \({{{\boldsymbol \Pi}}}_{{i}} \in {{\boldsymbol \Pi }},\) there is a distinct set of non-preserved tasks (possibly empty) that it depends on, denoted as its dependency group Gi. In SPT, processors cannot be shared between groups, and the hosts of any group G are aggregated to a single host.
Dependency groups are illustrated in Figure 3. The tasks in \({\bf PT}\,= \,\{User,\,PT_{1}\,\text{and}\,PT_{2}\}\) have bold borders. There are four subsets \({{{\boldsymbol \Pi }}}\) i and four dependency groups indicated by shadings. Each group gives one aggregate task with the same shading and one host, as shown in Figure 3(b).
Fig. 3.
Fig. 3. Illustration of dependency groups for an example (from Reference [16]).
FSPT removes the restrictions that a processor cannot be shared between groups, as described in Section 3.
Aggregation Operations. Suppose a group g of tasks with the set \({\textbf {e(g)}}\) of entries and the set \({\textbf {h(g)}}\) of hosts is aggregated into a task A with a single entry \(E(A)\) . Its demand \(D_{E(A)}\) and its mean calls per user response are sums over the values for tasks in g,
\begin{equation} {Y}_{E(A)} = \sum\limits_{E \in {\bf e}\textbf{(} {{\textbf g}}\textbf{)}}YE, \end{equation}
(6)
\begin{equation} {D}_{E(A)} = \sum\limits_{E \in {\textbf e}\textbf{(}\textbf{g}\textbf{)}}DE. \end{equation}
(7)
The resource multiplicities are summed,
\begin{equation} {m}_A = \sum\limits_{t \in {\textbf g}} {m}_t,{m}_{H(A)} = \sum\limits_{h \in {\textbf {h}}\textbf{(}\textbf{g}\textbf{)}} m_{h}. \end{equation}
(8)
The speed factor for \(H(A)\) is computed to give the same total host capacity,
\begin{equation} {s}_{H(A)} = \left[\sum\limits_{h \in {\textbf h}\textbf{(}{\textbf g}\textbf{)}}s_{h} m_{h}\right]/m_{H(A)}. \end{equation}
(9)
Interactions of tasks in g with any entry outside g are aggregated to one interaction as follows:
\begin{equation} {Y}_{E(A),E1} = \sum\limits_{E \in {\textbf e}\textbf{(} {{\textbf g}}\textbf{)}}{Y}_{E,E1}/{Y}_{E(A)} \end{equation}
(10)
\begin{equation} {Y}_{E2,E(A)} = \sum\limits_{E \in {\textbf e}\textbf{(}{{\textbf g}}\textbf{)}}Y_{E2,E}. \end{equation}
(11)
A more detailed description of these operations is found in Reference [17].
Aggregation Properties. SPT has the following desirable properties:
(1)
By Equation (7), the total CPU demand of an aggregate task per user response equals the demand of the group of tasks that enter it.
(2)
By Equation (9), the capacity mH sH of an aggregate host equals the sum of the capacities of hosts that are aggregated.
(3)
By Equation (8), the concurrency available to execute each operation is at least as great in the aggregated models, as in the original.
(4)
The saturation point is preserved.
(5)
The dependencies of a preserved task are preserved, noting that some tasks it depends on may be aggregated.
(6)
If the OM call graph is acyclic (which is required), then the FM call graph is also acyclic.
Strategies to Determine the Focus. The focus set PR may be selected by the analyst, but an automated method is also useful and can be combined with analyst choices. Two strategies for determining what to automatically keep in PR were utilized, both based on the resource saturation S:
(1)
The Accuracy strategy ACCx with target accuracy x% used a sequence of sets PRi containing the User task and the i additional tasks or hosts with the largest values of S, plus tasks with hosts in PRi. The sequence ended when \(Err\,<\,x\%\) . This strategy was used in Reference [16] and in Reference [17].
(2)
The Saturation strategy SATr with a saturation ratio at least r uses a set PR containing the User task, the hosts and tasks with \(S\,>\,{S_{max}}/r\) , and the tasks with hosts in PR. This strategy is newly introduced in this article to support accurate sensitivity analysis.
The focus also must include LQN components that involve priority queueing, execution defined by activities and second-phases, asynchronous and forwarding messages, and symmetric subsystem replication, since FSPT does not aggregate components with these properties (see Reference [9] for details).

3 The FSPT Extension to SPT and an Extended Evaluation

SPT was extended to remove the limitation on shared hosts within a dependency group and re-evaluated with both random and industrial case studies. A new measure \(Eff(M)\) was defined for the effectiveness of FSPT, and examples of models giving low effectiveness are presented.

3.1 Disentangling Tasks

Two non-focus tasks in different dependency groups that share a host are “entangled” by their common host dependence and cannot be aggregated. SPT in Reference [16] cannot deal with entanglement. Entanglement is illustrated in Figure 4(a).
Fig. 4.
Fig. 4. Introduction of shadow hosts when a host is shared between groups.
Tasks are disentangled by giving them separate shadow hosts of reduced capacity, similarly to the shadow servers defined for classes with priorities in References [23, 34]. In those works, each priority class is provided with its own shadow server to simplify the delay calculation.
The transformation is constructed to preserve the response time to a request for the host processing provided to each task. Consider a single entangled task \(T_{i}\) on host \(H\) , and a shadow host \(H_{i}\) constructed for it with speed factor \({s}_i\) (to be determined). \(H\) is a queueing server, and the task threads are its customers, with a class for each task. \(T_{i}\) provides class \(i,\) with service time \({x}_i\) and response time \({R}_i\) . Using the analytic queueing results [20] for a processor-sharing host,
\begin{equation} {R}_i = {x}_i/\left({1 - {S}_H} \right)\!. \end{equation}
(12)
On \(H\) , \(T_{i}\) has a partial saturation \({S}_i\) (which is the fraction of the capacity of \(H\) which is used by Ti) equal to its server utilization divided by \({m}_H\) , and \({S}_H = \mathop \sum_j {S}_j\) . When it is moved to \({H}_i\) , its service time is scaled by the speed ratio to \(x{'}_i\) , and it has the same arrival rate, so \(x{'}_i\) and its saturation \(S{'}_i\) are as follows:
\begin{equation} x{'}_i = {x}_i({s}_H/{s}_i),\qquad S{'}_i = {S}_i({s}_H/{s}_i). \end{equation}
(13)
Given that \({H}_i\) has just the one class provided by \(T_{i}\) , its response time \(R_i^{\prime}\) is given by
\begin{equation*} R_i^{\prime} = {x'}_i/(1 - S{'}_i). \end{equation*}
Setting \({R}_i = R_i^{\prime}\) and re-arranging gives the shadow server speed factor and saturation ratio as
\begin{equation} {s}_i = {s}_H\left(1 - \sum\limits_{j \ne i} {{S}_j}\right),\quad S{'}_i = 1 - \sum \limits_{j \ne i} {S}_j \end{equation}
(14)
By Equation (14), the ith shadow host capacity is the part of the original capacity not used by the other tasks. Although it is derived differently, it is the same as a shadow server created for priorities in References [23, 34].
The classes represent the tasks executing on the host. If multiple tasks have a common shadow host as in Figure 4(b), then they form a subset of classes and the same construction is applied, with Si replaced by the sum of saturations over the subset. Formally, the tasks on host \(H\) are divided among a set of shadow hosts, with one shadow host \(h(H,\,{\textbf g})\) for each group of tasks \({\textbf g}\) that has members on \(H\) . Call those members \({\textbf{g}}'(H,{{\textbf g}})\) , and the set of their entries, \({\bf E}(H,\,{\textbf g})\) . One aggregated task \(A(H,\,{\textbf g})\) represents the tasks in \({\textbf{g}}'(H,{{\textbf g}})\) , with one entry \(e(H,\,{\textbf g})\) . The OM saturation of host H due to task T is \(S_{H,T}\) , so using Equation (14) the speed factor of the shadow host \(h(H,\,{\textbf g})\) is
\begin{equation} {s}_{h(H,{\textbf g})} = {s}_H\left(1 - \sum \limits_{T \in {{\textbf T}}(H)\backslash {\rm{\textbf g}}} {S}_{H,T}\right) \end{equation}
(15)
where \({\bf T}(H)\) is the set of tasks with host \(H\) . The demand \(D_{e(H,g)}\) of the aggregated entry \(e(H,{\textbf g})\) is the sum of the demands of entries in \({\bf E}(H,{\textbf g})\) weighted by their speed factors (from Equation (13)),
\begin{equation} {D}_{e(H,{\textbf {g}})} = {\sum \limits_{E \in{\textbf E}(H,{\textbf g})}}D_E({s}_H/s_{h(H,{\textbf g})}) \end{equation}
(16)
The calls to and from \(A(H,\,{\textbf g})\) are found as in Equations (10) and (11), with the subset \(\rm{{\textbf {g}}{'}}(H,{{\textbf g}})\) in place of g.
Although the shadow server approximation is well known, it is based on open workloads, while LQN models have closed workloads. Therefore, its accuracy for disentanglement was extensively tested for closed workloads before using it. The details are in Reference [17], but, in summary, first, the calculated response times were compared to simulation results for two classes each with a closed workload of 100 customers. The queueing was simulated with one and three servers ( \(m\,= \,1\) and 3) for 1,500 cases with random parameters. The response time errors for \(m\,= \,1\) were all less than 1%, and for m = 3, less than 0.3%. Second, the errors due to disentangling an LQN model were found for 25 cases that occurred in simplifying 10 randomly generated models. Thirteen instances had 0–2% error, and the rest were all under 8%. This accuracy was judged less than ideal but adequate.

3.2 Evaluation of FSPT with the Accuracy Strategy

This section describes the base-case approximation errors resulting from completed FSPT with the ACCx strategy. The experiments reported here and in Reference [17] differ from the published evaluation in Reference [16] in that they include cases with entanglement. The ACCx strategy adds preserved tasks one at a time until \(\text{Err}\,<\,x\) , as illustrated in Figure 5 for an example of original size (tasks + hosts) of 31 and a 2% error target.
Fig. 5.
Fig. 5. Trajectory of accuracy and size reduction as preserved tasks are added.
As more tasks are preserved, the accuracy improves but the size increases. For a given purpose the size target x can be chosen by the analyst. The nature of the accuracy/reduction tradeoff was investigated by finding the throughput error and degree of model size reduction obtained for 150 randomly generated models at two levels of traffic intensity. In addition,
FSPT was applied to a very large model, to demonstrate scalability.
It was applied to case studies based on industrial systems, to show typical results.

3.3 Accuracy of FSPT to Simplify Randomly Generated Models

To evaluate the accuracy of FSPT, 150 models were generated with random structure and parameters by using the utility program lqngen [12]. The models had three to five layers and a wide variety of calling patterns to represent calls between web services. Five model sizes with 11, 16, 21, 26, and 31 tasks (including the User task) were chosen, and 30 models were generated for each size. Another 150 models were created by doubling the number of users in each model. The ACC2 strategy was used, which increased the number of preserved tasks until \(Err\,<\,2\%\) . Figure 6 displays the tradeoff between accuracy and simplification by a scatter plot of Err against the fraction FT of preserved tasks, for all the FMs created in the sequence of reductions. The accuracy values have great variability. Three properties stand out in Figure 6. On the left, reductions with \(F_{T}\,<\,10\%\) give a wide range of errors. For FT > 10% and up to 50% the error is reliably less than 18%, and for \(F_{T}\,>\,50\%\) all errors were less than 10%.
Fig. 6.
Fig. 6. Error vs. fraction FT of preserved tasks for 300 cases.
Simplification also tended to be more successful for larger models, which is exactly where it can be most useful. The average \(F_{T}\) for 10% accuracy was about 42% for models with 11 tasks and increased to about 62% for models with 31 tasks. For 2% accuracy; however, all sizes of models had average \(F_{T}\) of about 40%. More details are given in Reference [17].
Figure 7 examines the tradeoff between accuracy and focused model size by a scatter plot of Err against the fraction F of size (tasks plus hosts) for all 695 FMs created in the sequence of reductions. The accuracy values have great variability for small F but FMs with F > 60% all have \(Err\,<\,20\%\) . Only 49 FMs of all sizes showed Err > 20%, and only 114 showed Err > 10%. We can conclude that a reduction to 60% in size gives a maximum error of 20% (almost always), and the great majority of all reductions down to 20% in size have error less than 10%. Since these are empirical results any given model may have, however, larger errors than this. For sensitivity analysis, further conditions must be considered as shown in Section 4.
Fig. 7.
Fig. 7. Error vs. model size fraction F for 300 focused models.

3.4 Complexity and Scalability of FSPT

For the time complexity of FSPT, if there are N model elements (tasks plus processors), and if several properties are bounded by constants (entries, calls and activities per task, and the number of dependency groups divided by N), then for a given set of dependency groups the aggregation operations are of order of \(N^{2}\) . If there are K preserved tasks, then to obtain the dependency groups by brute force is of order \(2^{K}\) (to obtain the powerset of PT). However, most of these have no dependency group; by working backwards from the non-preserved tasks (non-PTs) the non-empty groups can be constructed by operations of order \(K^{2}\) . The steps are as follows: construct a matrix of dependencies of PTs on non-PTs \((K^{2})\) , for each PT collect the PTs dependent on it \((K^{2})\) , and identify identical collections as the dependency sets \((K^{3})\) .
FSPT also requires a LQN solution as a starting point. If LQNS is used, then, according to Reference [10], it has a complexity of order N 2 per iteration, and experience with the LQNS solver suggests that the number of iterations rises only a little, if at all, with N.
Overall, if K is of the same order as N (as is assumed above), then the time complexity of FSPT is of order N  3, dominated by constructing of dependency groups.
To demonstrate the application of SPT to larger models, a case with 50 tasks and 50 hosts was simplified. The process took 30 seconds, and (as described in Reference [17]) the aggregated model achieved 4.85% error with nine tasks and nine hosts, a highly successful result.

3.5 Case Studies

Six case studies of industrial software systems are described in Reference [17], of which three are summarized here.
(a) A Business Reporting System. The OM in Figure 1(a) was generated automatically from a design specified in the Palladio tool [28], based on an industrial system, with a deployment (a host for each task) that is artificial. The FM in Figure 1(b) preserves just two tasks of 43 and aggregates everything else. Its throughput and response time predictions across varying loads up to saturation are all within 1%. This case shows how when there is a well-defined bottleneck a small FM can give good accuracy.
(b) A Class 5 Telephony Switch. This OM of a voice telephony switch [11] shown in Figure 8(a) has 22 tasks and 19 hosts. The focused model TS-FM1 with just three preserved tasks gave 5.2% error. Preserving more tasks only slowly reduced the error, as described in Reference [17].
Fig. 8.
Fig. 8. Models of the telephone switch (without the hosts).
The focused model TS-FM2 was created with a stated saturation ratio SR = 3.33 to satisfy the Accurate Sensitivity Hypothesis and gives a “trusted range” from 0.75 times to 1.33 times the base-case throughput. Sensitivity calculations for model TS-FM2 are described in Section 4.2.
(c) An E-Commerce System. A model of a generic E-Commerce system from Reference [24] with eight tasks and six hosts was reduced to three tasks and three hosts with less than 1% error [17].

3.6 An Effectiveness Index for FSPT

To visualize the performance of FSPT, the error Err (as a fraction) is plotted against the FM size fraction F for a sequence of focused models that preserve different numbers of elements, as in Figure 5. Small error for a small size factor indicates effective simplification. Figure 9 shows a typical case, and the best and worst cases, from the 150 random models. A point \((f,\,e)\) is called achievable if a better focused model exists with both \(Err\,<\,e\) and F < f. The fraction of the space in \((f,\,e)\) that is achievable will be defined as the effectiveness index \(Eff(\text{OM})\) for simplification of model OM:
\begin{align*} Eff(OM)\, =\,\, & {\rm{fraction}}\,{\rm{of}}\,{\rm{possible}}\,{\rm{values}}\,{\rm{of}}\,{\rm{(}}F,Err{\rm{)\, that}}\,{\rm{can}}\,{\rm{be}}\,{\rm{achieved}}\,{\rm{by}}\,{\rm{some}}\,{\rm{FM}}\\ =\,\, & {\rm{ fraction}}\,{\rm{of}}\,{\rm{space}} (f,e), {\rm{with}}\, 0 < f < 1\,\ {\rm{and}}\ \ 0\, < e < 1, {\rm{in}}\,{\rm{which\, some\, focused\, model }}\\ \,\, & {\rm{FM }}\,{\rm{can}}\,{\rm{be}}\,{\rm{created}}\,{\rm{with}}\,{\rm{both}}\, Err < e\,{\rm{and}}\, F < f. \end{align*}
Fig. 9.
Fig. 9. Effectiveness Index Eff(M) for simplification of model M.
This fraction equals the shaded area in each of Figure 9(a), (b), and (c) below.
Figure 10 shows the models for case-92 and case-11, with some saturation values labeled to explain the effectiveness values. Both models are deeply layered and have requests that cross the layers. However, case 92 has 100% saturation at t0, which depends on just one other service at t10. Other tasks have saturations below 0.4, which cannot increase because of the bottleneck. As long as tasks t0 and t10 are preserved, aggregation of the rest of the model has only a small effect on performance. However, case-11 has several medium-to-high utilization tasks along the calling path t1-t2-t4-t7, with task saturation values 0.92, 0.79, 0.24, 0.32 (under different loads these values will vary but they retain their relative magnitude). Aggregation of multiple tasks with moderate loads introduces a substantial increase in throughput due to summing their concurrency levels (this is a well-known effect of aggregating separate concurrent servers to make one multiserver).
Fig. 10.
Fig. 10. The structure of the best and worst models for reduction effectiveness.
A scatter plot of the effectiveness index against the original model size is shown in Figure 11 for the 150 random models. There is a general trend to greater effectiveness for larger models but also some outliers that are quite resistant to reduction. Case-11 above lies at the left-hand edge, and case-92 at the right-hand edge of the range of sizes shown, so model size is a partial explanation of the resistance of case-11. The industrial case study models give similar results. The Telephone system model has an excellent effectiveness index at about 0.9. However, a Linux file system model from Reference [10], shown in Figure 12, is quite resistant. It gives two FMs, with (f, e) = (8/11, 0.588) and (9/11, .0433) giving Eff = 0.179. As with case-92, it has a calling path with several nearly saturated tasks, as shown in Figure 12(a).
Fig. 11.
Fig. 11. The reduction effectiveness measure Eff for models of different sizes.
Fig. 12.
Fig. 12. A linux file system model.

4 Sensitivity Analysis Using A Focused Model

Models are used to predict the impact of changes through sensitivity analysis. Suppose the set of parameters to be changed are collected in a parameter vector a and the performance measure of interest is the throughput λ(a). We define the following:
\(\bar{a} =\) base value of parameter vector a,
\({\lambda }_{OM}{\rm{(}}a{\rm{)\, and}}\,{\lambda }_{FM}{\rm{(}}a{\rm{)}} =\) the throughput of models OM and FM for parameter values given by a,
\(M(\bar{a},a)\, = {\lambda }_{OM}{\rm{(}}a{\rm{)}}/{\lambda }_{OM}{(\bar{a})} =\) the ratio of throughputs in OM due to a change in a,
\(\Delta{\lambda}_{OM}(\bar{a},a) = {\lambda }_{OM}(a) - {\lambda}_{OM}(\bar{a}) =\) the change in throughput in OM due to a change in a,
\(Er{r}_{Sens}(\bar{a},a) = [\Delta {\lambda }_{FM}(\bar{a},a)-\Delta {\lambda }_{OM}(\bar{a},a)]/{\lambda}_{OM}(\bar{a}) =\) the relative error of the change in throughput estimated by FM.
The relative error ErrSens \((\bar{a},a)\) will be examined as a function of the parameter change \((a - \bar{a})\) . The target value for “adequate” accuracy of ErrSens was arbitrarily chosen at 20%. Based on the ASH stated in Section 1 and discussed below, the “trusted range” of results for a FM with saturation ratio SR satisfies
\(1/(SR - 2) < M(\bar{a},a)\, < SR - 2\) (for the “trusted range”)
This implies that SR must be more than 2 to provide trusted sensitivity results, and trust depends on the throughput change.

4.1 Sensitivity Calculation: An Example

A first example considers the OM and FM shown in Figure 13, and some parameters of the preserved resources \({\bf PR}\,= \,\{Users,\,t\textit{01},\,t\textit{07},\,p\textit{00}\}\) .
Fig. 13.
Fig. 13. LQN model for illustrating sensitivity analysis.
Figure 14 shows the throughput changes \(\Delta \lambda _{{\rm{OM}}}\) and \(\Delta \lambda _{{\rm{FM}}}\) when three different parameters are varied one at a time: the demand of t01 in part (a), the demand of t07 in part (b), and the multiplicity of p00 in part(c). The throughputs for OM and FM are almost identical in all three cases. These results are encouraging but not all models give such good results.
Fig. 14.
Fig. 14. Sensitivity results using the original and focused models shown in Figure 13.

4.2 Case Study: Sensitivity of the Telephone Switch Model

The OM of a Class V telephone switch was described in Section 3.5 and Figure 8 above. An FM with saturation ratio SR = 3.33 was created, shown as TS-FM2 in Figure 8(c), with six tasks and five hosts. The trusted range of throughput multiples is the interval (0.75, 1.33).
A set of 1,000 sensitivity cases were calculated in which all the demand parameters of every preserved task were changed by an independent random factor that was uniformly distributed over (0.5, 1.5), giving a range of ±50% from the base case values. Figure 15 shows histograms of the errors for the cases with results within the trusted range for 300 users. For 300 users, the system is not saturated, so the throughput changes are relatively small and the errors are all less than 2.5%. For 400 users, loads are heavier, some cases have throughput changes outside the trusted range, sensitivities are larger, and 117 sensitivities have errors above 20%.
Fig. 15.
Fig. 15. Model TS-FM2: Relative error Errsens for 1,000 cases with random changes to all execution-demands.
Fig. 16.
Fig. 16. Model TS-FM2: Scatter plot of Errsens for 1,000 execution-demand changes.
Greater insight into the importance of the trusted range is given by the plot in Figure 15 of \(Err_{{\rm{sens}}}\) ( \(\bar{a}\) , a) against the throughput multiple \(M(\bar{a},\,a)\) . For 400 users, many throughput changes fall outside the trusted range, and the relevance of the Accurate Sensitivity Hypothesis is displayed in the larger errors for many of these points. Within the trusted range, 93.4% of the results have “adequate” accuracy \((Err_{sens}\,<\,0.2)\,\) and outside it about half.

4.3 Large-scale Sensitivity Experiments

To move away from single cases and seek insight into the general validity of the sensitivity analysis, the set of 150 OMs used in Section 3.3 for assessing the base-case FMs were re-analyzed for sensitivity. Three sets of FMs were created using three different strategies:
The Accuracy strategy ACC2, which was also used in Section 3. For each OM the size of PT was increased one task at a time until the FM gave Err < 2%. The saturation ratios SR are variable, ranging from 1 to 518.
The Saturation strategy SAT3.3 ( \(SR\,= \,3.3\) , giving a moderately broad focus).
The Saturation strategy SAT10 ( \(SR\,= \,10\) , giving a very broad focus).

4.4 Sensitivity to CPU Demand Parameters

As in Section 4.2, the demand parameters of all the preserved tasks were multiplied by a random factor uniformly distributed over (0.5, 1.5). Nine random FMs were created for each OM, giving a total of 1,350 cases for each strategy. Figure 17 shows the error histogram for each strategy; there are fewer than 1,350 points because of a few non-convergent model solutions.
Fig. 17.
Fig. 17. Relative Sensitivity errors for the three strategies (note that the bin label gives the upper edge of the bin).
The ACC2 strategy gave the worst results (Figure 17(a)) but still over 1,300 of the 1,350 points have error less than 20%. The large errors were in cases with low saturation ratios SR in the range (1.07, 1.36) and high-impact parameter changes. For SAT3.3 and SAT10 all the results have errors less than 20%. The advantage of using the SAT strategies for sensitivity analysis seems clear.
Figure 18 shows scatter plots against the throughput multiple M. For the ACC2 FMs it is clear that the largest errors are associated with M > 1. Since SR is different for every FM, a trusted range cannot be mapped. For the SAT cases, the trusted range in M can be plotted and is (0.75, 1.33) in part (b) and (0.125, 8) in part (c). SAT10 is only slightly better than SAT3.3, indicating that the extra large focus was not necessary for these parameter changes.
Fig. 18.
Fig. 18. Sensitivity errors for 150 randomly generated OMs, each with nine random sets of demand parameter changes (±50%), giving 1,350 cases for each strategy.

4.5 Sensitivity to System Scaling Changes

System scaling increases the capacity of resources to handle larger loads on the system. These experiments applied a common scale factor “scale” (ranging from 2 to 10) to the capacity of every preserved resource (tasks and hosts) and to the number of users and evaluated the throughput. Successful scaling should give an increase in throughput by a factor of f also. Figure 19(a) shows the throughputs against the scale factor, and in Figure 19(b) the points along the line show Errsens for the successive scale factors (scale = 1, 2, \(\ldots\) ) against M, the OM throughput multiple. The FM throughput increases more than OM, because a resource that saturates in OM has been aggregated with more lightly used resources, which share the load. This model provided limited scaling, but in some cases both model throughputs scaled all the way up to 10 times.
Fig. 19.
Fig. 19. The scaled throughputs and errors for one example of a scaled model.

4.6 Large-scale Experiments on Resource Scaling

As described above, these experiments scaled the resources of the 150 random models by factors scale from 2 to 10. Because the changes in throughput are large, the normalization of error was changed. For large deviations it is better to normalize to ΔλOM instead of λOM, giving the measure \(Err_{largeSens}\) :
\begin{equation*} Er{r}_{largeSens}(\overline a ,a) = [\Delta {\lambda }_{FM}(\overline a ,a)-\Delta {\lambda }_{OM}(\overline a ,a)]/\Delta {\lambda }_{OM}(\overline a ). \end{equation*}
Using this normalization of the error, Figure 20 shows histograms of ErrlargeSens that are similar to the histograms for demand sensitivities. Only the SAT10 cases are mostly accurate. The ACC2 and SAT3.33 cases have poor accuracy in many cases, because the saturation ratios of the FMs are not large enough for aggressive scaling. This issue was investigated more deeply for the ACC2 cases, which covered a wide range of ratios in an attempt to identify the inaccurate results for the analyst, which led to the Accurate Sensitivity Hypothesis.
Fig. 20.
Fig. 20. Relative errors in predicted throughput under resource scaling from 1× to 10×.

4.7 The ASH (Accurate Sensitivity Hypothesis) and the Trusted Range of Scale-Up

Deeper analysis of the ACC2 case errors in Figure 20 led to the Accurate Sensitivity Hypothesis stated in Section 1. Many of the errors were associated with small values of the saturation ratio SR, which in turn indicates that some resources are close to saturation in OM but are not preserved. As the throughput increases (and M becomes large) these resources become saturated in OM but not FM, and errors become substantial. An example called “case-30” is shown in Figure 19 above.
To examine this relationship, the value of SR was found for each FM found by ACC2. Figure 21 plots M against SR separately for the “accurate” and “inaccurate” results and shows that inaccurate results emerge when M is larger than a value that is roughly a constant plus R. The sloping boundary of the grey area was chosen by eye so it contains almost all the inaccurate results and represents M = SR – 2. The points inside the grey area violate the ASH condition, and the others satisfy it and define a “trusted range” for M for a FM with a given R. There are many false negatives, showing that the hypothesis is quite conservative. There are also some false positives, showing that the trusted range does not provide a guarantee. However, there are very few false positives.
Fig. 21.
Fig. 21. Throughput multipliers and saturation ratios for “Accurate” and “Inaccurate” results.

4.8 Accuracies for Cases Satisfying the ASH

We can revisit the cases described so far to see how successful the ASH is in predicting accurate results. Table 1 summarizes the number of results that satisfy the ASH and the number of these with less than 20% error using the \(Err_{sens}\) measure for the demand studies and the \(Err_{largeSens}\) measure for the scaling studies.
Table 1.
Sensitivity StudyUsersStrategy for FMsResults that satisfy the ASH“Accurate” Results with <20% Relative Error
Demand parameters of the Telephone Switch, Section 4.2300SAT3.31,000/1,0001,000/1,000
 400SAT3.3837/1,000777/837 (=92.5%)
Demand parameters, Section 4.4 ACC2560/1,350560/560 = 100%
SAT3.331,335/1,3351,335/1,335 = 100%
SAT101,336/1,3361,336/1,336 = 100%
Scaling resource multiplicities up to \(\times\) 10, Section 4.6 ACC2203/1350189/203 = 93%
SAT3.330/1,3500
SAT101,242/1,3501,215/1,242 = 97.8%
Table 1. Effectiveness of the ASH in Identifying Accurate Sensitivity Results
For the demand sensitivity results, the trusted range was found to be reliable, but for the scaling results there are some “false positives,” results that test to be accurate but are not about 2% for the SAT10 cases and about 7% for the ACC2 cases and 7.5% for some of the telephone switch results. Most of those errors are just over 20% and few exceed 30%.

4.9 Sensitivity Analysis on Simplified Models: Summary

The random cases in this section apply many large parameter changes and thus place a heavy stress on the accuracy of sensitivities. Accurate sensitivities require a model with a large-enough saturation ratio R, and the Accurate Sensitivity Hypothesis provides insight to the analyst when SR is not large enough. When the ASH condition was met, over 90% of cases had errors less than 20%. The relative error was normalized to the base value of the measure for small changes (less than 100%) and to the OM change itself for larger changes.

5 Related Work

We can speak of data-driven, subsystem-driven, and problem-driven simplification approaches for queueing models and other performance models.
A data-driven approach controls the complexity of a model fitted to data statistical estimates of accuracy (see, e.g., Reference [22]). In References [30, 41] the authors used stepwise fitting and statistical tests to simplify a workload model with hundreds of user classes. In Reference [38] it was used to add components to a performance model. Machine learning models are also data driven (e.g., Reference [21, 36]) and can exploit simplification as in Reference [8] to avoid over-fitting.
Subsystem-driven simplification replaces selected subsystems by a surrogate delay [19] or a server and often uses the methods of model decomposition as described in Reference [7] and Reference [29]. Surrogate delays have been applied to decomposing timed Petri Net models that are too large to solve into sets of smaller submodels (e.g., References [6, 25]). The special case of symmetrically replicated subsystems leads to symmetries that simplify the solution. In LQNs, there is already a model feature for declaring and solving replicated subsystems (see a summary in Reference [9]).
Queueing model class aggregation has been achieved by standard clustering methods [30].
In queueing networks, a FESC can be used to approximate a subsystem, as described in Reference [23]. A FESC is a server whose rate depends on its occupancy level, and for some models a FESC is an exact replacement [4]. However, FESCs have limited application to Layered Queueing models because of the simultaneous resource possession intrinsic in layered queueing. When a server inside the chosen subsystem makes a request to a server outside it and waits, there is a service-rate dependency on the state of the entire system rather than just on the subsystem. For this reason, we look beyond FESCs to represent subsystems in the present work.
Most research on decomposition has a different goal from the present article, a goal to create a “divide and conquer” approximate solution technique for intractable models. Examples are Reference [7] for queueing networks, the LQN solver [10], and References [6, 25] for timed Petri Nets. The submodels in a decomposition may combine different modeling paradigms, as in Reference [13], which has a Markov Chain submodel.
These approaches have limitations. Surrogate delays require iteration to solve, which may be lengthy. A FESC requires repeated solutions of the submodel in isolation, for every population that may occur. For non-separable and layered queueing networks this is expensive.
Problem-driven simplification focusses the analysis on a particular problem. An example is bound analysis for asymptotic cases of light and heavy loads. There are well-known asymptotic and balanced-job bounds on response time and throughput for queueing networks [23] and additional bounds for LQNs [27].
In the context of these categories, FSPT is problem driven in that the focus can be tailored to the components of concern, wherever they are.
The simplification approaches described above use either structural aggregation (where a group of model elements corresponding to system structural elements are aggregated into a subsystem) or state aggregation (where a group of system states are aggregated into a meta-state).
Structural aggregation in queueing models has a counterpart in Petri net models, in which elements are “folded” into replacement elements. For instance, Stochastic Petri Nets may be simplified by using structural simplification rules or foldings [33] or well-formed nets [14].
A brief look at simplification of other kinds of models is useful for perspective. Besides structural models, there are state-based models (e.g., Markov Chains) that represent the system behaviour and are fundamental to many modeling approaches. However, state-based models often suffer from state explosion, motivating research to reduce the state space. State aggregation is a popular model reduction method that reduces the system complexity by mapping its states into a small number of meta-states. For example, state aggregation is used for analyzing properties of exact and ordinary state lumping based on symmetries in the system [3], dynamic load-balancing policies [26], and reliability analysis of hybrid systems [32]. State aggregation is also used in machine learning, where model simplification can be used to solved overfitting issues that are due to making a model more complex than necessary. While state-based models are valuable, they do not scale well enough, even when simplified, to describe the large heterogeneous layered service system with very large numbers of users, which are becoming common. Therefore, state-based models are out of the scope of this work.
Aggregation of a LQN subsystem to a multiserver provides a middle ground between a simple delay, which ignores contention in the subsystem, and an FESC, which represents its effect in detail but is expensive to build. Aggregation of groups of components as chosen by the analyst, to give a multiserver for each group was considered in Reference [15]. A later paper [16] showed how substantial errors can be avoided by a correct grouping of the components to be aggregated. However, this method called SPT has a serious limitation in that tasks in different groups and deployed to the same processor could not be aggregated at all. The present work completes SPT and evaluates its use.
Overall, we are unaware of any prior work on deriving a simplified layered queuing model directly from a detailed one, apart from our own previous work [15, 16]. In particular, there is a lack of simplification techniques that avoid the scalability problems of calibrating a FESC.

6 Conclusions

Layered performance models can be successfully aggregated by preserving components with resource saturation levels above a threshold, which depends on the analyst's goals. Each model exhibits its own tradeoff between the degree of simplification and the accuracy of the approximate throughput or response time. Success was evaluated by the accuracy of the focused models and their sensitivities over many cases. Aggregation in queueing models is not exact, but the error can be controlled.
Larger models were found to simplify better, on average, than smaller ones, so simplification is most effective exactly where it is needed most. There is greater detail on this point in the thesis [17].
Because aggregation errors are tightly linked to resource saturation levels, real systems with many lightly utilized components will tend to give more effective simplification than the random-system cases reported here, which are generated to have roughly balanced saturations.
The principal use of performance models is to study the effect of changes. For accurate sensitivity calculations the saturation ratio (between the most saturated resource and the most saturated resource that is aggregated) must be at least 2. This is part of an empirical criterion called the ASH, derived in Section 4.7. The aggregated model can also include components of special interest to the analyst; the process is under the analyst's control.
Two kinds of changes were studied in Section 4, for resource-demand parameters and in scale-related parameters. Over 90% of sensitivity results that satisfied the ASH had less than 20% error. Focused model sensitivities for demand parameter changes up to 50% were satisfactory provided the saturation ratio exceeds 3.3. Scale-related changes for scale factors up to 10 required a saturation ratio of 10, which is intuitively reasonable. Section 4 examined the sensitivity only to parameters of preserved components, but a parameter that has been aggregated can also be studied by re-applying the aggregation Equations (3) and (4).
FSPT can contribute to automated model-based performance analysis, since models made automatically are often unnecessarily complex, and advanced analysis techniques may solve the model many times (and thus can benefit from a smaller model). An example is the use of model predictions to continuously optimize the overall deployment of an application, which must complete each optimization cycle in seconds or minutes. Given that layered model solution techniques are approximately quadratic in model size [10], model complexity must be controlled.
FSPT can simplify all layered queueing models except those with calling cycles. It removes a limitation of the previous SPT algorithm, which could not aggregate entangled tasks. Some components, however, are not aggregated: those representing replicated subsystems, those with parallel sub-paths, with “second phases” of service, with asynchronous and forwarding calls, and with priority execution (see Reference [9] for descriptions of these features of LQNs). Such components are preserved in FSPT. Multiple classes of system users can be included by defining alternative choices made by a single pool of users.

References

[1]
S. Balsamo, V. de Nitto Personé, and R. Onvural. 2001. Exact analysis of special networks. In Analysis of Queueing Networks with Blocking. International Series in Operations Research & Management Science, 31. Springer, Boston, MA.
[2]
F. Brosig, N. Huber, and S. Kounev. 2011. Automated extraction of architecture-level performance models of distributed component-based systems. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11). 183–192.
[3]
P. Buchholz. 1944. Exact and ordinary lumpability in finite Markov chains. J. Appl. Probab. 31, 1 (1944), 59–75.
[4]
K. M. Chandy, U. Herzog, and L. Woo. 1975. Parametric analysis of queuing networks. IBM J. Res. Dev. 19, 1 (1975), 36–42.
[5]
R. Chi, Z. Qian, and S. Lu. 2011. A heuristic approach for scalability of multi-tiers web application in clouds. In Proceedings of the 5th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 28–35. DOI:
[6]
G. Ciardo and K. S. Trivedi. 1993. A decomposition approach for stochastic petri net models. Perf. Eval. 18, 1 (1993), 97–100.
[7]
E. de Souza e Silva and R. R. Muntz. 1987. Approximate solutions for a class of non-product form queueing network models. Perf. Eval. 7, 3 (1987), 221–242.
[8]
Y. Duan, T. Ke, and M. Wang. 2019. State aggregation learning from markov transition data. In Advances in Neural Information Processing Systems 32, 4486–4495.
[9]
G. Franks, T. Omari, C. M. Woodside, O. Das, and S. Derisavi. 2009. Enhanced modeling and solution of layered queueing networks. IEEE Trans. Softw. Eng. 35, 2 (2009), 148–161.
[10]
G. Franks. 1999. Performance Analysis of Distributed Server Systems. Ph.D. thesis, Carleton University, Systems and Computer Engineering.
[11]
G. Franks, D. Petriu, C. M. Woodside, J. Xu, and P. Tregunno. 2006. Layered bottlenecks and their mitigation. In Proceedings of the 3rd International Conference on Quantitative Evaluation of Systems (QEST'06). IEEE, 103–114.
[12]
G. Franks. lqngen—Generate Layered Queueing Network Models. Retrieved from www.layeredqueues.org.
[13]
P. Heidelberger and K. S. Trivedi. 1983. Analytic queueing models for programs with internal concurrency. IEEE Trans. Comput. C-32, 1 (January 1983), 73–82.
[14]
J. M. Ilie, S. Baarir, G. Franceschinis, and M. Beccuti. 2006. Efficient lumpability check in partially symmetric systems. In Proceedings of the 3rd International Conference on the Quantitative Evaluation of Systems (QEST'06). 211–220.
[15]
F. Islam, D. Petriu, and M. Woodside. 2015. Simplifying layered queuing network models. In Proceedings of the 12th European Workshop on Performance Engineering (EPEW’15). LNCS, 9272, 65–79.
[16]
F. Islam, D. Petriu, and M. Woodside. 2018. Choice of aggregation groups for layered performance model simplification. In Proceedings of the ACM/SPEC International Conference on Performance Engineering. 241–252.
[17]
F. Islam. 2018. Simplifying Layered Queueing Network Models. Ph.D. thesis, Carleton University, Ottawa, Ontario, Canada.
[18]
T. Israr, M. Woodside, and G. Franks. 2017. Interaction tree algorithms to extract effective architecture and layered performance models from traces. J. Syst. Softw. 80, 4 (April 2007), 474–492.
[19]
P. A. Jacobson and E. D. Lazowska. 1981. The method of surrogate delays: Simultaneous resource possession in analytic models of computer systems. In Proceedings of SIGMETRICS'81.
[20]
L. Kleinrock. 1967. Time-shared systems: A theoretical treatment. J. ACM. 14, 2 (1967), 242.
[21]
S. Kundu, R. Rangaswami, A. Gulati, M. Zhao, and K. Dutta. 2012. Modeling virtualized applications using machine learning techniques. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments. 3–14.
[22]
M. Kutner, C. Nachtsheim, J. Neter, and W. Li. 2004. Applied Linear Statistical Models. McGraw–Hill, New York, NY.
[23]
E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik. 1984. Quantitative System Performance: Computer System Analysis Using Queueing Network Models. Prentice Hall, Hoboken, NJ.
[24]
J. Z. Li, M. Woodside, J. Chinneck, and M. Litoiu. 2011. CloudOpt: Multi-Goal optimization of application deployments across a cloud. In Proceedings of the 7th International Conference on Network and Service Management. IEEE.
[25]
Y. Li and C. M. Woodside. 1995. Complete decomposition of stochastic Petri nets representing generalized service networks. IEEE Trans. Comput. 44, 4 (1995), 577–592.
[26]
H. C. Lin and C. S. Raghavendra. 1993. A state-aggregation method for analyzing dynamic load-balancing policies. In Proceedings of the 13th International Conference on Distributed Computing Systems. 482–489.
[27]
S. Majumdar, C. M. Woodside, J. E. Neilson, and D. C. Petriu. 1991. Performance bounds for concurrent software with rendezvous. Perf. Eval. 13, 4 (1991), 207–236.
[28]
A. Martens, H. Koziolek, S. Becker, and R. Reussner. 2010. Automatically improve software architecture models for performance, reliability, and cost using evolutionary algorithms. In Proceedings of the 1st Joint WOSP/SIPEW International Conference on Performance Engineering. ACM, New York, NY, 105–116.
[29]
B. Rabta. 2009. A review of decomposition methods for open queueing networks. In Rapid Modelling for Increasing Competitiveness, G. Reiner (ed.). Springer, London.
[30]
J. A. Rolia, D. Krishnamurthy, G. Casale, and S. M. Dawson. 2010. BAP: A benchmark-driven algebraic method for the performance engineering of customized services. In Proceedings of the WOSP/SIPEW International Conference on Performance Engineering (WOSP/SIPEW'10). 3–14.
[31]
J. A. Rolia and K. C. Sevcik. 1995. The method of layers. IEEE Trans. Softw. Eng. 21, 8 (1995), 689–700.
[32]
R. Schoenig, J. F. Aubry, T. Cambois, and T. Hutinet. 2006. An aggregation method of Markov graphs for the reliability analysis of hybrid systems. Reliabil. Eng. Syst. Safety 91, 2 (February 2006), 137–148.
[33]
A. Senderovich, A. Shleyfman, M. Weidlich, A. Gal, and A. Mandelbaum. 2018. To aggregate or to eliminate? Optimal model simplification for improved process performance prediction. Inf. Syst. 78 (2018), 96–111.
[34]
K. C. Sevcik. 1977. Priority scheduling disciplines in queueing network models of computer systems. In Information Processing, 77. North-Holland, Amsterdam.
[35]
V. S. Sharma, P. Jalote, and K. S. Trivedi. 2005. Evaluating performance attributes of layered software architecture. In Component-Based Software Engineering, G. T. Heineman, I. Crnkovic, H. W. Schmidt, J. A. Stafford, C. Szyperski, and K. Wallnau (Eds.). LNCS, Vol. 3489.
[36]
C. Witt, M. Bux, W. Gusew, and U. Leser. 2019. Predictive performance modeling for distributed batch processing using black box monitoring and machine learning. Inf. Syst. 82 (2019), 33–52.
[37]
C. M. Woodside, J. E. Neilson, D. C. Petriu, and S. Majumdar. 1995. The stochastic rendezvous network model for performance of synchronous client-server-like distributed software. IEEE Trans. Comput. 44, 1 (1995), 20–34.
[38]
C. M. Woodside. 2008. The relationship of performance models to data. In Performance Evaluation: Metrics, Models and Benchmarks, S. Kounev, I. Gorton, and K. Sachs, (Eds). Lecture Notes in Computer Science, Vol. 5119. Springer, Berlin, 9–28.
[39]
C. M. Woodside, D. C. Petriu, J. Merseguer, D. B. Petriu, and M. Alhaj. 2013. Transformation challenges: From software models to performance models. Softw. Syst. Model. 13, 4 (October 2013), 1529–1552.
[40]
C. M. Woodside. 2013. Tutorial Introduction to Layered Modeling of Software Performance, Edition 4.0. RADS Lab, Carleton University, Ottawa, Ontario, Canada.
[41]
Q. Zhang, L. Cherkasova, N. Mi, and E. Smirni. 2008. A regression-based analytic model for capacity planning of multi-tier applications. Cluster Comput. 11 (2008), 197–211.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 7, Issue 2-4
December 2022
74 pages
ISSN:2376-3639
EISSN:2376-3647
DOI:10.1145/3572822
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2022
Online AM: 20 July 2022
Accepted: 08 June 2022
Revised: 01 June 2022
Received: 18 March 2021
Published in TOMPECS Volume 7, Issue 2-4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Performance models
  2. layered queuing network models
  3. model simplification
  4. model sensitivity
  5. model aggregation

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Natural Sciences and Research Council of Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 579
    Total Downloads
  • Downloads (Last 12 months)178
  • Downloads (Last 6 weeks)26
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media