Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Consensus Building when Comparing Software Architectures

2002, Lecture Notes in Computer Science

M. Svahnberg and C. Wohlin, "Consensus Building when Comparing Sof t ware Archit ect ures", Proceedings of t he 4t h Int ernat ional Conf erence on Product Focused Sof t ware Process Improvement (PROFES 2002), pp. 436-452, Springer LNCS Volume 2559/ 2002, Rovaniemi, Finland, December 2002. Consensus Building when Comparing Software Architectures Mikael Svahnberg, Claes Wohlin Department of Software Engineering and Computer Science Blekinge Institute of Technology, PO Box 520, S-372 25 Ronneby SWEDEN [Mikael.Svahnberg | Claes.Wohlin]@bth.se, http://www.ipd.bth.se/serl Abstract. When designing a software system it is beneficial to study and use architectural styles from literature, to ensure certain quality attributes. However, as the interpretation of literature may differ depending on the background and area of expertise of the person reading the literature, we suggest that structured discussions about different architecture candidates provides more valuable insight not only in the architectures themselves, but in peoples’ opinions of the architectures’ benefits and liabilities. In this paper, we propose a method to elicit the views of individuals concerning architecture candidates for a software system and pinpoint where discussions are needed to come to a consensus view of the architectures. 1 Introduction When developing software, it is important to have an appropriate architecture for the system, or sub-systems comprising the full system. The choice of, or evolution into, an appropriate architecture is not only governed by functional requirements, but to a large extent by quality attributes [2][4][6]. However, knowing this, it is still a non-trivial task to discern between architecture candidates. There are usually more than one quality attribute involved in a system, and the knowledge of the benefits and drawbacks of different architecture structures with respect to different quality attributes is not yet an exact science. Decisions are often taken on intuition, relying on the experience of senior software developers. Because of this it is important, we believe, to be able to compare software architecture structures based on quantified data. Likewise, it is important to be able to compare the strengths and weaknesses of a single software architecture structure based on quantified data. If this is not done, there will always be subjective judgements involved when selecting between architecture structures. What is even more important is that everyone involved in designing a software architecture share the same view of what benefits and drawbacks different architecture structures have. To do this, it is important that the views of different persons are extracted in a quantified way that enables comparisons between the views, and a synthesis of the different views into a unified consensus view. We propose that the understanding of architectures starts with eliciting the knowledge of individuals and that structured discussions should be held to reach a further understanding and learn from others during the process of building consensus around the benefits and liabilities of different architecture candidates. It should be noted that we use the term “software system” rather loosely in this paper. We use it to mean any software entity, be it an entire product suite, a single product, a subsystem within a product, a software module, or a software component. 1.1 Scope and Goal of Paper In this paper, we describe a process for capturing knowledge from individuals into a framework that enables analysis of and comparison between software architectures with respect to quality attributes. In the process of synthesizing this framework, the views of the individuals participating in the process are extracted and presented in a way that enables and facilitates discussion where the participants are in disagreement. The purpose of these discussions is to create a joint understanding of the benefits and liabilities of the different software architectures. Throughout this paper we illustrate each step with data and experiences from conducting the step with the participation of colleagues, who have also participated in creating the initial data sets in a previously conducted experiment, described in further detail in [17]. We would like to stress the fact that even if we in this experiment used generic architecture structures and quality attributes one would, if using the method in a company, develop architecture candidates and elicit quality requirements for a particular system rather than using generic architectures and quality attributes. The method would thus operate on architectures that can be used in the particular system. Likewise, the quality attributes used would be elicited for the particular system, and would hence be expressed in terms pertinent to the system’s problem domain. Moreover, the focus of this paper is mainly on the process described and the discussion of this rather than on the example, which is mostly included to illustrate the different steps. The remainder of this paper is organized as follows. In Section 1.2 we present an outline of the process proposed in this paper. In Section 2 we present how to create individual views, discuss these and combine them to a unified framework. In Section 3 we present how to evaluate this unified framework, and in Section 4 we present how to analyse it. Finally, the paper is concluded in Section 5. 1.2 Outline of Process The process we describe in this paper consists of the following five steps (also illustrated in Figure 1), each of which we go through in further detail in this paper: 1. Create individual frameworks, as outlined in Section 2.1. The individual frameworks consists of two tables per participant, where one table describes the participant’s ranking of the support for different quality attributes for each of the architecture structures, and the other ranks the architecture structures for each quality attribute. 2. Discuss the individual frameworks and decide upon a strategy for combining them into a unified consensus framework, which we go through in Section 2.2 in this paper. 3. Create the unified framework as presented in Section 2.3. 4. Evaluate Framework, as presented in Section 3. 5. Analyse the framework. This step is described in Section 4. As presented by Johansson et al. [9], it is expected that stakeholders have different views of the importance of different quality attributes, and we also expect developers with different backgrounds to have different views of different architecture structures. The purpose of steps 1 and 2 of the process in this paper is hence to elicit the views of different stakeholders and use these as a basis for further discussions where the causes for the different views are investigated. Step 4: Evaluate Framework Step 1: Create Individual Frameworks Step 3: Create Unified Framework Step 5: Analyse Framework Step 2: Discuss Individual Frameworks Figure 1. Illustration of Process These discussions serve as one important input to the architecture design process and the development process, but it is also important to analyse the framework to understand the benefits of different architecture structures. This is not only useful if the architectures represent initial designs for a system, but is even more important if the purpose of the evaluation is to see whether the current architecture of an evolving system is still the most appropriate alternative given the (potentially also evolved) domain and quality requirements of the software system. By regularly re-evaluating the choice of software architecture phenomena such as software aging [13] may be, if not stopped, so at least slowed down. This analysis consists of steps 4 and 5, where the framework are first evaluated to measure the amount of unceirtanty it contains (step 4) and then each architecture candidate analysed in further detail compared to the other architecture candidates (step 5). 2 Creating the Framework The framework we intend to create consists of two tables, which we refer to as the FQA (Framework for Quality Attributes) and the FAS (Framework for Architecture Structures). These two tables consist of a set of vectors, normalized so that the values in each vector sum up to 1, and the tables describe the architecture structures with respect to quality attributes in two ways: the FQA describes a ranking of architecture structures with respect to a certain quality attribute, and the FAS describe the ranking of different quality attributes within a particular architecture structure. We create these two tables by first acquiring the individual views of a number of participants, as described in Section 2.1 and then discussing these views in a meeting, which we describe in Section 2.2. The last step is to combine all individual views into a unified framework, as described in Section 2.3. However, before the framework can be created, it must be known which quality aspects are relevant to consider, and a set of architecture structures must be developed for the system to design. We do not describe this further, as it is done using traditional requirements engineering (e.g. [5][12]) and architecture design methods (e.g. [4][6][8]). The input for the method is thus highly context dependent, as the architecture structures and the quality attributes are designed and elicited for a particular software system, in a particular domain and for a particular software company with a certain software development culture. The elicited relevant quality aspects and the architecture structure candidates are used as input to the first step of the process, as described below. 2.1 Creation of Individual Frameworks The first step after the architecture structures are created is to understand how different individuals, potentially with different backgrounds, perceive the strengths and weaknesses of the architecture structures with respect to a set of quality attributes. One can choose whether to elicit the views, in terms of the FAS and FQA, of each individual or whether to obtain a collective framework in a group meeting. If the participants create the framework collectively, there is a risk that opinions are suppressed, wherefore we suggest that it is better if each participant create an individual framework that becomes the basis for discussions, e.g. as presented in Section 2.2. Each of these individual frameworks should then describe the strengths and weaknesses of the architecture structures in a comparable way, so that differences between participants can be identified and discussed. However, to explain in comparable terms what the strengths and weaknesses of a particular architecture structure are, is not a trivial task. To facilitate this, we propose the use of methods available from the management science literature, for example in Anderson et al. [1]. The methods are often denoted multi-criteria decision processes. One such method is the Analytic Hierarchy Process (AHP for short), which was originally proposed by Saaty [14] (and also described in a later publication [15]). This approach has been applied in software engineering by other researchers addressing, for example, requirements engineering [10] and project estimation [16]. The Analytic Hierarchy Process can be used to prioritize different items or aspects. The result is a priority vector with relative weights on the different items or aspects being prioritized. In order to create the individual frameworks, what needs to be done is to complete two AHP questionnaires, with questions pertaining to the following two issues: • A comparison of different quality attributes for each software architecture structure. • A comparison of different software architecture structures for each software quality attribute. This will then result in two tables per participant, related to the FAS and the FQA as earlier described. Illustration of Creating Individual Frameworks. In a previous study [17], we present an experiment where we conduct the type of AHP ranking as mentioned above, i.e. we describe and apply a method for assessing the support different architectural structures have for different quality attributes, and also which architectural structures best fulfil certain quality attributes. The outcome of this experiment is a series of vectors for each participant in the study. In our study each of the eight participant produced six vectors ranking architectural structures for each quality attribute and five vectors ranking the quality attribute potential within each architecture structure. These vectors are grouped into two tables per participant, corresponding to the FAS and the FQA as described above. The quality attributes used were those from the ISO 9126 standard [7], namely: Efficiency, Functionality, Usability, Reliability, Maintainability and Portability, and the architecture structures used were a selection from Buschmann et al. [3], namely: Microkernel, Blackboard, Layered, Model-View-Controller and Pipes and Filters. It should however be noted that the method is not bound to these attributes and structures in particular. Any other set would have worked just as well. The individual frameworks can be studied separately, but the real use comes if they can be combined into a single, comparable view of the architecture structures and quality attributes, e.g. as described in the next sections. 2.2 Discussing the Individual Frameworks The purpose of discussing and comparing the individual frameworks is to create a further understanding of where the software engineers disagree in their judgements of the architecture candidates, and to elicit the reasons why this disagreement occurs. We expect to find disagreements, as it is rare that all software engineers have the exact same background and experience, and these differences will be manifested in the individual frameworks created in the previous step. In order to identify the discrepancies that are most relevant to discuss, we propose to use the sum of the squared distance to the mean value, described by the following formulae: N ∑i=1 ( xi – x ) 2 , where N is the number of participants. This formulae is ap- plied over all participants for each vector in the FAS and FQA (in effect, for each quality attribute and each architecture structure), and hence produce a value for each vector that describes the amount of disagreement between the participants. After this, a suitable threshold value is selected to discern which of the vectors are worthy of further examination. Although it is up to the user of our method to set a suitable threshold value and it is depending on the number of discussion points one wish to identify, we suggest that the threshold value is set to the 75th percentile, thus pinpointing the upper 25% of the data set. However, this also needs to be augmented by visually inspecting graphs of the individual answers and identifying places where there are interesting outliers even though the spread of the answers do not exceed the threshold value. During a meeting, each of the identified data points are discussed, and participants with a differing opinion from the rest get a chance to explain why their values differ. Embracing Disagreement. That different persons have different backgrounds is not an uncommon situation, neither in academia nor in industry. Thus, any formed group will consist of persons with different backgrounds, which is partly what makes a group successful. As all group members form their interpretations of the situation at hand based on their background, one cannot expect all participants to have the same interpretation. We believe that the key to success is to acknowledge this and to find ways to cope with the differing interpretations. If participants disagree on the meaning of a certain quality attribute, or of a certain architecture structure, this is a disagreement that would manifest itself later during the development process and, in a worst case, be the source of flaws in the delivered product. The major contribution of the meeting presented in this section is that the participants get to present their rationale, and this creates a better joint understanding of how to interpret the quality attributes and architecture structures. Another goal of the discussions is that the individual vectors should be combined into vectors that everyone can agree upon. There are several ways to combine the vectors, e.g.: • Use the mean value. • Remove outliers and use the mean value. • Use the median value. • Let the participants, with the gained knowledge from the discussion, re-do the AHP questionnaire for the vector in question and hold a second meeting to discuss the new vector. • Let the participants jointly complete an AHP questionnaire for the vector in question. Which method to use can be decided during the meeting for every vector, but we suggest that in most cases using the median value is sufficient. It is less time-consuming than the other choices, while still giving a more accurate image than just using the mean value. The mean value would be unduly influenced by extreme values, whereas the median value indicates where the bulk of the participants are located without biasing towards outliers. Illustration of a Consensus Discussion Meeting. With the goal of creating a unified view of the eight different opinions (stemming from the eight different participants in the previous study [17]), we conducted a follow-up meeting. During this meeting, the 11 calculated vectors (one vector for each architecture structure and quality attribute used: 5 architecture structures + 6 quality attributes) per participant based on the AHP study was presented and then discussed from the perspective of a smaller set of data points which was deemed worthy of further examination. These data points include those where there is a large spread among the answers of the participants, and those where the participants’ opinions form two, or in some cases three distinct groups. As a guideline for finding these data points, we used the sum over all participants of the squared distance to the mean value, with a threshold value of 0.10, which roughly corresponds to the 70th percentile. Hence, any data point where the sum over all participants of the squared distance to the mean value was larger than 0.10 was deemed interesting enough to warrant further discussion. Using this simple technique, we identified 20 data points out of 60 (Five architecture structure vectors with 6 values each, and six quality attribute vectors with 5 values each equals 60 data points per participant, and we are looking across all of the participants) that warranted discussion. Of these, 6 data points were only marginally over the threshold value and were not discussed in as great a detail. Even though the set threshold value roughly corresponds to the 70th percentile, we thus only held detailed discussions about the data points above the 75th percentile. In addition to the data points identified by the threshold value described above we also noticed, while studying graphs of the data sets, that in some cases one or two participants disagreed largely with the rest of the group. We included these data points as discussion points as well, as it is important that all arguments are heard, and the disagreeing person or persons may have very compelling reasons for disagreeing with the rest of the participants. As stated, the intention of the discussion meeting is to find out the specific reasons for why the participants may have differing opinions in the identified data points. In our case, it soon became apparent that all disagreements could be put down to the same factor, namely that the interpretation and application of architecture structures are dependent on the background of the participants. This led people with different backgrounds from different disciplines to interpret the architecture structures differently. As the architecture structures in themselves are rather abstract, many of the participants envisioned a typical system in which the architecture is used, to put a context to the question. These envisioned systems differed depending on the background of the participants. If the framework were created and the discussions were held in an industry case this would, however, not be an issue, as the context in that case is given by the software system in focus, and the architecture structures and quality attributes directly relatable to this system. Moreover, this difference in the systems envisioned is of minor importance to this paper, as the focus is on the presented process for eliciting and analysing peoples’ opinions of different architecture candidates, and to study the problems surrounding the creation of a consensus view of the strengths and weaknesses of the different alternatives. 2.3 A Unified Framework After conducting the meeting described above, where the disagreements are discussed, a unified Framework for Architecture Structures (FAS) and a unified Framework for Quality Attributes (FQA) is constructed of the participants’ views using the method decided upon to unite the individual views. Most often, the median value will be sufficient, unless arguments are brought forward to use another method (e.g. the ones mentioned in the previous section) for uniting the individual frameworks. By using the median value, these unified frameworks are no longer normalized as the individual tables were, i.e. the columns in the FAS and the rows in the FQA no longer sum up to 1. Because of this, a step is added where the data is re-normalized so that the columns in the FAS and the rows in the FQA sum up to 1. Illustration of Unified Framework. The FAS and FQA constructed from our study after the consensus discussion meeting are presented in Table 1 and Table 2. The FAS (Table 1) presents the ranking of quality attributes for each architecture structure. This table should be read column-wise. For example, it ranks microkernel as being best at portability (0.309), followed by maintainability (0.183), efficiency (0.161), reliability (0.122), functionality (0.119) and usability (0.106), in that order. Moreover, the figures indicate that for example microkernel is almost twice as good at portability as it is at efficiency (the value for microkernel is 0.309 compared to the value for efficiency which is 0.161). The FQA (Table 2) presents the ranking of architecture structures for each quality attribute, and should be read row-wise. For example, the FQA ranks pipes and filters as the best choice for efficiency (0.360), followed by microkernel (0.264), blackboard (0.175), model-view-controller (0.113) and layered (0.0868), in that order. As with the FAS, the figures indicate how much better a choice for example pipes and filters is compared to the other architecture structures. It is, for example, twice as good a choice as blackboard (with a value of 0.360 compared to the 0.175 that blackboard scores). 3 Evaluation of Unified Framework Previously we discussed the importance of embracing disagreement. This is not only done by venting peoples opinion in a meeting, as earlier described. For each value in the FAS and FQA, a value can be added to indicate the amount of disagreement between Table 1. Framework for Architecture Structures (FAS) Microkernel Blackboard Layered Efficiency Functionality Usability Reliability Maintainability Portability 0.161 0.119 0.106 0.122 0.183 0.309 0.145 0.321 0.127 0.0732 0.273 0.0597 0.0565 0.237 0.255 0.0930 0.221 0.138 ModelViewController 0.0557 0.115 0.104 0.105 0.300 0.320 Pipes and Filters 0.218 0.151 0.0818 0.144 0.271 0.135 Table 2. Framework for Quality Attributes (FQA) Microkernel Blackboard Layered Efficiency Functionality Usability Reliability Maintainability Portability 0.264 0.205 0.0914 0.126 0.191 0.112 0.175 0.252 0.113 0.142 0.0921 0.0689 0.0868 0.199 0.250 0.318 0.285 0.426 ModelViewController 0.113 0.206 0.408 0.190 0.239 0.139 Pipes and Filters 0.360 0.139 0.137 0.224 0.193 0.255 the participants. Such disagreement indicators can be used to judge the accuracy of decisions or statements based on data from the framework. Disagreement indicators can be constructed in a number of ways, but we suggest that the same measure as earlier is used, i.e. the squared distance to the mean, and count the number of participants with a larger value than a certain threshold. As before, the idea is to set the threshold such that it identifies where the participants actually are in disagreement, which means that if the threshold is too high too much disagreement is allowed, and if it is too low there is too little tolerance for variations in the answers. However, it is not feasible to set the threshold value to identify a particular percentile as we did to identify data points that warrants discussion. Instead, we need a value that correctly depicts the amount of disagreement found and not a value that identifies a particular group of data points. To this end, we recommend that points where the squared distance to the mean is larger than two standard deviations (of all the squared distances to the mean) are deemed to be in disagreement with the rest of the participants. As before, what value to use as a threshold value is up to the user of the method, but we find that two standard deviations give a fair picture of the amount of disagreement. This measure of disagreement is only one in a series of uncertainty indicators. For every step of the way, we have indicators of uncertainty, and these should be considered so that, if the uncertainty becomes too large, it should be possible to backtrack and redo steps to get more certainty in the data sets and hence in the accuracy and usability of the framework: The uncertainty indicators available hitherto are: 1. Individual consistency ratio for each of the produced vectors. If a method such as AHP [14][15] is used, this is obtained as part of the results from the method, otherwise these may need to be calculated separately. 2. Differences between individuals, as discussed in Section 2.2 and using the measure introduced there. 3. Differences between the unified FAS and FQA. In [18] we describe a way to measure and compensate for these differences. Briefly, we compensate for inconsistencies between the FAS and FQA using one of the frameworks to improve the quality of the other, which is then used in the subsequent steps of the method. In every step of the way, the goal has been to quantify the knowledge about architecture structures, while still retaining a qualitative rationale. Every step of the way helps in removing ambiguity and increasing the clarity and the understanding of the architecture structures. This will, in a development process, ensure that architecture design decisions can be taken with more certainty. The uncertainty indicators on all levels and during all steps of the creating of the framework can be used to ascertain that the uncertainty, and hence the risk involved, is reasonably low, but also to identify factors upon which people have different opinions and where further discussions are needed to avoid problems further on in the development process. Illustration of Disagreement Measure. In our example, the standard deviation of all squared distances to the mean value is 0.0166, and hence the threshold value is set to the double, i.e. 0.0332. By confirming against a plotting of all the participants, we are able to determine that this threshold value gives a fair representation of where participants disagree with the majority. Counting the occurrences where the participants in the study diverge more than this threshold number, we find that there are 43 places where participants disagree, out of the total 480 data points (6 vectors with 5 values plus 5 vectors with 6 values, and all this times 8 participants). The persons in disagreement are distributed over the different vectors as shown in Table 3 and Table 4. In these tables, we see for example in Table 3 that for Microkernel one person had a different opinion to that of the majority regarding its efficiency value, one person regarding its usability value and as many as four persons disagreed on Microkernel’s abilities regarding portability. Studying a graph with all the participants’ individual frameworks, it becomes clear that the participants form two distinct groups with respect to this issue (although the points identified by the disagreement measure come from both of these two groups). Moreover, we see that in general the architecture structures Microkernel and Blackboard contribute with more than half of the disagreement issues, which indicates that for these two architecture structures much discussion is needed in order to fully understand them, and the consequences of using them in a software system. In Table 3 and Table 4, we see that there are a total of eight places where two or more participants disagree with the majority. While these places certainly need further discussions to elicit the reasons for the disagreement, we can also conclude that the unified framework seems to be constructed by persons who are mostly in agreement, and the framework can thus be used with reasonable accuracy. Table 3. Disagreement in FAS ModelViewController Microkernel Blackboard Layered Efficiency 1 Functionality Usability 1 Reliability Maintainability 4 Portability 1 1 Pipes and Filters 1 1 1 1 1 1 1 Table 4. Disagreement in FQA Microkernel Blackboard Layered Efficiency Functionality Usability Reliability Maintainability Portability 2 1 1 1 4 1 1 2 4 ModelViewController 2 1 2 1 1 Pipes and Filters 1 1 2 4 Analysis of Framework In this section, we describe the logical next step after the framework is evaluated for consistency, which is to analyse the framework internally, i.e. to discern how the different architecture structures relate to each other and how each of the architecture structures support different quality attributes. This is important in order to really understand the relations between the architecture structures and the quality attributes. Moreover, to analyse the framework instead of simply using it creates a learning effect, in that by understanding the qualities of one software architecture, it may be easier to understand the qualities of the next architecture, i.e. the next time software architectures are designed and when evolving the software architectures, the designers will have an increased understanding from the start of the strengths and weaknesses of different design alternatives. Furthermore, if the purpose of creating the framework is to re-evaluate the architecture of an existing software product, it becomes even more vital to analyse the architecture alternatives in the created framework to understand for which quality attributes there is an improvement potential in the current software architecture of the system. The analysis is based on the two tables, i.e. the FAS and the FQA. We have attempted several ways to integrate these two tables into a single table, but have come to the conclusion that it is better to keep the two tables separate. There are two dimensions to the analysis: (a) a comparison between different architecture structures, for which the FQA is used, and (b) a comparison of the software qual- 0,35 0,3 E ffic ie n c y 0,25 F u n c t io n a lit y 0,2 U s a b ilit y R e lia b ilit y 0,15 M a in t a in a b ilit y 0,1 P o rt a b ilit y 0,05 0 0 Microkernel 1 Layered Pipes5 and Filters 6 3 4 Model-View-Controller 2 Blackboard Figure 2. Plotting of FAS 0,45 0,4 0,35 M ic ro k e rn e l 0,3 B la c k b o a rd 0,25 L a y e re d 0,2 M o d e l-V ie w -C o n t ro lle r 0,15 P ip e s a n d F ilt e rs 0,1 0,05 0 Efficiency 0 2 Usability Functionality 4 Maintainability6 Reliability 8 Portability Figure 3. Plotting of FQA ities within a particular architecture structure, for which the FAS is used. As before, the FQA is read row-wise, and the FAS is read column-wise. Rather than studying the mere numbers, which can be difficult to compare, we suggest that the data is plotted into graphs. Examples of such graphs, based on the FAS in Table 1 and the FQA in Table 2, can be found in Figure 2 and in Figure 3. The graph in Figure 2 is read column-wise, i.e. one column in the FAS correspond to one column, or rather a vertical line, of dots. Figure 3 corresponds to the FQA, which is read row-wise, but as there is no meaningful interpretation of comparing the quality attributes with each other in this table, we choose to plot this table so that its graph is also read column-wise. 4.1 Analysis of Architecture Structures (FAS) This part of the analysis is concerned with understanding the qualities of each architecture structure. To this end, we use the FAS, and a plot of it, as the example in Figure 2. When studying the FAS, it becomes apparent that an architecture structure can be of two different kinds, depending on how the quality attributes are spread. If the architecture ranks a few quality attributes highly compared to the others, this implies that the architecture is specialized for these purposes. On the other hand, if all of the quality attributes are ranked closely together, the architecture is generalized, and can better suit any mixture of desired quality attributes. Another thing to look for is whether some quality attributes are always, or in most cases, ranked high or low. For example, if a quality attribute is ranked high for all architecture structures, there is no further need to study this attribute as one can be certain to fulfil it in any case. If it is ranked low, this indicates that new architecture structures may be needed that focus on this quality attribute. The purpose of this analysis is to understand the strengths and weaknesses of each architecture structure, which may help the next time an architecture is to be designed. As stated, it may also help in pointing out the need for new architecture structures that either favours a particular quality attribute, or is equally good at a number of quality attributes. Illustration. In Table 1 (and in Figure 2) all of the architecture structures, except for possibly the pipes and filters structure, are specialized for certain purposes. For example, microkernel is specialized for portability. Portability scores a value of 0.309 compared to the second largest, which is maintainability with 0.183. The relative distance between portability and maintainability is thus 0.126, whereas the relative distance between maintainability and the quality attribute with the lowest score is 0.0770, which is considerably smaller than between portability and maintainability. Another example of a specialized architecture structure is model-view-controller which is specialized on portability and maintainability, with values of 0.320 and 0.300, respectively. The distance to the third best quality attribute is more than 0.185, as compared to the distance between the third best and the last quality attribute, which is 0.0593. Another thing that the FAS from our experiment seems to indicate is that there are some quality attributes that no architecture structure is really good at. For example, all architecture structures except for pipes and filters have a fairly low value on efficiency. What this implies is that there is room for specialized architecture structures with emphasis on this quality attribute. Similar situations can be found with usability, where only the layered architecture structure ranks it highly, and reliability, which no architecture structure in our study seems to focus on, or is capable of. Since no architecture structures in our study focus on reliability, one can ask the question whether reliability is at all a quality attribute that affects the software architecture or whether it is an attribute that mainly manifests itself in the software development process. This, in turn, may be depending on the interpretation of reliability. Maintainability, on the other hand, seems to be a quality attribute that most, if not all, architecture structures seem to focus on. This manifests itself in the relatively high values for maintainability compared to the other quality attributes for all architecture structures. 4.2 Analysis of Ranking per Quality Attribute (FQA) The other part of the analysis is to compare architecture structures with each other. To this end we use the FQA, and a plot of this, as is exemplified in Figure 3. Patterns one can discern here are whether a particular architecture structure always gets better values than another, which of course would mean that this architecture struc- ture is always to prefer over another. One can also see if there are certain quality attributes that favours the selection of a particular architecture structure. This may, in time, create an increased understanding of what traits of a software architecture it is that benefits a particular quality attribute. Illustration. In the FQA in Table 2 (and the corresponding plot in Figure 3), the layered architecture is considerably better at most quality attributes than the other architecture structures, except for efficiency and functionality (and usability, where model-viewcontroller is extremely better than the rest of the architectures). Likewise, we see that blackboard is in general a bad choice, except when functionality is a desired quality. If a situation like this occurs when comparing architecture structure candidates for a system, this would indicate that the evaluation can be aborted and the high-ranking architecture (in our example the layered architecture structure) can be chosen directly, unless efficiency is a desired quality. 5 Conclusions In this paper we present a way to build consensus around the benefits and liabilities of software architecture structures through the process of creating a unified, quantified and comparable view of architecture structures with respect to quality attributes. Instead of just reading about the architecture structures in a book, we believe that the process of expressing ones own knowledge and experience of different architecture structures in a structured way creates a further understanding of how the architecture structures will work in the specific situation. When this is compared to other peoples opinions as well, this creates a learning effect and allows differences of opinions to be identified and discussed to form a consensus before the development process continues. Without this consensus, it is our belief that the differences of opinion will appear later during the development, and take the form of inconsistencies in the developed system, and an increased development time. Unlike related literature (e.g. [3][4]), which only present benefits and liabilities by means of logical reasoning, a framework created using the process in this paper provides relative measures of the level of support for different quality attributes, thus enabling measurement of the importance or severity of different traits. Furthermore, we also provide a way to compare these traits over different architecture structures. This complements the work of e.g. Buschmann et al. [3] and Bosch [4] by providing an opportunity to quantitatively analyse architecture structures, and thus create a further insight into how these architecture structures work. Moreover, the framework can be constructed for any set of architecture structures and quality attributes, which means that companies can perform their own analysis for architecture structures and quality attributes related to their business and the domain of their systems, and is hence not bound to the selection of architecture structures and the descriptions of these that can be found in mainstream literature. The focus of the paper is on the different steps that assist in the process of building consensus among the participants while ensuring that all, or at least many, relevant aspects are covered before more time and effort is spent on further developing the software system at hand. We illustrate these steps by reporting from a case study conducted according to the steps in this paper. Hence, we would again like to stress that the framework we present in the illustrations is only one example of using the proposed process. The created framework and the steps to discuss, evaluate and analyse it can be used on any set of architecture structures and quality attributes. Which sets to use is primarily determined by the context and the domain in which the process is being applied. Moreover, we would also like to stress that albeit the framework is constructed by capturing the perception of architecture structures and quality attributes from professionals, it is our belief that the perception professionals have about architecture structures are also represented as actual qualities of the architecture structures themselves. Nevertheless the framework is only indirectly based on the actual qualities of the architecture structures. The process for consensus building in this paper has the following benefits: • It can be used to create a better understanding of different architecture structures. • It can be used to kindle a dialogue between software developers to iron out and understand discrepancies in interpretations of architecture structures. • It can be used to identify the need for architecture structures specialized on certain quality attributes. • It can be used as a sub-step in methods for comparing different architecture structures when selecting which architecture to use in a system to design, as in [18]. • It can be used to evaluate architectures against a “baseline” of common architecture structures. • It can be used as a learning tool to allow software developers to share their experiences with each other in a structured way. • It can be used to confirm or confute “myths” regarding software architectures. For example if all architecture structures rank performance and maintainability highly, this indicates that it is at least possible to create architecture structures where these are not in conflict, thus refuting the myth that this, in general, is not possible. To summarize, the contribution of this paper is that we present a process for creating a data set around which discussions can be held to find out if and why there are disagreements in a group of software developers. Such a discussion is, we believe, vital to create a joint understanding of the architecture candidates for a system to design. Our advice is to acknowledge that there will be disagreement, and to use this disagreement to power discussions to create a better understanding of the architecture structures and quality attributes involved. References [1] [2] [3] [4] [5] [6] [7] [8] D.R. Anderson, D.J. Sweeney, T.A. Williams, “An Introduction to Management Science: Quantitative Approaches to Decision Making”, South Western College Publishing, Cincinnati Ohio, 2000. L. Bass, P. Clements, R. Kazman, “Software Architecture in Practice”, Addison-Wesley Publishing Co., Reading MA, 1998. F. Buschmann, C. Jäkel, R. Meunier, H. Rohnert, M. Stahl, “Pattern-Oriented Software Architecture - A System of Patterns“, John Wiley & Sons, Chichester UK, 1996. J. Bosch, “Design & Use of Software Architectures - Adopting and Evolving a Product Line Approach“, Addison-Wesley, Harlow UK, 2000. L. Chung, B.A. Nixon, E. Yu, J. Mylopoluos, “Non-Functional Requirements in Software Engineering”, Kluwer Academic Publishers, Dordrecht, the Netherlands, 2000. C. Hofmeister, R. Nord, D. Soni, “Applied Software Architecture”, Addison-Wesley, Reading MA., 2000. Software Qualities”, ISO/IEC FDIS 9126-1:2000(E). I. Jacobson, G. Booch, J. Rumbaugh, “The Unified Software Development Process”, Addison-Wesley, Reading MA, 1999. [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] E. Johansson, M. Höst, A. Wesslén, L. Bratthall, “The Importance of Quality Requirements in Software Platform Development - A Survey”, in Proceedings of HICSS-34, Maui Hawaii, January 2001. J. Karlsson and K. Ryan, “A Cost-Value Approach for Prioritizing Requirements”, in IEEE Software 14 (5):67–74, 1997. J. Karlsson, C. Wohlin and B. Regnell, “An Evaluation of Methods for Prioritizing Software Requirements”, in Information and Software Technology, 39(14-15):938-947, 1998. G. Kotonya, I. Sommerville, “Requirements Engineering”, John Wiley & Sons, Chichester UK, 1998. D.L. Parnas, “Software Aging”, in Proceedings of the 16th International Conference on Software Engineering, IEEE Computer Society Press, Los Alamitos CA, pp. 279-287, 1994. T. Saaty, “The Analytic Hierarchy Process”, McGraw-Hill, 1980. T.L. Saaty, L.G. Vargas, “Models, Methods, Concepts & Applications of the Analytic Hierarchy Process”, Kluwer Academic Publishers, Dordrecht, the Netherlands, 2001. M. Shepperd, S. Barker, M. Aylett, “The Analytic Hierarchy Process and almost Dataless Prediction”, in Project Control for Software Quality - Proceedings of ESCOM-SCOPE 99, R.J. Kusters, A. Cowderoy, F.J. Heemstra, E.P.W.M. van Weenendaal (eds), Shaker Publishing BV, Maastricht the Netherlands, 1999. M. Svahnberg, C. Wohlin, “An Investigation of a Method for Evaluating Software Architectures with Respect to Quality Attributes”, Submitted, 2002. M. Svahnberg, C. Wohlin, L. Lundberg, M. Mattsson, “A Method for Understanding Quality Attributes in Software Architecture Structures”, in Proceedings of the 14th International conference on Software Engineering and Knowledge Engineering (SEKE 2002), ACM Press, New York NY, pp. 819-826.