Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
The Journal of Systems and Software 142 (2018) 92–114 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Early evaluation of technical debt impact on maintainability José M. Conejero a,∗, Roberto Rodríguez-Echeverría a, Juan Hernández a, Pedro J. Clemente a, Carmen Ortiz-Caraballo b, Elena Jurado a, Fernando Sánchez-Figueroa a a b Quercus Software Engineering Group, University of Extremadura, Avda. de la Universidad, s/n, 10071, Spain Escola d’Enginyeria d’Igualada, Universitat Politècnica de Catalunya, Av. Pla de la Massa, n° 8, 08700 Igualada, Spain a r t i c l e i n f o Article history: Received 15 March 2017 Revised 24 March 2018 Accepted 18 April 2018 Available online 21 April 2018 Keywords: Technical Debt indicator Requirements Modularity anomalies Maintainability Empirical evaluation a b s t r a c t It is widely claimed that Technical Debt is related to quality problems being often produced by poor processes, lack of verification or basic incompetence. Several techniques have been proposed to detect Technical Debt in source code, as identification of modularity violations, code smells or grime buildups. These approaches have been used to empirically demonstrate the relation among Technical Debt indicators and quality harms. However, these works are mainly focused on programming level, when the system has already been implemented. There may also be sources of Technical Debt in non-code artifacts, e.g. requirements, and its identification may provide important information to move refactoring efforts to previous stages and reduce future Technical Debt interest. This paper presents an empirical study to evaluate whether modularity anomalies at requirements level are directly related to maintainability attributes affecting systems quality and increasing, thus, system’s interest. The study relies on a framework that allows the identification of modularity anomalies and its quantification by using modularity metrics. Maintainability metrics are also used to assess dynamic maintainability properties. The results obtained by both sets of metrics are pairwise compared to check whether the more modularity anomalies the system presents, the less stable and more difficult to maintain it is. © 2018 Elsevier Inc. All rights reserved. 1. Introduction Since Technical Debt was firstly introduced in Cunningham (1992), many approaches have emerged to identify (Vetro’ et al., 2010; Wong et al., 2011; Schumacher et al., 2010), estimate (Chin et al., 2010; Curtis et al., 2012a; Letouzey and Ilkiewicz, 2012; Marinescu, 2012) or, in general, deal with Technical Debt by different techniques (Ramasubbu and Kemerer, 2014). As the authors state in Kruchten et al. (2012), “most authors agree that the major cause of Technical Debt is schedule pressure, e.g. ignoring refactorings to reduce time to market” (Abad and Ruhe, 2015). However, as they also claim, Technical Debt is also related to quality problems being often produced by carelessness, lack of education, poor processes, lack of verification or, even, basic incompetence. These origins of Technical Debt are called unintentional debt (Brown et al., 2010) and examples of these quality problems occasioned by Technical Debt are bad reusability and low understandability (Griffith et al., 2014), error-prone and higher number of defects ∗ Corresponding author. E-mail addresses: chemacm@unex.es (J.M. Conejero), rre@unex.es (R. RodríguezEcheverría), juanher@unex.es (J. Hernández), pjclemente@unex.es (P.J. Clemente), carmen.ortiz@eei.upc.edu (C. Ortiz-Caraballo), elenajur@unex.es (E. Jurado), fernando@unex.es (F. Sánchez-Figueroa). https://doi.org/10.1016/j.jss.2018.04.035 0164-1212/© 2018 Elsevier Inc. All rights reserved. (Zazworka et al., 2014), negative impact on robustness, performance, security and transferability (Curtis et al., 2012a, 2012b) or, especially, on maintainability issues like stability (Zazworka et al., 2014). A study conducted by Chen and Huang (2009) highlights that stability is one of the top 10 higher-severity software development problem factors which affect software maintainability. Moreover, maintainability is currently draining 60–90% of the total cost of software development (Chen and Huang, 20 09; Erlikh, 20 0 0; Hung, 2007). To solve these issues, several techniques have been proposed in the literature to detect Technical Debt in source code, such as the identification of modularity violations (Wong et al., 2011), code smells (Schumacher et al., 2010; Marinescu, 2004), grime buildups (Gueheneuc and Albin-Amiot, 2001; Izurieta and Bieman, 2007) or the identification of violations of good programmer practices by using Automatic Static Analysis (ASA) approaches (Vetro’ et al., 2010). Indeed, the combination of these four different techniques has been empirically evaluated in Zazworka et al. (2014) to test which practices perform better under different conditions and how they could complement each other to estimate Technical Debt interests (quality harms). Technical debt interest may be defined as the payment in the form of extra time, effort, and cost to address future changes in a project (Abad and Ruhe, 2015). Similarly, in Ramasubbu and Kemerer (2014), Griffith et al. (2014), J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Curtis et al. (2012b), and Zazworka et al. (2011), the authors conducted studies where they empirically evaluated the relation among different Technical Debt indicators and software quality characteristics in order to test whether the former are really related to the latter. What all these works have in common is that they are focused on the programming level, when the system has already been implemented (if not completely, at least, partially). However, as claimed in Li et al. (2014), Technical Debt can span all the phases of the software lifecycle and there may also be sources of Technical Debt in non-code artifacts (Brown et al., 2010), e.g. requirements documents. Therefore, its identification at early stages of development may provide developers with important information to apply refactoring approaches (e.g. based on aspect-oriented techniques Moreira et al., 2013; Jacobson and Ng, 2004; Jacobson, 2003) improving, thus, modularity also at source code and therefore reducing Technical Debt at latest development stages (or, at least, reducing the future global interest). The reality is that requirements always change and Technical Debt is inevitable (Allman, 2012), however, the issue is not eliminating debt, but rather reducing it or even moving its identification to previous stages. Indeed, this is more important if we consider that those who incurred the debt may usually not be the same as those who will have to re-pay later (Brondum and Zhu, 2012). Nevertheless, to the best of our knowledge, little effort has been dedicated to study the implications of Technical Debt at earlier stages of development. There are some works that have dealt with the definition of Technical Debt at the requirements level (Abad and Ruhe, 2015; Ernst, 2012) or its relation with architectural dependencies (Li et al., 2014; Brondum and Zhu, 2012). Even, these types of debts have been described in the mapping study introduced in Alves et al. (2016) as Requirements and Architecture Debts. However, the empirical evaluation of the quality problems produced by Technical Debt at early stages has been neglected in the literature so far. Based on this assumption, we have formulated the main question that we try to answer in this work: is there a relationship between Technical Debt indicators at the requirements level and software quality? Concretely, we focus on modularity violations (a well-known Technical Debt indicator Wong et al., 2011; Alves et al., 2016) and software stability (a quality attribute related to maintainability International Organization of Standardization, 2014). Thus, our main question is reformulated as follows: is there a relationship between modularity anomalies at the requirements level and system stability? The existence of this relationship would provide empirical evidence of the harmful relationship between Technical Debt and software quality at early stages of development. To tackle the problem of answering this question, this paper presents an empirical study where we evaluate whether modularity anomalies at the requirements level occasioned by crosscutting concerns (Baniassad et al., 2006) are directly related to instability of the system, which would increase its interest. The empirical study is supported by the application of a conceptual framework defined in previous work (Conejero, 2010). The framework allows the identification of modularity violations based on scattering, tangling and crosscutting at any abstraction level but concretely at the requirements level. Moreover, based on this conceptual framework a set of software metrics were defined to quantify the Degree of Crosscutting properties that a system may have. In this work, these metrics are validated by comparing them with similar metrics introduced by other authors, whilst their utility is illustrated by comparing them with a set of metrics that measure stability. All the metrics are applied to measure both modularity and stability properties in three different software product lines (with different releases) and the measurements obtained are pairwise compared to 93 test whether those metrics are correlated and to find an answer for our main question. The rest of the paper is organized as follows. Section 2 briefly introduces the conceptual framework that supports the study by providing a method to identify crosscutting properties at requirements level. Section 3 presents the settings for our empirical study by introducing the hypothesis established, the measures used and the systems considered. Section 4 shows the results obtained and it discusses their interpretation according to our main hypothesis. Section 5 presents an evaluation of the metrics in order to select the most representative for future studies. Section 6 presents the threats to validity for this study. Finally, Section 7 discusses the related work and Section 8 concludes the paper. 2. Background A concern is an interest, which pertains to the system’s development, its operation or any other matters that are critical or otherwise important to one or more stakeholders (van den Berg et al., 2005). The term concern is closely related to the term feature (used in the Software Product Line context) in the sense of being a prominent or distinctive user-visible aspect, quality, or characteristic of a software system or systems (Kang et al., 1990). Software modularity is mainly determined by the concept of Separation of concerns (Dijkstra, 1976), the design principle that proposes the proper encapsulation of systems’ concerns into separate entities. One of the main advantages of separation of concerns is the significant reduction of dependencies between these features or concerns. However, concern independence is not always fully achieved and modularity anomalies arise usually occasioned by the well-known concern properties of scattering, tangling and crosscutting. Crosscutting (usually described in terms of scattering and tangling) denotes the situation where a concern may not be completely encapsulated into a single software component but spread over several artifacts and mixed with other concerns due to a poor support for its modularization (van den Berg et al., 2005). In order to detect these modularity anomalies, crosscutting identification approaches come to the scene. Next section introduces our previous work where a conceptual framework for identifying and characterizing crosscutting properties was proposed. This framework was independent of any particular software development stage. Therefore, it may be applied at stages previous to implementation, e.g. at requirements stage. 2.1. A conceptual framework for analysing modularity anomalies In Conejero (2010) a conceptual framework was presented where formal definitions of concern properties, such as scattering, tangling, and crosscutting, were provided. This framework is based on the study of trace dependencies that exist between two different domains. These domains, which are generically called Source and Target, could be, for example, concerns and requirements descriptions, respectively or features and use cases in a different situation. We use the term Crosscutting Pattern (Fig. 1) to denote the situation where Source and Target are related to each other by means of trace dependencies. From a mathematical point of view, the Crosscutting Pattern indicates that the Source and Target domains are related to each other by a mapping. This mapping is the trace relationship that exists between the Source and Target domains, and it can be formalized as follows: According to Fig. 1, there exists a multivalued function f’ from Source to Target domain such that if f’(s) = t, then there exists a trace relation between s є Source and t є Target. Analogously, we can define another multivalued function g’ from Target to Source 94 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 1 Crosscutting product matrix for dependency matrix in Fig. 3. Crosscutting product matrix Source Source Fig. 1. The crosscutting pattern. s1 s2 s3 s[1] s[2] s[3] s[4] s[5] s[1] s[2] s[3] s[4] s[5] 2 1 0 0 1 1 3 0 1 1 1 1 0 0 0 0 1 0 1 0 1 1 0 0 2 Table 2 Crosscutting matrix for dependency matrix in Fig. 3. source Crosscutting matrix Source target t1 t2 t3 Source t4 Fig. 2. Relationships among source and target elements. that can be considered as a special inverse of f’. If f’ is not a surjection, we consider that Target is the range of f’. Obviously, f’ and g’ can be represented as single-valued functions considering that the codomains are the set of non-empty subsets of Target and Source, respectively. Let f: Source →P (Target) and g: Target → P (Source) be two functions defined by: ∀ s є Source, f(s) = {t є Target: f’ (s) = t} ∀ t є Target, g(t) = {s є Source: g’ (t) = s} The concepts of scattering, tangling and crosscutting are defined as specific cases of these functions. Definition 1. [Scattering] We say that an element s є Source is scattered if card(f(s)) > 1, where card(f(s)) refers to the cardinality of f(s). In other words, scattering occurs when, in a mapping between source and target, a source element is related to multiple target elements. Note that cardinality of f(s) refers to the number of elements of the Target set that are related by f to the Source element s. Definition 2. [Tangling] We say that an element t є Target is tangled if card(g(t)) > 1. Hence, tangling occurs when, in a mapping between source and target, a target element is related to multiple source elements. There is a specific combination of scattering and tangling which we call crosscutting. Definition 3. [Crosscutting] Let s1, s2 є Source, s1 = s2, we say that s1 crosscuts s2 if card(f(s1)) > 1 and ∃ t є f(s1): s2 є g(t). In other words, crosscutting occurs when, in a mapping between source and target, a source element is scattered over target elements and, in at least one of these target elements, source elements are tangled. According to the previous definitions, the following result is a direct consequence. Lemma. Let s1, s2 є Source, s1 = s2, then s1 crosscuts s2 iff card(f(s1)) > 1 and f(s1) ∩ f(s2) = ∅. For the sake of clarity, Fig. 2 shows an example based on a graph representation of the mappings among three different Source elements and four Target ones. As we can see in this figure, there would be a crosscutting situation in t3 element since s1 would be s[1] s[2] s[3] s[4] s[5] s[1] s[2] s[3] s[4] s[5] 0 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 0 scattered over three different target elements (t1, t3 and t4) and t3 would be tangled based on the mapping with s1 and s3. 2.2. Identification of modularity anomalies Based on the crosscutting pattern previously described, a special kind of traceability matrix was defined (called dependency matrix) to represent the function f. An example of dependency matrix with five source and six target elements is shown in Fig. 3. In the rows, we have source elements, and in the columns, target elements are arranged. A ‘1’ in a cell denotes that the target element of the corresponding column contributes to or addresses the source element of the corresponding row (in dependency matrix of Fig. 3, t[1] and t[4] contribute to the functionality of s[1]). Two different matrices called scattering matrix and tangling matrix are derived from the dependency matrix (shown in Fig. 3). These matrices show the scattered and tangled elements in a system, respectively: • • In the scattering matrix, a row contains only dependency relations from source to target elements if the source element in a row is scattered (mapped onto multiple target elements); otherwise, the row contains just zero values (no scattering). This last situation has been highlighted with circles in Fig. 3. In the tangling matrix, a row contains only dependency relations from target to source elements if the target element in a row is tangled (mapped onto multiple source elements); otherwise, the row is filled with zero values (no tangling). This last situation has been also highlighted with circles in Fig. 3. The crosscutting product matrix is obtained through the multiplication of scattering and tangling matrices. The crosscutting product matrix shows the quantity of crosscutting relations and it is an intermediary step to derive the final crosscutting matrix. Tables 1 and 2 show, respectively, the crosscutting product and crosscutting matrices derived from the example shown in Fig. 3. In the crosscutting matrix, each cell denotes the occurrence of crosscutting; it abstracts from the quantity of crosscutting. A crosscutting matrix ccm can be derived from a crosscutting product matrix ccpm using a simple conversion: ccm[i][k] = if (ccpm[i][k] > 0) /\ (i = k) then 1 else 0. More details about this conceptual framework and matrix operations can be found in van den Berg et al. (2005). J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 95 Fig. 3. Process for generating scattering and tangling matrices. Fig. 4. XML-schema to validate the concerns file. 2.3. Building the dependency matrix Our conceptual framework was also extended in order to be automatically applied to software requirements. In particular, syntactical and dependency-based analyses were used to automatically obtain the mappings between source and target elements. In other words, this extension allows automatizing the construction of dependency matrices, which represents the starting point to identify crosscutting concerns. The process to build dependency matrix is divided into two different steps; firstly, requirements documentation of the system is analyzed in order to identify concerns, i.e. source elements; secondly, use case diagrams are analyzed to elicit target elements. To perform the first step, concerns are categorized as functional and non-functional ones. The identification of non-functional concerns is supported by the utilization of a non-functional concerns catalogue. Once concerns, both functional and non-functional are elicited, they are represented in an XML file according to the XML Schema represented in Fig. 4. The second step, identification of target elements, is based on the utilization of use case diagrams. Concretely, every XMI file representing a use case diagram is analyzed to identify system’s use cases. Then, based on the concerns file built in the first step and the XMI file representing the use cases, these two files are automatically queried (by using Xquery, 2018) to identify syntactic dependencies among source and target elements. These dependencies are based on partial or full coincidences on their names. Moreover, in order to detect indirect dependencies among concerns and use cases, the <<include>> dependencies of use case diagrams are also automatically analyzed by processing the XMI file (see Fig. 5). Based on the indirect dependencies obtained, the original dependency matrix is completed with new dependencies and an extended dependency matrix is generated. The reader may obtain further details of these analyses in Conejero (2010) and Conejero et al. (2009). 3. Experimental design As it has been commented in previous sections, the presence of crosscutting in a software system negatively affects its modularity and it is one of the most significant indicators of Technical Debt (Wong et al., 2011; Alves et al., 2016). However, modularity anomalies in a system may impact its quality in different ways, since other quality attributes could be affected, and it may increase interest in different ways. This work focuses on empirically evaluating whether modularity anomalies at the requirements level are directly related to software maintainability (in terms of stability), jeopardizing the system quality. Note that maintainability is one of the main characteristics contributing to Technical Debt interest. Therefore, if modularity violations are harmful to maintainability, they will contribute to increase that interest. Moreover, the identification of Technical Debt at the requirements level may allow the application of refactoring techniques from the very beginning of the development reducing, thus, the interest generated. In other words, ignoring refactorings because of a lack of awareness of better modularization techniques may increase the interest of the software systems (Abad and Ruhe, 2015). In this context, the main research question (MRQ) that we try to answer is: • MRQ: Is Technical Debt based on modularity anomalies at the requirements level harmful to software stability? To answer this question, the next main hypothesis will be tested: 96 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Fig. 5. Building the dependency matrix. • Hypothesis: The more Technical Debt related to modularity anomalies at the requirements level, the less stable a system is. This hypothesis is refined according to the modularity attributes that we consider in this work as follows: • Hypothesis: The higher the Degree of Scattering, tangling or crosscutting at requirements level a system has, the less stable the system is. Note that stability (in conjunction with changeability) is one of the sub-characteristics of the maintainability characteristic defined in the product quality model of ISO-25010 (International Organization of Standardization, 2014). Therefore, Technical Debt may directly affect the maintainability of a system. The evaluation of our hypothesis requires the definition of appropriate measures to assess the attributes that are relevant to such hypothesis. In that sense, two different sets of software metrics at requirements level are used: modularity metrics and stability ones. The former (modularity) measure concern properties related to scattering, tangling and crosscutting (structural and static properties of the software requirements). The latter (stability) is based on observing the evolution of a product line in terms of changes in the different releases and, thus, these metrics are quantified/observed after a change has been completed. In other words, these metrics reflect a dynamic behavioral property of the software’s evolution. Finally, once the hypothesis of our empirical analysis has been defined, the scenarios used to measure the properties must be established. As claimed in Briand et al (2018), in order to validate software measurement assumptions experimentally, one can adopt two main strategies: use small-scale controlled experiments or real-scale industrial case studies. In our case, we adopted the latter strategy and three different real systems were used to validate the analysis. Following, first, modularity and maintainability measures are introduced, and then the systems used as our case studies are presented. 3.1. Modularity measures Firstly we define the set of metrics that are used to assess modularity properties: namely scattering, tangling and crosscutting. To quantify these attributes, the framework summarized in Section 2 is used. Note that this framework provides a formal characterization of these attributes so that it also enables the definition of metrics to measure them. In that sense, we have used our own set of modularity metrics (introduced in Conejero, 2010). Moreover, with the purpose of validating the results obtained by our metrics, other authors’ modularity metrics previously defined in the literature (Ducasse et al., 2006; Eaddy et al., 2008, Sant’Anna et al., 20 07; Sant’Anna et al., 20 03; Figueiredo et al., 20 08) have been also adapted to be applied at the requirements level so that we can avoid potential bias introduced by using just our metrics. Firstly, our set of metrics may be observed in Table 3. The information shown in the table for each metric is: (i) its name, (ii) 97 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 3 Metrics defined based on our conceptual framework. Metric Definition Relation with crosscutting pattern matrices Calculation Nscattering (sk ) Number of target elements addressing source element sk Addition of the values of cells in row k in dependency matrix (dm) = Degree of Scattering (sk ) Crosscutpoints (sk ) Ncrosscut (sk ) Degree of Crosscutting (sk ) Concern Degree of Tangling (sk ) Normalization of Nscattering (sk ) between 0 and 1 (by dividing it between the number of target elements (|T|)) Number of target elements where the source element sk crosscuts to other source elements Number of source elements crosscut by the source element sk Addition of the Crosscutpoints (sk ) and Ncrosscut (sk ) normalized between 0 and 1 (by dividing it between the addition of the number of source elements (|S|) and target elements (|T|)) |T |  d mk j j=1 =  Nscattering |T | 0 Diagonal cell of row k in the crosscutting product matrix (ccpm) = ccpmkk Addition of the values of cells in row k in the crosscutting matrix (ccm) = i f Nscattering (sk ) > 1 i f Nscattering (sk ) = 1 ccmki i=1 = Addition of the Degree of Tangling metric of each use case that addresses the concern sk | S|  ( sk ) = Crosscut points(sk )+Ncrosscut (sk ) | S| + | T | T  (Degree of Tangling(ti ) ) / f ′ (sk ) = ti i=1 Table 4 Modularity metrics defined by other authors. Author Metric Original definition Adapted metric Definition at requirements level Sant’Anna et al. Concern Diffusion over Components (CDC) Concer Diffusion over UseCases (CDUC) Ducasse et al. Spread It counts the number of usecases (target elements) addressing a concern (source element) It counts the number of usecases related to a particular concern Eaddy et al. Degree of Scattering (DOS) It counts the number of components (target elements) addressing a concern (source element) It counts the number of modules (classes or components) related to a particular concern It is defined as the variance of the Concentration of a concern over all program elements with respect to the worst case a brief description, (iii) its relation with the conceptual framework and traceability matrices (introduced in Section 2), and (iv) the formula used to compute it. Secondly, Table 4 summarizes the set of modularity metrics previously defined in the literature that are also used in this study. In (Sant’Anna et al., 2007; Sant’Anna et al., 2003) Sant’Anna et al. introduced a set of concern-oriented metrics to assess modularity in terms of fundamental attributes of software such as separation of concerns, coupling, cohesion or size. We have used the metric Concern Diffusion over Components (CDC) that measures the number of components whose main purpose is to contribute to the implementation of a concern. In Ducasse et al. (2006), Ducasse et al. introduced a technique to visualize software partitions in form of colored rectangles and squares. This technique was called Distribution Map and it allows partitions that represent all the software artifacts to be graphically represented. Based on distribution maps, the authors introduced the measure Spread, which counts the number of modules (classes or components) related to a particular concern. Similarly, Eaddy et al. introduced the Degree of Scattering (DOS) metric in Eaddy et al. (2008), where the authors presented an empirical analysis showing the correlation existing between scattering and number of faults in software systems. DOS provides information about how concern’s code is distributed over software artifacts. Note that although both sets of metrics (ours and other authors’) may be applied to different software artifacts (at different abstraction levels), they have been instantiated here to be applied at requirements level. For instance, given that other authors’ met- It is defined as the variance of the Concentration of a concern over all usecases with respect to the worst case rics were focused on the design or programming level, we adapted them to the requirements level just by considering different software artifacts (the authors did the same action to apply the metrics at a different abstraction level, e.g. at the architectural level Sant’Anna et al., 2007). In one case, a new name for the metric has been even provided, namely Concern Diffusion over Use cases (CDUC) metric that measures use cases instead of components or classes. Likewise, all the metrics that were defined in terms of components or classes were adapted to count use cases (it may be observed in the last column of Table 4). Finally, in order to define and use those metrics, we have considered concerns (or features1 ) and use cases as the source and target domains, respectively. As it is claimed in Jacobson (2003), use cases have been universally adopted for requirements specification and in this work we assume that system’s features are defined at a higher abstraction level (e.g. feature diagrams for domain modeling) than use cases, as it has been widely claimed in the literature (Eriksson et al., 2005; Griss et al., 1998). This is why they are considered as source and target domains, respectively. Since all the metrics presented in this section are based on the relations between source and target domains represented by the crosscutting pattern all the metrics were calculated based on the mappings existing between these two domains. 1 Note that in this study the term of feature is used as a synonym of concern. 98 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 3.2. Maintainability measures In order to measure maintainability of software systems, this section presents some measures for assessing stability, a particular maintainability attribute. Stability is defined as the capability of software products to avoid unexpected ripple effects when modifications are performed (ISO/IEC, 2001). Stability is highly related to change management so that the less stable a system is, the more complicated the change management becomes (Klass van den, 2006). Unstable models gradually lead to the degeneration of the design maintainability and its quality in general (Klass van den, 2006). To measure stability in software systems, we observed different releases of the same system and computed the number of use cases that were changed in each release. A modification in a use case may be mainly due to: either (i) the concerns, which it addresses, have evolved, or (ii) it has been affected by the addition, modification or removal of a concern to the system. Based on these changes, the use cases with a number of modifications higher than a threshold value are marked as unstable, whilst the use cases with a number of changes lower than this threshold value are considered stable. This threshold value depends on the particular system or case study where the metric is being applied. For the three case studies used in this paper, the values are shown in the Appendices (one for each case study). Then, we use as stability metrics the numbers of stable and unstable use cases that implement each concern as a way to measure the stability of these concerns. Instability(sk ) = # unstable usecases that address Sk (7) St ability (sk ) = # st able usecases that address Sk (8) 3.3. Motivating cases This section presents three different systems that were used to perform our study. All these systems were implemented as Software Product Lines with different releases. The reason for choosing these applications for our analyses is threefold. First, as they are software product lines, maintainability is of utmost importance; instabilities and changes affect negatively not only the software product line architecture but also all the instantiated products. Second, the software architectures and the requirements had allencompassing documentation; e.g., the description of all the use cases was made available as well as a complete specification of all component interfaces. Third, the architectural components were independently defined and provided by experienced developers. 3.3.1. MobileMedia MobileMedia is a product line system built to allow the user of a mobile device to perform different operations, such as visualizing photos, playing music or videos and sending photos via SMS. It has around 3 KLOC. MobileMedia encompasses eight designed and implemented subsequent releases (from 0 to 7) that support the analysis of different maintainability facets, such as stability. For instance, release 0 implements the original system with just the functionality of viewing photos and organizing them into albums (see Figueiredo et al., 2008 for more detail). Its scenarios cover heterogeneous concerns ranging from mandatory to optional and alternative features, as well as non-functional concerns. The different releases together with the changes encompassed in each release have been shown in the Appendix A. Note that the purpose of these changes is to exercise the implementation of the feature boundaries and, so, assess the stability of the product line requirements. Note that some non-functional concerns (NFC) are also ex- plicitly considered as concerns of the system (e.g., Persistence and Error Handling). All the concerns involved in the system have been also presented in Appendix A. 3.3.2. HealthWatcher The second system used in our analysis is called HealthWatcher. HealthWatcher is a typical Web-based program family that allows a citizen to register complaints regarding health issues (Greenwood et al., 2007). The system has around 4 KLOC and it has been developed as a product line in different releases. The first HealthWatcher release of the Java implementation was deployed in March 2001. Since then, a number of incremental and perfective changes have been addressed in posterior HealthWatcher releases. These releases allow us to observe typical types of changes in such application domain. In particular, for the purpose of our analysis, we have considered the requirements of five different releases of the product line. All these releases and the different concerns involved in each release are also shown in Appendix B. Note, again, that we have used the concerns used in previous analyses at later stages, e.g. at architectural level (Greenwood et al., 2007). 3.3.3. SmartHome The last product line analyzed in our study was developed by industry partners of the AMPLE European project. Concretely, this product line is taken from the domain of the Building Technologies (BT) division at Siemens (Elsner et al., 2008) and allows simulating the control of different devices of a smart home, including windows, heating, air conditioning, blinds, alarms, doors, and so on. We selected this system because a wide range of the system artifacts were publicly available (Elsner et al., 2008), e.g. system descriptions, feature models, and architecture design. Moreover, the system corresponds to a different domain and it is considerably bigger than MobileMedia and HealthWatcher, thereby allowing us to evaluate the generality and scalability of concern-driven analyses. The SmartHome system has around 17 KLOC (Elsner et al., 2008). The feature model of the product line has been built by using the SPLOT tool and it is stored and publicly available at its repository.2 From the huge amount of possible products that could be generated from this feature model, we selected three releases (product instantiations), detailed in Appendix C. We have used an additive strategy to select the three releases so that the first release contains a set of core features and the other ones just add features to the former. This strategy allows us to analyze the stability of the product line to accomplish changes. 4. Results and discussion This section presents the process followed to test the main hypothesis established in Section 3. This process is driven by the evaluation of the metrics introduced in the previous section and the analysis of their correlations. In other words, modularity metrics are empirically compared with stability ones to test their correlations. The process is composed by the next four different substeps (see Fig. 6): (1) the measurements for all the metrics in the different releases for each case study (described in Section 3.3) are calculated; (2) the average for these measurements (for all the releases of each system) are calculated; (3) the measurements are pairwise correlated in order to calculate Pearson’s correlation coefficient; (4) the results are deeply analyzed to check which software characteristics are empirically related (correlated) and the main conclusions of these correlations are extracted. For the sake of brevity, the results obtained for the metrics in the three case studies have been shown in the different appendices 2 Software Product Line Online Tool: http://www.splot-research.org/. 99 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Fig. 6. Process followed for first step of the study. Table 5 Pearson’s coefficients for the correlations among modularity and maintainability metrics. Modularity metrics Instability MobileMedia HealthWatcher SmartHome Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) 0.71 0.78 0.81 0.76 0.82 0.82 0.74 0.82 0.81 0.74 0.95 0.91 0.86 0.92 0.87 0.77 0.90 0.83 0.71 0.78 0.81 0.71 0.78 0.81 0.84 0.97 0.88 from D to F. Appendix D shows metrics averages for all the releases of the MobileMedia system, Appendix E shows HealthWatcher results and Appendix F does the same for SmartHome system. Once measurements were obtained, correlations among them were calculated for each pair of measures through the Pearson’s correlation coefficient. However, since we are mainly interested in testing our main hypothesis, in this section we just focus on correlations among modularity and instability metrics (summarized in Table 5). The correlations with Stability metric have not been presented since, obviously, their coefficients are symmetric to Instability ones. However, all the measurements used to obtain the coefficients and the scatter plots that represent these correlations are presented in Appendices D–F. Note also that, although it is not described in this section, the analysis of correlations among other metrics is also interesting. For instance, these correlations are used in Section 5 to empirically validate our modularity metrics and reduce its dimensionality by using a Principal Component Analysis (PCA). To confirm our main hypothesis we observed that scattering, tangling and crosscutting metrics present strong correlations (with high values of coefficient) with Instability in the three product lines. In other words, features with the highest degrees of scattering, tangling and crosscutting are implemented by more unstable usecases (those frequently changing throughout the different releases). Firstly, focusing on MobileMedia system, Table 5 shows how the correlations range from 0.71 to 0.78 and Degree of Crosscutting and Eaddy’s Degree of Scattering are the metrics that present highest correlations with Instability, with coefficient values of 0.86 and 0.84, respectively. Secondly, the results obtained for HealthWatcher system are consistent with those obtained for MobileMedia. Even, we also observed that correlations obtained for this system were stronger (in general) than those obtained for the previous system. In this case, the metrics with highest correlations were Eaddy’s Degree of Scattering and NCrosscut with coefficient values of 0.97 and 0.95, respectively. Finally, the results obtained for SmartHome system, the biggest one, confirm the results obtained for previous systems. However, in this case, we observed a decrease in the coefficient values for the correlations among some of the metrics (e.g. NCrosscut, Degree of Crosscutting or Eaddy’s Degree of Scattering) with respect to those obtained for the HealthWatcher system. The metrics with highest values, in this case, were again NCrosscut and Eaddy’s Degree of Scattering, with coefficient values of 0.91 and 0.88, respectively. Based on the results obtained, we evaluated our main hypothesis: • Hypothesis: The higher the Degree of Scattering, tangling or crosscutting at requirements level a system has, the less stable the system is. We concluded that all these data provide evidences about a relationship between these measures. In other words, we could say that the higher the Degree of Crosscutting a feature has, the more unstable use cases are implementing that feature. This indicates that modularity anomalies due to crosscutting may be harmful to stability in systems. Moreover, since the opposite (negative) values were obtained for the correlations with Stability metric (shown in Appendix D), these data also come to confirm the hypothesis showing that stable use cases in all the systems address well-encapsulated features and not crosscutting ones. Therefore, the improvement of modularity, e.g. by means of aspectoriented refactoring techniques, may provide important benefits in the future in terms of stability, thus, easing the system maintainability and reducing its future Technical Debt interest. Examples of these aspect-oriented refactoring approaches are Moreira et al. (2006) and Alférez et al., (2008) where the authors used Use Cases Pattern Specification and Activity Diagrams composition to improve modularity at requirements level. Based on this approach, a new relationship is added to use case diagrams notation that allows encapsulating the features with a higher degree of crosscutting. Then, the behaviour of these use cases is defined by means of activity diagrams that are later on implemented in isolated entities by using aspect-oriented programming approaches (the application of this approach was also illustrated in Conejero, 2010). 5. Metrics evaluation As it was mentioned in previous sections, the results obtained in the analysis presented may be also useful for further studies. Hence, with the aim of validating our metrics, in this section we pairwise compare the correlations obtained for all the modularity 100 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 6 Correlations matrix for modularity metrics in MobileMedia. Each cell shows Pearson’s coefficient for the correlation between the metric on that row and the one in the corresponding column Nscattering Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) 1.00 0.99 0.99 0.56 0.94 0.91 1.00 1.00 0.72 Degree of Scattering 1.00 0.99 0.55 0.95 0.96 0.99 0.99 0.75 Crosscutpoints NCrosscut 1.00 0.62 0.96 0.91 1.00 1.00 0.77 1.00 0.77 0.45 0.56 0.56 0.94 Crosscutpoints NCrosscut 1.00 0.90 0.96 0.96 1.00 1.00 0.83 1.00 0.98 0.97 0.87 0.87 0.98 Crosscutpoints NCrosscut 1.00 0.94 0.99 0.94 0.98 0.98 0.98 1.00 0.98 0.92 0.93 0.93 0.98 Degree of Crosscutting Concern Degree of Tangling 1.00 0.91 0.94 0.94 0.91 1.00 0.91 0.91 0.70 Degree of Crosscutting Concern Degree of Tangling 1.00 0.99 0.95 0.95 0.94 1.00 0.95 0.95 0.93 Degree of Crosscutting Concern Degree of Tangling 1.00 0.95 0.98 0.98 1.00 1.00 0.95 0.95 0.95 CDUC Spread 1.00 1.00 0.72 1.00 0.72 CDUC Spread 1.00 1.00 0.80 1.00 0.80 CDUC Spread 1.00 1.00 0.98 1.00 0.98 Degree of scattering (Eaddy) 1.00 Table 7 Correlations matrix for modularity metrics in HealthWatcher. Nscattering Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) 1.00 1.00 1.00 0.87 0.95 0.95 1.00 1.00 0.80 Degree of Scattering 1.00 1.00 0.90 0.96 0.97 1.00 1.00 0.83 Degree of Scattering (Eaddy) 1.00 Table 8 Correlations matrix for modularity metrics in SmartHome. Nscattering Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) 1.00 1.00 0.98 0.93 0.98 0.95 1.00 1.00 0.98 Degree of Scattering 1.00 0.98 0.94 0.98 0.95 1.00 1.00 0.98 metrics to check whether the results obtained by our metrics (Table 3) are consistent with those obtained for the metrics previously introduced by other authors (Table 4). Moreover, based on the correlation coefficients obtained for modularity metrics, a Principal Component Analysis (PCA) is performed to select a representative subset of the modularity metrics considered in this study. As a result, we may discard some of the modularity metrics in future studies and calculate just a subset of them. 5.1. Modularity metrics comparison From Tables 6 to 8 the correlation coefficients among modularity metrics for the three product lines used in our study are presented. Based on the observation of the correlation matrices for the three systems, we noticed that the coefficients obtained were, in general, close to 1, indicating a high correlation among the metrics (the p-values Sokal and Rohlf, 1994 for the correlations in each system have been also shown in the Appendices from D to F). Furthermore, we also observed some interesting results. For instance, in MobileMedia (Table 6) we observed that the correlations among some metrics were not so high like in the other two systems. Concretely, the coefficients for the correlations between NCrosscut and the rest of metrics are, in general, lower than the rest of correlations. The only exception is for the correlation with Eaddy’s Degree Degree of Scattering (Eaddy) 1.00 of Scattering where the value obtained is 0.95. Based on these data, we identified that the metrics were grouped according to their pairwise correlations into two main groups: on the one hand, there is a group composed by NCrosscut and Eaddy’s Degree of Scattering metrics, and, on the other hand, the group composed by the metrics Nscattering, Degree of Scattering, Crosscutpoints, Degree of Crosscutting, Concern Degree of Tangling, CDUC and Spread (the rest of metrics). Based on the observation of the results obtained for the other two systems (HealthWatcher and SmartHome), bigger than MobileMedia product line, we identified that all the coefficients were higher than 0.8 in both systems. We may then conclude that there exists a strong relationship among all the metrics, indicating that our metrics are consistent with those previously introduced in the literature. 5.2. Principal component analysis The idea of the principal component analysis is to find linear combinations of correlated variables to describe most of the variation in the dataset with a small number of new uncorrelated variables (Abdi and Williams, 2010). The PCA transforms the data to a new coordinate system, where the greatest variance by any projection of the data lies along the first coordinate (the first principal component), the second greatest variance – along the second 101 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Fig. 7. Representation of the 9 Principal Components in MobileMedia, HealthWatcher and SmartHome Systems. coordinate, and so on. There can be as many principal components as variables, but typically only the first two or three are needed to explain most of the total variation. Principal components PCx(x ∈ N, x ≤ Nt ) are a linear combination of the original variables: PCx = Nt  a(i )x X i i=1 Where −1 ≤ a(i )x ≤ 1 are the coefficients of the linear transformation, Xi are the original variables and Nt is the number of the original variables. In our study we have considered 9 different metrics (variables), according to the modularity metrics used in the study. Therefore, PCA will result in 9 principal components. Following these variables are numbered to simplify variable identification within the images. X1. X2. X3. X4. X5. X6. X7. X8. X9. NScattering Degree of Scattering Crosscutpoints Ncrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) In order to apply the PCA process in our three systems, we used as input our correlations matrices (from Tables 6 to 8) that meet the conditions to be used in the process (to be a symmetric correlation or covariance matrix). Based on the 9 principal components obtained by the PCA process, Fig. 7 shows the importance of the first two ones. Concretely, it shows that these two components explain around 98.13% variance in the data set in MobileMedia System, around 99.71% variance in the data set in HealthWatcher System and 98.72% variance in the data set in SmartHome System. In other words, based on these results we may reduce the 9 variables to 2 principal components (PC1 and PC2) without compromising on existing variance. Note that, as aforementioned, the first principal component (PC1) captures the maximum variance in the data set whilst the second one (PC2) captures the remaining variance in the data set and is uncorrelated with PC1. In other words, PC1 and PC2 are orthogonal. Once we have identified the principal components, we analyzed the degree of contribution of every variable (metrics) to each principal component, so that we can select those with higher contributions a candidate to be used in further studies. These contribu- Fig. 8. Contributions of variables to PC1 in MobileMedia System. Table 9 Groups of metrics according to the contribution to each PC. System PC1 PC2 MobileMedia HealthWatcher SmartHome X5, X3, X2, X7, X8, X1 X5, X6, X2, X3, X7, X8, X1 X2, X9, X5, X1, X7, X8, X3 X4, X9 X9, X4 X4 tions are also obtained by applying the PCA process and they are presented in Figs. 8–13 for the three systems. The red dashed lines in the graphs from Figs. 8–13 indicate the expected average contribution of the variables to the principal components. If the contribution of the variables were uniform, the expected value would be 1/num_of_variables (in this case 11,11%). Taking into account this value, for a given component, a variable with a contribution larger than this cutoff (11,11%) may be considered as providing an important contribution to the component definition (Abdi and Williams, 2010; Kassambara, 2018). In this case, based on the values obtained, the metrics that provide an important contribution to PC1 and PC2 in each system are the presented in Table 9. 102 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Fig. 12. Contributions of variables to PC1 in SmartHome System. Fig. 9. Contributions of variables to PC2 in MobileMedia System. Fig. 13. Contributions of variables to PC1 in SmartHome System. Fig. 10. Contributions of variables to PC1 in HealthWatcher System. We may consider the existence of two groups of well-correlated variables in every system (see Table 9). Based on these two groups, our next step is to decide what variables can be selected to be the representative of these two groups and be used in future analyses. To make this decision, we observed the contribution of every variable to each PC. Based on these contributions, X5 (Degree of Crosscutting) and X4 (Ncrossut) have been selected as the most representative metrics for PC1 and PC2, respectively. There are several reasons for this choice: • • Fig. 11. Contributions of variables to PC2 in HealthWatcher System. X5 is the metric with a higher contribution to PC1 in two of the three systems. Moreover, although in SmartHome system there are two metrics with a slightly higher contribution to PC1 than the provided by X5, X5 is a normalization of other modularity metrics (including one of those with higher contribution) so that it is more representative of crosscutting properties. Finally, the correlation between X5 and the rest of metrics of the same group is really high in the three systems. Therefore, we can assume that PC1 ≈ n∗ X5, being n the number of metrics in that group. X4 has the higher contribution to PC2 in two of the three systems and it is the only one that contributes to PC2 in the three systems. Moreover, the correlation between X4 and X9 in J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 MobileMedia and HealthWatcher systems is close to one, so we can assume that PC2 ≈ n∗ X4, being n the number of metrics that contribute to PC2 in those systems. So, based on these results, we can conclude that X5 (Degree of Crosscutting) and X4 (NCrosscut) may be used in further studies as a representation of the rest of metrics in order to reduce system dimensionality and redundant data. 6. Threats to validity In this section, we elaborate on several factors that may jeopardize the validity of our results. In order to present these threats, we follow the well-known categorization introduced by Wohlin et al. (20 0 0), where threats are classified into four different validity categories: construct, internal, external and conclusion. 6.1. Construct validity Construct validity focuses on the relationship between the theory behind an experiment and the observations. In that sense, the selection of concerns and use cases as the elements of the source and target domains, respectively, may be considered a potential threat to the construct validity of the study. There may be usually alternative decompositions both in source and target, and alternative mappings between source and target. These alternative decompositions may impact in a different way on quality attributes such as adaptability, reusability and maintainability. However, whatever the decomposition is, in order to detect the cases where modularity violations are present, we need to apply the Crosscutting Pattern. Obviously, the whole empirical process presented here could be applied to these alternative decompositions and we could select the one with the best results for the desired quality attributes. In this work, we selected just one possible decomposition considering that it was supervised by experienced developers. 6.2. Internal validity Internal validity refers to the relationship between the treatment for an experiment and the outcomes obtained—in other words, whether we are sure that treatment we used in an experiment is really related to the actual outcome we observed. In this case, another potential threat to the validity of the study may be the creation of the dependency matrix of the Crosscutting Pattern that provides the starting point for applying the theory behind the process and calculating the measurements of the study. This matrix is filled with the mapping dependencies existing between source and target. As explained in Section 2.3, the Crosscutting Pattern was extended with syntactical and dependency-based analyses to automatically obtain the mappings between source and target elements. This extension was based on the development of an Eclipse plugin that has been used in our study to collect all the mappings. Of course, the main goal of this tool is just to assist the developer in the selection of these mappings and the results obtained may be corrected by the developer based on her own experience. In our study, the matrices were supervised by experienced developers. The reader may find a deeper description of this extension in Conejero (2010). 6.3. External validity applicability of the study out of the academical context could be ensured. Nevertheless, we are aware that bigger and more complex systems, e.g. open source ones, would help to better demonstrate the applicability of the approach in different contexts. However, most of these systems usually lack a comprehensive documentation, so that neither requirement documents nor use cases are frequently available. These projects usually rely on an agile software development methodology, where the requirements are defined by numerical or item textual lists. The main reason is that contributors are usually volunteers and they basically spend their time in developing short functionalities and bug fixes. As an illustrative example, the requirements defined in three open sources projects have been analyzed: Linux Kernel,3 LibreOffice4 and Gimp.5 The development of the Linux Kernel project is mainly driven by a bug tracker (Bugzilla) used by the developers to decide the next functionalities to be incorporated into the system. This repository could be used to follow the system’s evolution. However, the system lacks high abstraction software artifacts (e.g. requirements or design artifacts) so that developers just rely on the list of bugs to be fixed and the source code. Nevertheless, there are some works that have used the Linux Kernel Configuration language (LKC)6 for managing and analysing Linux kernel as a software product line (Lotufo et al., 2010; Sincero et al., 2007; Sincero and Schröder-preikschat, 2008; Passos and Czarnecki, 2014). In other words, they use this language to build a feature model so that kernel characteristics like the processor architecture are defined as features of the system. However, this model allows just having a different representation of the list of functionalities to be built but the authors did not provide artifacts at an abstraction level higher than code. The development of LibreOffice is chaired by an Engineering Steering Committee (ESC)7 composed by a set of individuals with skills in different areas of software development like coding, user experience, QA, release engineering, packaging and more. They command technical issues about LibreOffice. However, in order to set the functionalities, they use a shared document where they include the next items to be developed. Similarly, GIMP maintains the project roadmap in a wiki.8 The wiki includes the main functional requirements to be developed defined just as text so that they have a large granularity. So, in both projects there is a lack of software artifacts different from source code which hinders the applicability of our approach in them. 6.4. Conclusion validity This validity is concerned with the relationship between the treatment and the outcome. Obviously, from a statistical point of view, we may not assure that the results obtained in the experiment may be generalized to any kind of system. As aforementioned, three product lines belonging to different domains have been used in this study. Moreover, they range from 3KLOC to 17KLOC so that different system sizes were considered. However, obviously, in order to generalize the results obtained in the study, other case studies should be considered where different characteristics could be also tested, e.g. different requirement notations and elicitation; systems where early-aspect oriented techniques have been used to modularize crosscutting concerns from the beginning of the development; or systems not implemented as product lines. 3 4 External validity refers to the possibility of generalizing the results outside the scope of the study. In that sense, as explained in Section 3, the case studies used for our experiment were all developed by external teams (professional programmers) so that the 103 5 6 7 8 The Linux Kernel Archive, https://www.kernel.org/. LibreOffice, https://es.libreoffice.org/. GIMP, http://www.gimp.org.es/. https://www.kernel.org/doc/Documentation/kbuild/kconfig-language.txt https://wiki.documentfoundation.org/Development/ESC http://wiki.gimp.org/index.php/Roadmap 104 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 7. Related work This section has been organized according to the different kind of works that are included. Firstly, we mention some works that focus on the identification of Technical Debt. Secondly, some works that empirically demonstrate the relationship between Technical Debt and software quality are presented. Finally, some works that deal with Technical Debt in early stages of development (Requirements or Architecture) are commented. 7.1. Identification of Technical Debt Different approaches to identify Technical Debt may be found in the literature, whose characteristics have been collected and compared in mapping studies such as the presented in Li et al. (2015). Examples of these approaches are the identification of code smells (Schumacher et al., 2010; Marinescu, 2004), the analysis of design patterns anomalies or grime buildup (Gueheneuc and AlbinAmiot, 2001; Izurieta and Bieman, 2007), the study of violations of good programmer practices by using ASA (Vetro’ et al., 2010) or the identification of modularity violations (Wong et al., 2011). Indeed, all these approaches have been summarized in the mapping study presented in Alves et al. (2016) where the authors analyzed, among others, Technical Debt types and its main indicators (that are usually considered by the different approaches). The approach presented in this paper could be classified into the modularity violations category, which was identified as one of the most recurrent indicators in Alves et al. (2016). As an example of this category, in Wong et al. (2011) the authors presented a study where they analyzed modularity violations in 15 releases of the Hadoop Common system and 10 releases of Eclipse JDT. The authors stated that two supposedly independent modules should not change together because of modification requests. They called for these situations modularity violations. They also categorized violations into four different types according to symptoms of design problems: cyclic dependencies, code clone, poor inheritance and unnamed coupling. As we state in this work, the authors also claimed that making developers awake of violations as soon as possible may help to avoid accumulating modularity decay. However, all these works are focused on source-code artifacts and, therefore, the identification of Technical Debt indicators is relegated to late development stages. By following a different approach, in Fontana et al. (2016) authors used a set of existing tools, which provide general quality indexes, to analyze whether these indexes could be also related to Technical Debt. In particular, those tools measure the following software attributes: structural flaws in production code (used by CAST9 ); design flaws, including code and architectural smells (inFusion,10 Sonargraph or Structure10111 ); violations of programming best practices (Sonorgraph); or coding constraints (SonarQube12 ). In this work, authors emphasize the need for dealing with architectural issues related to Technical Debt (supporting our claim about anticipating identification to previous development stages), however, they do not mention requirement artifacts. 7.2. Empirical studies about Technical Debt In Zazworka et al. (2014) authors presented an empirical analysis where they evaluated the aforementioned four different Technical Debt identification approaches with the aim of studying whether they could complement each other in terms of their relationships with several Technical Debt interests (quality characteris9 10 11 12 http://www.castsoftware.com/ https://www.intooitus.com/, its evolution at http://www.aireviewer.com. http://structure101.com/products/ http://docs.sonarqube.org/display/SONARQUBE52/Technical+Debt tics). To this purpose, authors defined a set of 25 Technical Debt indicators including modularity violations. They extracted interesting conclusions such as the lack of relationship among some indicators and interests (meaning that some indicators may not be harmful to software interest) or the strong relationship identified between modularity violations and change-proneness (similar to the relationship identified in this work among modularity properties and stability). In Griffith et al. (2014) authors also conducted a study where they analyzed the relation between three different Technical Debt estimation approaches and an external quality model. The study was driven by applying the three estimation approaches to ten Java open source projects and the quality model included the following characteristics: reusability, flexibility, understandability, functionality, extendibility and effectiveness. The results obtained were compared with the quality model by calculating the correlations and linear regressions among the measures. In Curtis et al. (2012b) authors performed a study where they estimated the cost of Technical Debt in software systems by automatically studying the source code of 745 applications from 160 different companies. Based on the results obtained, the authors concluded that the 30% of the Technical Debt interest measured was related to the cost of changeability. A similar study was presented in Curtis et al. (2012a) where the authors empirically evaluated the relation among different Technical Debt indicators and software quality characteristics. However, again all these studies are based on the utilization of Technical Debt estimation approaches and indicators focused on code-artifacts at the programming level. In this work we performed a similar study but moving the process to the earlier stages of development so that Technical Debt estimation may be anticipated in the development life cycle. 7.3. Dealing with Technical Debt at early stages Although it has been clearly identified that Technical Debt is also related to non-code software artifacts (Brown et al., 2010), there are just a few works dealing with the management of Technical Debt at early stages of development. In Ernst (2012) Ernst introduced a definition of Technical Debt at requirements level as the distance between the optimal solution to a requirements problem and the actual solution. He also introduced a tool to decide what the optimal solution to a requirements problem is. Then, when the requirements problem changes, a new optimal solution may be selected by means of the tool in order to minimize that distance. Unlike this work, our work focuses on applying at requirements level some of the techniques used at programming level to identify Technical Debt. Our main goal is to provide developers with information that they may use to anticipate refactoring decisions and to reduce, thus, Technical Debt at later stages. Moreover, the work in Ernst (2012) did not show the relation between Technical Debt and system quality. In Li et al. (2014) authors focus on the study of Technical Debt at the architectural level. They evaluate the relationship among modularity metrics and a Technical Debt indicator called ANMCC (Average Number of Modified Components per Commit) and correlate these measurements concluding that modularity metrics may substitute ANMCC in order to measure Technical Debt. Similarly, in Mo et al. (2015) and Fontana et al. (2017) authors also presented tools to identify architectural problems that usually incur in high maintenance costs or quality problems. Furthermore, they also concluded that architectural problems are an early source of quality problems that could be avoided by using refactoring techniques. However, unlike the work presented here, in all these works authors apply their approach over systems’ source code so that decisions may not be taken in early stages of development. In Brondum and Zhu (2012) authors presented a modelling approach to visualize complex dependencies at the architectural level in order to extend current approaches. They argue that their J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 dependency model supports the strategic use of Technical Debt, the use of more accurate estimation models and the identification of different sources of debt. However, again, authors neither deal with the relationship between Technical Debt and quality attributes nor provide a way to identify Technical Debt indicators at early stages of development. 8. Conclusions This paper has presented an empirical study where we analyzed the relation between Technical Debt, caused by modularity anomalies, and other software quality properties, namely maintainability attributes. The study has provided evidences of correlations among concern properties, such as scattering, tangling and crosscutting, and stability, one of the most important maintainability characteristics. Based on these correlations we extracted important conclusions that allow us to empirically prove that modularity anomalies are harmful to system stability. Concretely, the higher the Degree of Scattering, tangling and crosscutting a system has, the less stable the system is. In other words, we observed that a particular type of Technical Debt caused by modularity decisions taken at requirements level is highly related to maintainability problems at this level. The empirical study has been supported by means of a conceptual framework that allows the identification of modularity anomalies at any abstraction level or development stage. In this study, it has been instantiated to be focused on early development stages (requirements level) so that the identification of Technical Debt may be conducted from the very beginning of the software development life cycle. Modularity metrics, defined by this framework, have been used during the empirical study. These metrics were also empirically validated by comparing them with similar metrics introduced by other authors. This comparison allowed us not only to validate our metrics but also to identify their dependencies and determine which metrics are equivalent, i.e. they measure similar properties. Finally, the identification of Technical Debt at requirements level provides important information that may be used to avoid this problem at later development stages. In this sense, aspectoriented refactoring solutions to apply advanced separation of concerns techniques at the requirements stage may reduce modularity problems at the programming level, with the consequent sav- ings in both time and money. Less modularity problems imply less Technical Debt of systems in development in terms of a significant reduction of their future interest. As further work, we plan to extend our study by following two different lines: (i) conducting the study by using other requirement notations (e.g. goal-oriented ones), this could even allow us to gain insight in how the selected requirements notation affect Technical Debt identification at early stages of development; (ii) considering other quality attributes to check whether modularity anomalies may also influence them to derive, perhaps, new conclusions, examples of these attributes may be software understandability or reusability. Acknowledgments The authors gratefully acknowledge the support of TIN201569957-R (MINECO/FEDER, UE) project, Consejería de Economía e Infraestructuras/Junta de Extremadura (Spain)- European Regional Development Fund(ERDF)- GR15098 project and IB16055 project to the work presented here. This work was also partially supported by the 4IE project (0045-4IE-4-P) funded by the Interreg V-A EspañaPortugal (POCTEP) 2014-2020 program. We would like to thank A. Garcia and E. Figueiredo for allowing us to use the MobileMedia case study and for their comments and support on this work. Appendix A. MobileMedia releases Table 10 summarizes the changes made in each MobileMedia release. The scenarios cover heterogeneous concerns ranging from mandatory to optional and alternative features, as well as nonfunctional concerns. Table 10 also presents which types of change each release encompassed. The purpose of these changes is to exercise the implementation of the feature boundaries and, so, assess the stability of the product line requirements. Note that some nonfunctional concerns (NFC) are also explicitly considered as concerns of the system (e.g., Persistence and Error Handling). Table 11 shows the concerns used in the analysis and the releases that include these concerns. The threshold value for considering use cases unstable in this case study was established to 2 so that any use case with a number of changes equal or higher than 2 was considered as unstable. This number was selected based on the number of releases (a high number of releases imply, in general, more changes). Table 10 Different releases of MobileMedia. Release Description Type of changes r0 r1 MobileMedia basic functionality Exception handling included r2 New feature added to count the number of times a photo has been viewed and sorting photos by highest viewing frequency. New feature added to edit the photo’s label New feature added to allow users to specify and view their favorite photos New feature added to allow users to keep multiple copies of photos in different albums New feature added to send and receive photos via SMS New feature added to store, play, and organize music. The management of photo (e.g. create, delete and labeling) was turned into an alternative feature. All alternative features (e.g. sorting, favorites, and copy) were also provided for music New feature added to manage videos None Inclusion of non-functional requirement Inclusion of optional and mandatory features r3 r4 r5 r6 r7 105 Inclusion of an optional feature Inclusion of an optional feature Inclusion of an optional feature Changing one mandatory feature into two alternatives Inclusion of an alternative feature 106 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 14 Different releases of SmartHome. Table 11 MobileMedia concerns and releases where are included. Features Releases Features Releases Release Description Type of changes Album Photo Label Persistence Error handling Sorting Favourites r0–r7, r0–r7, r0–r7, r0–r7, r1–r7 r2–r7 r3–r7 Copy SMS Music Media Video Capture r4–r7 r5–r7 r6, r7 r6, r7 r7 r7 r0 SmartHome core (Heating Management, Windows Management, Lights Management, Presence Simulator, Fire Control, Authentication, User Notifications) + Door Lock + Security r0 + Blinds Management + Gas Detection + Water detection + Air conditioning control r1 + Audio Management + Dimming Lights + Phone Call notifications + Intruse Detection + CardReader as Authentication method None r1 Appendix B. HealthWatcher releases For the purpose of our analysis we have considered the requirements of five different releases of the product line. These releases are summarized in Table 12. As an example, release 0 contains the core system whilst release 1 represents the core system with the functionality of sorting complaints by most popular or most frequent. The different features of the system and the releases where they were included are described in Table 13. As in the previous case study, the threshold value for considering use cases unstable in this case study was established to 2. This number was selected based on the number of releases. Table 12 Different releases of HealthWatcher. Release Description Type of changes r0 r1 HealthWatcher core Feature added to count the number of times a complaint has been viewed and sorting them by frequency. Allow citizens to geolocalize complaints origin when they create them Allow citizens to login by using digital signature Allow citizens to store and manage their complaints None Inclusion of optional feature r2 r3 r4 Inclusion of optional feature Inclusion of optional feature Inclusion of mandatory feature Table 13 HealthWatcher features and releases where are included. Features Releases Features Releases QueryInformation RegisterComplaint RegisterTables UpdateComplaint RegisterNewEmployee UpdateEmployee UpdateHealthUnit ChangeLoggedEmployee ResponseTime Encryption Compatibility Access-Control Usability Availability r0r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 Standards Hardware and Software Distribution UserInterface OperationalEnvironments Persistence Concurrency Performance ErrorHandling ViewComplaints Popularcomplaints Geolocalization DigitalSignature ClientComplaints r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r0–r4 r1–r4 r2–r4 r3, r4 r4 Appendix C. SmartHome releases The feature model of the product line has been built by using the SPLOT tool and it is stored and publicly available at its repository.13 This feature model would allow the generation of around 13 Software Product Line Online Tool: http://www.splot-research.org/. r2 Inclusion of optional features Inclusion of optional features Table 15 SmartHome concerns and releases where are included. Features Releases Features Releases Temperature Control Windows Management Lights Management Presence Simulation Fire Control Door Lock Authentication Security r0–r2 r0–r2 r0–r2 r0–r2 r0–r2 r0–r2 r0–r2 r0–r2 User Notifications Access to Physical KNX Devices Blinds Management Floods Detection Gas Detection Air Conditioning Audio Management Intrusion Detection r0–r2 r0–r2 r1, r2 r1, r2 r1, r2 r1, r2 r2 r2 382.205 K different products. From this huge amount of possible products we selected three releases (product instantiations), detailed in Table 14. We have used an additive strategy to select the three releases so that the first release contains a set of core features and the other ones just add features to the former. This strategy allows us to analyze the stability of the product line to accomplish changes. Table 15 details the concerns (features) and the releases in which they are involved. The threshold value for considering use cases unstable in this case study was established to 1 so that any use case with one change or more was considered as unstable. In this case, the number was lower than in the previous case studies since the number of releases was smaller (just three). Appendix D. MobileMedia measurements and correlations This appendix shows the measurements for the MobileMedia system (see Table 16). D.1. Correlations with stability measures Once the selected modularity metrics and maintainability ones are pairwised compared for the MobileMedia system, the correlations among all these metrics, together with the scatter plots that represent all these correlations may be observed in Fig. 14. The scatter plots are shown using the same order for the measures that are selected in the previous table. Table 17 shows the p-values for the correlation cofficients shown in Fig. 14. As it may be observed, the correlations are statistically significant for the groups of metrics that were described in Section 5.1. 107 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 16 Average of metrics for all the releases. Metrics Features Album Photo Label Persistence Error Handling Sorting Favourites Copy SMS Music Media Video Capture Modularity Maintainability Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of crosscutting Concern degree of tangling Concern difussuion usecases Spread Degree of Scattering Instability 3,63 4,13 5,38 12,8 15,9 4,33 3 2,5 3,67 1 6,5 1 1 0,26 0,3 0,34 0,85 0,98 0,25 0,17 0,13 0,18 0 0,31 0 0 3,63 3,88 5,38 12,4 15,9 4,33 3 2,5 3,67 0 6,5 0 0 5,25 4,13 6 6,38 7 7,33 6 6,25 6,67 0 8,5 0 0 0,39 0,38 0,46 0,77 0,89 0,43 0,32 0,29 0,32 0 0,44 0 0 2,08 2,29 2,41 4,93 4,97 1,44 0,73 0,55 0,54 0 0,58 0 0,02 3,63 4,13 5,38 12,8 15,9 4,33 3 2,5 3,67 1 6,5 1 1 3,63 4,13 5,38 12,8 15,9 4,33 3 2,5 3,67 1 6,5 1 1 0,77 0,62 0,82 0,98 0,99 0,78 0,66 0,61 0,76 0 0,85 0 0 1 2 1 2 2 2 1 1 1 0 1 0 0 Fig. 14. Scatter plots for the correlations in MobileMedia. 108 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 17 p-values for the correlations in MobileMedia system. Each cell shows the p-value for the correlation between the metric on that row and the one in the corresponding column Nscattering Degree of Crosscutpoints NCrosscut Scattering Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 0.046 <0.0 0 0 01 0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 0.0055 < 0.0 0 0 01 <0.0 0 0 01 0.051 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 0.0031 <0.0 0 0 01 0.023 <0.0 0 0 01 0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 0.002 <0.0 0 0 01 0.002 0.12 0.046 0.046 <0.0 0 0 01 Appendix E. HealthWatcher measurements and correlations Table 18 shows the results obtained for the measurements in the HealthWatcher system. Degree of Concern Degree Crosscutting of Tangling <0.0 0 0 01 0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 0.0 0 0 01 < 0.0 0 0 01 0.0 0 0 01 0.0 0 0 01 0.0077 CDUC Spread Degree of Scattering (Eaddy) <0.0 0 0 01 <0.0 0 0 01 0.0055 <0.0 0 0 01 0.0055 <0.0 0 0 01 observed in Fig. 15. The scatter plots are shown using the same order for the measures that is selected in the previous table. Table 19 shows the p-values for the correlation cofficients shown in Fig. 15. E.1. Correlations with stability measures The correlations among all these metrics, together with the scatter plots that represent all these correlations may be Table 18 Modularity and maintainability measurements for the HealthWatcher system. features Metrics Modularity Maintainability Nscattering Degree of Crosscutpoints NCrosscut Degree of Concern Degree Concern Difussuion Spread Degree of Instability Scattering crosscutting of Tangling over Usecases Scattering QueryInformation RegisterComplaint RegisterTables UpdateComplaint RegisterNewEmployee UpdateEmployee UpdateHealthUnit ChangeLoggedEmployee ResponseTime Encryption Compatibility Access-control Usability Availability UserInterface OperationalEnvironments Persistence Concurrency Performance ErrorHandling ViewComplaints PopularComplaints Geolocalization DigitalSignature Client complaints 1 3 1 1 1 1 1 1 5 5 4 12,80 5 8 11,80 0 16,20 8 4 17,80 1 4,80 3,60 0,80 1 0 0,16 0 0 0 0 0 0 0,27 0,27 0,22 0,70 0,27 0,44 0,64 0 0,87 0,44 0,22 0,96 0 0,25 0,18 0,04 0,05 0 3 0 0 0 0 0 0 5 5 4 12,80 5 8 11,80 0 16,20 8 4 17,80 0 4,80 3,60 0,80 0,80 0 12,40 0 0 0 0 0 0 14 14 13,40 21 14 14 20,40 0 19 14 13,40 21 0 11 8,40 3,40 1,80 0 0,37 0 0 0 0 0 0 0,46 0,46 0,42 0,81 0,46 0,53 0,78 0 0,85 0,53 0,42 0,93 0 0,37 0,27 0,09 0,06 0,52 1,75 0,22 0,22 0,22 0,22 0,22 0,17 2,65 2,65 2,27 4,25 2,65 3,19 3,91 0 4,72 3,19 2,27 5,10 0,23 1,84 1,38 0,19 0,16 1 3 1 1 1 1 1 1 5 5 4 12,80 5 8 11,80 0 16,20 8 4 17,80 1 4,80 3,60 0,80 1 1 3 1 1 1 1 1 1 5 5 4 12,80 5 8 11,80 0 16,20 8 4 17,80 1 4,80 3,60 0,80 1 0 0,71 0 0 0 0 0 0 0,85 0,85 0,80 0,98 0,85 0,93 0,97 0 0,99 0,93 0,80 1 0 0,71 0,53 0,21 0,18 0 3 0 0 0 0 0 0 4 4 3 5 4 4 4 0 4 4 3 5 1 4 4 1 1 Table 19 p-values for the correlations in HealthWatcher system. Each cell shows the p-value for the correlation between the metric on that row and the one in the corresponding column Nscattering Degree of Crosscutpoints NCrosscut Scattering Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 Degree of Concern Degree Crosscutting of Tangling <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 CDUC Spread Degree of Scattering (Eaddy) <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 109 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Fig. 15. Scatter plots for the correlations in HealthWatcher. Appendix F. SmartHome measurements and correlations The results for the measurements obtained for the SmartHome system may be observed in Table 20. in Fig. 16. The scatter plots are shown using the same order for the measures that is selected in the previous table. Table 21 shows the p-values for the correlation cofficients shown in Fig. 16. F.1. Correlations with stability measures The correlations among all these metrics, together with the scatter plots that represent all these correlations may be observed Table 20 Modularity and maintainability measurements for the SmartHome system. Metrics Features Modularity Maintainability Nscattering Degree of Crosscutpoints NCrosscut Degree of Concern Degree Concern Difussuion Spread Degree of Instability Scattering crosscutting of Tangling over Usecases Scattering Temperature control 2 Windows management 1 Lights management 3 0,08 0 0,12 2 0 3 1 0 3,67 0,09 0 0,20 0,13 0,07 0,27 2 1 3 2 1 3 0,08 0 0,18 0 0 3 (continued on next page) 110 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 20 (continued) Metrics Features Modularity Maintainability Nscattering Degree of Crosscutpoints NCrosscut Degree of Concern Degree Concern Difussuion Spread Degree of Instability Scattering crosscutting of Tangling over Usecases Scattering Presence simulation Fire control Door lock Authenticaton Security User notifications Access to phyisical KNX devices Blinds management Flood alarm Gas alarm Air Conditioning Management Intruse detection Audio control 2 2 2 6 2 5,67 16 2 1,33 1,33 1,33 0,67 0,67 0,08 0,08 0,08 0,25 0,08 0,22 0,63 0,07 0,05 0,05 0,05 0,02 0,02 2 2 1 4 1 3,67 14 2 1,33 1,33 1,33 0,67 0,67 3,67 2 1 5,67 1 3,67 10,33 3,33 1,33 1,33 1,33 0,67 0,33 0,17 0,12 0,06 0,28 0,06 0,21 0,69 0,14 0,07 0,07 0,07 0,04 0,03 0,20 0,20 0,07 0,33 0,07 0,33 0,80 0 0 0,13 0 0 0 Fig. 16. Scatter plots for the correlations in SmartHome. 2 2 2 6 2 5,67 16 2 1,33 1,33 1,33 0,67 0,67 2 2 2 6 2 5,67 16 2 1,33 1,33 1,33 0,67 0,67 0,15 0,11 0,05 0,26 0,05 0,19 0,62 0,12 0,06 0,06 0,06 0,03 0,02 1 0 0 3 0 1 4 1 0 0 0 0 0 111 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Table 21 p-values for the correlations in SmartHome system. Each cell shows the p-value for the correlation between the metric on that row and the one in the corresponding column Nscattering Nscattering Degree of Scattering Crosscutpoints NCrosscut Degree of Crosscutting Concern Degree of Tangling CDUC Spread Degree of Scattering (Eaddy) <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 Degree of Scattering <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 Crosscutpoints NCrosscut <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 References Abad, Z.S.H., Ruhe, G., 2015. Using real options to manage technical debt in requirements engienering. In: Proceedings of the Twenty-third IEEE International Requirements Engineering Conference. Ottawa, Canada, pp. 230–235. Abdi, H., Williams, L.J., 2010. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2 (4), 433–459. Alférez, M., et al., 2008. A model-driven approach for software product lines requirements engineering. In: Proceedings of the SEKE. Knowledge Systems Institute Graduate School, pp. 779–784. Allman, E., 2012. Managing Technical Debt: Shortcuts that save money and time today can cost you down the road. ACM Queue 10 (3). Alves, N.S.R., Mendes, T.S., De Mendonça, M.G., Spinola, R.O., Shull, F., Seaman, C., 2016. Identification and management of technical debt: a systematic mapping study. Inf. Softw. Technol. 70, 100–121. Baniassad, E., Clements, P.C., Araujo, J., Moreira, A., Rashid, A., Tekinerdogan, B., 2006. Discovering early aspects. IEEE Softw. 23 (1), 61–70. Briand, L., Morasca, S., Basili, V.R. Defining and Validating High-Level Design Metrics, University of Maryland at College Park. Brondum, J., Zhu, L., 2012. Visualising architectural dependencies. In: Proceedings of the Third International Workshop on Managing Technical Debt. MTD, Piscataway, USA, pp. 7–14. Brown, N., et al., 2010. Managing technical debt in software-reliant systems. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research – FoSER’10. Santa Fe, USA, pp. 47–52. Chen, J.-C., Huang, S.-J., 2009. An empirical analysis of the impact of software development problem factors on software maintainability. J. Syst. Softw. 82 (6), 981–992. Chin, S., Huddleston, E., Bodwell, W., Gat, I., 2010. The economics of technical debt. Cut. IT J. 82 (10). Conejero, J.M., 2010. The Crosscutting Pattern: A Conceptual Framework for the Analysis of Modularity Across Software Development Phases. Universidad de Extremadura. Conejero, J.M., Hernández, J., Jurado, E., Clemente, P.J., Rodríguez, R., 2009. Early analysis of modularity in software product lines. In: Proceedings of the Twenty-first Inernational Conference on Software Engineering and Knowledge Engineering (SEKE). Boston, USA, pp. 721–736. Cunningham, W., 1992. The WyCash portfolio management system. In: Proceedings of the Object-Oriented Programming Systems, Languages and Applications Conference (OOPSLA), vol. 4. Vancouver, Canada, pp. 29–30. Curtis, B., Sappidi, J., Szynkarski, A., 2012a. Estimating the principal of an application’s technical debt. IEEE Softw. 29 (6), 34–42. Curtis, B., Sappidi, J., Szynkarski, A., 2012b. Estimating the size, cost, and types of technical debt. In: Proceedings of the Third International Workshop on Managing Technical Debt. Piscataway, NJ, USA, pp. 49–53. Dijkstra, E.W., 1976. A Discipline of Programming. Prentice Hall. Ducasse, S., Girba, T., Kuhn, A., 2006. Distribution map. In: Proceedings of the Twenty-second IEEE International Conference on Software Maintenance. Philadelphia, USA, pp. 203–212. Eaddy, M., et al., 2008. Do crosscutting concerns cause defects? IEEE Trans. Softw. Eng. 34 (4), 497–515. Elsner, C., Fiege, L., Groher, I., Jäger, M., Schwanninger, C., Völter, M., 2008. Ample project. Deliverable d5.3 - implementation of first case study: smart home. Eriksson, M., Börstler, J., Borg, K., 2005. The PLUSS approach – domain modeling with features, use cases and use case realizations. In: Proceedings of the Ninth International Conference on Software Product Lines, pp. 33– 44. Erlikh, L., 20 0 0. Leveraging legacy system dollars for e-business. IEEE IT Prof. 2 (3), 17–23. Ernst, N.A., 2012. On the role of requirements in understanding and managing technical debt. In: Proceedings of the Third International Workshop on Managing Technical Debt (MTD). Piscataway, USA, pp. 61–64. Figueiredo, E., et al., 2008. Evolving software product lines with aspects. In: Proceedings of the Thirtieth International Conference on Software Engineering (ICSE). Leipzig, Germany, pp. 261–270. Degree of Crosscutting Concern Degree of Tangling <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 CDUC Spread <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 <0.0 0 0 01 Degree of scattering (Eaddy) <0.0 0 0 01 Fontana, F.A., Roveda, R., Zanoni, M., 2016. Technical debt indexes provided by tools: a preliminary discussion. In: Proceedings of the 2016 IEEE Eighth International Workshop on Managing Technical Debt (MTD), pp. 28–31. Fontana, F.A., Pigazzini, I., Roveda, R., Tamburri, D., Zanoni, M., Nitto, E.D., 2017. Arcan: a tool for architectural smells detection. In: Proceedings of the 2017 IEEE International Conference on Software Architecture Workshops (ICSAW), pp. 282–285. Greenwood, P., et al., 2007. On the impact of aspectual decompositions on design stability: an empirical study. In: Proceedings of the Twenty-first European Conference on Object-Oriented Programming. Berlin, Germany, pp. 176–200. Griffith, I., Reimanis, D., Izurieta, C., Codabux, Z., Deo, A., Williams, B., 2014. The correspondence between software quality models and technical debt estimation approaches. In: Proceedings of the Sixth International Workshop on Managing Technical Debt. Victoria, Canada, pp. 19–26. Griss, M.L., Favaro, J., D’Alessandro, M., 1998. Integrating feature modeling with the RSEB. In: Proceedings of the Fifth International Conference on Software Reuse (Cat. No.98TB100203), pp. 76–85. Gueheneuc, Y.-G., Albin-Amiot, H., 2001. Using design patterns and constraints to automate the detection and correction of inter-class design defects. In: Proceedings of the Thirty-ninth International Conference and Exhibition on Technology of Object-Oriented Languages and Systems. TOOLS. Washington, USA, pp. 296–305. Hung, V.. Software maintenance [Online]. Available: [Accessed: 26-Feb-2016]. International Organization of Standardization. 2014. Systems and software engineering – Systems and software Quality Requirements and Evaluation (SQuaRE) – Guide to SQuaRE [Online]. Available: https://www.iso.org/standard/64764.html. [Accessed: 31-Jan-2018]. ISO/IEC, 2001. Software engineering – product quality – Part 1: quality model, ISO/IEC 9126-1. Izurieta, C., Bieman, J.M., 2007. How software designs decay: a pilot study of pattern evolution. In: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 449–451. Jacobson, I., 2003. Use cases and aspects – working seamlessly together. J. Object Technol. 2 (4), 7–28. Jacobson, I., Ng, P.-W., 2004. Aspect-Oriented Software Development with Use Cases. Addison-Wesley Professional. Kang, K., Cohen, S., Hess, J., Novak, W., Spencer, A., 1990. Feature Oriented Domain Analysis (FODA). Feasibility Study., Carnegie Mellon University Technical Report, CMU/SEI-90-TR-21. Kassambara, Alboukadel, Principal Component Methods in R: Practical Guide, first ed. STHDA. Klass van den, B., 2006. Change impact analysis of crosscutting in software architectural design. In: Proceedings of the Workshop on Architecture-Centric Evolution at Twentieth ECOOP. Nantes, France. Kruchten, P., Nord, R.L., Ozkaya, I., 2012. Technical debt: from metaphor to theory and practice. IEEE Softw. 29 (6), 18–21. Letouzey, J.-L., Ilkiewicz, M., 2012. Managing technical debt with the SQALE method. IEEE Softw. 29 (6), 44–51. Li, Z., Avgeriou, P., Liang, P., 2015. A systematic mapping study on technical debt and its management. J. Syst. Softw. 101, 193–220. Li, Z., Liang, P., Avgeriou, P., Guelfi, N., Ampatzoglou, A., 2014. An empirical investigation of modularity metrics for indicating architectural technical debt. In: Proceedings of the Tenth International ACM Sigsoft Conference on Quality of Software Architectures. New York, NY, USA, pp. 119–128. Lotufo, R., She, S., Berger, T., Czarnecki, K., Wasowski, A., 2010. Evolution of the Linux Kernel Variability Model. Springer-Verlag. Marinescu, R., 2004. Detection strategies: metrics-based rules for detecting design flaws. In: Proceedings of the Twentieth IEEE International Conference on Software Maintenance. Chicago, USA, pp. 350–359. Marinescu, R., 2012. Assessing technical debt by identifying design flaws in software systems. IBM J. Res. Dev. 56 (5) 9:1–9:13. Mo, R., Cai, Y., Kazman, R., Xiao, L., 2015. Hotspot patterns: the formal definition and automatic detection of architecture smells. In: Proceedings of the Twelfth Working IEEE/IFIP Conference on Software Architecture, pp. 51–60. Moreira, A., Araújo, J., Whittle, J., 2006. Modeling Volatile Concerns as Aspects. Springer, Berlin, Heidelberg, pp. 544–558. 112 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Moreira, A., Chitchyan, R., Araújo, J., Rashid, A., 2013. Aspect-Oriented Requirements Engineering. Springer. Passos, L., Czarnecki, K., 2014. A dataset of feature additions and feature removals from the Linux kernel. In: Proceedings of the Eleventh Working Conference on Mining Software Repositories – MSR 2014. New York, USA, pp. 376–379. Ramasubbu, N., Kemerer, C.F., 2014. Managing technical debt in enterprise software packages. IEEE Trans. Softw. Eng. 40 (8), 758–772. Sant’Anna, C., Figueiredo, E., Garcia, A., Lucena, C.J.P., 2007. On the modularity of software architectures: a concern-driven measurement framework. In: Proceedings of the First European Conference on Software Architecture (ECSA). Madrid, Spain, pp. 207–224. Sant’Anna, C., Figueiredo, E., Garcia, A., Lucena, C., 2007. On the modularity assessment of software architectures: do my architectural concerns count? In: Proceedings of the First Workshop on Aspects in Architectural Description to be held at Sixth International Conference on Aspect-Oriented Software Development. Vancouver, Canada. Sant’Anna, C., Garcia, A., Chavez, C., Lucena, C., von Staa, A.V., 2003. On the reuse and maintenance of aspect-oriented software: an assessment framework. In: Proceedings of the Seventeenth Brazilian Symposium on Software Engineering. Manaus, Brazil, pp. 19–34. Schumacher, J., Zazworka, N., Shull, F., Seaman, C., Shaw, M., 2010. Building empirical support for automated code smell detection. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). New York, USA, p. 1. Sincero, J., Schröder-preikschat, W., 2008. The linux kernel configurator as a feature modeling tool. In: Proceedings of the Software Product Line Conference, pp. 257–260. Sincero, J., Schirmeier, H., Schröder-Preikschat, W., Spinczyk, O., 2007. Is the linux kernel a software product line? In: Proceedings of the International Workshop on Open Source Software and Product Lines, p. 30. Sokal, R.R., Rohlf, F.J., 1994. Biometry: Principles and Practice of Statistics in Biological Research, third ed W. H. Freeman. van den Berg, K., Conejero, J.M., Chitchyan, R., 2005. AOSD Ontology 1.0 – Public Ontology of Aspect-Orientation. AOSD-Europe. Vetro’, A., Torchiano, M., Morisio, M., 2010. Assessing the precision of FindBugs by mining Java projects developed at a university. In: Proceedings of the Seventh IEEE Working Conference on Mining Software Repositories (MSR). Cape Town, South Africa, pp. 110–113. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A., 20 0 0. Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, Norwell, MA, USA. Wong, S., Cai, Y., Kim, M., Dalton, M., 2011. Detecting software modularity violations. In: Proceedings of the Thirty-third International Conference on Software Engineering (ICSE). Honolulu, USA, pp. 411–420. XQuery 1.0., An XML query language. W3C recommendation. [Online]. Available: http://www.w3.org/TR/xquery/. Zazworka, N., Seaman, C., Shull, F., 2011. Prioritizing design debt investment opportunities. In: Proceedings of the Second Working on Managing Technical Debt (MTD). Honolulu, USA, pp. 39–42. Zazworka, N., et al., 2014. Comparing four approaches for technical debt identification. Softw. Qual. Control 22 (3). J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 113 José María Conejero received his Ph.D. in Computer Science from Universidad de Extremadura in 2010. He is an Assistant Professor at Universidad de Extremadura. He is the author of more than 20 papers of journals and conference proceedings and has also participated in different journals and conferences as member of the program committee. His research areas include the Aspect-Oriented Software Development, Requirements Engineering, Model-Driven Development or Ambient Intelligence. Roberto Rodríguez-Echeverría received his Ph.D. in Computer Science from Universidad de Extremadura in 2014. He is an Assistant Professor at Universidad de Extremadura. He is the author of more than 20 papers of journals and conference proceedings and has also participated in different journals and conferences as member of the program committee. His research areas include Web Engineering, Model-Driven Software Engineering, Software Modernization and End-User Development. Juan Hernández received the B.Sc. in Mathematics from the University of Extremadura and the Ph.D. degree in Computer Science from the Technical University of Madrid. He is a Full Professor of the Quercus Software Engineering Group of the Extremadura University (Spain). His research interests include serviceoriented computing, ambient intelligence, aspect orientation and model driven development. He is involved in several research projects as responsible and senior researcher related to these subjects. He has published the results of his research in more than 100 papers in international journals, conference proceedings and book chapters. He has participated in many workshops and conferences as speaker and member of the program committee. He is currently member of the Spanish steering committee on Software Engineering and IEEE, and organized several workshops and international conferences. Pedro J. Clemente is an Associate Professor of the Computer Science Department at the University of Extremadura (Spain). He received his BSc in Computer Science from the University of Extremadura in 1998 and a PhD in Computer Science in 2007. He has published numerous peerreviewed papers in international journals, workshops, and conferences. His research interests include component-based software development, aspect orientation, service-oriented architectures, business process modeling, and model-driven development. He is involved in several research projects. He has participated in many workshops and conferences as speaker and member of the program committees. Carmen Ortiz-Caraballo has a Ph.D. in Mathematics from the University of Seville (2011). She is an assistant professor of Mathematics at the Escola d’Enginyeria d’Igualada of the Universitat Politècnica de Catalunya (Spain). She has published several peer-reviewed papers in international journals, workshops, and conferences on harmonic analysis. She is involved in diferent research projects. Her research interests include harmonic analysis and applied mathematics, in which she is currently collaborating with different research groups. Elena Jurado received the B.Sc. in Mathematics and the Ph.D. degree in Computer Science from the University of Extremadura (Spain) in 1985 and 2003, respectively. She has been a professor at University of Extremadura since 1985, and she is currently an associated professor. Her research interests include ambient intelligence, multidimensional indexing and content based information retrieval. She has published the results of his research in more than 20 papers in international journals and conference proceedings. 114 J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114 Fernando Sánchez-Figueroa received the Ph.D. degree from University of Extremadura. He is currently Professor at the Department of Computer Science, University of Extremadura, Spain. He belongs to the Quercus Software Engineering group, being his research focused on Web engineering, Big Data Visualization and Model Driven Engineering. He is also the CEO of Homeria Open Solutions, a spin-off arisen from Universidad de Extremadura.