The Journal of Systems and Software 142 (2018) 92–114
Contents lists available at ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Early evaluation of technical debt impact on maintainability
José M. Conejero a,∗, Roberto Rodríguez-Echeverría a, Juan Hernández a, Pedro J. Clemente a,
Carmen Ortiz-Caraballo b, Elena Jurado a, Fernando Sánchez-Figueroa a
a
b
Quercus Software Engineering Group, University of Extremadura, Avda. de la Universidad, s/n, 10071, Spain
Escola d’Enginyeria d’Igualada, Universitat Politècnica de Catalunya, Av. Pla de la Massa, n° 8, 08700 Igualada, Spain
a r t i c l e
i n f o
Article history:
Received 15 March 2017
Revised 24 March 2018
Accepted 18 April 2018
Available online 21 April 2018
Keywords:
Technical Debt indicator
Requirements
Modularity anomalies
Maintainability
Empirical evaluation
a b s t r a c t
It is widely claimed that Technical Debt is related to quality problems being often produced by poor
processes, lack of verification or basic incompetence. Several techniques have been proposed to detect
Technical Debt in source code, as identification of modularity violations, code smells or grime buildups.
These approaches have been used to empirically demonstrate the relation among Technical Debt indicators and quality harms. However, these works are mainly focused on programming level, when the system has already been implemented. There may also be sources of Technical Debt in non-code artifacts,
e.g. requirements, and its identification may provide important information to move refactoring efforts
to previous stages and reduce future Technical Debt interest. This paper presents an empirical study to
evaluate whether modularity anomalies at requirements level are directly related to maintainability attributes affecting systems quality and increasing, thus, system’s interest. The study relies on a framework
that allows the identification of modularity anomalies and its quantification by using modularity metrics.
Maintainability metrics are also used to assess dynamic maintainability properties. The results obtained
by both sets of metrics are pairwise compared to check whether the more modularity anomalies the
system presents, the less stable and more difficult to maintain it is.
© 2018 Elsevier Inc. All rights reserved.
1. Introduction
Since Technical Debt was firstly introduced in Cunningham
(1992), many approaches have emerged to identify (Vetro’ et al.,
2010; Wong et al., 2011; Schumacher et al., 2010), estimate (Chin
et al., 2010; Curtis et al., 2012a; Letouzey and Ilkiewicz, 2012;
Marinescu, 2012) or, in general, deal with Technical Debt by different techniques (Ramasubbu and Kemerer, 2014). As the authors
state in Kruchten et al. (2012), “most authors agree that the major cause of Technical Debt is schedule pressure, e.g. ignoring refactorings to reduce time to market” (Abad and Ruhe, 2015). However,
as they also claim, Technical Debt is also related to quality problems being often produced by carelessness, lack of education, poor
processes, lack of verification or, even, basic incompetence. These origins of Technical Debt are called unintentional debt (Brown et al.,
2010) and examples of these quality problems occasioned by
Technical Debt are bad reusability and low understandability
(Griffith et al., 2014), error-prone and higher number of defects
∗
Corresponding author.
E-mail addresses: chemacm@unex.es (J.M. Conejero), rre@unex.es (R. RodríguezEcheverría), juanher@unex.es (J. Hernández), pjclemente@unex.es (P.J. Clemente),
carmen.ortiz@eei.upc.edu (C. Ortiz-Caraballo), elenajur@unex.es (E. Jurado),
fernando@unex.es (F. Sánchez-Figueroa).
https://doi.org/10.1016/j.jss.2018.04.035
0164-1212/© 2018 Elsevier Inc. All rights reserved.
(Zazworka et al., 2014), negative impact on robustness, performance, security and transferability (Curtis et al., 2012a, 2012b) or,
especially, on maintainability issues like stability (Zazworka et al.,
2014). A study conducted by Chen and Huang (2009) highlights
that stability is one of the top 10 higher-severity software development problem factors which affect software maintainability. Moreover, maintainability is currently draining 60–90% of the total cost
of software development (Chen and Huang, 20 09; Erlikh, 20 0 0;
Hung, 2007).
To solve these issues, several techniques have been proposed
in the literature to detect Technical Debt in source code, such as
the identification of modularity violations (Wong et al., 2011), code
smells (Schumacher et al., 2010; Marinescu, 2004), grime buildups
(Gueheneuc and Albin-Amiot, 2001; Izurieta and Bieman, 2007)
or the identification of violations of good programmer practices
by using Automatic Static Analysis (ASA) approaches (Vetro’ et al.,
2010). Indeed, the combination of these four different techniques
has been empirically evaluated in Zazworka et al. (2014) to test
which practices perform better under different conditions and how
they could complement each other to estimate Technical Debt interests (quality harms). Technical debt interest may be defined
as the payment in the form of extra time, effort, and cost to address future changes in a project (Abad and Ruhe, 2015). Similarly, in Ramasubbu and Kemerer (2014), Griffith et al. (2014),
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Curtis et al. (2012b), and Zazworka et al. (2011), the authors
conducted studies where they empirically evaluated the relation
among different Technical Debt indicators and software quality
characteristics in order to test whether the former are really related to the latter.
What all these works have in common is that they are focused on the programming level, when the system has already
been implemented (if not completely, at least, partially). However,
as claimed in Li et al. (2014), Technical Debt can span all the phases
of the software lifecycle and there may also be sources of Technical
Debt in non-code artifacts (Brown et al., 2010), e.g. requirements
documents. Therefore, its identification at early stages of development may provide developers with important information to apply
refactoring approaches (e.g. based on aspect-oriented techniques
Moreira et al., 2013; Jacobson and Ng, 2004; Jacobson, 2003) improving, thus, modularity also at source code and therefore reducing Technical Debt at latest development stages (or, at least, reducing the future global interest). The reality is that requirements always change and Technical Debt is inevitable (Allman, 2012), however, the issue is not eliminating debt, but rather reducing it or
even moving its identification to previous stages. Indeed, this is
more important if we consider that those who incurred the debt
may usually not be the same as those who will have to re-pay later
(Brondum and Zhu, 2012).
Nevertheless, to the best of our knowledge, little effort has
been dedicated to study the implications of Technical Debt at earlier stages of development. There are some works that have dealt
with the definition of Technical Debt at the requirements level
(Abad and Ruhe, 2015; Ernst, 2012) or its relation with architectural dependencies (Li et al., 2014; Brondum and Zhu, 2012). Even,
these types of debts have been described in the mapping study introduced in Alves et al. (2016) as Requirements and Architecture
Debts. However, the empirical evaluation of the quality problems
produced by Technical Debt at early stages has been neglected in
the literature so far. Based on this assumption, we have formulated the main question that we try to answer in this work: is
there a relationship between Technical Debt indicators at the requirements level and software quality? Concretely, we focus on modularity violations (a well-known Technical Debt indicator Wong et al.,
2011; Alves et al., 2016) and software stability (a quality attribute
related to maintainability International Organization of Standardization, 2014). Thus, our main question is reformulated as follows:
is there a relationship between modularity anomalies at the requirements level and system stability? The existence of this relationship
would provide empirical evidence of the harmful relationship between Technical Debt and software quality at early stages of development.
To tackle the problem of answering this question, this paper
presents an empirical study where we evaluate whether modularity anomalies at the requirements level occasioned by crosscutting
concerns (Baniassad et al., 2006) are directly related to instability of the system, which would increase its interest. The empirical
study is supported by the application of a conceptual framework
defined in previous work (Conejero, 2010). The framework allows
the identification of modularity violations based on scattering, tangling and crosscutting at any abstraction level but concretely at the
requirements level. Moreover, based on this conceptual framework
a set of software metrics were defined to quantify the Degree of
Crosscutting properties that a system may have. In this work, these
metrics are validated by comparing them with similar metrics introduced by other authors, whilst their utility is illustrated by comparing them with a set of metrics that measure stability. All the
metrics are applied to measure both modularity and stability properties in three different software product lines (with different releases) and the measurements obtained are pairwise compared to
93
test whether those metrics are correlated and to find an answer
for our main question.
The rest of the paper is organized as follows. Section 2 briefly
introduces the conceptual framework that supports the study by
providing a method to identify crosscutting properties at requirements level. Section 3 presents the settings for our empirical study
by introducing the hypothesis established, the measures used and
the systems considered. Section 4 shows the results obtained and
it discusses their interpretation according to our main hypothesis.
Section 5 presents an evaluation of the metrics in order to select
the most representative for future studies. Section 6 presents the
threats to validity for this study. Finally, Section 7 discusses the
related work and Section 8 concludes the paper.
2. Background
A concern is an interest, which pertains to the system’s development, its operation or any other matters that are critical or
otherwise important to one or more stakeholders (van den Berg
et al., 2005). The term concern is closely related to the term feature (used in the Software Product Line context) in the sense of being a prominent or distinctive user-visible aspect, quality, or characteristic of a software system or systems (Kang et al., 1990). Software modularity is mainly determined by the concept of Separation of concerns (Dijkstra, 1976), the design principle that proposes the proper encapsulation of systems’ concerns into separate
entities. One of the main advantages of separation of concerns is
the significant reduction of dependencies between these features
or concerns. However, concern independence is not always fully
achieved and modularity anomalies arise usually occasioned by the
well-known concern properties of scattering, tangling and crosscutting. Crosscutting (usually described in terms of scattering and
tangling) denotes the situation where a concern may not be completely encapsulated into a single software component but spread
over several artifacts and mixed with other concerns due to a poor
support for its modularization (van den Berg et al., 2005).
In order to detect these modularity anomalies, crosscutting
identification approaches come to the scene. Next section introduces our previous work where a conceptual framework for identifying and characterizing crosscutting properties was proposed. This
framework was independent of any particular software development stage. Therefore, it may be applied at stages previous to implementation, e.g. at requirements stage.
2.1. A conceptual framework for analysing modularity anomalies
In Conejero (2010) a conceptual framework was presented
where formal definitions of concern properties, such as scattering,
tangling, and crosscutting, were provided. This framework is based
on the study of trace dependencies that exist between two different domains. These domains, which are generically called Source
and Target, could be, for example, concerns and requirements descriptions, respectively or features and use cases in a different situation. We use the term Crosscutting Pattern (Fig. 1) to denote
the situation where Source and Target are related to each other
by means of trace dependencies.
From a mathematical point of view, the Crosscutting Pattern
indicates that the Source and Target domains are related to each
other by a mapping. This mapping is the trace relationship that
exists between the Source and Target domains, and it can be formalized as follows:
According to Fig. 1, there exists a multivalued function f’ from
Source to Target domain such that if f’(s) = t, then there exists a
trace relation between s є Source and t є Target. Analogously, we
can define another multivalued function g’ from Target to Source
94
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 1
Crosscutting product matrix for dependency matrix in
Fig. 3.
Crosscutting product matrix
Source
Source
Fig. 1. The crosscutting pattern.
s1
s2
s3
s[1]
s[2]
s[3]
s[4]
s[5]
s[1]
s[2]
s[3]
s[4]
s[5]
2
1
0
0
1
1
3
0
1
1
1
1
0
0
0
0
1
0
1
0
1
1
0
0
2
Table 2
Crosscutting matrix for dependency matrix in Fig. 3.
source
Crosscutting matrix
Source
target
t1
t2
t3
Source
t4
Fig. 2. Relationships among source and target elements.
that can be considered as a special inverse of f’. If f’ is not a surjection, we consider that Target is the range of f’. Obviously, f’ and g’
can be represented as single-valued functions considering that the
codomains are the set of non-empty subsets of Target and Source,
respectively.
Let f: Source →P (Target) and g: Target → P (Source) be two
functions defined by:
∀ s є Source, f(s) = {t є Target: f’ (s) = t}
∀ t є Target, g(t) = {s є Source: g’ (t) = s}
The concepts of scattering, tangling and crosscutting are defined
as specific cases of these functions.
Definition 1. [Scattering] We say that an element s є Source is scattered if card(f(s)) > 1, where card(f(s)) refers to the cardinality of
f(s). In other words, scattering occurs when, in a mapping between
source and target, a source element is related to multiple target elements. Note that cardinality of f(s) refers to the number of elements of the Target set that are related by f to the Source element
s.
Definition 2. [Tangling] We say that an element t є Target is tangled if card(g(t)) > 1. Hence, tangling occurs when, in a mapping between source and target, a target element is related to multiple source
elements.
There is a specific combination of scattering and tangling which
we call crosscutting.
Definition 3. [Crosscutting] Let s1, s2 є Source, s1 = s2, we say that
s1 crosscuts s2 if card(f(s1)) > 1 and ∃ t є f(s1): s2 є g(t). In other
words, crosscutting occurs when, in a mapping between source
and target, a source element is scattered over target elements and,
in at least one of these target elements, source elements are tangled.
According to the previous definitions, the following result is a
direct consequence.
Lemma. Let s1, s2 є Source, s1 = s2, then s1 crosscuts s2 iff
card(f(s1)) > 1 and f(s1) ∩ f(s2) = ∅.
For the sake of clarity, Fig. 2 shows an example based on a
graph representation of the mappings among three different Source
elements and four Target ones. As we can see in this figure, there
would be a crosscutting situation in t3 element since s1 would be
s[1]
s[2]
s[3]
s[4]
s[5]
s[1]
s[2]
s[3]
s[4]
s[5]
0
1
0
0
1
1
0
0
1
1
1
1
0
0
0
0
1
0
0
0
1
1
0
0
0
scattered over three different target elements (t1, t3 and t4) and t3
would be tangled based on the mapping with s1 and s3.
2.2. Identification of modularity anomalies
Based on the crosscutting pattern previously described, a special kind of traceability matrix was defined (called dependency
matrix) to represent the function f. An example of dependency matrix with five source and six target elements is shown in Fig. 3.
In the rows, we have source elements, and in the columns, target
elements are arranged. A ‘1’ in a cell denotes that the target element of the corresponding column contributes to or addresses the
source element of the corresponding row (in dependency matrix of
Fig. 3, t[1] and t[4] contribute to the functionality of s[1]).
Two different matrices called scattering matrix and tangling
matrix are derived from the dependency matrix (shown in Fig. 3).
These matrices show the scattered and tangled elements in a system, respectively:
•
•
In the scattering matrix, a row contains only dependency relations from source to target elements if the source element in a
row is scattered (mapped onto multiple target elements); otherwise, the row contains just zero values (no scattering). This
last situation has been highlighted with circles in Fig. 3.
In the tangling matrix, a row contains only dependency relations from target to source elements if the target element in a
row is tangled (mapped onto multiple source elements); otherwise, the row is filled with zero values (no tangling). This last
situation has been also highlighted with circles in Fig. 3.
The crosscutting product matrix is obtained through the multiplication of scattering and tangling matrices. The crosscutting
product matrix shows the quantity of crosscutting relations and
it is an intermediary step to derive the final crosscutting matrix.
Tables 1 and 2 show, respectively, the crosscutting product and
crosscutting matrices derived from the example shown in Fig. 3. In
the crosscutting matrix, each cell denotes the occurrence of crosscutting; it abstracts from the quantity of crosscutting. A crosscutting matrix ccm can be derived from a crosscutting product matrix
ccpm using a simple conversion: ccm[i][k] = if (ccpm[i][k] > 0) /\ (i =
k) then 1 else 0. More details about this conceptual framework and
matrix operations can be found in van den Berg et al. (2005).
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
95
Fig. 3. Process for generating scattering and tangling matrices.
Fig. 4. XML-schema to validate the concerns file.
2.3. Building the dependency matrix
Our conceptual framework was also extended in order to be automatically applied to software requirements. In particular, syntactical and dependency-based analyses were used to automatically
obtain the mappings between source and target elements. In other
words, this extension allows automatizing the construction of dependency matrices, which represents the starting point to identify
crosscutting concerns. The process to build dependency matrix is
divided into two different steps; firstly, requirements documentation of the system is analyzed in order to identify concerns, i.e.
source elements; secondly, use case diagrams are analyzed to elicit
target elements.
To perform the first step, concerns are categorized as functional
and non-functional ones. The identification of non-functional concerns is supported by the utilization of a non-functional concerns
catalogue. Once concerns, both functional and non-functional are
elicited, they are represented in an XML file according to the XML
Schema represented in Fig. 4.
The second step, identification of target elements, is based on
the utilization of use case diagrams. Concretely, every XMI file
representing a use case diagram is analyzed to identify system’s
use cases. Then, based on the concerns file built in the first step
and the XMI file representing the use cases, these two files are
automatically queried (by using Xquery, 2018) to identify syntactic dependencies among source and target elements. These dependencies are based on partial or full coincidences on their names.
Moreover, in order to detect indirect dependencies among concerns and use cases, the <<include>> dependencies of use case
diagrams are also automatically analyzed by processing the XMI
file (see Fig. 5). Based on the indirect dependencies obtained, the
original dependency matrix is completed with new dependencies
and an extended dependency matrix is generated. The reader may
obtain further details of these analyses in Conejero (2010) and
Conejero et al. (2009).
3. Experimental design
As it has been commented in previous sections, the presence
of crosscutting in a software system negatively affects its modularity and it is one of the most significant indicators of Technical
Debt (Wong et al., 2011; Alves et al., 2016). However, modularity
anomalies in a system may impact its quality in different ways,
since other quality attributes could be affected, and it may increase
interest in different ways. This work focuses on empirically evaluating whether modularity anomalies at the requirements level are
directly related to software maintainability (in terms of stability),
jeopardizing the system quality. Note that maintainability is one
of the main characteristics contributing to Technical Debt interest.
Therefore, if modularity violations are harmful to maintainability,
they will contribute to increase that interest. Moreover, the identification of Technical Debt at the requirements level may allow
the application of refactoring techniques from the very beginning
of the development reducing, thus, the interest generated. In other
words, ignoring refactorings because of a lack of awareness of better modularization techniques may increase the interest of the software systems (Abad and Ruhe, 2015).
In this context, the main research question (MRQ) that we try
to answer is:
•
MRQ: Is Technical Debt based on modularity anomalies at the
requirements level harmful to software stability?
To answer this question, the next main hypothesis will be
tested:
96
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Fig. 5. Building the dependency matrix.
•
Hypothesis: The more Technical Debt related to modularity
anomalies at the requirements level, the less stable a system
is.
This hypothesis is refined according to the modularity attributes
that we consider in this work as follows:
•
Hypothesis: The higher the Degree of Scattering, tangling or
crosscutting at requirements level a system has, the less stable
the system is.
Note that stability (in conjunction with changeability) is one of
the sub-characteristics of the maintainability characteristic defined
in the product quality model of ISO-25010 (International Organization of Standardization, 2014). Therefore, Technical Debt may directly affect the maintainability of a system.
The evaluation of our hypothesis requires the definition of appropriate measures to assess the attributes that are relevant to
such hypothesis. In that sense, two different sets of software metrics at requirements level are used: modularity metrics and stability ones. The former (modularity) measure concern properties related to scattering, tangling and crosscutting (structural and static
properties of the software requirements). The latter (stability) is
based on observing the evolution of a product line in terms of
changes in the different releases and, thus, these metrics are quantified/observed after a change has been completed. In other words,
these metrics reflect a dynamic behavioral property of the software’s evolution.
Finally, once the hypothesis of our empirical analysis has been
defined, the scenarios used to measure the properties must be established. As claimed in Briand et al (2018), in order to validate software measurement assumptions experimentally, one can adopt two
main strategies: use small-scale controlled experiments or real-scale
industrial case studies. In our case, we adopted the latter strategy
and three different real systems were used to validate the analysis.
Following, first, modularity and maintainability measures are
introduced, and then the systems used as our case studies are presented.
3.1. Modularity measures
Firstly we define the set of metrics that are used to assess
modularity properties: namely scattering, tangling and crosscutting. To quantify these attributes, the framework summarized in
Section 2 is used. Note that this framework provides a formal characterization of these attributes so that it also enables the definition
of metrics to measure them. In that sense, we have used our own
set of modularity metrics (introduced in Conejero, 2010). Moreover,
with the purpose of validating the results obtained by our metrics,
other authors’ modularity metrics previously defined in the literature (Ducasse et al., 2006; Eaddy et al., 2008, Sant’Anna et al.,
20 07; Sant’Anna et al., 20 03; Figueiredo et al., 20 08) have been
also adapted to be applied at the requirements level so that we
can avoid potential bias introduced by using just our metrics.
Firstly, our set of metrics may be observed in Table 3. The information shown in the table for each metric is: (i) its name, (ii)
97
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 3
Metrics defined based on our conceptual framework.
Metric
Definition
Relation with crosscutting pattern
matrices
Calculation
Nscattering (sk )
Number of target elements addressing
source element sk
Addition of the values of cells in row k
in dependency matrix (dm)
=
Degree of Scattering (sk )
Crosscutpoints (sk )
Ncrosscut (sk )
Degree of Crosscutting (sk )
Concern Degree of Tangling (sk )
Normalization of Nscattering (sk )
between 0 and 1 (by dividing it
between the number of target
elements (|T|))
Number of target elements where the
source element sk crosscuts to other
source elements
Number of source elements crosscut by
the source element sk
Addition of the Crosscutpoints (sk ) and
Ncrosscut (sk ) normalized between 0
and 1 (by dividing it between the
addition of the number of source
elements (|S|) and target elements (|T|))
|T |
d mk j
j=1
=
Nscattering
|T |
0
Diagonal cell of row k in the
crosscutting product matrix (ccpm)
= ccpmkk
Addition of the values of cells in row k
in the crosscutting matrix (ccm)
=
i f Nscattering (sk ) > 1
i f Nscattering (sk ) = 1
ccmki
i=1
=
Addition of the Degree of Tangling
metric of each use case that addresses
the concern sk
| S|
( sk )
=
Crosscut points(sk )+Ncrosscut (sk )
| S| + | T |
T
(Degree of Tangling(ti ) ) / f ′ (sk ) = ti
i=1
Table 4
Modularity metrics defined by other authors.
Author
Metric
Original definition
Adapted metric
Definition at requirements level
Sant’Anna et al.
Concern Diffusion over Components (CDC)
Concer Diffusion over
UseCases (CDUC)
Ducasse et al.
Spread
It counts the number of
usecases (target elements)
addressing a concern (source
element)
It counts the number of
usecases related to a particular
concern
Eaddy et al.
Degree of Scattering (DOS)
It counts the number of
components (target elements)
addressing a concern (source
element)
It counts the number of
modules (classes or
components) related to a
particular concern
It is defined as the variance of
the Concentration of a concern
over all program elements with
respect to the worst case
a brief description, (iii) its relation with the conceptual framework
and traceability matrices (introduced in Section 2), and (iv) the formula used to compute it.
Secondly, Table 4 summarizes the set of modularity metrics
previously defined in the literature that are also used in this study.
In (Sant’Anna et al., 2007; Sant’Anna et al., 2003) Sant’Anna et al.
introduced a set of concern-oriented metrics to assess modularity
in terms of fundamental attributes of software such as separation
of concerns, coupling, cohesion or size. We have used the metric
Concern Diffusion over Components (CDC) that measures the number of components whose main purpose is to contribute to the implementation of a concern. In Ducasse et al. (2006), Ducasse et al.
introduced a technique to visualize software partitions in form of
colored rectangles and squares. This technique was called Distribution Map and it allows partitions that represent all the software artifacts to be graphically represented. Based on distribution
maps, the authors introduced the measure Spread, which counts
the number of modules (classes or components) related to a particular concern. Similarly, Eaddy et al. introduced the Degree of
Scattering (DOS) metric in Eaddy et al. (2008), where the authors
presented an empirical analysis showing the correlation existing
between scattering and number of faults in software systems. DOS
provides information about how concern’s code is distributed over
software artifacts.
Note that although both sets of metrics (ours and other authors’) may be applied to different software artifacts (at different
abstraction levels), they have been instantiated here to be applied
at requirements level. For instance, given that other authors’ met-
It is defined as the variance of
the Concentration of a concern
over all usecases with respect
to the worst case
rics were focused on the design or programming level, we adapted
them to the requirements level just by considering different software artifacts (the authors did the same action to apply the metrics at a different abstraction level, e.g. at the architectural level
Sant’Anna et al., 2007). In one case, a new name for the metric
has been even provided, namely Concern Diffusion over Use cases
(CDUC) metric that measures use cases instead of components or
classes. Likewise, all the metrics that were defined in terms of
components or classes were adapted to count use cases (it may
be observed in the last column of Table 4).
Finally, in order to define and use those metrics, we have considered concerns (or features1 ) and use cases as the source and
target domains, respectively. As it is claimed in Jacobson (2003),
use cases have been universally adopted for requirements specification
and in this work we assume that system’s features are defined at
a higher abstraction level (e.g. feature diagrams for domain modeling) than use cases, as it has been widely claimed in the literature
(Eriksson et al., 2005; Griss et al., 1998). This is why they are considered as source and target domains, respectively.
Since all the metrics presented in this section are based on the
relations between source and target domains represented by the
crosscutting pattern all the metrics were calculated based on the
mappings existing between these two domains.
1
Note that in this study the term of feature is used as a synonym of concern.
98
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
3.2. Maintainability measures
In order to measure maintainability of software systems, this
section presents some measures for assessing stability, a particular maintainability attribute. Stability is defined as the capability of software products to avoid unexpected ripple effects when
modifications are performed (ISO/IEC, 2001). Stability is highly related to change management so that the less stable a system is,
the more complicated the change management becomes (Klass van
den, 2006). Unstable models gradually lead to the degeneration
of the design maintainability and its quality in general (Klass van
den, 2006).
To measure stability in software systems, we observed different releases of the same system and computed the number of use
cases that were changed in each release. A modification in a use
case may be mainly due to: either (i) the concerns, which it addresses, have evolved, or (ii) it has been affected by the addition,
modification or removal of a concern to the system. Based on these
changes, the use cases with a number of modifications higher than
a threshold value are marked as unstable, whilst the use cases with
a number of changes lower than this threshold value are considered stable. This threshold value depends on the particular system
or case study where the metric is being applied. For the three case
studies used in this paper, the values are shown in the Appendices
(one for each case study).
Then, we use as stability metrics the numbers of stable and unstable use cases that implement each concern as a way to measure
the stability of these concerns.
Instability(sk ) = # unstable usecases that address Sk
(7)
St ability (sk ) = # st able usecases that address Sk
(8)
3.3. Motivating cases
This section presents three different systems that were used to
perform our study. All these systems were implemented as Software Product Lines with different releases. The reason for choosing
these applications for our analyses is threefold. First, as they are
software product lines, maintainability is of utmost importance;
instabilities and changes affect negatively not only the software
product line architecture but also all the instantiated products.
Second, the software architectures and the requirements had allencompassing documentation; e.g., the description of all the use
cases was made available as well as a complete specification of
all component interfaces. Third, the architectural components were
independently defined and provided by experienced developers.
3.3.1. MobileMedia
MobileMedia is a product line system built to allow the user of
a mobile device to perform different operations, such as visualizing
photos, playing music or videos and sending photos via SMS. It has
around 3 KLOC.
MobileMedia encompasses eight designed and implemented
subsequent releases (from 0 to 7) that support the analysis of
different maintainability facets, such as stability. For instance, release 0 implements the original system with just the functionality of viewing photos and organizing them into albums (see
Figueiredo et al., 2008 for more detail). Its scenarios cover heterogeneous concerns ranging from mandatory to optional and alternative features, as well as non-functional concerns. The different releases together with the changes encompassed in each release have been shown in the Appendix A. Note that the purpose
of these changes is to exercise the implementation of the feature
boundaries and, so, assess the stability of the product line requirements. Note that some non-functional concerns (NFC) are also ex-
plicitly considered as concerns of the system (e.g., Persistence and
Error Handling). All the concerns involved in the system have been
also presented in Appendix A.
3.3.2. HealthWatcher
The second system used in our analysis is called HealthWatcher. HealthWatcher is a typical Web-based program family
that allows a citizen to register complaints regarding health issues
(Greenwood et al., 2007). The system has around 4 KLOC and it has
been developed as a product line in different releases.
The first HealthWatcher release of the Java implementation
was deployed in March 2001. Since then, a number of incremental and perfective changes have been addressed in posterior
HealthWatcher releases. These releases allow us to observe typical types of changes in such application domain. In particular,
for the purpose of our analysis, we have considered the requirements of five different releases of the product line. All these releases and the different concerns involved in each release are also
shown in Appendix B. Note, again, that we have used the concerns
used in previous analyses at later stages, e.g. at architectural level
(Greenwood et al., 2007).
3.3.3. SmartHome
The last product line analyzed in our study was developed by
industry partners of the AMPLE European project. Concretely, this
product line is taken from the domain of the Building Technologies
(BT) division at Siemens (Elsner et al., 2008) and allows simulating
the control of different devices of a smart home, including windows, heating, air conditioning, blinds, alarms, doors, and so on.
We selected this system because a wide range of the system artifacts were publicly available (Elsner et al., 2008), e.g. system descriptions, feature models, and architecture design. Moreover, the
system corresponds to a different domain and it is considerably
bigger than MobileMedia and HealthWatcher, thereby allowing us
to evaluate the generality and scalability of concern-driven analyses. The SmartHome system has around 17 KLOC (Elsner et al.,
2008). The feature model of the product line has been built by
using the SPLOT tool and it is stored and publicly available at its
repository.2 From the huge amount of possible products that could
be generated from this feature model, we selected three releases
(product instantiations), detailed in Appendix C. We have used an
additive strategy to select the three releases so that the first release contains a set of core features and the other ones just add
features to the former. This strategy allows us to analyze the stability of the product line to accomplish changes.
4. Results and discussion
This section presents the process followed to test the main hypothesis established in Section 3. This process is driven by the
evaluation of the metrics introduced in the previous section and
the analysis of their correlations. In other words, modularity metrics are empirically compared with stability ones to test their correlations. The process is composed by the next four different substeps (see Fig. 6): (1) the measurements for all the metrics in the
different releases for each case study (described in Section 3.3) are
calculated; (2) the average for these measurements (for all the releases of each system) are calculated; (3) the measurements are
pairwise correlated in order to calculate Pearson’s correlation coefficient; (4) the results are deeply analyzed to check which software
characteristics are empirically related (correlated) and the main
conclusions of these correlations are extracted.
For the sake of brevity, the results obtained for the metrics in
the three case studies have been shown in the different appendices
2
Software Product Line Online Tool: http://www.splot-research.org/.
99
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Fig. 6. Process followed for first step of the study.
Table 5
Pearson’s coefficients for the correlations among modularity and maintainability metrics.
Modularity metrics
Instability
MobileMedia
HealthWatcher
SmartHome
Nscattering
Degree of
Scattering
Crosscutpoints
NCrosscut
Degree of
Crosscutting
Concern Degree
of Tangling
CDUC
Spread
Degree of
Scattering (Eaddy)
0.71
0.78
0.81
0.76
0.82
0.82
0.74
0.82
0.81
0.74
0.95
0.91
0.86
0.92
0.87
0.77
0.90
0.83
0.71
0.78
0.81
0.71
0.78
0.81
0.84
0.97
0.88
from D to F. Appendix D shows metrics averages for all the releases
of the MobileMedia system, Appendix E shows HealthWatcher results and Appendix F does the same for SmartHome system.
Once measurements were obtained, correlations among them
were calculated for each pair of measures through the Pearson’s
correlation coefficient. However, since we are mainly interested in
testing our main hypothesis, in this section we just focus on correlations among modularity and instability metrics (summarized
in Table 5). The correlations with Stability metric have not been
presented since, obviously, their coefficients are symmetric to Instability ones. However, all the measurements used to obtain the
coefficients and the scatter plots that represent these correlations
are presented in Appendices D–F. Note also that, although it is not
described in this section, the analysis of correlations among other
metrics is also interesting. For instance, these correlations are used
in Section 5 to empirically validate our modularity metrics and reduce its dimensionality by using a Principal Component Analysis
(PCA). To confirm our main hypothesis we observed that scattering, tangling and crosscutting metrics present strong correlations
(with high values of coefficient) with Instability in the three product lines. In other words, features with the highest degrees of scattering, tangling and crosscutting are implemented by more unstable usecases (those frequently changing throughout the different
releases).
Firstly, focusing on MobileMedia system, Table 5 shows how the
correlations range from 0.71 to 0.78 and Degree of Crosscutting
and Eaddy’s Degree of Scattering are the metrics that present highest correlations with Instability, with coefficient values of 0.86 and
0.84, respectively. Secondly, the results obtained for HealthWatcher
system are consistent with those obtained for MobileMedia. Even,
we also observed that correlations obtained for this system were
stronger (in general) than those obtained for the previous system.
In this case, the metrics with highest correlations were Eaddy’s Degree of Scattering and NCrosscut with coefficient values of 0.97 and
0.95, respectively. Finally, the results obtained for SmartHome system, the biggest one, confirm the results obtained for previous systems. However, in this case, we observed a decrease in the coefficient values for the correlations among some of the metrics (e.g.
NCrosscut, Degree of Crosscutting or Eaddy’s Degree of Scattering)
with respect to those obtained for the HealthWatcher system. The
metrics with highest values, in this case, were again NCrosscut and
Eaddy’s Degree of Scattering, with coefficient values of 0.91 and
0.88, respectively.
Based on the results obtained, we evaluated our main hypothesis:
•
Hypothesis: The higher the Degree of Scattering, tangling or
crosscutting at requirements level a system has, the less stable
the system is.
We concluded that all these data provide evidences about a
relationship between these measures. In other words, we could
say that the higher the Degree of Crosscutting a feature has, the
more unstable use cases are implementing that feature. This indicates that modularity anomalies due to crosscutting may be harmful to stability in systems. Moreover, since the opposite (negative) values were obtained for the correlations with Stability metric (shown in Appendix D), these data also come to confirm the
hypothesis showing that stable use cases in all the systems address well-encapsulated features and not crosscutting ones. Therefore, the improvement of modularity, e.g. by means of aspectoriented refactoring techniques, may provide important benefits in
the future in terms of stability, thus, easing the system maintainability and reducing its future Technical Debt interest. Examples
of these aspect-oriented refactoring approaches are Moreira et al.
(2006) and Alférez et al., (2008) where the authors used Use Cases
Pattern Specification and Activity Diagrams composition to improve modularity at requirements level. Based on this approach,
a new relationship is added to use case diagrams notation that allows encapsulating the features with a higher degree of crosscutting. Then, the behaviour of these use cases is defined by means of
activity diagrams that are later on implemented in isolated entities
by using aspect-oriented programming approaches (the application
of this approach was also illustrated in Conejero, 2010).
5. Metrics evaluation
As it was mentioned in previous sections, the results obtained
in the analysis presented may be also useful for further studies.
Hence, with the aim of validating our metrics, in this section we
pairwise compare the correlations obtained for all the modularity
100
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 6
Correlations matrix for modularity metrics in MobileMedia.
Each cell shows Pearson’s coefficient for the correlation between the metric on that row and the one in the corresponding column
Nscattering
Nscattering
Degree of Scattering
Crosscutpoints
NCrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
1.00
0.99
0.99
0.56
0.94
0.91
1.00
1.00
0.72
Degree of
Scattering
1.00
0.99
0.55
0.95
0.96
0.99
0.99
0.75
Crosscutpoints
NCrosscut
1.00
0.62
0.96
0.91
1.00
1.00
0.77
1.00
0.77
0.45
0.56
0.56
0.94
Crosscutpoints
NCrosscut
1.00
0.90
0.96
0.96
1.00
1.00
0.83
1.00
0.98
0.97
0.87
0.87
0.98
Crosscutpoints
NCrosscut
1.00
0.94
0.99
0.94
0.98
0.98
0.98
1.00
0.98
0.92
0.93
0.93
0.98
Degree of
Crosscutting
Concern Degree
of Tangling
1.00
0.91
0.94
0.94
0.91
1.00
0.91
0.91
0.70
Degree of
Crosscutting
Concern Degree
of Tangling
1.00
0.99
0.95
0.95
0.94
1.00
0.95
0.95
0.93
Degree of
Crosscutting
Concern Degree
of Tangling
1.00
0.95
0.98
0.98
1.00
1.00
0.95
0.95
0.95
CDUC
Spread
1.00
1.00
0.72
1.00
0.72
CDUC
Spread
1.00
1.00
0.80
1.00
0.80
CDUC
Spread
1.00
1.00
0.98
1.00
0.98
Degree of
scattering (Eaddy)
1.00
Table 7
Correlations matrix for modularity metrics in HealthWatcher.
Nscattering
Nscattering
Degree of Scattering
Crosscutpoints
NCrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
1.00
1.00
1.00
0.87
0.95
0.95
1.00
1.00
0.80
Degree of
Scattering
1.00
1.00
0.90
0.96
0.97
1.00
1.00
0.83
Degree of
Scattering (Eaddy)
1.00
Table 8
Correlations matrix for modularity metrics in SmartHome.
Nscattering
Nscattering
Degree of Scattering
Crosscutpoints
NCrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
1.00
1.00
0.98
0.93
0.98
0.95
1.00
1.00
0.98
Degree of
Scattering
1.00
0.98
0.94
0.98
0.95
1.00
1.00
0.98
metrics to check whether the results obtained by our metrics
(Table 3) are consistent with those obtained for the metrics previously introduced by other authors (Table 4). Moreover, based on
the correlation coefficients obtained for modularity metrics, a Principal Component Analysis (PCA) is performed to select a representative subset of the modularity metrics considered in this study. As
a result, we may discard some of the modularity metrics in future
studies and calculate just a subset of them.
5.1. Modularity metrics comparison
From Tables 6 to 8 the correlation coefficients among modularity metrics for the three product lines used in our study are presented.
Based on the observation of the correlation matrices for the
three systems, we noticed that the coefficients obtained were, in
general, close to 1, indicating a high correlation among the metrics (the p-values Sokal and Rohlf, 1994 for the correlations in each
system have been also shown in the Appendices from D to F). Furthermore, we also observed some interesting results. For instance,
in MobileMedia (Table 6) we observed that the correlations among
some metrics were not so high like in the other two systems. Concretely, the coefficients for the correlations between NCrosscut and
the rest of metrics are, in general, lower than the rest of correlations. The only exception is for the correlation with Eaddy’s Degree
Degree of
Scattering (Eaddy)
1.00
of Scattering where the value obtained is 0.95. Based on these data,
we identified that the metrics were grouped according to their
pairwise correlations into two main groups: on the one hand, there
is a group composed by NCrosscut and Eaddy’s Degree of Scattering metrics, and, on the other hand, the group composed by the
metrics Nscattering, Degree of Scattering, Crosscutpoints, Degree of
Crosscutting, Concern Degree of Tangling, CDUC and Spread (the
rest of metrics).
Based on the observation of the results obtained for the other
two systems (HealthWatcher and SmartHome), bigger than MobileMedia product line, we identified that all the coefficients were
higher than 0.8 in both systems. We may then conclude that there
exists a strong relationship among all the metrics, indicating that
our metrics are consistent with those previously introduced in the
literature.
5.2. Principal component analysis
The idea of the principal component analysis is to find linear
combinations of correlated variables to describe most of the variation in the dataset with a small number of new uncorrelated
variables (Abdi and Williams, 2010). The PCA transforms the data
to a new coordinate system, where the greatest variance by any
projection of the data lies along the first coordinate (the first principal component), the second greatest variance – along the second
101
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Fig. 7. Representation of the 9 Principal Components in MobileMedia, HealthWatcher and SmartHome Systems.
coordinate, and so on. There can be as many principal components
as variables, but typically only the first two or three are needed to
explain most of the total variation.
Principal components PCx(x ∈ N, x ≤ Nt ) are a linear combination of the original variables:
PCx =
Nt
a(i )x X i
i=1
Where −1 ≤ a(i )x ≤ 1 are the coefficients of the linear transformation, Xi are the original variables and Nt is the number of the
original variables.
In our study we have considered 9 different metrics (variables),
according to the modularity metrics used in the study. Therefore,
PCA will result in 9 principal components. Following these variables are numbered to simplify variable identification within the
images.
X1.
X2.
X3.
X4.
X5.
X6.
X7.
X8.
X9.
NScattering
Degree of Scattering
Crosscutpoints
Ncrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
In order to apply the PCA process in our three systems, we used
as input our correlations matrices (from Tables 6 to 8) that meet
the conditions to be used in the process (to be a symmetric correlation or covariance matrix). Based on the 9 principal components
obtained by the PCA process, Fig. 7 shows the importance of the
first two ones. Concretely, it shows that these two components explain around 98.13% variance in the data set in MobileMedia System, around 99.71% variance in the data set in HealthWatcher System and 98.72% variance in the data set in SmartHome System. In
other words, based on these results we may reduce the 9 variables
to 2 principal components (PC1 and PC2) without compromising
on existing variance. Note that, as aforementioned, the first principal component (PC1) captures the maximum variance in the data
set whilst the second one (PC2) captures the remaining variance in
the data set and is uncorrelated with PC1. In other words, PC1 and
PC2 are orthogonal.
Once we have identified the principal components, we analyzed
the degree of contribution of every variable (metrics) to each principal component, so that we can select those with higher contributions a candidate to be used in further studies. These contribu-
Fig. 8. Contributions of variables to PC1 in MobileMedia System.
Table 9
Groups of metrics according to the contribution to each PC.
System
PC1
PC2
MobileMedia
HealthWatcher
SmartHome
X5, X3, X2, X7, X8, X1
X5, X6, X2, X3, X7, X8, X1
X2, X9, X5, X1, X7, X8, X3
X4, X9
X9, X4
X4
tions are also obtained by applying the PCA process and they are
presented in Figs. 8–13 for the three systems.
The red dashed lines in the graphs from Figs. 8–13 indicate
the expected average contribution of the variables to the principal
components. If the contribution of the variables were uniform, the
expected value would be 1/num_of_variables (in this case 11,11%).
Taking into account this value, for a given component, a variable
with a contribution larger than this cutoff (11,11%) may be considered as providing an important contribution to the component definition (Abdi and Williams, 2010; Kassambara, 2018). In this case,
based on the values obtained, the metrics that provide an important contribution to PC1 and PC2 in each system are the presented
in Table 9.
102
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Fig. 12. Contributions of variables to PC1 in SmartHome System.
Fig. 9. Contributions of variables to PC2 in MobileMedia System.
Fig. 13. Contributions of variables to PC1 in SmartHome System.
Fig. 10. Contributions of variables to PC1 in HealthWatcher System.
We may consider the existence of two groups of well-correlated
variables in every system (see Table 9). Based on these two groups,
our next step is to decide what variables can be selected to be the
representative of these two groups and be used in future analyses. To make this decision, we observed the contribution of every
variable to each PC. Based on these contributions, X5 (Degree of
Crosscutting) and X4 (Ncrossut) have been selected as the most
representative metrics for PC1 and PC2, respectively. There are several reasons for this choice:
•
•
Fig. 11. Contributions of variables to PC2 in HealthWatcher System.
X5 is the metric with a higher contribution to PC1 in two of the
three systems. Moreover, although in SmartHome system there
are two metrics with a slightly higher contribution to PC1 than
the provided by X5, X5 is a normalization of other modularity metrics (including one of those with higher contribution) so
that it is more representative of crosscutting properties. Finally,
the correlation between X5 and the rest of metrics of the same
group is really high in the three systems. Therefore, we can assume that PC1 ≈ n∗ X5, being n the number of metrics in that
group.
X4 has the higher contribution to PC2 in two of the three systems and it is the only one that contributes to PC2 in the
three systems. Moreover, the correlation between X4 and X9 in
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
MobileMedia and HealthWatcher systems is close to one, so we
can assume that PC2 ≈ n∗ X4, being n the number of metrics
that contribute to PC2 in those systems.
So, based on these results, we can conclude that X5 (Degree of
Crosscutting) and X4 (NCrosscut) may be used in further studies as
a representation of the rest of metrics in order to reduce system
dimensionality and redundant data.
6. Threats to validity
In this section, we elaborate on several factors that may
jeopardize the validity of our results. In order to present these
threats, we follow the well-known categorization introduced by
Wohlin et al. (20 0 0), where threats are classified into four different
validity categories: construct, internal, external and conclusion.
6.1. Construct validity
Construct validity focuses on the relationship between the theory behind an experiment and the observations. In that sense,
the selection of concerns and use cases as the elements of the
source and target domains, respectively, may be considered a potential threat to the construct validity of the study. There may be
usually alternative decompositions both in source and target, and
alternative mappings between source and target. These alternative decompositions may impact in a different way on quality attributes such as adaptability, reusability and maintainability. However, whatever the decomposition is, in order to detect the cases
where modularity violations are present, we need to apply the
Crosscutting Pattern. Obviously, the whole empirical process presented here could be applied to these alternative decompositions
and we could select the one with the best results for the desired
quality attributes. In this work, we selected just one possible decomposition considering that it was supervised by experienced developers.
6.2. Internal validity
Internal validity refers to the relationship between the treatment for an experiment and the outcomes obtained—in other
words, whether we are sure that treatment we used in an experiment is really related to the actual outcome we observed. In this
case, another potential threat to the validity of the study may be
the creation of the dependency matrix of the Crosscutting Pattern
that provides the starting point for applying the theory behind the
process and calculating the measurements of the study. This matrix
is filled with the mapping dependencies existing between source
and target. As explained in Section 2.3, the Crosscutting Pattern
was extended with syntactical and dependency-based analyses to
automatically obtain the mappings between source and target elements. This extension was based on the development of an Eclipse
plugin that has been used in our study to collect all the mappings.
Of course, the main goal of this tool is just to assist the developer in the selection of these mappings and the results obtained
may be corrected by the developer based on her own experience.
In our study, the matrices were supervised by experienced developers. The reader may find a deeper description of this extension
in Conejero (2010).
6.3. External validity
applicability of the study out of the academical context could be
ensured. Nevertheless, we are aware that bigger and more complex
systems, e.g. open source ones, would help to better demonstrate
the applicability of the approach in different contexts. However,
most of these systems usually lack a comprehensive documentation, so that neither requirement documents nor use cases are frequently available. These projects usually rely on an agile software
development methodology, where the requirements are defined by
numerical or item textual lists. The main reason is that contributors are usually volunteers and they basically spend their time
in developing short functionalities and bug fixes. As an illustrative
example, the requirements defined in three open sources projects
have been analyzed: Linux Kernel,3 LibreOffice4 and Gimp.5
The development of the Linux Kernel project is mainly driven
by a bug tracker (Bugzilla) used by the developers to decide the
next functionalities to be incorporated into the system. This repository could be used to follow the system’s evolution. However,
the system lacks high abstraction software artifacts (e.g. requirements or design artifacts) so that developers just rely on the list of
bugs to be fixed and the source code. Nevertheless, there are some
works that have used the Linux Kernel Configuration language
(LKC)6 for managing and analysing Linux kernel as a software
product line (Lotufo et al., 2010; Sincero et al., 2007; Sincero and
Schröder-preikschat, 2008; Passos and Czarnecki, 2014). In other
words, they use this language to build a feature model so that
kernel characteristics like the processor architecture are defined as
features of the system. However, this model allows just having a
different representation of the list of functionalities to be built but
the authors did not provide artifacts at an abstraction level higher
than code.
The development of LibreOffice is chaired by an Engineering
Steering Committee (ESC)7 composed by a set of individuals with
skills in different areas of software development like coding, user
experience, QA, release engineering, packaging and more. They
command technical issues about LibreOffice. However, in order to
set the functionalities, they use a shared document where they include the next items to be developed. Similarly, GIMP maintains
the project roadmap in a wiki.8 The wiki includes the main functional requirements to be developed defined just as text so that
they have a large granularity. So, in both projects there is a lack
of software artifacts different from source code which hinders the
applicability of our approach in them.
6.4. Conclusion validity
This validity is concerned with the relationship between the
treatment and the outcome. Obviously, from a statistical point of
view, we may not assure that the results obtained in the experiment may be generalized to any kind of system. As aforementioned, three product lines belonging to different domains have
been used in this study. Moreover, they range from 3KLOC to
17KLOC so that different system sizes were considered. However,
obviously, in order to generalize the results obtained in the study,
other case studies should be considered where different characteristics could be also tested, e.g. different requirement notations and
elicitation; systems where early-aspect oriented techniques have
been used to modularize crosscutting concerns from the beginning
of the development; or systems not implemented as product lines.
3
4
External validity refers to the possibility of generalizing the results outside the scope of the study. In that sense, as explained in
Section 3, the case studies used for our experiment were all developed by external teams (professional programmers) so that the
103
5
6
7
8
The Linux Kernel Archive, https://www.kernel.org/.
LibreOffice, https://es.libreoffice.org/.
GIMP, http://www.gimp.org.es/.
https://www.kernel.org/doc/Documentation/kbuild/kconfig-language.txt
https://wiki.documentfoundation.org/Development/ESC
http://wiki.gimp.org/index.php/Roadmap
104
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
7. Related work
This section has been organized according to the different kind
of works that are included. Firstly, we mention some works that
focus on the identification of Technical Debt. Secondly, some works
that empirically demonstrate the relationship between Technical
Debt and software quality are presented. Finally, some works that
deal with Technical Debt in early stages of development (Requirements or Architecture) are commented.
7.1. Identification of Technical Debt
Different approaches to identify Technical Debt may be found in
the literature, whose characteristics have been collected and compared in mapping studies such as the presented in Li et al. (2015).
Examples of these approaches are the identification of code smells
(Schumacher et al., 2010; Marinescu, 2004), the analysis of design patterns anomalies or grime buildup (Gueheneuc and AlbinAmiot, 2001; Izurieta and Bieman, 2007), the study of violations
of good programmer practices by using ASA (Vetro’ et al., 2010) or
the identification of modularity violations (Wong et al., 2011). Indeed, all these approaches have been summarized in the mapping
study presented in Alves et al. (2016) where the authors analyzed,
among others, Technical Debt types and its main indicators (that
are usually considered by the different approaches). The approach
presented in this paper could be classified into the modularity violations category, which was identified as one of the most recurrent
indicators in Alves et al. (2016). As an example of this category,
in Wong et al. (2011) the authors presented a study where they
analyzed modularity violations in 15 releases of the Hadoop Common system and 10 releases of Eclipse JDT. The authors stated that
two supposedly independent modules should not change together
because of modification requests. They called for these situations
modularity violations. They also categorized violations into four
different types according to symptoms of design problems: cyclic
dependencies, code clone, poor inheritance and unnamed coupling.
As we state in this work, the authors also claimed that making developers awake of violations as soon as possible may help to avoid
accumulating modularity decay. However, all these works are focused on source-code artifacts and, therefore, the identification of
Technical Debt indicators is relegated to late development stages.
By following a different approach, in Fontana et al. (2016) authors used a set of existing tools, which provide general quality
indexes, to analyze whether these indexes could be also related
to Technical Debt. In particular, those tools measure the following
software attributes: structural flaws in production code (used by
CAST9 ); design flaws, including code and architectural smells (inFusion,10 Sonargraph or Structure10111 ); violations of programming
best practices (Sonorgraph); or coding constraints (SonarQube12 ).
In this work, authors emphasize the need for dealing with architectural issues related to Technical Debt (supporting our claim
about anticipating identification to previous development stages),
however, they do not mention requirement artifacts.
7.2. Empirical studies about Technical Debt
In Zazworka et al. (2014) authors presented an empirical analysis where they evaluated the aforementioned four different Technical Debt identification approaches with the aim of studying
whether they could complement each other in terms of their relationships with several Technical Debt interests (quality characteris9
10
11
12
http://www.castsoftware.com/
https://www.intooitus.com/, its evolution at http://www.aireviewer.com.
http://structure101.com/products/
http://docs.sonarqube.org/display/SONARQUBE52/Technical+Debt
tics). To this purpose, authors defined a set of 25 Technical Debt indicators including modularity violations. They extracted interesting
conclusions such as the lack of relationship among some indicators
and interests (meaning that some indicators may not be harmful
to software interest) or the strong relationship identified between
modularity violations and change-proneness (similar to the relationship identified in this work among modularity properties and
stability). In Griffith et al. (2014) authors also conducted a study
where they analyzed the relation between three different Technical Debt estimation approaches and an external quality model. The
study was driven by applying the three estimation approaches to
ten Java open source projects and the quality model included the
following characteristics: reusability, flexibility, understandability,
functionality, extendibility and effectiveness. The results obtained
were compared with the quality model by calculating the correlations and linear regressions among the measures. In Curtis et al.
(2012b) authors performed a study where they estimated the cost
of Technical Debt in software systems by automatically studying
the source code of 745 applications from 160 different companies.
Based on the results obtained, the authors concluded that the 30%
of the Technical Debt interest measured was related to the cost of
changeability. A similar study was presented in Curtis et al. (2012a)
where the authors empirically evaluated the relation among different Technical Debt indicators and software quality characteristics. However, again all these studies are based on the utilization
of Technical Debt estimation approaches and indicators focused
on code-artifacts at the programming level. In this work we performed a similar study but moving the process to the earlier stages
of development so that Technical Debt estimation may be anticipated in the development life cycle.
7.3. Dealing with Technical Debt at early stages
Although it has been clearly identified that Technical Debt is
also related to non-code software artifacts (Brown et al., 2010),
there are just a few works dealing with the management of Technical Debt at early stages of development. In Ernst (2012) Ernst
introduced a definition of Technical Debt at requirements level as
the distance between the optimal solution to a requirements problem
and the actual solution. He also introduced a tool to decide what
the optimal solution to a requirements problem is. Then, when the
requirements problem changes, a new optimal solution may be selected by means of the tool in order to minimize that distance.
Unlike this work, our work focuses on applying at requirements
level some of the techniques used at programming level to identify
Technical Debt. Our main goal is to provide developers with information that they may use to anticipate refactoring decisions and
to reduce, thus, Technical Debt at later stages. Moreover, the work
in Ernst (2012) did not show the relation between Technical Debt
and system quality. In Li et al. (2014) authors focus on the study
of Technical Debt at the architectural level. They evaluate the relationship among modularity metrics and a Technical Debt indicator
called ANMCC (Average Number of Modified Components per Commit) and correlate these measurements concluding that modularity
metrics may substitute ANMCC in order to measure Technical Debt.
Similarly, in Mo et al. (2015) and Fontana et al. (2017) authors
also presented tools to identify architectural problems that usually incur in high maintenance costs or quality problems. Furthermore, they also concluded that architectural problems are an early
source of quality problems that could be avoided by using refactoring techniques. However, unlike the work presented here, in
all these works authors apply their approach over systems’ source
code so that decisions may not be taken in early stages of development. In Brondum and Zhu (2012) authors presented a modelling
approach to visualize complex dependencies at the architectural
level in order to extend current approaches. They argue that their
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
dependency model supports the strategic use of Technical Debt,
the use of more accurate estimation models and the identification of different sources of debt. However, again, authors neither
deal with the relationship between Technical Debt and quality attributes nor provide a way to identify Technical Debt indicators at
early stages of development.
8. Conclusions
This paper has presented an empirical study where we analyzed
the relation between Technical Debt, caused by modularity anomalies, and other software quality properties, namely maintainability
attributes. The study has provided evidences of correlations among
concern properties, such as scattering, tangling and crosscutting,
and stability, one of the most important maintainability characteristics. Based on these correlations we extracted important conclusions that allow us to empirically prove that modularity anomalies
are harmful to system stability. Concretely, the higher the Degree
of Scattering, tangling and crosscutting a system has, the less stable the system is. In other words, we observed that a particular
type of Technical Debt caused by modularity decisions taken at requirements level is highly related to maintainability problems at
this level.
The empirical study has been supported by means of a conceptual framework that allows the identification of modularity anomalies at any abstraction level or development stage. In this study, it
has been instantiated to be focused on early development stages
(requirements level) so that the identification of Technical Debt
may be conducted from the very beginning of the software development life cycle. Modularity metrics, defined by this framework,
have been used during the empirical study. These metrics were
also empirically validated by comparing them with similar metrics
introduced by other authors. This comparison allowed us not only
to validate our metrics but also to identify their dependencies and
determine which metrics are equivalent, i.e. they measure similar
properties.
Finally, the identification of Technical Debt at requirements
level provides important information that may be used to avoid
this problem at later development stages. In this sense, aspectoriented refactoring solutions to apply advanced separation of concerns techniques at the requirements stage may reduce modularity problems at the programming level, with the consequent sav-
ings in both time and money. Less modularity problems imply less
Technical Debt of systems in development in terms of a significant
reduction of their future interest.
As further work, we plan to extend our study by following two
different lines: (i) conducting the study by using other requirement
notations (e.g. goal-oriented ones), this could even allow us to gain
insight in how the selected requirements notation affect Technical
Debt identification at early stages of development; (ii) considering other quality attributes to check whether modularity anomalies may also influence them to derive, perhaps, new conclusions,
examples of these attributes may be software understandability or
reusability.
Acknowledgments
The authors gratefully acknowledge the support of TIN201569957-R (MINECO/FEDER, UE) project, Consejería de Economía e
Infraestructuras/Junta de Extremadura (Spain)- European Regional
Development Fund(ERDF)- GR15098 project and IB16055 project to
the work presented here. This work was also partially supported by
the 4IE project (0045-4IE-4-P) funded by the Interreg V-A EspañaPortugal (POCTEP) 2014-2020 program. We would like to thank A.
Garcia and E. Figueiredo for allowing us to use the MobileMedia
case study and for their comments and support on this work.
Appendix A. MobileMedia releases
Table 10 summarizes the changes made in each MobileMedia
release. The scenarios cover heterogeneous concerns ranging from
mandatory to optional and alternative features, as well as nonfunctional concerns. Table 10 also presents which types of change
each release encompassed. The purpose of these changes is to exercise the implementation of the feature boundaries and, so, assess
the stability of the product line requirements. Note that some nonfunctional concerns (NFC) are also explicitly considered as concerns of the system (e.g., Persistence and Error Handling).
Table 11 shows the concerns used in the analysis and the releases that include these concerns.
The threshold value for considering use cases unstable in this
case study was established to 2 so that any use case with a number of changes equal or higher than 2 was considered as unstable.
This number was selected based on the number of releases (a high
number of releases imply, in general, more changes).
Table 10
Different releases of MobileMedia.
Release
Description
Type of changes
r0
r1
MobileMedia basic functionality
Exception handling included
r2
New feature added to count the
number of times a photo has been
viewed and sorting photos by highest
viewing frequency. New feature added
to edit the photo’s label
New feature added to allow users to
specify and view their favorite photos
New feature added to allow users to
keep multiple copies of photos in
different albums
New feature added to send and receive
photos via SMS
New feature added to store, play, and
organize music. The management of
photo (e.g. create, delete and labeling)
was turned into an alternative feature.
All alternative features (e.g. sorting,
favorites, and copy) were also provided
for music
New feature added to manage videos
None
Inclusion of non-functional
requirement
Inclusion of optional and mandatory
features
r3
r4
r5
r6
r7
105
Inclusion of an optional feature
Inclusion of an optional feature
Inclusion of an optional feature
Changing one mandatory feature into
two alternatives
Inclusion of an alternative feature
106
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 14
Different releases of SmartHome.
Table 11
MobileMedia concerns and releases where are included.
Features
Releases
Features
Releases
Release
Description
Type of changes
Album
Photo
Label
Persistence
Error handling
Sorting
Favourites
r0–r7,
r0–r7,
r0–r7,
r0–r7,
r1–r7
r2–r7
r3–r7
Copy
SMS
Music
Media
Video
Capture
r4–r7
r5–r7
r6, r7
r6, r7
r7
r7
r0
SmartHome core (Heating
Management, Windows
Management, Lights
Management, Presence
Simulator, Fire Control,
Authentication, User
Notifications) + Door
Lock + Security
r0 + Blinds Management + Gas
Detection + Water
detection + Air conditioning
control
r1 + Audio
Management + Dimming
Lights + Phone Call
notifications + Intruse
Detection + CardReader as
Authentication method
None
r1
Appendix B. HealthWatcher releases
For the purpose of our analysis we have considered the requirements of five different releases of the product line. These releases
are summarized in Table 12. As an example, release 0 contains the
core system whilst release 1 represents the core system with the
functionality of sorting complaints by most popular or most frequent.
The different features of the system and the releases where
they were included are described in Table 13.
As in the previous case study, the threshold value for considering use cases unstable in this case study was established to 2. This
number was selected based on the number of releases.
Table 12
Different releases of HealthWatcher.
Release
Description
Type of changes
r0
r1
HealthWatcher core
Feature added to count the
number of times a complaint
has been viewed and sorting
them by frequency.
Allow citizens to geolocalize
complaints origin when they
create them
Allow citizens to login by using
digital signature
Allow citizens to store and
manage their complaints
None
Inclusion of optional feature
r2
r3
r4
Inclusion of optional feature
Inclusion of optional feature
Inclusion of mandatory feature
Table 13
HealthWatcher features and releases where are included.
Features
Releases
Features
Releases
QueryInformation
RegisterComplaint
RegisterTables
UpdateComplaint
RegisterNewEmployee
UpdateEmployee
UpdateHealthUnit
ChangeLoggedEmployee
ResponseTime
Encryption
Compatibility
Access-Control
Usability
Availability
r0r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
Standards
Hardware and Software
Distribution
UserInterface
OperationalEnvironments
Persistence
Concurrency
Performance
ErrorHandling
ViewComplaints
Popularcomplaints
Geolocalization
DigitalSignature
ClientComplaints
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r0–r4
r1–r4
r2–r4
r3, r4
r4
Appendix C. SmartHome releases
The feature model of the product line has been built by using
the SPLOT tool and it is stored and publicly available at its repository.13 This feature model would allow the generation of around
13
Software Product Line Online Tool: http://www.splot-research.org/.
r2
Inclusion of optional
features
Inclusion of optional
features
Table 15
SmartHome concerns and releases where are included.
Features
Releases
Features
Releases
Temperature Control
Windows Management
Lights Management
Presence Simulation
Fire Control
Door Lock
Authentication
Security
r0–r2
r0–r2
r0–r2
r0–r2
r0–r2
r0–r2
r0–r2
r0–r2
User Notifications
Access to Physical KNX Devices
Blinds Management
Floods Detection
Gas Detection
Air Conditioning
Audio Management
Intrusion Detection
r0–r2
r0–r2
r1, r2
r1, r2
r1, r2
r1, r2
r2
r2
382.205 K different products. From this huge amount of possible
products we selected three releases (product instantiations), detailed in Table 14. We have used an additive strategy to select the
three releases so that the first release contains a set of core features and the other ones just add features to the former. This strategy allows us to analyze the stability of the product line to accomplish changes.
Table 15 details the concerns (features) and the releases in
which they are involved.
The threshold value for considering use cases unstable in this
case study was established to 1 so that any use case with one
change or more was considered as unstable. In this case, the number was lower than in the previous case studies since the number
of releases was smaller (just three).
Appendix D. MobileMedia measurements and correlations
This appendix shows the measurements for the MobileMedia
system (see Table 16).
D.1. Correlations with stability measures
Once the selected modularity metrics and maintainability ones
are pairwised compared for the MobileMedia system, the correlations among all these metrics, together with the scatter plots that
represent all these correlations may be observed in Fig. 14. The
scatter plots are shown using the same order for the measures that
are selected in the previous table.
Table 17 shows the p-values for the correlation cofficients
shown in Fig. 14. As it may be observed, the correlations are statistically significant for the groups of metrics that were described
in Section 5.1.
107
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 16
Average of metrics for all the releases.
Metrics
Features
Album
Photo
Label
Persistence
Error Handling
Sorting
Favourites
Copy
SMS
Music
Media
Video
Capture
Modularity
Maintainability
Nscattering
Degree of
Scattering
Crosscutpoints
NCrosscut
Degree of
crosscutting
Concern degree
of tangling
Concern difussuion
usecases
Spread
Degree of
Scattering
Instability
3,63
4,13
5,38
12,8
15,9
4,33
3
2,5
3,67
1
6,5
1
1
0,26
0,3
0,34
0,85
0,98
0,25
0,17
0,13
0,18
0
0,31
0
0
3,63
3,88
5,38
12,4
15,9
4,33
3
2,5
3,67
0
6,5
0
0
5,25
4,13
6
6,38
7
7,33
6
6,25
6,67
0
8,5
0
0
0,39
0,38
0,46
0,77
0,89
0,43
0,32
0,29
0,32
0
0,44
0
0
2,08
2,29
2,41
4,93
4,97
1,44
0,73
0,55
0,54
0
0,58
0
0,02
3,63
4,13
5,38
12,8
15,9
4,33
3
2,5
3,67
1
6,5
1
1
3,63
4,13
5,38
12,8
15,9
4,33
3
2,5
3,67
1
6,5
1
1
0,77
0,62
0,82
0,98
0,99
0,78
0,66
0,61
0,76
0
0,85
0
0
1
2
1
2
2
2
1
1
1
0
1
0
0
Fig. 14. Scatter plots for the correlations in MobileMedia.
108
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 17
p-values for the correlations in MobileMedia system.
Each cell shows the p-value for the correlation between the metric on that row and the one in the corresponding column
Nscattering
Degree of
Crosscutpoints
NCrosscut
Scattering
Nscattering
Degree of Scattering
Crosscutpoints
NCrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
0.046
<0.0 0 0 01
0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
0.0055
< 0.0 0 0 01
<0.0 0 0 01
0.051
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
0.0031
<0.0 0 0 01
0.023
<0.0 0 0 01
0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
0.002
<0.0 0 0 01
0.002
0.12
0.046
0.046
<0.0 0 0 01
Appendix E. HealthWatcher measurements and correlations
Table 18 shows the results obtained for the measurements in
the HealthWatcher system.
Degree of
Concern Degree
Crosscutting
of Tangling
<0.0 0 0 01
0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
0.0 0 0 01
< 0.0 0 0 01
0.0 0 0 01
0.0 0 0 01
0.0077
CDUC
Spread
Degree of
Scattering (Eaddy)
<0.0 0 0 01
<0.0 0 0 01
0.0055
<0.0 0 0 01
0.0055
<0.0 0 0 01
observed in Fig. 15. The scatter plots are shown using the same
order for the measures that is selected in the previous table.
Table 19 shows the p-values for the correlation cofficients
shown in Fig. 15.
E.1. Correlations with stability measures
The correlations among all these metrics, together with
the scatter plots that represent all these correlations may be
Table 18
Modularity and maintainability measurements for the HealthWatcher system.
features
Metrics
Modularity
Maintainability
Nscattering Degree of Crosscutpoints NCrosscut Degree of
Concern Degree Concern Difussuion Spread Degree of Instability
Scattering
crosscutting of Tangling
over Usecases
Scattering
QueryInformation
RegisterComplaint
RegisterTables
UpdateComplaint
RegisterNewEmployee
UpdateEmployee
UpdateHealthUnit
ChangeLoggedEmployee
ResponseTime
Encryption
Compatibility
Access-control
Usability
Availability
UserInterface
OperationalEnvironments
Persistence
Concurrency
Performance
ErrorHandling
ViewComplaints
PopularComplaints
Geolocalization
DigitalSignature
Client complaints
1
3
1
1
1
1
1
1
5
5
4
12,80
5
8
11,80
0
16,20
8
4
17,80
1
4,80
3,60
0,80
1
0
0,16
0
0
0
0
0
0
0,27
0,27
0,22
0,70
0,27
0,44
0,64
0
0,87
0,44
0,22
0,96
0
0,25
0,18
0,04
0,05
0
3
0
0
0
0
0
0
5
5
4
12,80
5
8
11,80
0
16,20
8
4
17,80
0
4,80
3,60
0,80
0,80
0
12,40
0
0
0
0
0
0
14
14
13,40
21
14
14
20,40
0
19
14
13,40
21
0
11
8,40
3,40
1,80
0
0,37
0
0
0
0
0
0
0,46
0,46
0,42
0,81
0,46
0,53
0,78
0
0,85
0,53
0,42
0,93
0
0,37
0,27
0,09
0,06
0,52
1,75
0,22
0,22
0,22
0,22
0,22
0,17
2,65
2,65
2,27
4,25
2,65
3,19
3,91
0
4,72
3,19
2,27
5,10
0,23
1,84
1,38
0,19
0,16
1
3
1
1
1
1
1
1
5
5
4
12,80
5
8
11,80
0
16,20
8
4
17,80
1
4,80
3,60
0,80
1
1
3
1
1
1
1
1
1
5
5
4
12,80
5
8
11,80
0
16,20
8
4
17,80
1
4,80
3,60
0,80
1
0
0,71
0
0
0
0
0
0
0,85
0,85
0,80
0,98
0,85
0,93
0,97
0
0,99
0,93
0,80
1
0
0,71
0,53
0,21
0,18
0
3
0
0
0
0
0
0
4
4
3
5
4
4
4
0
4
4
3
5
1
4
4
1
1
Table 19
p-values for the correlations in HealthWatcher system.
Each cell shows the p-value for the correlation between the metric on that row and the one in the corresponding column
Nscattering
Degree of
Crosscutpoints
NCrosscut
Scattering
Nscattering
Degree of Scattering
Crosscutpoints
NCrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
Degree of
Concern Degree
Crosscutting
of Tangling
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
CDUC
Spread
Degree of
Scattering (Eaddy)
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
109
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Fig. 15. Scatter plots for the correlations in HealthWatcher.
Appendix F. SmartHome measurements and correlations
The results for the measurements obtained for the SmartHome
system may be observed in Table 20.
in Fig. 16. The scatter plots are shown using the same order for the
measures that is selected in the previous table.
Table 21 shows the p-values for the correlation cofficients
shown in Fig. 16.
F.1. Correlations with stability measures
The correlations among all these metrics, together with the
scatter plots that represent all these correlations may be observed
Table 20
Modularity and maintainability measurements for the SmartHome system.
Metrics
Features
Modularity
Maintainability
Nscattering Degree of Crosscutpoints NCrosscut Degree of
Concern Degree Concern Difussuion Spread Degree of Instability
Scattering
crosscutting of Tangling
over Usecases
Scattering
Temperature control
2
Windows management 1
Lights management
3
0,08
0
0,12
2
0
3
1
0
3,67
0,09
0
0,20
0,13
0,07
0,27
2
1
3
2
1
3
0,08
0
0,18
0
0
3
(continued on next page)
110
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 20 (continued)
Metrics
Features
Modularity
Maintainability
Nscattering Degree of Crosscutpoints NCrosscut Degree of
Concern Degree Concern Difussuion Spread Degree of Instability
Scattering
crosscutting of Tangling
over Usecases
Scattering
Presence simulation
Fire control
Door lock
Authenticaton
Security
User notifications
Access to phyisical KNX devices
Blinds management
Flood alarm
Gas alarm
Air Conditioning Management
Intruse detection
Audio control
2
2
2
6
2
5,67
16
2
1,33
1,33
1,33
0,67
0,67
0,08
0,08
0,08
0,25
0,08
0,22
0,63
0,07
0,05
0,05
0,05
0,02
0,02
2
2
1
4
1
3,67
14
2
1,33
1,33
1,33
0,67
0,67
3,67
2
1
5,67
1
3,67
10,33
3,33
1,33
1,33
1,33
0,67
0,33
0,17
0,12
0,06
0,28
0,06
0,21
0,69
0,14
0,07
0,07
0,07
0,04
0,03
0,20
0,20
0,07
0,33
0,07
0,33
0,80
0
0
0,13
0
0
0
Fig. 16. Scatter plots for the correlations in SmartHome.
2
2
2
6
2
5,67
16
2
1,33
1,33
1,33
0,67
0,67
2
2
2
6
2
5,67
16
2
1,33
1,33
1,33
0,67
0,67
0,15
0,11
0,05
0,26
0,05
0,19
0,62
0,12
0,06
0,06
0,06
0,03
0,02
1
0
0
3
0
1
4
1
0
0
0
0
0
111
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Table 21
p-values for the correlations in SmartHome system.
Each cell shows the p-value for the correlation between the metric on that row and the one in the corresponding column
Nscattering
Nscattering
Degree of Scattering
Crosscutpoints
NCrosscut
Degree of Crosscutting
Concern Degree of Tangling
CDUC
Spread
Degree of Scattering (Eaddy)
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
Degree of
Scattering
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
Crosscutpoints
NCrosscut
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
References
Abad, Z.S.H., Ruhe, G., 2015. Using real options to manage technical debt in requirements engienering. In: Proceedings of the Twenty-third IEEE International Requirements Engineering Conference. Ottawa, Canada, pp. 230–235.
Abdi, H., Williams, L.J., 2010. Principal component analysis. Wiley Interdiscip. Rev.
Comput. Stat. 2 (4), 433–459.
Alférez, M., et al., 2008. A model-driven approach for software product lines requirements engineering. In: Proceedings of the SEKE. Knowledge Systems Institute Graduate School, pp. 779–784.
Allman, E., 2012. Managing Technical Debt: Shortcuts that save money and time
today can cost you down the road. ACM Queue 10 (3).
Alves, N.S.R., Mendes, T.S., De Mendonça, M.G., Spinola, R.O., Shull, F., Seaman, C.,
2016. Identification and management of technical debt: a systematic mapping
study. Inf. Softw. Technol. 70, 100–121.
Baniassad, E., Clements, P.C., Araujo, J., Moreira, A., Rashid, A., Tekinerdogan, B.,
2006. Discovering early aspects. IEEE Softw. 23 (1), 61–70.
Briand, L., Morasca, S., Basili, V.R. Defining and Validating High-Level Design Metrics, University of Maryland at College Park.
Brondum, J., Zhu, L., 2012. Visualising architectural dependencies. In: Proceedings
of the Third International Workshop on Managing Technical Debt. MTD, Piscataway, USA, pp. 7–14.
Brown, N., et al., 2010. Managing technical debt in software-reliant systems. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research
– FoSER’10. Santa Fe, USA, pp. 47–52.
Chen, J.-C., Huang, S.-J., 2009. An empirical analysis of the impact of software development problem factors on software maintainability. J. Syst. Softw. 82 (6),
981–992.
Chin, S., Huddleston, E., Bodwell, W., Gat, I., 2010. The economics of technical debt.
Cut. IT J. 82 (10).
Conejero, J.M., 2010. The Crosscutting Pattern: A Conceptual Framework for the
Analysis of Modularity Across Software Development Phases. Universidad de Extremadura.
Conejero, J.M., Hernández, J., Jurado, E., Clemente, P.J., Rodríguez, R., 2009. Early
analysis of modularity in software product lines. In: Proceedings of the Twenty-first Inernational Conference on Software Engineering and Knowledge Engineering (SEKE). Boston, USA, pp. 721–736.
Cunningham, W., 1992. The WyCash portfolio management system. In: Proceedings
of the Object-Oriented Programming Systems, Languages and Applications Conference (OOPSLA), vol. 4. Vancouver, Canada, pp. 29–30.
Curtis, B., Sappidi, J., Szynkarski, A., 2012a. Estimating the principal of an application’s technical debt. IEEE Softw. 29 (6), 34–42.
Curtis, B., Sappidi, J., Szynkarski, A., 2012b. Estimating the size, cost, and types of
technical debt. In: Proceedings of the Third International Workshop on Managing Technical Debt. Piscataway, NJ, USA, pp. 49–53.
Dijkstra, E.W., 1976. A Discipline of Programming. Prentice Hall.
Ducasse, S., Girba, T., Kuhn, A., 2006. Distribution map. In: Proceedings of the Twenty-second IEEE International Conference on Software Maintenance. Philadelphia,
USA, pp. 203–212.
Eaddy, M., et al., 2008. Do crosscutting concerns cause defects? IEEE Trans. Softw.
Eng. 34 (4), 497–515.
Elsner, C., Fiege, L., Groher, I., Jäger, M., Schwanninger, C., Völter, M., 2008. Ample
project. Deliverable d5.3 - implementation of first case study: smart home.
Eriksson, M., Börstler, J., Borg, K., 2005. The PLUSS approach – domain modeling with features, use cases and use case realizations. In: Proceedings
of the Ninth International Conference on Software Product Lines, pp. 33–
44.
Erlikh, L., 20 0 0. Leveraging legacy system dollars for e-business. IEEE IT Prof. 2 (3),
17–23.
Ernst, N.A., 2012. On the role of requirements in understanding and managing technical debt. In: Proceedings of the Third International Workshop on Managing
Technical Debt (MTD). Piscataway, USA, pp. 61–64.
Figueiredo, E., et al., 2008. Evolving software product lines with aspects. In: Proceedings of the Thirtieth International Conference on Software Engineering
(ICSE). Leipzig, Germany, pp. 261–270.
Degree of
Crosscutting
Concern Degree
of Tangling
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
CDUC
Spread
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
<0.0 0 0 01
Degree of
scattering (Eaddy)
<0.0 0 0 01
Fontana, F.A., Roveda, R., Zanoni, M., 2016. Technical debt indexes provided by tools:
a preliminary discussion. In: Proceedings of the 2016 IEEE Eighth International
Workshop on Managing Technical Debt (MTD), pp. 28–31.
Fontana, F.A., Pigazzini, I., Roveda, R., Tamburri, D., Zanoni, M., Nitto, E.D., 2017.
Arcan: a tool for architectural smells detection. In: Proceedings of the 2017
IEEE International Conference on Software Architecture Workshops (ICSAW),
pp. 282–285.
Greenwood, P., et al., 2007. On the impact of aspectual decompositions on design
stability: an empirical study. In: Proceedings of the Twenty-first European Conference on Object-Oriented Programming. Berlin, Germany, pp. 176–200.
Griffith, I., Reimanis, D., Izurieta, C., Codabux, Z., Deo, A., Williams, B., 2014. The
correspondence between software quality models and technical debt estimation
approaches. In: Proceedings of the Sixth International Workshop on Managing
Technical Debt. Victoria, Canada, pp. 19–26.
Griss, M.L., Favaro, J., D’Alessandro, M., 1998. Integrating feature modeling with the
RSEB. In: Proceedings of the Fifth International Conference on Software Reuse
(Cat. No.98TB100203), pp. 76–85.
Gueheneuc, Y.-G., Albin-Amiot, H., 2001. Using design patterns and constraints to
automate the detection and correction of inter-class design defects. In: Proceedings of the Thirty-ninth International Conference and Exhibition on Technology of Object-Oriented Languages and Systems. TOOLS. Washington, USA,
pp. 296–305.
Hung, V.. Software maintenance [Online]. Available: [Accessed: 26-Feb-2016].
International Organization of Standardization. 2014. Systems and software engineering – Systems and software Quality Requirements and Evaluation (SQuaRE) –
Guide to SQuaRE [Online]. Available: https://www.iso.org/standard/64764.html.
[Accessed: 31-Jan-2018].
ISO/IEC, 2001. Software engineering – product quality – Part 1: quality model,
ISO/IEC 9126-1.
Izurieta, C., Bieman, J.M., 2007. How software designs decay: a pilot study of pattern
evolution. In: Proceedings of the First International Symposium on Empirical
Software Engineering and Measurement (ESEM), pp. 449–451.
Jacobson, I., 2003. Use cases and aspects – working seamlessly together. J. Object
Technol. 2 (4), 7–28.
Jacobson, I., Ng, P.-W., 2004. Aspect-Oriented Software Development with Use Cases.
Addison-Wesley Professional.
Kang, K., Cohen, S., Hess, J., Novak, W., Spencer, A., 1990. Feature Oriented Domain
Analysis (FODA). Feasibility Study., Carnegie Mellon University Technical Report,
CMU/SEI-90-TR-21.
Kassambara, Alboukadel, Principal Component Methods in R: Practical Guide, first
ed. STHDA.
Klass van den, B., 2006. Change impact analysis of crosscutting in software architectural design. In: Proceedings of the Workshop on Architecture-Centric Evolution
at Twentieth ECOOP. Nantes, France.
Kruchten, P., Nord, R.L., Ozkaya, I., 2012. Technical debt: from metaphor to theory
and practice. IEEE Softw. 29 (6), 18–21.
Letouzey, J.-L., Ilkiewicz, M., 2012. Managing technical debt with the SQALE method.
IEEE Softw. 29 (6), 44–51.
Li, Z., Avgeriou, P., Liang, P., 2015. A systematic mapping study on technical debt and
its management. J. Syst. Softw. 101, 193–220.
Li, Z., Liang, P., Avgeriou, P., Guelfi, N., Ampatzoglou, A., 2014. An empirical investigation of modularity metrics for indicating architectural technical debt. In: Proceedings of the Tenth International ACM Sigsoft Conference on Quality of Software Architectures. New York, NY, USA, pp. 119–128.
Lotufo, R., She, S., Berger, T., Czarnecki, K., Wasowski, A., 2010. Evolution of the Linux
Kernel Variability Model. Springer-Verlag.
Marinescu, R., 2004. Detection strategies: metrics-based rules for detecting design
flaws. In: Proceedings of the Twentieth IEEE International Conference on Software Maintenance. Chicago, USA, pp. 350–359.
Marinescu, R., 2012. Assessing technical debt by identifying design flaws in software
systems. IBM J. Res. Dev. 56 (5) 9:1–9:13.
Mo, R., Cai, Y., Kazman, R., Xiao, L., 2015. Hotspot patterns: the formal definition
and automatic detection of architecture smells. In: Proceedings of the Twelfth
Working IEEE/IFIP Conference on Software Architecture, pp. 51–60.
Moreira, A., Araújo, J., Whittle, J., 2006. Modeling Volatile Concerns as Aspects.
Springer, Berlin, Heidelberg, pp. 544–558.
112
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Moreira, A., Chitchyan, R., Araújo, J., Rashid, A., 2013. Aspect-Oriented Requirements
Engineering. Springer.
Passos, L., Czarnecki, K., 2014. A dataset of feature additions and feature removals
from the Linux kernel. In: Proceedings of the Eleventh Working Conference on
Mining Software Repositories – MSR 2014. New York, USA, pp. 376–379.
Ramasubbu, N., Kemerer, C.F., 2014. Managing technical debt in enterprise software
packages. IEEE Trans. Softw. Eng. 40 (8), 758–772.
Sant’Anna, C., Figueiredo, E., Garcia, A., Lucena, C.J.P., 2007. On the modularity of
software architectures: a concern-driven measurement framework. In: Proceedings of the First European Conference on Software Architecture (ECSA). Madrid,
Spain, pp. 207–224.
Sant’Anna, C., Figueiredo, E., Garcia, A., Lucena, C., 2007. On the modularity assessment of software architectures: do my architectural concerns count? In: Proceedings of the First Workshop on Aspects in Architectural Description to be
held at Sixth International Conference on Aspect-Oriented Software Development. Vancouver, Canada.
Sant’Anna, C., Garcia, A., Chavez, C., Lucena, C., von Staa, A.V., 2003. On the reuse
and maintenance of aspect-oriented software: an assessment framework. In:
Proceedings of the Seventeenth Brazilian Symposium on Software Engineering.
Manaus, Brazil, pp. 19–34.
Schumacher, J., Zazworka, N., Shull, F., Seaman, C., Shaw, M., 2010. Building empirical support for automated code smell detection. In: Proceedings of the
ACM-IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM). New York, USA, p. 1.
Sincero, J., Schröder-preikschat, W., 2008. The linux kernel configurator as a feature modeling tool. In: Proceedings of the Software Product Line Conference,
pp. 257–260.
Sincero, J., Schirmeier, H., Schröder-Preikschat, W., Spinczyk, O., 2007. Is the linux
kernel a software product line? In: Proceedings of the International Workshop
on Open Source Software and Product Lines, p. 30.
Sokal, R.R., Rohlf, F.J., 1994. Biometry: Principles and Practice of Statistics in Biological Research, third ed W. H. Freeman.
van den Berg, K., Conejero, J.M., Chitchyan, R., 2005. AOSD Ontology 1.0 – Public
Ontology of Aspect-Orientation. AOSD-Europe.
Vetro’, A., Torchiano, M., Morisio, M., 2010. Assessing the precision of FindBugs by
mining Java projects developed at a university. In: Proceedings of the Seventh
IEEE Working Conference on Mining Software Repositories (MSR). Cape Town,
South Africa, pp. 110–113.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A., 20 0 0. Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, Norwell, MA, USA.
Wong, S., Cai, Y., Kim, M., Dalton, M., 2011. Detecting software modularity violations. In: Proceedings of the Thirty-third International Conference on Software
Engineering (ICSE). Honolulu, USA, pp. 411–420.
XQuery 1.0., An XML query language. W3C recommendation. [Online]. Available:
http://www.w3.org/TR/xquery/.
Zazworka, N., Seaman, C., Shull, F., 2011. Prioritizing design debt investment opportunities. In: Proceedings of the Second Working on Managing Technical Debt
(MTD). Honolulu, USA, pp. 39–42.
Zazworka, N., et al., 2014. Comparing four approaches for technical debt identification. Softw. Qual. Control 22 (3).
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
113
José María Conejero received his Ph.D. in Computer Science from Universidad de Extremadura in 2010. He is an Assistant Professor at Universidad
de Extremadura. He is the author of more than 20 papers of journals and conference proceedings and has also participated in different journals and conferences as member of the program committee. His research areas include the Aspect-Oriented Software Development, Requirements
Engineering, Model-Driven Development or Ambient Intelligence.
Roberto Rodríguez-Echeverría received his Ph.D. in Computer Science from Universidad de Extremadura in 2014. He is an Assistant Professor at
Universidad de Extremadura. He is the author of more than 20 papers of journals and conference proceedings and has also participated in different
journals and conferences as member of the program committee. His research areas include Web Engineering, Model-Driven Software Engineering,
Software Modernization and End-User Development.
Juan Hernández received the B.Sc. in Mathematics from the University of Extremadura and the Ph.D. degree in Computer Science from the Technical University of Madrid. He is a Full Professor of the Quercus Software Engineering Group of the Extremadura University (Spain). His research
interests include serviceoriented computing, ambient intelligence, aspect orientation and model driven development. He is involved in several research projects as responsible and senior researcher related to these subjects. He has published the results of his research in more than 100 papers
in international journals, conference proceedings and book chapters. He has participated in many workshops and conferences as speaker and member of the program committee. He is currently member of the Spanish steering committee on Software Engineering and IEEE, and organized several
workshops and international conferences.
Pedro J. Clemente is an Associate Professor of the Computer Science Department at the University of Extremadura (Spain). He received his BSc
in Computer Science from the University of Extremadura in 1998 and a PhD in Computer Science in 2007. He has published numerous peerreviewed papers in international journals, workshops, and conferences. His research interests include component-based software development,
aspect orientation, service-oriented architectures, business process modeling, and model-driven development. He is involved in several research
projects. He has participated in many workshops and conferences as speaker and member of the program committees.
Carmen Ortiz-Caraballo has a Ph.D. in Mathematics from the University of Seville (2011). She is an assistant professor of Mathematics at the
Escola d’Enginyeria d’Igualada of the Universitat Politècnica de Catalunya (Spain). She has published several peer-reviewed papers in international
journals, workshops, and conferences on harmonic analysis. She is involved in diferent research projects. Her research interests include harmonic
analysis and applied mathematics, in which she is currently collaborating with different research groups.
Elena Jurado received the B.Sc. in Mathematics and the Ph.D. degree in Computer Science from the University of Extremadura (Spain) in 1985 and
2003, respectively. She has been a professor at University of Extremadura since 1985, and she is currently an associated professor. Her research
interests include ambient intelligence, multidimensional indexing and content based information retrieval. She has published the results of his
research in more than 20 papers in international journals and conference proceedings.
114
J.M. Conejero et al. / The Journal of Systems and Software 142 (2018) 92–114
Fernando Sánchez-Figueroa received the Ph.D. degree from University of Extremadura. He is currently Professor at the Department of Computer
Science, University of Extremadura, Spain. He belongs to the Quercus Software Engineering group, being his research focused on Web engineering, Big Data Visualization and Model Driven Engineering. He is also the CEO of Homeria Open Solutions, a spin-off arisen from Universidad de
Extremadura.