This study applied the multi-objective choice function-great deluge hyper-heuristic (HHMO_CF_GDA) approach proposed by Maashi et al. [
5]. HHMO_CF_GDA controls and combines three multi-objective evolutionary algorithms including the multi-objective genetic algorithm (MOGA) [
22,
23], the non-dominated sorting genetic algorithm (NSGAII) [
24,
25], and the strength Pareto evolutionary algorithm (SPEA2) [
26]—which acting as low-level heuristics in the hyper-heuristic framework [
27]. As a selection method, we utilized a choice function with a high-level strategy that adaptively ranks the performance of three low-level heuristics, choosing which one to call at each decision point. Based on four performance metrics—algorithm effort [
10], ratio of non-dominated individuals (RNI) [
10], uniform distribution (UD) of a non-dominated population [
24], and size of space covered (SSC; also known as S-metric hypervolume) [
28]—we used an online learning mechanism to give knowledge of the problem domain to the choice mechanism [
5]. The great deluge algorithm (GDA) acts as an acceptance criterion to accept or reject the candidate solution at each iteration
3.1. Problem Description and Formulation
The purpose of software modularity is to shape a number of subsystems by clustering the software elements in the software system. The consistency of the software module‘s clustering outcome can be evaluated in terms of cohesion and coupling. Cohesion is a function that tests how the modules in a cluster are interrelated. On the other hand, coupling is an inter-cluster term that tests how two given clusters are inter-edge dependencies. Coupling is as minimal as possible between the different subsystems, and cohesion within the subsystem is as high as possible [
29]. Our goal is that a high-quality partition has both a low coupling relationship and a high cohesion relationship, which assumes that well-designed systems are formed by cohesive sets of loosely related modules between each other.
The multi-objective software module clustering problem can be described as a specific group of modules assembled into clusters according to given criteria [
30]. The modules‘ relationships automatically enable better-quality clustering of software modules in the software module clustering problem. These relationships take the form of dependencies among modules. The idea is to minimize coupling between clusters and to maximize cohesion within each cluster. Clustering partitions the collection of all modules in the system. The collection of modules in each partition of the clustering is a cluster. Finding the most suitable clustering for a given collection of modules is an NP-hard problem, making it ideal for search-based software engineering techniques [
30]. The clustering problem is defined as follows:
“The set of n objects X = {Xl, X2, ... , xn} is to be grouped. Each object, Xi, is to be clustered into non-overlapping groups 0 = {01, O2, ... , Ok} where OJ is a cluster j , or a set of objects and k is the number of clusters, 01 U O2 U . . . U Ok = X, 0i cJ ¢ and 0i n OJ = ¢ for i cJ j”
Clustering presents an idea of the properties of the groups rather than the individuals within them. Recently, clustering methods have been used to support comprehension of the software. The software module clustering problem can be established as a search problem and divided into two components—the first one is a representation of the problem for exploring methodologies and the second one is a cost function, or a fitness function, to estimate the quality of solutions [
30].
The software module optimization problem is to assign cohesion and coupling values, trying to achieve maximum cohesion and low coupling. We define the cohesion relationship
Ai of cluster
i with
Ni components and
μi intra-edge dependencies [
31] as follows:
Coupling between the
i-th cluster and the
j-th cluster is expressed by
Ei,j after the software module is clustered [
29]:
We represent a system’s modularization quality (
MQ) as a function that shows the trade-off between interconnectivity and intra-connectivity [
31]. Given a model dependency graph partitioned into
k clusters, where
Ai is the intra-connectivity of the
i-th cluster, and
Eij is the interconnectivity between the
i-th and
j-th clusters, we represent
MQ as follows:
MQ establishes a trade-off between interconnectivity and intra-connectivity that remunerates the creation of highly cohesive clusters and penalizes excessive inter-edge dependencies. The average interconnectivity achieves this trade-off from the average intra-connectivity.
MQ is bound between −1 (no cohesion inside the clusters) and 1 (no coupling among the clusters) [
31]; see
Table 1 for notation explanation.
In this study, we applied HHMO_CF_GDA [
5] to solve our multi-objective module clustering problem. The HHMO_CF_GDA framework proposed in Maashi [
5,
11,
32] is shown in
Figure 2. The high-level strategy can be a learning mechanism or a meta-heuristic [
33]. The high-level strategy task is to lead the search effectively and adapt based on the success or failure of the low-level heuristic components during the search process to reuse the applied method for solving several problems [
34]. Low-level heuristics are the hyper-heuristic framework’s problem domain-specific elements; therefore, they can enter any related information, such as candidate solutions. So, the high-level strategy does not alter while low-level heuristics and the evaluation function need editing when taking a new problem [
33]. In the HHMO_CF_GDA framework, there is a clear separation between the high-level hyper-heuristic method and low-level heuristic components. The idea of the domain barrier is to provide a higher level of abstraction for hyper-heuristics and raise hyper-heuristics’ generality by applying it to a new problem without changing the framework. The barrier domain only supports problem information, such as fitness, cost, and penalty value [
35]. This barrier does not allow any problem-specific information to pass to the high-level strategy during the search process. The framework is designed in the same modular manner, making it highly flexible and easily replaceable, and its components reusable. The multi-objective choice function-great deluge hyper-heuristic (HHMO_CF_GDA) controls and combines the robustness of three multi-objective evolutionary algorithms (NSGAII, SPEA2, and MOGA), which can be used as low-level heuristics. As a selection method, the choice function utilized with a high-level strategy adaptively ranks the performance of three low-level heuristics, choosing which one to call at each decision point. The great deluge algorithm (GDA) is employed as a move acceptance based on four performance metrics—AE [
10], RNI [
10], SSC [
28], and UD [
24]. An online learning mechanism is used to obtain knowledge of the problem domain for the selection mechanism [
5].
We propose a module clustering optimization framework as shown in
Figure 1. HHMO_CF_GDA was used to solve the software module clusters in order to find optimal or near-optimal solutions that meet the three objectives (low coupling, high cohesion, and high modularization quality). As shown in
Figure 1, the module clustering optimization framework consists of two phases. Phase 1 involves building the matrix. In this phase, the software module matrix is constructed based on the total number of clusters, the number of components in each cluster, and the number of edges between them. The built matrix output is an initial solution individual with appropriate representations based on the total numbers of components, clusters, and edges. The output of phase 1 acts as an input for the optimization process in phase 2. In this phase, we apply the HH_CF_GDA optimizer to solve the multi-objective module clustering optimization problem. The output-optimized solution is built as a diagram to increase decision maker understandability.
The pseudocode of the multi-objective module clustering optimization framework based on HHMO_CF_GDA is shown in Algorithm 1. The SMC framework consists of two phases; the first phase aims to build the input matrix using software information from the decision maker including input values, the total number of clusters, the total number of components in each cluster, and the total number of edges. Then, the number of edges between each node is counted until the total number of nodes is achieved. After determining each cluster with its edges and components, we calculate cohesion, coupling, and initial MQ values. Then, those values are fed to the optimization phase and act as inputs to the HHMO_CF_GDA method. In the second phase, HHMO_CF_GDA is run to solve the SMC problem and obtain the optimized solution.
Algorithm 1 Module Cluster Optimization |
Input | Total number of clusters, total number of components in each cluster, and total number of edges |
Result | Getting optimize objectives |
1. | Phase 1: Build matrix |
1.1 | Sorting each cluster with its edges and components |
1.2 | Calculte the objective‘s value and send it as an initial value |
2. | Phase 2: Optimization process |
2.1 | Apply the optimization process (HHMO_CF_GDA) |
2.2 | Getting optimize objectives. |
HHMO_CF_GDA acts as the optimizer in phase2. The pseudocode of HHMO_CF_GDA [
5] is reprinted in Algorithm 2. HHMO_CF_GDA combines three multi-objective meta-heuristics as low-level heuristics for solving a multi-objective optimization problem, which in our case is a module clustering optimization problem. HHMO_CF_GDA performs a fixed number of iterations. Initially, all low-level heuristics are run for a fixed number of function evaluations with the same population size and number of generations. Then, all low-level heuristics are ranked with respect to performance metrics (AE, RNI, SCC, and UD). The low-level heuristic with the best performance is selected to execute the next iteration. In each iteration, one low-level heuristic is run and then the ranking of all low-level heuristics is updated. This process is reputed until the stopping criteria are met. The choice function provides a balance between intensification and diversification. The choice function addresses the trade-off between the undiscovered areas of the search space and the past performance of each heuristic. So, the heuristic with the best performance will be chosen more frequently to exploit the search area. This will boost the intensification element. The time element in the choice function boosts diversification. A low-level heuristic that is executed for a long period of time is recalled to support diversification by exploring unvisited areas of the search space. See [
5] for more details on how HHMO_CF_GDA works.
Algorithm 2 HHMO_CF_GDA |
Input Data | HHMO_CF_GDA (H, F) whereas H = low level heuristic, F = performance metrics |
Result | Getting optimized result |
1. | Initialization (take initial objectives values) |
1.1 | Run all H members |
1.2 | For all H members, get the values of all members of F for each member of H |
1.3 | Rank all H member based on ranking scheme |
1.4 | Get the values of the simple choice function for each member of H. |
1.5 | The member of H with the largest choice function value will be selected as an initial heuristic |
2. | Repeat |
4. | Execute the selected member of H. |
5. | Get the values of all member of F for the selected H member |
6. | Update the rank for all H members based on ranking scheme |
7. | Updating the choice function values for all H members. |
8. | The member of H with the largest choice function value will be selected for the next iteration |
10. | Until stopping condition is met |