4.1. CLP Data Preparation and Feature Selection
Based on the data preparation process, the CLP value was normalized, using Equation (2). As a result of normalization, the CLP value was between 0 and 1, and greater CLP indicates better labor productivity for the project. After imputing missing values and removing factors with zero standard deviation, the number of factors was reduced to 108. By eliminating outliers from the CLP data set, 7 data points were removed as outliers, and the total number of data points became 85. Therefore, the CLP data set after the preparation process had 85 data points, 108 CLP factors, and a CLP value.
Next, the number of features was reduced by the proposed HFS method. For this study, the threshold of 0.25 was defined for ReliefF. All features with weights greater than or equal to 0.25 were selected as essential features in the next HFS stage. From 108 factors in the final CLP data set, ReliefF selected 43 as essential features. In the next stage of HFS, which is the integration of SVM and GA as a wrapper method, the GA parameter settings were a population size of 50, GA maximum iteration of 60, crossover rate of 0.83, and mutation rate of 0.2. The SVM penalty factor
C was 10, the kernel type was RBF, and the kernel cache was 200. These parameters were obtained by trial and error and are the optimum values for this case. The termination criteria were a maximum of 60 generations or no improvement of performance over 5 generations. The proposed wrapper method was developed considering these parameters, and it selected 14 of the 43 factors identified by ReliefF. The set of 14 factors was selected when the RMSE of the run of the wrapper model was lowest.
Table 1 presents the selected CLP factors resulting from HFS. As shown in
Table 1, the first 11 factors are all from the activity level, and the next three factors belong to the project level, which shows the significant impact of activity-level factors on predicting CLP. From the selected factors, “Level of interruption and disruption”, “Complexity of task”, “Working condition (dust and fumes)”, “Location of work scope (elevation)” and “Congestion of work area” are factors that negatively influence CLP. In other words, after normalization, when negatively influencing factors have values close to zero, they result in greater CLP, compared to when their values are close to 1. The other selected factors are positively influencing factors, and when their values are close to 1, they result in greater CLP.
4.2. CLP Modeling Comparison and Results
To develop the predictive CLP model, four different AI models were developed using the selected factors from HFS as input variables and CLP as the output. The accuracy of the four models was measured by comparing their predictions to the actual field data and calculating two commonly used error measures, mean absolute error (MAE) and RMSE, which are shown in Equations (12) and (13), where
and
are the actual and predicted CLP values for the
ith instance, respectively, and
m is the number of instances. For this purpose, data were divided into training and testing data sets, in which 70% of data were used for training and 30% for testing.
For development of the ANN model, using MATLAB NN Toolbox, a multilayer feedforward back-propagation network with two hidden layers was considered, and the hidden layer sizes were 5 and 6. The learning rate was set to 0.33, and 200 training cycles were performed. The ANN model resulted in an RMSE of 0.164 and MAE of 0.130 for the training data set, and an RMSE of 0.165 and MAE of 0.135 for the testing data set.
The ANFIS model was generated using the ANFIS function of MATLAB Fuzzy Logic Toolbox. The basic learning rules for optimizing membership functions in ANFIS are either hybrid learning or back-propagation gradient descent. Hybrid learning combines the gradient descent and least square methods, and it overcomes the major limitation of the back-propagation method, which is that the learning process gets trapped in the local minima. Therefore, this study used the hybrid learning method. The training data set was grouped using subtractive clustering with an influence range of 0.4, squash factor of 1.15, and accept and reject ratios set at 0.5 and 1.15, respectively. The selected CLP factors were used as input variables and CLP as the output of ANFIS. The ANFIS model resulted in an RMSE of 0.042 and MAE of 0.034 for the training data set and an RMSE = 0.176 and MAE = 0.138 for the testing data set.
The ANFIS-GA model, developed using MATLAB, tries to optimize ANFIS parameters, and it showed better performance than ANFIS alone. In this study, the values of 0.2, 0.83, and 60 were assigned for the mutation rate, crossover percentage, and maximum iteration of GA, respectively. These parameters were obtained by trial and error and are the optimum values for this case. Different sizes were tested to find the appropriate population size and based on the results as shown in
Table 2, the ANFIS-GA model with a population size of 25 had the best testing performance, which included an MAE of 0.096 for the training data set and MAE of 0.129 for the testing data set. Therefore, a population size of 25 was used in this study.
The RF model was developed using Python language programming and required three parameters, namely the minimum number of terminal nodes for each tree, the number of trees, and the number of randomly selected variables to grow the trees [
62]. In this study, these three parameters were set to 5, 145, and 6, respectively. The results of the RF prediction model are listed in
Table 3 along with results of the ANN, ANFIS, and ANFIS-GA models for comparison.
The results presented in
Table 3 indicate that the RF model had the highest accuracy among the four predictive models, with an RMSE of 0.137 and MAE of 0.112 in the testing data set. The second most accurate algorithm was the ANN model, with a testing data set RMSE of 0.165 and MAE of 0.135. The third most accurate algorithm was the combination of ANFIS and GA, with an RMSE of 0.172 and MAE of 0.129 in the testing data set. Finally, testing data set RMSE of 0.176 and MAE of 0.138 indicate the ANFIS model was the least accurate.
According to the RMSE value of 0.137 for the RF testing data set, CLP predicted by RF was closer to the actual CLP values than for the other three developed models. In other words, RF was found to be better than ANN, ANFIS, and ANFIS-GA in mapping the relationship between the selected CLP factors and CLP. Moreover, the closeness of the RMSE values for the training and testing data sets indicate that ANN and RF were more stable than ANFIS and ANFIS-GA. Therefore, the RF model was selected to predict CLP in the optimization process for this study. Comparing the results of this study with past studies indicate that the RF predictive model has better performance. For example, Gerami Seresht et al. [
39] obtained an RMSE value of 0.22 for their proposed CLP predictive model, while in this study, using the same data set, the RMSE value of the RF model was 0.137. Therefore, the proposed CLP predictive model achieved better performance accuracy in CLP prediction, compared with Gerami Seresht et al. [
39].
4.3. CLP Optimization Results
Next, the integration of RF and PSO was developed to achieve the optimum value of the selected factors and maximum CLP value, according to the objective function in Equation (11). For this case study, the average value of each factor (
) and CLP after normalization are shown in
Table 4, and the average CLP value for the data set is 0.259.
For the purpose of illustrating a CLP improvement trend, a sensitivity analysis was carried out to show the influence of different values of input parameters (namely
and
on output variables (
and
Z) for understanding the impact of input parameters on model output.
Table 5 shows the results of the sensitivity analysis, which indicates the value of
Z and predicted CLP as outputs based on different values of
and
as inputs of the RF-PSO model. The value of
was changed to between 0.27 and 1;
= 1 is the largest possible value for
and indicates that Goal 2 has no impact on the model. The
is in the range of 0.45 to 1, and
is the largest possible value for CLP resulting from the normalization process.
Figure 3 is based on the results in
Table 5, which shows the value of
for different values of
and
. For a specific
, by increasing
,
increases, which shows the model sensitivity to
, which is the relative importance of Goal 1. For
and
= 0.6, the changes in
are much less, given
greater than 0.4. So, it can be concluded that when
is less than or equal to 0.6, the most appropriate value of
is less than or equal to 0.4. This means the minimum deviation of
(predicted value of CLP factors) from
(average value of CLP factors in the dataset) as a Goal 2 in Equation (10) has more weight, compared to the minimum deviation of
from
as a Goal 1 in Equation (9).
For selecting the most appropriate weight and targeted CLP, a company’s preference is important. Most companies prefer minimum deviation from “average value of factors”, which is feasible to reach, helps them decrease the number of corrective measures that are required, and thus reduces the cost of implementing corrective measures. Based on this, Goal 2 needs to have more weight compared to Goal 1, which leads to selecting a value of less than or equal to 0.5 as a weight of Goal 1.
For this case study, the targeted CLP (
) of 0.75 and
of 0.27 were selected. Equation (14) indicates the objective function of the HFS-RF-PSO algorithm, according to the selected factors. In the presented algorithm, the settings were the number of particles = 50, maximum number of iterations = 30, and maximum velocity = 2, and the value of learning factors
and
were both set to 2.05. The initial values of the parameters were established on the basis of the relevant literature [
66]. A large number of trials were performed to obtain the optimum values for this case.
Based on the selected inputs, the result of the RF-PSO model indicated 0.057 as a minimum value of
Z, which is the minimum value of objective function (Equation (14)), and 0.522 was achieved as a maximum value of predicted CLP (
). The optimum value of each factor is shown in
Table 6.
The optimum value of each factor was obtained from the RF-PSO model as the predicted values for CLP factors (
) and the deviation of the optimum value from the average value for each factor. In other words, deviation from average value was achieved using Equation (15):
As shown in
Table 6, the optimum value of the factors “Ground condition,” “Crisis management,” and “Risk monitoring and control” have the least deviation from the average value of selected factors from dataset (
) with values of 0.004, −0.005, and 0.007, respectively. Therefore, these factors do not need major changes to achieve the optimum CLP value, which is 0.522. It is notable in
Table 6 that the optimum values of “Level of interruption and disruption,” “Working condition (dust and fumes),” and “Fairness in performance review of crew by foreman” have the largest deviation from the average value of the factors, which are −0.119, −0.110, and 0.114, respectively. In other words, “Level of interruption and disruption” needs to be reduced to 0.043, “Working condition (dust and fumes)” needs to be reduced to 0.108, and “Fairness in performance review of crew by foreman” needs to increase to 0.808 in order to obtain the optimum CLP value. Improving factors with high deviation helps companies reach optimum predicted CLP. In order to improve factors that have a high deviation from their average value, a number of improvement strategies and corrective measures can be implemented. For example, for reducing dust and fumes in the working area, preventative maintenance for the air-conditioning system can be conducted.
The proposed HFS-RF-PSO model has the potential to benefit construction companies in achieving their preferred labor productivity by applying the minimum changes to factors influencing CLP. Another capability of the proposed model is that companies can define their targeted value for each factor influencing CLP instead of the average value of factors. The results of the model will give them the values of predicted CLP and predicted factors in regard to having the minimum deviation from their targeted values for CLP as well as each factor. This novel approach can help companies identify factors that need the most changes for achieving their targeted CLP and, consequently, to prioritize the management practices that focus on factors with the greatest deviation from average value in the HFS-RF-PSO model.
The proposed HFS-RF-PSO model presented in this paper has a few limitations that need to be addressed in future research. First, the hybrid model was developed, using field data collected for concrete placing activities. In order to develop a generic model of CLP for different types of labor-dependent activities, new field data need to be collected. Second, although the PSO algorithm is computationally efficient compared to other optimization techniques and is robust with respect to control parameters, it can fall into a local optimum in high-dimensional space. In future research, an adaptive PSO algorithm can be developed and added to the hybrid model to improve diversity of the algorithm and avoid falling into local optimum.