The results are split into two subsections. The first subsection ascertains the performance of environmental state prediction, with respect to the variety of machine and deep learning algorithms. Whereas, the second, determines the feasibility of utilising Genetic Algorithm and Gradient Descent RL for optimal configuration suggestion. The datasets are further split into three smaller subsets: training, testing and validation subsets, with , and of the dataset size, respectively. The training subset is used to learn the model weightings, the testing subset serves as unseen data instances to determine the accuracy of each model, and finally the validation subset is used to fine-tune the machine learning hyper parameters for a more accurate model.
5.1. Prediction of Environmental States
The prediction of intermediary environmental states has a direct impact on the utility of the RL algorithm, as it facilitates the ability of the RL algorithm to learn different intermediary states through simulation.
The machine learning models being used to predict the single and multi performance metrics are presented in
Table 4 with respect to the testing data subsets. The majority of the algorithms preformed exceptionally well obtaining an accuracy approximately greater than 99%, excluding Gradient Boosting and Linear Regression. This proves that the dataset can be predicted with a variety of models without the fear of over fitting. Although decision trees performed exceptionally well with the single metric prediction, the ensemble of decision trees curated by gradient boosting under-performed. However, in the multi metric prediction, the gradient boosting approach managed to obtain a relatively high accuracy in-line with the other approaches. It is worth noting, that the perceptron obtained a very high accuracy in all the single metric predictions, which can hint at the ability of a multi layer perceptron or a deeper model such as DNN, to be able to accurately predict the combination of metrics which motivated the use of DNN in this work.
The single performance metric prediction can be employed to create an ensemble method for combined metric predictions which has been shown to provide a high accuracy as seen from the gradient boosting approach. However, it may suffer to generalise as different UE devices and BS are introduced into the state space. This could be especially troubling with forced configurations set from the BS.
Taking the multi-metric prediction further, a DNN is adopted, where the accuracy of the DNN is plotted against the iteration number for both data subsets are shown in
Figure 3. The DNN demonstrates superior accuracy that both the data subsets obtained, surpassing the accuracies of ML multi performance metric prediction algorithms, attaining an accuracy of 99.8% for
, and 95% for
on the testing dataset. The DNN surpassed the highest
ML approach by 3.9%, but the difference is more stark with the
reaching 12%. Since the DNN exceeds the accuracy of all the models recorded previously for the prediction of multi performance metrics, it is adopted to facilitate learning for the RL model.
5.2. Optimal Configuration Suggestion
The comparison of Gradient Descent and Genetic Algorithm for configuration suggestion is explored through the SmoothL1Loss, reward each model achieves for the performance metrics, percentage of successful configurations and the performance improvement between the default configuration with respect to the suggested configuration. This is to gain a holistic evaluation of the configuration suggestion approach that not only focuses on final model but its training performance as well.
The final RL model is also evaluated on the testing data for each data subset to indicate its performance, in the case of Genetic Algorithm the best individual is adopted as the model to be tested. The reward allocated to the model indicates the quality of suggested configuration speaking to its performance based on Equation (
3). Whereby, a larger reward is proportional to the performance improvement on the UE.
Table 5 depicts the SmoothL1Loss obtained for each model, with the iteration number of its convergence concerning each data subset. It is immediately notable that the
, converged at a much lower number of iterations that its counter part. This can be attributed to the inequality of the subset sizes. Nevertheless, the convergence of the models is not sparse, reaching the same approximate optimum for all the performance metrics. The loss attained by the RL Genetic Algorithm against the number of generation, is presented in
Table 6 for both data subsets and catering to all the metrics. An immediate inspection of the loss concludes that the Genetic Algorithm obtained a much higher loss than that of the Gradient Descent counter part by a factor of
. The number of generation the Genetic Algorithm needed to converge is somewhat misleading, as each generation contained a population of agents placed in the simulated environment to learn, which roughly equates to
instances in the Gradient Descent approach. The best loss had a min of
, which is given by the
delay, though by a negligible margin. It can be seen that the
had a more consistent loss at convergence for all the metrics compared to
which fluctuates at the converged loss, this hints at the stability of values in
compared to
, which may make the
a more attractive option for deploying the RL model.
Figure 4 and
Figure 5 show the Empirical Distribution Function (ECDF) of rewards obtained by the most optimal model with respect to the performance metrics for both data subsets. The delay metric reward resembles a step function, reaching a reward of 1 about
of the time as observed in
Figure 4b, suggesting that the model has found the optimal set of actions to undertake. Contrarily, the energy consumption metric, displays a continuous like function where the model performed well in obtaining a reward of greater than
of the time, there is a sharp incline in the rewards obtained between the ranged of
to
accounting for
of the rewards obtained. As for the multi performance metrics, the ECDF indicates that
of the rewards stem from the range
to
and the other half of the rewards obtained a reward greater than
.
Figure 4a unsurprisingly shows the same reward structure. This is attributed to the metric representation in both subsets.
However, it is obvious that the
performed better, as shown by the larger concentration of rewards located at the higher end of the reward spectrum. That being said, the throughput metric, showed a reward of 0 suggesting that the model assigned every action incorrectly, even though, it was evident that the model converged with the same approximate loss as the other metrics in the same subset. The
models outperformed the
models in every performance metric consolidating the importance of the dataset size regarding Gradient Descent for the models effectiveness and ability to generalise. It is intriguing to note that the rewards obtained by the RL Genetic Algorithm (
Figure 5) is almost identical to the rewards obtained by the RL Gradient Descent method in
Figure 4. In this regard, RL Genetic Algorithm resulted in a more optimal model, mimicking the rewards achieved by the Gradient Descent approach, as well as achieving a better performance for the throughput metric as seen in
Figure 4a, where roughly 70% of the configuration suggestion resulted in the highest reward.
Finally, the accuracy of the RL models with the performance improvement percentage is shown in
Table 7 with respect to the testing data subset. The performance difference between Gradient Descent and Genetic Algorithm is almost negligible concerning
, although the accuracy of successful suggestions leans towards the Genetic Algorithm approach by a minuscule factor of 0.2% if the sum of average performance improvement is taken as the deciding factor. That being said, the actual metric improvement increase is dominated by the Gradient Descent for both data subsets by an average of 3.6%. On the other hand, the percentage of successful suggestions were superior in Genetic Algorithm with a difference of 10%. What is intriguing in this set of results, is the comparable performance of the two approaches albeit the adoption of a simpler DNN architecture for Genetic Algorithm. This can pave a way for creation of RL architectures with less computational complexity without sacrificing model accuracy.
Therefore, gathering the results from the optimisation suggestion, with a special focus on , since the models flourished better in that data subset, the Genetic Algorithm approach outperformed the Gradient Descent approach. This is a combination of the increased convergence speed, the performance improvement of the metric compared to the default configuration and the number of successful suggestions. Although, the Gradient Descent approach obtained better rewards during training phase, the Genetic Algorithm approach obtain consistent performance in terms of successful suggestions coupled with the percentage metric performance between the default configuration and the Genetic Algorithm suggested configuration.