6.1. Experimental Settings
We conducted a comparative study on mobile traffic prediction across various models using the Telecom Italia dataset for Milan City. The models, including ARIMA, ST-ResNet, T-DenseNet, and CNN-RNN [
13,
19,
20], encompass deep learning and traditional forecasting techniques.
For this study, we reformatted INT, Milano Today, Social Pulse, and periodic data into four-dimensional arrays as described in
Section 5, with the array parameters
set to
. This configuration spans a 15-by-15 grid area, covering 225 grids in the Milan Grid, each representing six temporal data points within one hour.
The output array parameters were defined as , focusing on predicting network traffic within a central grid area from the original grid, on emphasizing city center traffic forecasting.
To reflect the broader city-wide traffic, we expanded the prediction scope to approximately a
grid area, covering the entire city center of Milan.
Figure 7a displays the network traffic for this expanded area. Our objective was to predict network traffic over this extensive area.
Different ranges for prediction were selected, as shown in
Figure 7b, with the blue and yellow boxes indicating the chosen prediction ranges. The feature extraction was initially performed within the blue box area, followed by predictions within this range before proceeding to the yellow box area. This iterative process, repeated 81 times, covered the entire city center, culminating in an aggregated average to formulate the final prediction.
In our deep learning model, we optimized its performance by adjusting several parameters. The following are the parameters we set for the model:
For the abbreviations of each data type, please refer to
Table 1.
CNN-RNN: The overall structure of the CNN-RNN model remains similar to our previous work [
13].
Table 2 summarizes the parameters optimized.
Training data: 90% of the data from all but the last two weeks for training, and 10% for validation
Testing data: The second-to-last week.
Table 1 shows the full names and abbreviations used for the different data types.
To evaluate the performance of our model, we employed two commonly used metrics: the mean absolute percentage error (
) and the root mean square error (
). These metrics allow us to assess the accuracy of our predictions. Additionally, we calculated the mean accuracy (
), which provides a more intuitive measure of accuracy between the real and predicted values. The
and
are defined as follows:
The
is calculated as:
These metrics provide insights into the accuracy and performance of our deep learning model in predicting mobile traffic. To address the challenges of mobile internet traffic forecasting, we compare our proposed models against a diverse set of baselines. These baselines include statistical models, feature-focused deep learning methods, and recent transformer-based architectures. This multifaceted comparison will illuminate the strengths and potential shortcomings of our model, guiding future refinements in mobile internet traffic forecasting:
6.2. Benchmarking with TE-CNN-RNN
Our experiment used advanced hardware and software for reliable performance benchmarks. We used an Intel® Core™ i7-14700 CPU with 28 threads, an NVIDIA GeForce GTX 1080 Ti graphics card, and 31.1 GiB of RAM. The software environment was Ubuntu 20.04.6 LTS with a 64-bit OS type. We also tailored the loss function selection for each model, with the TE-CNN-RNN using the mean absolute error (MAE) and Huber loss according to the prediction context. This approach aimed to enhance the model’s predictive accuracy across various scenarios.
Table 2 summarizes the parameters optimized for each model, including those specific to our TE-CNN-RNN architecture. For TE-CNN-RNN, TSMixer and SAMformer, the following parameters were identified through Optuna.
Table 2.
Optimization parameters for CNN-RNN (INT), TSMixer, SAMformer, and TE-CNN-RNN models.
Table 2.
Optimization parameters for CNN-RNN (INT), TSMixer, SAMformer, and TE-CNN-RNN models.
Parameter | CNN-RNN (INT) | TSMixer | SAMformer | TE-CNN-RNN |
---|
Learning Rate | 0.00002 | 0.009 | 0.009 | 0.009 |
Batch Size | 96 | 58 | 58 | 58 |
Optimizer | SGD | Adam | Adam | adagrad |
Clipnorm | - | 9.98 | 9.98 | 1.2 |
L2 norm | 0.001 | | | 0.0002 |
Embedding nodes | 100 | - | - | - |
Exclusive Parameters for TE-CNN-RNN |
Num Heads | 3 | Coefficient | 30 | CNN Filter | 153 |
Fusion FFN | 246 | Kernel Size | 5 | Time Steps | 5 |
D Model | 90 | Dropout Rate | 0.23 |
Table 3 presents a comprehensive performance comparison of our CNN-RNN (INT) baseline, TSMixer, SAMformer, and the proposed TE-CNN-RNN model, focusing on their ability to forecast network traffic. The TE-CNN-RNN model is superior across all tasks (Max, Avg, Min), consistently achieving the lowest RMSE scores and highest MA percentages. This validates its accuracy and robustness for time-series analysis.
TSMixer and SAMformer represent advanced time-series forecasting methods. TSMixer excels in blending operations across time and feature dimensions, while SAMformer enhances efficiency through sharpness-aware optimization and channel-wise attention. However, our TE-CNN-RNN’s advantage likely stems from its comprehensive approach. The integration of CNNs, transformer blocks, and GRUs, combined with our fusion mechanism, enables the model to effectively capture complex spatial–temporal patterns and long-range dependencies inherent in network traffic data.
Table 4 complements the findings from the figures, showing that the TE-CNN-RNN achieved the highest MA and the lowest RMSE consistently across all tasks, further confirming its superior performance. The confidence intervals included in the table reflect the robustness and reliability of these results over five runs with different random seeds.
An analysis of variance (ANOVA) was utilized to assess the prediction accuracy among various time groups and model types, yielding F-statistics of 33.713 (p < 0.001) for time groups and 51.838 (p < 0.001) for models. These results indicate a highly significant discrepancy in performance. Notably, the TE-CNN-RNN model emerged as the most accurate, consistently outperforming TSMixer and SAMformer over multiple time intervals and across differing levels of internet traffic demand.
The statistical tests support the robustness of the TE-CNN-RNN, demonstrating its predictive reliability that is not merely a consequence of sample variation. This tailored analysis method aligns with our goal of creating a forecasting model adept at navigating the complexities of real-world internet traffic characterized by dynamic temporal shifts and diverse volume changes.
Furthermore, the enduring superiority of the TE-CNN-RNN across a range of prediction tasks affirms its utility and sets a precedent in network traffic forecasting endeavors. This model’s performance establishes a new standard in the domain, highlighting the efficacy of the carefully engineered architecture that leverages data fusion. The results also propel the development of our MSCR model, designed to harness multi-faceted data sources’ potential.
6.3. Cross-Source Performance
This section presents the experimental results derived from various input data fusion models compared with our study. We evaluated downtown Milan, an area of approximately 750 grids, as illustrated in
Figure 1c. The evaluation method uses CDR and other relevant datasets to measure network traffic activity.
Table 5 provides a summarized overview of the performance of the various data input fusion models, considering their unique characteristics. This table illustrates the MA and RMSE values across the 750 grids. The mean accuracy metric represents the average accuracy of the model predictions, while the RMSE quantifies the overall discrepancy between predicted values and ground truth.
As seen in
Table 5, the performance of the INT data in isolation mirrors the results we procured in our previous paper for the Maximum (Max), Average (Avg), and Minimum (Min) tasks [
13]. This implies that our architecture effectively predicts network traffic, even in suburban locales characterized by sparse network traffic, as it sustains a high accuracy under such circumstances.
Assessing the data fusion models, we discern that integrating Social Pulse and Milano Today data with INT augments the performance compared to employing INT exclusively. This signifies that these supplementary data sources benefit the network traffic prediction task. Furthermore, upon contrasting the performance of various prediction methodologies, it is consistently observed that the task average MA is superior to the minimum and maximum. This suggests that our model excels at average predictions and exhibits less sensitivity towards extreme values.
In summary, the accuracy attained by incorporating pertinent data sources surpasses the accuracy of the single INT forecasts by approximately 2% to 3%. This underscores the effectiveness of introducing additional data sources to enhance the accuracy of network traffic prediction.
Our approach involves utilizing the temporal characteristics of the INT data by processing them simultaneously in two distinct dimensions: the preceding day and the previous week. We have noted that deep learning models are especially skilled at identifying patterns in the data when both these temporal variables are considered. By considering daily and weekly data simultaneously, we can make better predictions than if we relied only on the forecast of the fusion day.
Table 6 illustrates the prediction performance of integrating daily and weekly temporal characteristics. The table shows combining data from these time scales leads to better predictions. By considering the characteristics of both temporal dimensions together, we observe an improvement in prediction accuracy, with an increase ranging from 3% to 6%, compared to using only the single INT data.
This insight underscores the importance of acknowledging short-term (daily) and long-term (weekly) patterns in network traffic prediction. By simultaneously incorporating daily and weekly temporal characteristics, our model leads to significantly more accurate predictions and a holistic understanding of temporal dynamics.
We have incorporated weather data into our previous analysis to measure the influence of weather conditions on data fusion and investigate their role in enhancing accuracy. According to the findings outlined in
Table 7, relying solely on INT or a combination of INT with Social Pulse or Milano Today does not yield the most favorable predictions. However, an improvement in accuracy ranging from 0.5% to 1% is noticeable when combining periodic data with weather conditions, as demonstrated in
Table 6. This suggests that incorporating weather data can be a significant factor in enhancing the accuracy of predictions.
In
Table 8, we compare the performances of five distinct scenarios: solo INT, INT paired with news, INT integrated with periodic data, INT combined with news and periodic data, and fusion of all available data. It becomes evident from the table that including data linked to INT, such as news and periodic data, boosts the prediction accuracy. Furthermore, the results suggest that periodic data appear more beneficial than Milano Today, Social Pulse, and weather data, as the periodic patterns are naturally embedded within the INT data. It is important to highlight that integrating more relevant data does not trigger interference among the different data sources. Applying deep learning to identify correlations between these data sources enhances the model’s overall performance.
Table 8 compares the impact of different fusion strategies on the data, namely early and late fusion. Early fusion involves preprocessing the data into
grids and merging same-grid data before feeding them into the CNN + LSTM architecture. This process aims to capture the data’s temporal and spatial characteristics. It is important to note that weather data were not included in the early fusion approach. The results show that early fusion, which integrates the data without extracting features, reduces the prediction accuracy, even when the data sources have some relation. This is likely due to the architecture’s inability to capture all the data’s attributes when combined without applying suitable feature extraction techniques. On the other hand, the late fusion approach improves the accuracy of the prediction. In this approach, the neural network performs feature extraction and identifies correlations within the data. The results show that late fusion outperforms early fusion in terms of prediction accuracy. This highlights the importance of neural-network-based feature extraction in enhancing fusion results, as it provides a deeper understanding of the data’s characteristics and correlations.
6.4. Overall Performance
In this sub-section, we will discuss the overall performance of the MSCR model in managing and analyzing diverse data sources. The MSCR model is designed to handle multi-source data fusion effectively.
Table 9 summarizes various model performances—including ST-ResNet, T-DenseNet, ARIMA, and our MSCR framework—in predicting network traffic. The table depicts the MA and RMSE across the 750 grids.
ST-ResNet leverages a deep residual network (ResNet) to forecast crowd traffic. It integrates periodic data and weather information to fuse the raw data. However, its reliance solely on a CNN, without including LSTM, might limit its effectiveness in capturing temporal features. Also, it does not consider other data sources like Milano Today. Despite these constraints, it achieves a superior accuracy of 73.6% in the minimum task, marking a 1% increase compared to previous work.
T-DenseNet employs a dense convolutional network (DenseNet) architecture to forecast call data. This model interconnects each neural network layer, enabling efficient feature extraction and reducing the parameter count. However, T-DenseNet does not employ LSTM, thus limiting its ability to capture time-related characteristics fully. Despite this, it achieves reasonable accuracy levels, improving the prediction accuracy of the predecessor model by approximately 5% across the task maximum, task average, and task minimum.
ARIMA, a conventional time-series model, achieves approximately 76.5% accuracy in the task average. However, it falls short in capturing uncertainties, such as the maximum and minimum task values, where it achieves only 64% and 68% accuracy, respectively. This illustrates the limitations of traditional time-series models in accurately predicting network traffic.
Conversely, our MSCR framework integrates LSTM to capture time dependencies effectively. By merging various data sources and utilizing LSTM, our framework secures higher accuracy rates across the task maximum, task average, and task minimum compared to ST-ResNet and T-DenseNet. Furthermore, our model surpasses the prior work by approximately 7% in prediction accuracy. In the task maximum, task average, and task minimum, our model attains accuracy levels exceeding 80%.
Our MSCR framework, incorporating LSTM and diverse data sources, outperforms other models in predicting network traffic. Moreover, to advance the field of intelligent information systems, we have developed two distinct models: the transformer and the CNN-RNN (fusion). Both models represent our innovative approach to data integration and prediction tasks. Below, we present a comparative analysis of their performance. The transformer model is designed for high precision in scenarios with well-defined patterns, while the all-fusion model is tailored to deliver consistent accuracy across diverse and complex datasets.
Figure 8 offers a detailed visual representation of the forecasting accuracy distribution for each method. This focuses on the expansive 27 × 27 grid area and displays the accuracy distribution for the minimum, average, and maximum tasks.
In the context of the task minimum, both CNN-RNN and CNN-RNN (fusion) outshine other methods with accuracy levels ranging from 70% to 90%. This performance could be due to the inclusion of LSTM, which effectively captures temporal features. Notably, around 550 of the total 750 grids attain a prediction accuracy exceeding 70%.
Regarding the task average, the top three prediction methods CNN-RNN, CNN-RNN(fusion), and T-DenseNet achieve accuracy in the 70% to 90% range. Interestingly, ARIMA also fares well within this range due to its lower variance at the task average.
For the task maximum, which is substantially relevant for traffic offloading applications, CNN-RNN (fusion), T-DenseNet, and CNN-RNN demonstrate superior performance, with accuracy ranging between 70% and 90%. Nonetheless, it is worth highlighting that T-DenseNet exhibits m accurate grids within the 80% to 90% and 60% to 70% accuracy ranges.
Our results demonstrate that both models significantly outperform existing methods, thus providing practical tools for optimizing B5G network management. This study seeks to answer how these advanced models can predict and manage network traffic more effectively in B5G networks. Also, the original CNN-RNN model displays promising results when addressing the network traffic problem across different tasks. Moreover, integrating diverse data types enhances the prediction accuracy across various grids.
As illustrated in
Figure 8, the accuracy rate is mainly within the 60% to 80% range. To delve deeper into the accuracy distribution, we employed the cumulative distribution function (CDF) depicted in
Figure 9.
Regarding the maximum task, our approach (CNN-RNN (fusion)), as noted in
Table 9, achieves an accuracy of 81.7%, marginally higher than the T-DenseNet model. However,
Figure 9a reveals that the T-DenseNet model’s cumulative accuracy surpasses 70% in a more significant number of grids. This implies that the T-DenseNet model has more grids with high or low accuracies. In contrast, our method demonstrates a more consistent performance, yielding an average accuracy within the 60% to 80% range.
Our method outperforms other techniques for average and minimum tasks, as reflected in
Table 9. The CDF chart shows a more substantial accumulation of grids with over an 80% accuracy. Most of our method’s grids achieve an accuracy above 70%, and the average accuracy significantly surpasses other methods.
In conclusion, our method exhibits a higher average accuracy for each grid, especially in the average and minimum tasks. The CDF analysis shows that our method accumulates more grids with an accuracy above 80%.
As evident in
Figure 9, both the ARIMA and T-DenseNet methods tend to accumulate more grids with high and low accuracy, signaling a broad distribution of prediction performance. We introduce color maps to investigate the predicted values on each grid further, emphasizing the accuracy levels for different tasks, including task max, task avg, and task min.
In
Figure 10, we observe that the T-DenseNet and ARIMA methods exhibit lower predicted values in the low-flow regions of the 27 × 27 grid. In contrast, the central region with a higher traffic flow displays higher predicted values than our method. This pattern implies that these methods might capture high-traffic areas more accurately but fail to predict low-flow regions precisely. On the contrary, our method predicts the average value across the entire grid, resulting in a predominantly reddish color map. This signifies that our method achieves an accuracy of over 70% in a significant portion of the grids, delivering a more balanced and, on the whole, superior accuracy prediction.
In conclusion, the color maps provide visual insights into the predicted values, underscoring the superior performance of our method in securing a higher accuracy across the entire grid, particularly in the task average.