Machine Learning for Wireless Network Throughput Prediction
Machine Learning for Wireless Network Throughput Prediction
ScholarWorks @ UTRGV
School of Mathematical and Statistical
Sciences Faculty Publications and College of Sciences
Presentations
2023
Research Article
DOI: https://doi.org/10.21203/rs.3.rs-3267046/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Gustavo A Fernandez
The University of Texas Rio Grande Valley.
Abstract
This paper analyzes a dataset containing radio frequency (RF) measurements and Key Per-
formance Indicators (KPIs) captured at 1876.6MHz with a bandwidth of 10MHz from an
operational 4G LTE network in Nigeria. The dataset includes metrics such as RSRP (Refer-
ence Signal Received Power), which measures the power level of reference signals; RSRQ
(Reference Signal Received Quality), an indicator of signal quality that provides insight
into the number of users sharing the same resources; RSSI (Received Signal Strength Indi-
cator), which gauges the total received power in a bandwidth; SINR (Signal to Interference
plus Noise Ratio), a measure of signal quality considering both interference and noise; and
other KPIs, all derived from three evolved node base stations (eNodeBs). After meticulous
data cleaning, a subset of measurements from one serving eNB, spanning a 20-minute
duration, was selected for deeper analysis. The PDCP DL Throughput, as a vital KPI met-
ric, plays a paramount role in evaluating network quality and resource allocation strategies.
Leveraging the high granularity of the data, the primary aim was to predict throughput.
For this purpose, I compared the predictive capabilities of two machine learning mod-
els: Linear Regression and Random Forest. Metrics such as Mean Absolute Error (MAE),
Root Mean Squared Error (RMSE) were used to examine the models as they offer a com-
prehensive insight into the models accuracies. The comparative analysis highlighted the
superior performance of the Random Forest model in predicting the PDCP DL Through-
put. The insights derived from this research can potentially guide network engineers and
data scientists in optimizing network performance, ensuring a seamless user experience.
Furthermore, as the telecommunication industry advances towards the integration of 5G
and beyond, the methodologies explored in this paper will be invaluable in addressing the
increasingly complex challenges of future wireless networks.
1
1 Introduction
In today’s digital age, telecommunications stands as a cornerstone of global connectivity.
As the world becomes increasingly interconnected, cellular network operators grapple with
the relentless challenge of accommodating escalating user demands. The explosion in media
consumption, especially with the introduction of bandwidth-intensive applications, real-time
media streaming on social platforms, and the rapidly evolving realm of connected and
autonomous vehicles, has placed unprecedented pressure on network resources. To address
these challenges, operators are in a continuous quest for cutting-edge solutions. One of the
primary objectives is to refine resource allocation and load balancing mechanisms, ensuring
that networks can handle the ever-growing data traffic without compromising on performance.
The anticipatory approach to resource allocation and network management is a groundbreak-
ing paradigm that offers a potential solution to these challenges. At the heart of this approach
lies the ability to predict network connectivity fluctuations before they occur. By proactively
identifying potential changes in connectivity, operators can take preemptive actions, ensuring
that the user’s Quality of Service (QoS) remains consistent and reliable. A prime example
of this forward-thinking strategy is the concept of pre-buffering video content. By allocat-
ing additional resources in anticipation of a potential drop in future throughput values for a
user, operators can guarantee uninterrupted streaming experiences. The need for such proac-
tive measures stems from the paramount importance of delivering consistent and high-quality
network connectivity. The proliferation of bandwidth-demanding applications and the expo-
nential growth in media publishing and streaming on social platforms highlight the critical
need for innovative network management strategies.
Certain studies have made significant contributions to the field among the plethora of
research dedicated to enhancing network QoS. For instance, Yue et al. [1] embarked on a com-
prehensive correlation analysis, exploring the intricate relationships between Radio Signals
(RSs) and throughput across various scenarios, from stationary settings to dynamic highway
driving conditions. Their findings emphasized the potential of the Random Forest machine
learning model in predicting network performance based on metrics like RSRP, RSRQ, and
CQI. In a similar vein, Raca et al. [2] delved into the realm of predicting future throughput
windows, evaluating the predictive prowess of diverse machine learning models, from Ran-
dom Forest and Support Vector Machine (SVM) to Neural Networks (NN). Furthermore, a
study by A.Y. et al. [3] emphasized the significance of machine learning models in predicting
downlink throughput on 4G-LTE networks. Their research provides invaluable insights into
the practical applications of these models in real-world network scenarios.
Building on the seminal work of these researchers and addressing the requirements of
contemporary telecommunication networks, this paper presents a detailed analysis of a 4G LTE
network dataset. Concentrating on key metrics such as RSRP, and RSRQ, the research aims
to employ machine learning techniques, notably Linear Regression and Random Forest, to
predict PDCP DL Throughput – a crucial metric in assessing network quality. Recent studies,
including those by D. Minovski et al.[4], and R. Zhohov et al. [5], have further emphasized
the importance and potential of machine learning in throughput prediction, underscoring the
relevance and timeliness of the present research.
While many studies have ventured into network throughput prediction, the distinctiveness
of this research manifests in several pivotal areas:
2
Real World Data:
Grounded in data sourced from an operational 4G LTE network in Nigeria, this research offers
a pragmatic vantage point often eclipsed in predominantly theoretical pursuits.
Focused Predictors:
Singular emphasis is placed on RSRP (Reference Signal Received Power) and RSRQ (Refer-
ence Signal Received Quality) as the chief predictors, enabling a meticulous probe into these
pivotal metrics.
3
2.2 Data Preprocessing
Wireless network datasets are inherently intricate, owing to their exposure to various fluc-
tuating environmental and technical variables. Recognizing this complexity, rigorous data
preprocessing was essential to ensure the robustness and reliability of the subsequent analysis.
To achieve a consistent dataset free from site-specific anomalies, focus was narrowed to
data sourced from a single site. This approach aimed to eliminate discrepancies or inconsis-
tencies that might emerge from variations across different sites. To ensure a comprehensive
understanding of the data’s completeness, missingness heatmaps were used to represent miss-
ing values across features visually. Given the critical nature of certain columns, rows with
missing values in these columns were eliminated, and they were determined to be missing at
random. The result of this meticulous cleanse was a dataset with heightened integrity.
The importance of temporal features in the analysis became evident. The Date Time
column was converted into a date time data type, laying the foundation for time series analysis.
The data was then grouped by this temporal feature, and specific aggregations were applied
to other columns to capture the mean within each time group. To further optimize the dataset,
rows with specific abnormal values in the Serving EARFCN column were removed. A lag
feature was introduced based on the PDCP Throughput DL column to add depth to the
analysis. This temporal aspect provides a time-shifted perspective, invaluable for forecasting
and understanding patterns.
4
as:
𝑛
1 ∑︁
MAE = |𝑦 𝑖 − 𝑦ˆ 𝑖 |
𝑛 𝑖=1
where 𝑦 𝑖 is the actual value, 𝑦ˆ 𝑖 is the predicted value, and 𝑛 is the number of observations.
Root Mean Squared Error (RMSE): Delving deeper into error magnitudes, the RMSE cap-
tures the square root of the mean of squared deviations between predictions and actual
observations. Its formula is:
v
t 𝑛
1 ∑︁
RMSE = (𝑦 𝑖 − 𝑦ˆ 𝑖 ) 2
𝑛 𝑖=1
R-squared: Primarily associated with linear regression, the 𝑅 2 value elucidates the proportion
of variance in the dependent variable that the independent variables in the model account for.
It is computed as:
SSres
𝑅2 = 1 −
SStot
where SSres is the sum of squares of the residuals and SStot is the total sum of squares.
By harnessing these evaluation techniques, the aim was to measure the prediction precision
of the models for PDCP DL throughput and to furnish insights that can illuminate pathways
for subsequent research endeavors in this arena.
3 Results
3.1 Descriptive Analysis
An examination of the PDCP DL Throughput data over the specified 20-minute interval
revealed its inherently dynamic nature. While no discernible pattern was immediately evident,
the data vividly portrayed wireless networks’ ever-fluctuating and volatile nature. Every pass-
ing second exhibited throughput alterations, underlining the network environment’s non-static
and rapidly evolving characteristics. This continuous oscillation in throughput underscores the
challenges and intricacies of predicting such a metric, given its susceptibility to a multitude
of factors that can change from moment to moment. A visual representation of this dynamic
throughput over the interval can be seen in Figure 1.
5
Fig. 1 Dynamic PDCP DL Throughput over a 20-minute interval.
Serving RSRP Serving RSRQ Serving RSSI PCC SINR PHY Throughput DL PDCP Throughput DL
count 1504.0 1504.0 1504.0 1504.0 1504.0 1504.0
mean -85.27 -9.00 -62.02 9.08 7186.58 6026.89
std 7.88 1.10 7.52 6.65 5201.80 4696.75
min -99.91 -14.25 -76.56 -7.62 128.0 0.0
25% -91.28 -9.62 -67.51 3.79 3461.82 2676.51
50% -86.75 -8.83 -63.95 7.95 5326.36 4454.01
75% -79.72 -8.30 -56.82 13.7 9596.29 8297.02
max -59.05 -3.7 -35.92 26.08 28890.94 28040.11
6
Table 2 Performance metrics for model evaluation.
Fig. 2 Comparison of predicted values against actual values for the Linear Regression and Random Forest models.
4 Discussion
Predicting PDCP DL Throughput in wireless networks is an intricate endeavor, laden with
both challenges and avenues for the application of advanced predictive modeling. This study
delved deep into these intricacies, utilizing both Linear Regression—enhanced with a temporal
feature—and the Random Forest model to shed light on throughput predictability.
A cornerstone in model evaluation, the Mean Squared Error (MSE) speaks volumes
about prediction accuracy. Both models displayed admirable prowess. Yet, the Random Forest
model slightly edged out its counterpart, registering an MSE of 3,016,817.89 against Linear
Regression’s 3,202,810.38. This edge can be attributed to the ensemble nature of Random
Forest, adept at discerning non-linearities and subtle data patterns.
The 𝑅 2 score, delineating the explanatory power of the models regarding the variations in
PDCP DL Throughput, painted a congruent picture. Both models posted impressive 𝑅 2 scores
exceeding 0.8. The Random Forest model, however, with an 𝑅 2 of 0.8321, slightly surpassed
the 0.8218 score of the Linear Regression model.
Further insights were gleaned from the Mean Absolute Error (MAE) and Root Mean
Squared Error (RMSE) metrics. The close MAEs of 1,188.59 for Linear Regression and
1,100.69 for Random Forest, coupled with respective RMSEs of 1,789.64 and 1,736.90,
7
reiterate the neck-to-neck performance of the two models. Yet, the slight superiority of the
Random Forest model remained consistent across all metrics.
While the empirical data leans toward Random Forest, the virtues of each model in varied
contexts cannot be understated. With its transparency, Linear Regression elucidates clear
feature-target relationships—priceless in situations where clarity supersedes sheer accuracy.
With its nuanced handling of complex feature dynamics, Random Forest becomes the go-to
when top-tier prediction accuracy is the order of the day.
However, it’s imperative to temper these findings with the understanding of the dataset’s
scope—focused on a singular site over a 20-minute span. This dataset, albeit rich, captures a
mere moment in the vast expanse of network operations. When faced with diverse conditions
or prolonged durations, the true mettle of these models beckons further exploration.
To encapsulate, this investigation accentuates the significance of judicious model selection
in the realm of throughput forecasting. While Random Forest clinched slightly superior metrics
in this endeavor, the ultimate choice hinges on the unique demands of the task—whether
it’s model interpretability, sheer accuracy, or computational nimbleness. As the tapestry of
wireless communication grows more intricate, the tools we harness must evolve in tandem,
propelling the field to new pinnacles of innovation and service par excellence.
5 Conclusion
Wireless networks are the bedrock of our increasingly digitalized world. Ensuring their optimal
performance is more than just a technical imperative; it’s pivotal to the seamless integration
of technology into our daily lives. In this study, the endeavor to predict PDCP DL Throughput
8
via Linear Regression and Random Forest models cast light on the multifaceted nature of
such a task. While the Random Forest model slightly edged ahead, showcasing the prowess
of ensemble methodologies in deciphering complex data patterns, the Linear Regression’s
performance was not to be overshadowed. Its robustness, especially when bolstered with a
temporal dimension, reiterated the lasting relevance of traditional statistical approaches.
The scope of the research, limited to a dataset from a singular location within a concise
time window, serves as a snapshot—a vignette of the grander tableau of challenges in wireless
network predictions. A key takeaway is the absence of a one-size-fits-all solution. The choice
of predictive model hinges on the nuanced requirements of the task at hand, be it sheer
predictive accuracy, model transparency, or computational pragmatism.
Looking ahead, as we stand at the cusp of a 5G-dominated world with whispers of
6G innovations, the imperative for refined, accurate, and adaptable forecasting tools grows
exponentially. This study underscores the necessity for an adaptive research ethos—one that
is receptive to the swift currents of technological progress. By championing such a spirit of
relentless innovation and introspection, we pave the way for wireless networks that are not
just technically superior but also deeply resonant with the dynamic needs of their users.
References
[1] Yue, C., Jin, R., Suh, K., Qin, Y., Wang, B., Wei, W.: Linkforecast: Cellular link bandwidth
prediction in lte networks. IEEE Transactions on Mobile Computing 17(7), 1582–1594
(2017)
[2] Raca, D., Zahran, A.H., Sreenan, C.J., Sinha, R.K., Halepovic, E., Jana, R., Gopalakrish-
nan, V.: On leveraging machine and deep learning for throughput prediction in cellular
networks: Design, performance, and challenges. IEEE Communications Magazine 58(3),
11–17 (2020)
[3] Al-Thaedan, A., Shakir, Z., Mjhool, A.Y., Alsabah, R., Al-Sabbagh, A., Salah, M., Zec,
J.: Downlink throughput prediction using machine learning models on 4g-lte networks.
International Journal of Information Technology, 1–7 (2023)
[4] Minovski, D., Ogren, N., Ahlund, C., Mitra, K.: Throughput prediction using machine
learning in lte and 5g networks. IEEE Transactions on Mobile Computing (2021)
[5] Zhohov, R., Palaios, A., Geuer, P.: One step further: Tunable and explainable throughput
prediction based on large-scale commercial networks. In: 2021 IEEE 4th 5G World Forum
(5GWF), pp. 430–435 (2021). IEEE
[6] Imoize, A.L., Orolu, K., Atayero, A.A.-A.: Analysis of key performance indicators of a
4g lte network based on experimental data obtained from a densely populated smart city.
Data in brief 29, 105304 (2020)