Deep Learning in Network-Level Performance Prediction Using Cross-Layer Information
Deep Learning in Network-Level Performance Prediction Using Cross-Layer Information
4, JULY-AUGUST 2022
Abstract—Wireless communication networks are conventionally average user throughput and the package ACK/NACK rate in
designed in model-based approaches through utilizing performance one unified mathematical model as these two performance
metrics such as spectral efficiency and bit error rate. However, from
the perspectives of wireless service operators, network-level metrics measure two vastly different aspects of the network
performance metrics such as the 5%-tile user data rate and network performance. As a result, most existing models can only mea-
capacity are far more important. Unfortunately, it is difficult to sure the so-called link-level performance. However, it is much
mathematically compute such network-level performance metrics in more desirable to investigate the network-level performance
a model-based approach. To cope with this challenge, this work by taking into account available information from all network
proposes a data-driven machine learning approach to predict
these network-level performance metrics by utilizing customized layers.
deep neural networks (DNN). More specifically, the proposed More specifically, it is critical for network designers to
approach capitalizes on cross-layer information from both the understand network-level performance such as the network
physical (PHY) layer and the medium access control (MAC) layer to capacity, the average user data rate and the 5%-tile user data
train customized DNNs, which was considered impossible for the rate (the data rate of the worst 5% users). The main challenges
conventional model-based approach. Furthermore, a robust training
algorithm called weighted co-teaching (WCT) is devised to overcome in modeling these network-level performance stem from the
the noise existing in the network data due to the stochastic nature of following difficulties. First, the wireless communication net-
the wireless networks. Extensive simulation results show that the work is highly complex with many components and protocols,
proposed approach can accurately predict two network-level which renders the whole system analytically intractable. Sec-
performance metrics, namely user average throughput (UAT)
and acknowledgment (ACK)/negative acknowledgment (NACK)
ond, it is difficult to accurately characterize the channel and
feedback with great accuracy. user behavior using channel transfer functions, user distribu-
Index Terms—Cross-layer information, machine learning, net- tion and motion models. Finally, network events are mostly
work-level performance. stochastic such as user arrival and traffic load. For these rea-
sons, two existing approaches have been developed to evalu-
ate network performance in the literature. The first approach is
I. INTRODUCTION to over-simplify the system mathematical model to approxi-
mate the network-level performance. However, the perfor-
C ONVENTIONALLY, wireless communication networks
are designed based on mathematical models that are
established with expert experience. Such models usually focus
mance of such approximation is far from being satisfactory.
Alternatively, the other existing approach is to develop net-
work simulators to predict or model the network-level perfor-
on one-single network layer, for example, the PHY or MAC mance. Such an approach has been widely adopted in the
layer, as it is considered impossible to develop one model to
wireless communications industry. Despite the large discrep-
unify information from multiple layers. For instance, it is
ancy between simulation and field test results, network simu-
rather challenging to mathematically characterize both the
lators are still the more preferable choice for studying the
network-level performance of a large network. However, the
Manuscript received June 7, 2021; revised February 17, 2022; accepted
March 26, 2022. Date of publication March 29, 2022; date of current version
development of network simulators is prohibitively expensive
June 27, 2022. This work was supported by the National Key Research and and labor-intensive.
Development Program of China under Grant 2020YFB1807700. Recom- In the meantime, powerful machine learning techniques
mended for acceptance by Dr. Guoliang Xing. (Corresponding author:
Man-On Pun.) have been recently developed and successfully applied in
Qi Cao is with the School of Science Engineering, The Chinese University many engineering areas such as image and linguistic process-
of Hong Kong, Shenzhen, Guangdong 518172, China (e-mail: caoqi@cuhk. ing. Built upon the ever-increasing computer power and the
edu.cn).
Man-On Pun is with the School of Science Engineering, The Chinese Uni- availability of Big Data, machine learning techniques are char-
versity of Hong Kong, Shenzhen, Guangdong 518172, China, and with Pen- acterized by their data-driven approach that is particularly
gcheng Laboratory, Shenzhen, Guangdong 518055, China, and also with the suitable for the data-rich wireless communication networks.
Shenzhen Research Institute of Big Data, Shenzhen, Guangdong 518172,
China (e-mail: simonpun@cuhk.edu.cn). However, to our best knowledge, there are only a few existing
Yi Chen is with the School of Science Engineering, The Chinese Univer- works on utilizing data from multiple network layers to under-
sity of Hong Kong, Shenzhen, Guangdong 518172, China, and also with the stand network behaviors and subsequently optimize network
Shenzhen Research Institute of Big Data, Shenzhen, Guangdong 518172,
China (e-mail: yichen@cuhk.edu.cn). design. In the following sections, we first review these related
Digital Object Identifier 10.1109/TNSE.2022.3163274 works before summarizing our main contributions.
2327-4697 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2365
A. Related Works and Main Contributions lacks sufficient bits to describe the messages. However, train-
There were early studies exploiting machine learning in ing the neural network requires that the channel model should
radio resource management (RRM) for wireless networks [1], be able to represent all non-linear properties of the system
[2]. For instance, power allocation in multi-user interference while maintaining to be differentiable, which is difficult to
channels is a classic NP-hard problem due to the combinatorial achieve in real-life systems. To cope with this problem, [16]
nature of the problem. With the goal being maximizing the proposes to use another CNN to approximate the gradient via
weighted sum rate (WSR), the traditional convex optimization supervised learning. Finally, the actor-critic reinforcement
theory can reach solutions that are close to the global optimum learning algorithm has been applied to handle user scheduling
using the iterative algorithm namely, weighted minimum and content caching at the same time [17], [18].
mean-squared error (WMMSE). However, the WMMSE algo- Apart from the problems in PHY layer, [19] studies an RL-
rithm is of high complexity and thus time-consuming [3]. In based resource block (RB) allocation scheduler, which selects
[4], a five-layer fully connected neural network is built to learn the momentarily best scheduler for each transmission time
from the resulting solutions of WMMSE via supervised learn- interval (TTI). As the conventional schedulers always focus on
ing. On this basis, the work in [5] adopts negative WSR as the some particular key performance indicator (KPI), the RL based
loss function to train an ensembling deep neural network scheduler can flexibly choose the best scheduling rule among
(CNN) to solve the same problem of power allocation, showing the conventional schedulers to achieve customized goals. Alter-
a better performance in high SNR regime (>10 dB). However, natively, the work [20] use an RL-based framework to adjust
there is a big impediment hindering the practical implementa- the parameters for the proportional fairness (PF) scheduler,
tion of such DNNs, which is the dynamic number of users. It is which can also allocate RBs better than conventional schedu-
shown from a theoretical perspective that the graph neural net- lers. Using a similar methodology, [21] improves the quality of
work (GNN) is a powerful solver to combinatorial problems as service (QoS) for an unmanned aerial vehicle (UAV)-based
it is adaptively scalable according to the number of entities [6]. immersive live system. In addition, authors in [22] studied a
Thus, by using the a GNN to solve the same power allocation dense small-cell network and proposed to capitalize on deep Q-
problem, the WSR is increased by more than 2% with respect learning (DQN) to reduce the end-to-end delay.
to WMMSE that always finds a local optimum. There are studies concerning other aspects in RRM. For
The above power allocation methods require accurate chan- instance, an actor-critic reinforcement learning is utilized to
nel state information (CSI). However, in many current com- solve the user allocation problem aiming at more energy-effi-
munication networks, accurate CSI of each user equipment cient strategies [23], while the work [24] considers the user
(UE) may not be available, especially in frequency division allocation problem from the handover point of view. Also, the
duplex (FDD) systems. A more common practice is that each authors in [25] effectively reduce the energy consumption in
UE feeds back its channel quality indicators (CQI) to the base base station sleeping control with a data-driven method. Fur-
station, and the base station determines the communication thermore, in [26]–[28], data-driven signal recolonization and
scheme based on the CQI [7]. In [8], a machine learning-based modulation classification problems were investigated, show-
solution is studied that uses only accessible communication ing impressive performance when it is compared with model-
overhead data such as CQI on the transmit side. Based on a driven method.
two-cell model, the study applies reinforcement learning to
allocate limited transmit power to 10 UEs working in the
B. Contributions
same frequency band. The work shows that reinforcement
learning is superior to the traditional algorithm in terms of the Most studies discussed above focus on utilizing information
5%-tile and median UE data rates. Furthermore, the work [9] from only one network layer, but they neglect to verify
shows that it is sufficient to effectively adjust the modulation whether a network is predictable or not, especially when it
and coding scheme (MCS) selection dynamically when the comes to the multi-layer architecture. In contrast with above
base station (BS) only has the UE’s CQI feedback. studies, this work considers a DNN structure to predict the net-
Apart from power allocation, a remarkable application of work-level performance by exploiting information from both
deep learning in the PHY layer is the end-to-end learning- PHY and MAC layers. In [29], we have made some initial
based wireless communication system. The basic idea behind attempts to explore the feasibility of such a CNN structure and
is that a communication system is similar to a neural network achieved some preliminary results. In this work, we will rigor-
in the sense of that both systems have input and output. By ously define the average UE throughput specifically designed
replacing the encoding/modulation and decoding/demodula- for our proposed framework. Furthermore, we will extend our
tion module with DNNs (also known as the autoencoder and investigations to the prediction of UE average throughput as
autodecoder respectively), the whole system can be automati- well as the ACK/NACK feedback. Finally, we will provided
cally optimized by unsupervised learning. Interestingly, it has in-depth elaboration on the applications of such predictions
been observed from [10]–[15] that the trained autoencoder for network parameter fine-tuning. The main contributions of
works like a conventional channel coder when the BS has this work are summarized as follows:
redundant bits to represent the messages. In contrast, the To our best knowledge, this work is the first successful
trained autoencoder behaviors like modulation when the BS attempt to demonstrate that it is feasible to accurately
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
2366 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 9, NO. 4, JULY-AUGUST 2022
predict network-level performance using DNN by (RBG) are set to serve downlink UEs. Each RBG consists of
exploiting both PHY and MAC information derived several RBs. We consider the downlink transmission scenario
from various network counters of vastly different natures in which both the BS and UEs are equipped with two antennas.
and complex network mechanisms such as out loop link In contrast to most works in the literature that studied the full-
adaptation (OLLA) and proportional fairness (PF) user buffer transmission, we consider the bursty traffic mode in
scheduling. Specifically, we design two DNN structures which each UE has finite amount of traffic request. Next, we
to predict two important network-level performance met- will elaborate on four basic transmission mechanisms that are
rics, namely the UE average throughput (UAT) and the adopted in our network simulator, namely the OLLA for
ACK/NACK outcome of a transmission, respectively; MCS, the transmission block formation, the proportional fair-
The performance prediction is highly data-dependent, ness user scheduling scheme and the Hybrid Automatic
and the measured real-world data always has very high Repeat reQuest and Retransmission (HARQ). It will be clear
randomness, and the performance is affected by compli- that it is non-trivial to model all these mechanisms mathemati-
cated factors such as scheduling algorithms, user behav- cally using the model-based approach.
iors and so on. However, accurate labels are essential
for training DNNs, and the stochastic nature of wireless A. Out Loop Link Adaptation (OLLA)
communication system causes noisy data. We formulate
The LTE protocol allows the UE to suggest an appropriate
the noisy data cleaning task as a bi-level optimization
MCS to be used in the next transmission, which is aimed at
problem and propose a robust weighted co-teaching
achieving a pre-defined block error rate (BLER). To propose
algorithm to circumvent the problem;
such a suggestion, the UE actually selects its desirable MCS
Predicting network-level performance is ultimately
by sending back a CQI value as a quantized reference. Typi-
about improving QoS/QoE for all users. Leveraging the
cally, each CQI representing a signal-to-noise ratio (SNR)
predictive capability of our trained DNNs, we can depict interval is periodically measured and reported. Thus, MCSs
the MCS landscape for any users, and thus guide the are indeed selected by mapping the received instantaneous
decision-making process during MCS selection. This
SNR into its interval. The challenge about the MCS selection
application is utilized as an example to demonstrate the
is that it cannot be either too aggressive (i.e. too high) or too
feasibility of better utilization of the network resource to
conservative (i.e. too low). A higher-level selected MCS leads
improve users’ QoS by data-driven methodology.
to a larger transport block (TB) size while incurring a higher
Note that the aim of this work is to validate the feasibility of
BLER. In the industry, the MCS achieving a BLER of 10% is
network-level performance prediction. However, this work
commonly adopted to maximize the expected TB size while
can be extended to other applications. For instance, the link- maintaining a high successful transmission rate.
level model is usually over-simplified for the benefit of expe- However, this mapping rule is not sufficient to robustly
diting the system-level simulation, which is known as the
compensate the discrepancy between the chosen MCS and the
link-to-system mapping (L2SM). The mapping is mainly
optimal MCS for different UEs. Note that every UE may have
aimed at providing an outcome of a transport block transmis-
its own preference. For instance, aged devices might require
sion (TB), i.e. whether the TB is successfully received or not
relatively lower MCS for the same given channel conditions
[30]. In the literature, there are many reported results studying
due to their limited computational power. To cope with this
L2SM from an information theoretic point of view [31]–[35].
problem, the OLLA algorithm is designed to enable the BS to
Unfortunately, these studies fail to consider cross-layer infor- adaptively update the CQI value q as follows:
mation as their cross-layer models become analytically intrac-
table. In this work, we devise a new approach to replace the
q ¼ ½q þ a; (1)
L2SM module for network simulation.
In the sequel, we will first introduce the wireless network
where ½ is the rounding operator, and q is the offset CQI used
simulator settings in Section II while the network data prepa-
for MCS selection. Furthermore, a is the adjustment coefficient
ration and the network-level performance prediction tasks are
dynamically updated every time when an ACK/NACK flag is
elaborated in Section III. After that, two customized DNN
fed back. Specifically, let K denote the ACK/NACK flag with
structures are developed in Section IV before two training
algorithms are proposed in Section V. Finally, extensive simu-
1; an ACK received;
lation results are shown in Section VI followed by the conclu- K¼ (2)
sion given in Section VII. 0; a NACK received:
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2367
B. Transmission Block (TB) Formation emptied. On the contrary, if a UE fails to acquire any RB, then
On the BS side, the CQI reported by UEs is one-to-one its effective TB will become zero in the current TTI. As a con-
mapped to an MCS order with unique spectral efficiency (SE), sequence, its moving average throughput will decrease, which
see Appendix for details. We denote by Mð qÞ the mapping will increase its priority in future RBG allocation.
function from the offset CQI q to its SE. Then, the estimated
data rate of the n-th UE in the k-th RBG can be obtained by D. Hybrid Automatic Repeat Request and Retransmission
0 1
X (HARQ)
1
Rn;k ¼ jGn ðkÞj M@ qn;‘ A; (4) Next, we briefly review the HARQ process in the downlink
jGn ðkÞj ‘2G ðkÞ
n transmission. When the TB for a UE is readily to be transmitted
from the BS, the data will be shifted from the buffer to the
where Gn ðkÞ is the set of all RBs in the k-th RBG measured by HARQ buffer. The data will stay in the HARQ buffer until it is
the n-th UE while j j stands for the cardinality of the enclosed completely successfully transmitted or dropped. Usually, eight
set. Furthermore, ‘ and qn;‘ are the RB index and the corre- HARQ processes are prepared for each UE. When a UE has a
sponding offset CQI level, respectively. The BS adopts a new transmission task, its HARQ process with the smallest pos-
default MCS when the CQI is not given, which occurs at the sible index is chosen. Eight TTIs after the data transmission is
beginning of a transmission. According to the estimated data completed, the BS will receive an ACK/NACK flag from the
rates, the BS will then allocate each RBG to an appropriate corresponding UE. In the case of ACK, the HARQ process will
UE while each UE can have more than one RBG. After the terminate as the TB has been successfully transmitted. In con-
allocation is achieved, the BS will re-calculate an MCS for trast, a NACK flag will trigger a retransmission. Since the RBGs
each UE and subsequently, and determine the size of its trans- used to conduct the initial transmission are delegated to conduct
mission block Tn in the current TTI : the retransmission, these RGBs will be temporally unavailable
! for the next user scheduling. The MCS chosen for the retrans-
1 X mission must remain the same to ensure that the same TB can be
Tn ¼ jGn j M qn;‘ : (5)
jGn j ‘2G reloaded to the delegated RBGs again. After five consecutive
n
failed retransmissions of the same HARQ process, the TB will
where Gn the set of all RBs allocated to the n-th UE by the be dropped, which incurs the so-called packet loss.
RBG allocation process. Fig. 1 illustrates an example in which UE1 started a trans-
mission using HARQ process 2in TTI 2. After eight TTIs, the
C. Proportional Fairness (PF) User Scheduling BS received a NACK flag, meaning the transmission was
To balance the tradeoff between system throughput and fair- failed. Then, the BS initiated a retransmission in TTI 10 using
ness among users, an RBG allocation algorithm called propor- the same HARQ process and RBGs.
tional fairness was proposed in [36]. In the algorithm, the BS
records a PF value for each UE-RBG pair, each RBG is allo- III. OBJECTIVES AND DATA PREPARATION
cated to the UE with the largest PF value defined as follows.
As aforementioned, the objective of this paper is to pioneer
The PF value of the UE-RBG pair ðn; kÞ in the i-th transmission
network-level performance prediction in a highly complex
time interval (TTI), denoted by bn;k ½i, is defined as
wireless communication system. In this section, we introduce
Rn;k ½i the network-performance prediction tasks and elaborate on the
bn;k ½i ¼ ; (6) data preparation. It is worth mentioning that a proprietary net-
Tn ½i
work simulator has been employed to generate the raw data
where Tn ½i is the moving average of the historical throughput used in this work. However, we believe that the proposed tech-
and given by niques are generally applicable to the field network data as
well as the simulated data obtained from off-the-shelf network
Tn ½i ¼ ð1 g ÞTn ½i 1 þ gTn ½i 1; (7)
simulators such like NS3 or OPNET.
where g is a small moving average coefficient, and Tn ½i 1 is
the TB size of the n-th UE in the fi 1g-th TTI given by (5). A. Feature Collection
For presentational simplicity, we omit the TTI index in the The state of a UE can be characterized by various features.
sequel. Each RBG will be allocated to the UE with the highest Table I lists some key features and their definitions used in
PF value in its list, specifically, this work. Particularly, a feature may have more than one
n ðkÞ ¼ arg max bn;k : (8) counter, e.g. RB_CQI has 50 counters. In this work, we con-
n
sider in total 268 counters to describe the state of a UE. All
To avoid allocating more RBGs to a UE than it needs, each time these counters, in practice, are available at the BS. The first
when an RBG is allocated to a UE, the system checks if the UE nine features in Table I are classified as the PHY information
has obtained enough RBGs to convey all remaining data in its (also known as Layer 1). In contrast, the last 11 features are
buffer. If so, the UE will be removed from the scheduling list. all MAC information (also known as Layer 2). As discussed
This usually happens when the UE’s buffer is about to be in Section I, it is considered technically impossible to fuse
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
2368 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 9, NO. 4, JULY-AUGUST 2022
TABLE I
COLLECTED NETWORK COUNTERS Definition III: A target UE is a scheduled UE whose perfor-
mance is to be predicted;
Definition IV: A parallel UE is an active UE who is not the
target UE;
Definition V: The Network snapshot of some TTI consists of
features of all active UEs in the TTI.
Now we summarize the tasks as follows:
Task I: Given a network snapshot, the first task is to predict
the UAT for a target UE in the next interval T . Clearly, the
data rate of a target UE is determined by all the complex trans-
mission mechanisms explained in Section II as well as the
states of all active UEs. In particular, since every UE competes
for the limited radio resources of the network, the UAT of the
target UE heavily depends on the state of the parallel UEs. As
a result, UAT prediction has to be performed in the network
level, in lieu of link level. Finally, the 5%-tile UAT that meas-
ures the network fairness can be derived if all UEs’ UATs are
found.
Task II: Given a scheduled UE, the task is to predict its
ACK/NACK. At the first glance, this task may appear to be
related to the link-level performance as ACK/NACK of the
UE is independent of the state of parallel UEs, assuming all
UEs are allocated to non-overlapping RBGs. However, the
information collected from these two layers using the model- ACK/NACK prediction should greatly benefit from the infor-
based approach. In this work, we take advantage of DNN to mation on BLER and iBLER. Furthermore, if the multi-cell
exploit the cross-layer information simultaneously without scenario is considered, then the inter-cell interference surely
explicitly modeling the information. has major impact on the ACK/NACK outcome of a given TB.
Thus, it makes more sense to predict the ACK/NACK result
for a UE in the network level.
A network snapshot consists of all active UEs, and we col-
B. Tasks and Dataset Construction
lect all the concerned counters of a UE in a vector x ¼
Before defining the network-performance prediction tasks, ½c1 ; c2 ; . . . ; cM T , where M is the total number of counters.
we first introduce the following terminology used in this work: Thus the network snapshot is formatted as an N M matrix
Definition I: An active UE of some TTI is a UE with a non- denoted by X ¼ ½x1 ; x2 ; . . . ; xN T , each row corresponding to
empty buffer in the TTI. an active UE, and each column being the same counter mea-
Definition II: A scheduled UE is an active UE who is allo- sured from different UEs. In particular, N is set to be large
cated at least one RBG; enough to cover the maximum number of active UEs in a
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2369
C. Data Preprocessing
Before the samples in the dataset are fed into the DNN, they
are pre-processed to improve the DNN convergence behavors.
Fig. 2. Illustration of data rate calculation. 1) Normalization: The counters collected in a network
snapshot are of different natures. For instance, the ACK/
NACK flag is binary whereas RSRP is a floating-point number
network snapshot in most cases. For the case in which the
and the CQI value is an integer. Therefore, we propose to nor-
number of active UEs in a TTI is less than N, virtual UEs are
malize the values according to the counter type. The maxi-
inserted into the snapshot to keep the total number of active
mum and minimum values of each counter type are first found
users fixed at N. Furthermore, the counters of a virtual UE are
by inspecting a small portion of the sample dataset. After that,
set to some default values that make the virtual UE easily dis-
all counter values are normalized by their respective maxi-
tinguishable from the real UEs. In addition the target UE is
mum and minimum values to the interval ½0; 1.
always placed in the first row namely x1 while the parallel
2) Virtual UE Padding: As mentioned, we assume N
UEs are placed below the target UE in a randomized order.
active UEs in the system. If the actual number of active UEs
We calculate the estimated UAT for the target UE indexed
in the current TTI is smaller than N, then virtual UEs are
by n in TTI t0 by
inserted into the network snapshot to keep the dimensionality
tX
0 þt of the DNN input constant. We propose to fill the virtual UEs’
Tn ½i Kn ½i counters with 1 to make them distinguishable from the regu-
i¼t0
(9)
yn ½t0 ¼ ; lar UEs.
minft; Dtg 3) UE Shuffling: The BS usually communicates with mul-
tiple UEs in the same TTI. Thus, if we switch the current tar-
where Kn ½i is the ACK/NACK flag of the n-th UE in TTI i as
get UE with another scheduled UE, we will generate a new
defined in (2). Furthermore, Dt stands for the duration from t0
training sample. Alternatively, we can keep the target UE
to the moment when the n-th UE successfully receives all its
intact but randomly shuffling the positions of parallel UEs in
data and t is a predefined time period. If the UE successfully
the snapshot, which can produce more samples of the same
receives all its data within t, then the actual transmission
target UE. In short, the UE shuffling process enables more
interval should be Dt (see Case 2in Fig. 2); Otherwise, the
efficient usage of the network simulation data.
UAT is defined with the actual data successfully transmitted
over the time interval t (see Case 1in Fig. 2).
The definition in (9) is motivated by the following observa-
IV. NEURAL NETWORK CONFIGURATION
tions. A UE of a large data buffer to be transmitted may never
be able to receive all its data before the simulation time Fig. 3 shows the structure of the DNN for Task I referred to
expires. In the worst case, a UE who suffers from poor channel as UATNet that consists of four groups of convolutional layers
conditions may never be able to successfully receive any and one group of fully connected layers. Note that we first use
packet (i.e. NACK frequently occurs). Thus, the UAT should 1 5 filters to solely extract the features from individual UE,
not be simply defined as the total data buffer size divided by in lieu of the square filters commonly employed in the image
the simulation time period. In contrast, (9) defines UAT in a processing applications. The fully-connected layers at the end
much finer resolution t using the exact amount of successfully of the DNN are designed to combine the inter-user features.
transmitted data divided by the actual time elapsed to com- To avoid gradient vanishing or exploding, a batch normaliza-
plete such successful transmissions. For Task I, we can gener- tion layer is set before the activation layer with a momentum
ate multiple sets of training samples by adjusting t from one of 0.99, which is not shown in the figure.
simulation run, and the label UAT is calculated by (9). The ACKNet designed for Task II shown in Fig. 4 has a
For Task II, training samples are constructed from every similar structure to UATNet, except that the UATNet takes
scheduled UE in each TTI and the corresponding ACK/ matrix input while ACKNet takes vector inputs. We use one-
NACK flag per transmission is the training label. For the hot encoding to represent the four outcomes of the transmis-
recent wireless networks employing the multi-antenna sion of two TBs.
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
2370 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 9, NO. 4, JULY-AUGUST 2022
Fig. 4. Structure of the proposed ACKNet for the target UE’s ACK/NACK
prediction.
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2371
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
2372 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 9, NO. 4, JULY-AUGUST 2022
1X S
jb
yi yi j
MREP ¼ : (15)
S i¼1 ybi þ yi
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2373
TABLE II
NUMERICAL METRICS
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
2374 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 9, NO. 4, JULY-AUGUST 2022
TABLE III
CONFUSION MATRIX OF ACK/NACK PREDICTIONS
the vibrations in the loss curve occurs when the two DNN
exchange training samples and update weights of each sample. Fig. 12. UATNet-predicted MCS landscape.
The resulting confusion matrix of our prediction on the test
set is shown in Table III. 57.94% of the test samples in the
(N, N) class. In each box in Table III, the number on the top this setting, a new dataset can be acquired and a UATNet can
shows the total samples in the class while the number below be trained via WCT. Then, we randomly pick a sample from
shows the percentage of prediction that falls into the class. For the test set, and vary the target UE’s MCS from its mini-
instance, there are 9829 samples whose labels and predictions mum value 1 to its maximum value 29 and adjust the TB
are both (N, N), occupying 94.24% of all the samples with (N, size accordingly, forming 29 modified copies of the sam-
N) labels. The overall accuracy across the diagonal entries is ple. Using the WCT-trained UATNet to predict the UATs
95.23%. upon the modified snapshots gives us the MCS landscapes
shown in Fig. 12. The red star on the graphs is the original
sample given by the network simulator. Clearly, in these
C. Application: MCS Selection
four examples, the stars are mostly on the curves of the
In this section, we demonstrate a potential application of predicted MCS landscapes.
Task I, which is that we can use a well-trained UATNet to pre- In addition, we find that the MCS landscapes drawn by our
dict the data rate for any given MCS, to facilitate the MCS UATNet are in a “bell shape”. This shape is reasonable as an
selection. We disable the OLLA algorithm and let each UE excessively large MCS incurs frequent NACK while a conser-
hold a random MCS till the end of its transmission. Based on vatively small MCS results in under-utilization of the
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2375
VII. CONCLUSION
Fig. 14. MCS landscape cross validation. In this paper, we have demonstrated the first DNN capa-
ble of predicting the network-level performance of a wire-
less communication system by exploiting information from
allocated RBG. Both cases incur UAT performance degrada- both PHY and MAC layers. More specifically, we have pro-
tion. It is worth emphasizing that Fig. 12 is the first illustration posed two novel DNN structures, UATNet and ACKNet, to
of the MCS landscapes reported in the literature. Empowered predict two network-level performance metrics, namely
with these MCS landscapes, we are capable of designing the user average throughput for a target UE and the ACK/
optimal MCS for UEs. NACK feedback of a TB. In particular, a weighted curricu-
Besides UATNet, ACKNet can also predict the UAT lum training (WCT) algorithm has been developed to allevi-
from another perspective as it can predict the probability ate the impact of noisy labels. Extensive results have
of the outcome of ACK/NACK in terms of different MCS confirmed that UATNet can accurately predict the resulting
orders. Let Tn ðmÞ denote the effective TB size of the UAT while ACKNet can achieve an impressive accuracy
n-th UE under a particular MCS of order m. It can be esti- rate of 95%. Finally, we have demonstrated that the newly
mated by: proposed UATNet and ACKNet can be utilized to find the
optimal MCS value by computing the MCS landscapes for
Tn ðmÞ ¼ P ðK ¼ 1 j MCS ¼ mÞ Tn ; (16) a given UE.
Source code and simulation data used in this work are avail-
where P ðK ¼ 1 j MCS ¼ mÞ is the conditional probability of able on Github at https://github.com/LSCSC/Network-level-
ACK when an MCS of order m is employed, and Tn is the TB Performance-Prediction.
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
2376 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 9, NO. 4, JULY-AUGUST 2022
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.
CAO et al.: DEEP LEARNING IN NETWORK-LEVEL PERFORMANCE PREDICTION USING CROSS-LAYER INFORMATION 2377
[32] E. Chu, J. Yoon, and B. C. Jung, “A novel link-to-system mapping tech- Man-On Pun (Senior Member, IEEE) received the
nique based on machine learning for 5G/IoT wireless networks,” Ph.D. degree in electrical engineering from the Univer-
Sensors, vol. 19, no. 5, 2019, Art. no. 1196. sity of Southern California, Los Angeles, CA, USA, in
[33] D. Petrov, A. Oborina, L. Giupponi, and T. H. Stitz, “Link performance 2006. He was a Postdoctoral Research Associate with
model for filter bank based multicarrier systems,” EURASIP J. Adv. Sig- Princeton University, Princeton, NJ, USA, from 2006
nal Process., vol. 2014, no. 1, 2014, Art. no. 169. to 2008. He is currently an Associate Professor with the
[34] A. Masaracchia, R. Bruno, A. Passarella, and S. Mangione, “Analysis of School of Science and Engineering, The Chinese Uni-
MAC-level throughput in LTE systems with link rate adaptation and versity of Hong Kong, Shenzhen (CUHKSZ), China.
harq protocols,” in Proc. IEEE 16th Int. Symp. World Wireless Mobile Prior to joining CUHKSZ in 2015, he held research
Multimedia Netw., 2015, pp. 1–9. positions with Huawei (USA), Mitsubishi Electric
[35] S. Lagen, K. Wanuga, H. Elkotby, S. Goyal, N. Patriciello, and Research Labs (MERL) in Boston and Sony in Tokyo,
L. Giupponi, “New radio physical layer abstraction for system-level Japan. His research interests include AI Internet of Things (AIoT) and app-
simulations of 5G networks,” 2020, arXiv:2001.10309. lications of machine learning in communications and satellite remote sensing.
[36] D. Tse, “Multiuser diversity in wireless networks,” in Wireless Commun. Prof. Pun was the recipient of best paper awards from IEEE VTC’06 Fall,
Seminar. Stanford, CA, USA: Standford Univ., 2001. IEEE ICC’08 and IEEE Infocom’09. He was an Associate Editor for IEEE
[37] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum TRANSACTIONS ON WIRELESS COMMUNICATIONS in 2010–2014. He is the found-
learning,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 41–48. ing chair of the IEEE Joint SPS-ComSoc Chapter, Shenzhen.
[38] L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “Mentornet: Learn-
ing data-driven curriculum for very deep neural networks on corrupted
labels,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2304–2313.
[39] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding Yi Chen received the B.S. degree in communication
deep learning requires rethinking generalization,” 2016, arXiv:1611.03530. engineering from the Beijing University of Posts and
[40] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, Telecommunications, Beijing, China, in 2007 and the
vol. 1. Cambridge, MA, USA: MIT Press, 2016. Ph.D. degree in information engineering from The Chi-
[41] D. Arpit et al., “A closer look at memorization in deep networks,” in
nese University of Hong Kong, Hong Kong, in 2012.
Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 233–242.
She is currently a Research Assistant Professor with the
[42] L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “MentorNet: School of Science and Engineering, the Chinese Univer-
Learning data-driven curriculum for very deep neural networks on cor- sity of Hong Kong, Shenzhen, Hong Kong. She is also a
rupted labels,” 2017, arXiv:1712.05055. Research Scientist with the Shenzhen Research Institute
[43] B. Han et al., “Co-teaching: Robust training of deep neural networks of Big Data. Her research interests include wireless com-
with extremely noisy labels,” in Proc. Adv. neural Inf. Process. Syst., munication, resource allocation, and machine learning.
2018, pp. 8527–8537.
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 20:32:44 UTC from IEEE Xplore. Restrictions apply.