1. Introduction
Radio wave propagation plays an important role in the research and development of wireless communication systems. The wireless signal strength decreases as the distance between the transmitter and receiver increases. Moreover, the mechanisms of the electromagnetic wave propagation are diverse and can be generally classified as reflection, diffraction, and scattering [
1]. The complex propagation environment makes the prediction of the received signal strength a very hard problem.
Path loss is used to describe the attenuation of an electromagnetic wave as it propagates through space [
2]. An accurate, simple, and general model for the path loss is essential for link budget, coverage prediction, system performance optimization, and selection of base station (BS) locations. Consequently, researchers and engineers have made great efforts to find out reasonable models for the path loss prediction in different scenarios and at different frequencies. Many measurement campaigns have been conducted worldwide to collect data, which have been used to build, adjust, and evaluate these models.
The upcoming fifth-generation (5G) networks are designed to support increased throughput, wide coverage, improved connection density, reduced radio latency, and enhanced spectral efficiency. Supporting Internet of Things (IoT) applications will involve vast coverage areas and various terrains. Besides, new frequency bands like the sub-6 GHz and millimeter wave bands, will be exploited to provide wide bandwidths. Numerous measurement campaigns and drive tests have to be carried out to collect attenuation data at the new frequencies. The network planning of 5G mobile communication systems will confront severe challenges, and preferable models are required for the path loss prediction.
Traditionally, path loss prediction models have been built based on empirical or deterministic methods [
3]. Empirical models mainly rely on measurements in a given frequency range and a specific scenario. They provide statistical descriptions of the relationship between path loss and propagation parameters such as frequency, antenna-separation distance, antenna heights, and so on. For example, the log-distance model [
4] uses the path loss exponent, which is determined empirically, to characterize how the receiver power falls off with the antenna-separation distance. A Gaussian random variable with zero mean is used to depict the attenuation (in decibel) caused by the shadow fading. Other typical empirical models include the Bullington, Egli, Longley-Rice, Okumura, and Hata models [
2]. Empirical models are quite simple because few parameters are required and model equations are concise. However, parameters of empirical models are extracted from measured data in a specific scenario. Thus, their accuracy may be unsatisfactory when these models are applied to more general environments [
5]. At the same time, empirical models can only represent the statistics of the path loss at a given distance, but they cannot give the actual received power at a specific location.
Deterministic models, such as models based on ray tracing and finite-difference time-domain (FDTD), apply radio-wave propagation mechanisms and numerical analysis techniques for modeling computational electromagnetics. In general, they can achieve high accuracy and provide the path loss value at any specific position. However, their disadvantages include the lack of computational efficiency and therefore prohibitive computation time in real environments. Site-specific geometry information and dielectric properties of materials are also required. Moreover, we have to run the time-consuming computation procedure again once the propagation environment has changed.
Machine learning is a method based on an extensive dataset and a flexible model architecture to make predictions. Recently, machine-learning-based methods have been used in self-driving cars, data mining, computer vision, speech recognition, and many other fields. These tasks can be classified as supervised learning and unsupervised learning. With labeled data, the goal of supervised learning is to learn a general and accurate function between inputs and outputs, which makes it suitable for solving classification and regression problems. On the other hand, unsupervised learning algorithms have to describe the hidden structure from unlabeled data. In essence, path loss prediction is a supervised regression problem, so it can also be solved by supervised machine learning algorithms, such as artificial neural network (ANN), support vector regression (SVR), and decision tree. It has been reported that the machine-learning-based models are more accurate than empirical models and more computational efficient than deterministic ones [
3,
6].
Among these machine-learning-based models, ANN, especially back propagation neural network (BPNN), has been widely used for path loss prediction. In [
3], ANNs with different hidden neurons and different training algorithms were analyzed based on measured data collected in a rural environment. It was indicated that more complex ANNs do not considerably increase the prediction accuracy. In [
7], a path loss prediction model based on BPNN was proposed for railway environment and it had good prediction accuracy and generalization ability for similar scenarios. In [
8], a pure ANN system and a hybrid prediction system were designed for urban and suburban environments. It was illustrated that the ANN modeling approach provided more accurate prediction of field strength loss than that of COST231-Walfisch-Ikegami model. In [
5], an ANN-based model was designed to predict the path loss values for heterogeneous networks, in which several frequencies and different environments including urban, suburban, and rural scenarios are considered. Compared with empirical models, the results showed that ANN performed well in terms of precision but with a slight increase of processing time and memory consumption. For indoor environments, a multilayer perceptron and a generalized regression neural network were proposed in [
9] at the frequency of 1890 MHz, which showed a good agreement with the measurements. A propagation model using BPNN was developed within multi-wall and multi-frequency scenarios in [
10]. In [
11], parameters related to body shadowing and furniture effects were added to inputs and the proposed ANN model demonstrated high performance compared to empirical model and measurements. Besides BPNN, radial basis function neural network (RBF-NN) [
12], dynamic neural networks (DNN) [
13], and wavelet neural network [
14], were also employed for path loss prediction. Recently, Neuro-Fuzzy draw great interests in the path loss prediction because of its transparency [
15,
16].
SVR was used for the prediction of radio-wave path loss values in suburban environments in [
17,
18]. Some algorithms, including genetic algorithms (GA) and tabu search (TS), were applied to select important parameters for SVR-based predictors. In [
19], a SVR-based modeling method was presented to predict in-cabin path loss values at 3520 MHz, outperforming the curve fitting model. In [
20], a propagation loss prediction model was built on the basis of SVR and it was able to achieve a good accuracy at the price of an acceptable computational cost.
Other machine learning algorithms such as decision tree and K-Nearest-Neighbors (KNN) were also employed for path loss prediction. In [
21], random forest (RF) and KNN are exploited to predict the path loss in an urban environment for UAV communications. Results have shown that machine-learning-based models have high prediction accuracy and acceptable computational efficiency. Besides, feature importance is assessed by using RF algorithm. In [
22], a hybrid scheme based on the ray tracing method and RF was presented for the field strength prediction. In contrast with the results of the finite integral method, the proposed model achieved higher prediction accuracy with less computation time. In [
23], the received signal strengths were predicted for an environmental wireless sensor network by using several candidate machine learning algorithms, including Adaboost, RF, ANN, and KNN. Among these methods, RF showed the highest accuracy in the considered environment, achieving a significant reduction in the average prediction error compared to the empirical models. From the perspective of feature reduction, the authors used a variety of manifold learning methods to reduce the original feature dimension to two dimensions to establish a path loss model in [
24].
The diverse application scenarios in the 5G era pose a great challenge to the channel models. A flexible modeling framework should be built to satisfy the requirements for the applications at new frequencies and in new propagation environments. As mentioned above, machine-learning-based methods are able to provide a tradeoff between accuracy and complexity of the path loss models. Nevertheless, machine learning is a data-hungry technique whose performance heavily depends on the amount and quality of the training data. Due to the high cost of conducting measurements, the path loss dataset is always far from the concept of “big data” which can be easily obtained on the Internet or Internet of Things (IoT). Especially when new scenarios or new frequencies are put into use, it is difficult to collect enough data for the path loss prediction in a short time. Therefore, data expansion solutions are also proposed in this paper to fill the research gap.
The major contributions of this paper can be summarized as follows.
- (1)
The basic principle and procedure of the path loss prediction based on machine learning are presented. Some crucial issues such as data collection, data preprocessing, algorithm selection, model hyperparameter settings, and performance evaluation, are discussed.
- (2)
In order to obtain enough data for machine-learning-based models, two mechanisms are proposed to enlarge the training dataset by taking full advantage of the existing data and the classical models. Data transferring is considered in both the scenario dimension and the frequency dimension.
- (3)
Different machine learning algorithms are employed to validate the proposed methods based on the measured data. Both outdoor and indoor scenarios are taken into account and measured data are used to verify the feasibility of the machine-learning-based predictors.
The remainder of this paper is structured as follows. The procedure of machine-learning-based path loss prediction is presented in
Section 2.
Section 3 introduces some representative machine learning methods for regression task, including ANN, SVR, and decision tree. In
Section 4, data expansion solutions are proposed and verified with the measured data in an outdoor urban scenario and an indoor aircraft cabin scenario. In
Section 5, some issues for future path loss prediction based on machine learning methods are discussed. At last, conclusions are drawn in
Section 6.
2. Machine-Learning-Based Path Loss Prediction
The basic principle of path loss predictors based on machine learning is shown in
Figure 1. After knowing the output (path loss observation) and the corresponding input features such as antenna-separation distance and frequency, we can employ machine learning methods to find a good estimation function for the path loss prediction. This function is to map input features to output path loss value, and it can be either a white box (within decision-tree-based models) or a black box (within SVR-based or ANN-based models). The procedure of machine-learning-based path loss predictors is shown in
Figure 2 and is introduced step by step as follows.
2.1. Data Collection and Feature Extraction
The collected data refer to samples obtained from measurement, and each sample should include the path loss value and the corresponding input features. The input features can be divided into two categories, system-dependent parameters and environment-dependent parameters. System-dependent parameters are those independent of the propagation environment, such as carrier frequency, heights and positions of the transmitter and receiver, and so on. According to the above parameters, more system-dependent features can be acquired, such as the antenna-separation distance and the angle between the line-of-sight path and the horizontal plane.
Environment-dependent parameters are those determined by the geographic environment and the weather conditions. Parameters related to the geographic environment include the terrain, building conditions, and vegetation conditions. Most of them can be obtained from three-dimensional (3D) digital maps, topographic databases, and land cover databases. The weather parameters include temperature, humidity, precipitation rate, and so on.
The performance of the path loss model is closely related to the number of training samples. After obtaining enough data, these samples should be divided into the training dataset and the test dataset. The former is used to build the prediction model, whereas the latter is used to verify and further improve the model performance.
2.2. Feature Selection and Scaling
In practice, the data used for machine learning may contain hundreds of features. Leaving out relevant features or retaining irrelevant features can both lead to poor quality of the predictor. The goal of feature selection is to select the optimal subset with the least number of features that most contribute to learning accuracy [
25].
According to the relationship between feature selection process and model design, there are usually three alternative feature selection approaches, including filter, wrapper and embedded. The filter approach is independent of the proposed model when evaluating feature importance. The wrapper approach takes the prediction performance into account when calculating the feature scores. The embedded approach combines feature selection and the accuracy of the prediction into its procedure [
26]. For different algorithms, the stopping conditions are related to the selection of the search algorithm, the feature evaluation criteria, and the specific application requirements.
Some machine-learning-based algorithms, such as ANN, SVR, and KNN, are sensitive to the scale of the input space. Thus, normalization process should be finished before the training begins. That is, all input features and path loss values should be changed in the range from
to 1 or from 0 to 1. The normalization method chosen in this paper is the same as that in [
7]. It can be expressed as
where
x is the value that needs to be normalized,
represents the minimum value of the data range,
represents the maximum value of the data range,
is the value after normalization. The predicted values can be obtained by anti-normalized according to the normalization method.
In contrast, the feature scaling is not required by decision-tree-based methods.
2.3. Model Selection
Different models can be used for the path loss prediction, and the model selection should consider about requirements of both accuracy and complexity. As examples, we will introduce ANN, SVR, and decision tree in Section III. It has been reported that these algorithms have good performance in predicting path loss values [
5,
20,
23].
2.4. Hyperparameter Setting and Model Training
Hyperparameters refer to the parameters whose values are set before the learning process begins. Typical hyperparameters include the number of hidden layers and neurons in ANN, the regularization coefficients and parameters in kernel function of SVR, the tree depth and the size of the ensemble in decision-tree-based algorithms, etc. A set of optimal hyperparameters should be carefully chosen in order to optimize the performance and effectiveness of the path loss prediction. The optimization methods for hyperparameters mainly include grid search, random search, and Bayesian optimization. In this paper, the final values of hyperparameters were obtained by using grid search method. It is an exhaustive search method which takes the best performing parameters as the final result by traversing all the possible values of the parameters.
Model parameters are those parameters learned from training samples. It is worth mentioning that different learning methods have different model parameters. During the model training process, model parameters such as weights and biases are automatically learned.
2.5. Model Evaluation and Path Loss Prediction
In general, the performance of machine-learning-based path loss models is measured by samples in the test dataset, which do not appear in the model training process. The evaluation metrics include prediction accuracy, generalization property, and complexity.
In terms of evaluating the accuracy, performance indicators like maximum prediction error (MaxPE), mean absolute error (MAE), error standard deviation (ESD), correlation factor (CF), root mean square error (RMSE), and mean absolute percentage error (MAPE) are commonly used [
3,
6].
Generalization property is to describe the model reusability when the deployment concerns new frequency bands or/and new environment types. The model may have better generalization performance with more data collected from diverse scenarios, such as different terrains, frequencies, and vegetative cover conditions.
The computational complexity is usually evaluated by processing time and memory cost. As an example, the number of iterations and convergence speed during the training phase are the key factors that affect the processing time of ANN.
Based on evaluated results, we can select the machine learning algorithm, adjust the hyperparameters, and further improve the prediction model. After the optimal model has been built, path loss values can be generated with new inputs.
3. Methods for Path Loss Prediction
As mentioned above, any supervised learning algorithm can be used for the path loss prediction. In this section, we will introduce some popular models, such as ANN, SVR, and decision tree, and evaluate their prediction performance by means of the measured data.
3.1. ANN
ANN can be used to solve nonlinear regression problems and has low prediction errors when the sample size is large enough, making it a popular algorithm for path loss prediction [
3,
5,
6,
10]. ANNs are networks formed by interconnections between neurons. Based on the neuron model, the feed-forward ANN of multi-layer perceptron structure usually contains an input layer, one or more hidden layers, and an output layer. Neurons are fully connected to those in the next layer by different weights, whereas there is no connection between neurons in the same layer and no cross-layer connection.
The number of hidden layers and the number of neurons determine the network size and have a great impact on the model complexity and accuracy. Unfortunately, how to find a suitable ANN structure for the path loss prediction is still an open problem. In [
3], it is shown that a non-complex ANN, such as a feed-forward ANN with one hidden layer and only a few neurons, would probably provide sufficient path loss prediction accuracy for a typical rural macrocell radio network planning scenario. ANNs with several hidden layers and numerous neurons may lead to inferior generalization properties compared with the non-complex structures. This phenomenon is probably caused by overtraining, that is, the model performs very well on data similar to the training dataset but is not flexible enough to favorably adapt to data different from the training data.
Back propagation algorithm is a low-complexity method usually used in training ANNs. This type of network is often referred to as BPNN. The subsequent analysis in this paper is based on a 3-layer BPNN structure with fully-connection between layers. Given a set of training samples as
, where
is a feature vector and
is the target output, measured value of path loss. In the forward propagation phase, the predicted value of path loss
can be expressed as
where
represents the connection weights between the neurons of the hidden layer and inputs,
represents the connection weights between the neurons of the output layer and the hidden layer,
and
are thresholds of the neurons of hidden layer and the neuron of output layer, respectively.
and
are transfer functions for the neurons in hidden layer and the neuron in output layer, respectively.
The error originating at the output neuron propagates backward. The learning phase of the network proceeds by adaptively adjusting the weights based on the loss function, which is expressed as
where
E is the mean squared error.
The back propagation algorithm is based on the gradient descent strategy. Standard gradient descent has some drawbacks, such as slow convergence speed and local poles. Thus, other training methods may also be taken into account, such as Levenberg-Marquardt method, Fletcher-Reeves update method, and Powell-Beale restart method. Among them, the Levenberg-Marquardt algorithm is commonly used for path loss prediction because it has a fast convergence speed at the expense of memory consumption [
3,
6].
3.2. SVR
Support vector machine (SVM) is a kind of machine learning method based on statistical learning theory. The basic idea of SVM is to nonlinearly map a set of data in the finite-dimensional space to a high-dimensional space such that the dataset is linearly separable. As an extension of SVM, SVR is designed to solve regression problems, so it can be used for path loss prediction [
20].
The main idea of SVR is to fine a hyperplane in the high-dimensional feature space to make the sample points fall on it. The hyperplane in the feature space can be described by the following linear function.
where
is the normal vector which determines the direction of the hyperplane,
is an input feature vector,
is the nonlinear mapping function, and
b is the displacement item.
The solution to the optimal hyperplane is a constrained optimization problem, which can be written as [
27]
where
C is regularization coefficient,
is insensitive loss which means the predicted value can be considered accurate if the deviation between the predicted value and the actual value is less than
,
and
are the slack variables which allow the insensitivity range on both sides of the hyperplane to be slightly different.
Then, by introducing Lagrange multipliers and solving its dual problem, the approximate function can be expressed as
where
and
are Lagrange multipliers, and
is a kernel function, which is used to perform the nonlinear mapping from the low-dimensional space to the high-dimensional space.
The choice of the kernel function is the key to the performance of the SVR-based predictor. At present, the most common kernel functions include the linear kernel, polynomial kernel, Gaussian radial basis function, sigmoid kernel, and their combinations. In this paper, Gaussian kernel with a tunable parameter is chosen as the kernel function and it is defined by
The Gaussian kernel is a commonly used kernel function [
17,
18,
19,
20], which is suitable for tasks with small feature dimensions and lack of prior knowledge. The parameters including regularization coefficient, insensitive loss, and the kernel function parameter in this study were searched as the same method in [
20].
3.3. Decision Tree
A decision tree usually contains a root node, some internal nodes, and some leaf nodes. A single decision tree model often has overfitting risk. Thus, new algorithms based on decision tree are proposed, such as AdaBoost and RF [
23]. Here, we put the focus on the descriptions of RF.
RF is a machine learning method that combines decision tree and bagging. It applies bootstrap aggregating to each decision tree learner for training samples selection. Furthermore, the random selection of features is introduced in tree training, which means just a set of features is randomly selected for each node split. Therefore, RF is less affected by sample disturbance and feature disturbance, and has higher generalization performance.
For path loss prediction, the predicted value of new samples can be made by averaging the predictions from all the individual decision tree, which is expressed as
where
T is the number of decision tree learner,
is the prediction of the
tth decision tree learner.
3.4. Comparison of Different Models
The BPNN-based model can fully approximate the complicated nonlinear relationship, whereas many parameters need to be selected. In addition, it is difficult to explain the learning process and predicted results.
The SVR-based model is flexible because we can get different models by choosing different kernel functions and different parameters. This flexibility theoretically ensures that the model has strong generalization ability. Besides, the complexity of the model does not depend on the dimension of the input features, avoiding the curse of dimensionality. The main weaknesses of the SVR-based method are the kernel definition and its computational complexity.
The meaning of the decision tree is easy to understand and explain. As an example, RF-based algorithm can provide a natural ranking of features in the model. This advantage is good for the feature selection. Nevertheless, the decision tree often ignores the correlation between the features.
In order to evaluate the performance of these machine learning models, a measurement campaign was carried out in an urban macrocell scenario in Beijing, China.
Figure 3 shows the top view of the measurement routes. The considered scenario mostly consists of buildings lower than ten stories. There were also large pedestrian bridges and road signs sparsely distributed, and an average tree density of roughly 6 m high along both sides of the roads.
The received signals from a TD-SCDMA BS at the operating frequency of 2021.4 MHz in this area were considered. The antenna height of the TD-SCDMA BS was about 40 m over the ground. The measurements were made with cars driving on roads in this urban area. An omni-directional receive antenna was mounted on the top of the car roof and connected to a drive-test equipment. The drive-test equipment can record the received signal power and the location information through an external GPS module. Then, the path loss values can be calculated in the offline post-processing and mapped to locations. More details of the equipment can be found in [
28].
The measurement routes and the position of the TD-SCDMA BS are illustrated in
Figure 3. The car moved from the south (sample index 1) to the north (sample index 517), and then turned to the west direction. The total number of collected samples through the two routes was 1483. Each sample included a path loss record and an antenna-separation distance which was calculated according to the GPS information. The antenna-separation distance was used as a single feature. We randomly selected 80% of samples as the training dataset and the remaining 20% as the test dataset. Three aforementioned models, including BPNN, SVR, and RF, were used to predict the path loss values in the test dataset. For BPNN, rectified linear unit function was selected as the activation function. A three-layer feed-forward structure was employed and the optimal number of neurons in the hidden layer was 15. The Gaussian radial basis function was used as the kernel in the SVR-based model. Regularization coefficient, insensitive loss, and the kernel function parameter are set to 451, 91, and 0.25. As the hyperparameters used in RF-based model, the maximum tree depth was 5 and the number of ensemble members was 150. The log-distance model was also considered for comparison.
Figure 4 illustrates the measured data and the predicted results of different models. The x-axis represents the index of test samples, which is corresponding to the positions in the route along the driving direction as shown in
Figure 3. As can be observed, the path loss values of Route 1 were higher than those of Route 2. The reason may be that in Route 1, the receive antenna was mainly under non-line-of-sight conditions due to the obstructions of buildings and trees. In contrast, the line-of-sight path played a dominant role in Route 2.
The path loss values at all positions in the test dataset were predicted by using different models. Then, these values were compared with the measured data and the prediction errors were computed. Multiple metrics including RMSE, MAPE, MAE, MaxPE, and ESD were used to evaluate the performance of the predictors, which were expressed as
where
is the index of the test sample,
Q is the total number of test samples,
is the measured data, and
is the predicted value of path loss.
The prediction errors of different predictors are listed in
Table 1. It is proved that the machine-learning-based models all have good performance and outperform the log-distance model. With selected hyperparameters, RF has the best performance in the measured scenario, followed by SVR, BPNN, and log-distance model.
5. Opportunities for Further Research
5.1. Collection of Training Data
It has been noted that obtaining enough training data is crucial for the accuracy and generalization of the machine-learning-based model. Considering the cost of carrying out measurement campaigns, the question is how many samples are enough for a given prediction accuracy. Evaluation metrics and tools need to be developed for judgment.
Meanwhile, what we need may not be “bigger data”, but “better data”. The diversity and uniformity of the samples should be considered. In a single scenario, the data should be evenly distributed in the measured region. To build a model with good generalization property, measurement routes should be carefully designed to acquire enough data in different scenarios. Therefore, the methodology of channel measurement should be carefully considered.
Additionally, we have offered two schemes to make the most use of existing measured data and classical models. Similar methods can also be investigated to enlarge the training dataset in the future.
5.2. Feature Selection Methodology
Too few features may affect the generalization ability of the path loss predictor. In the above analysis, only system-dependent parameters like antenna-separation distance and frequency are selected as features. It has been shown that these machine-learning-based models agree well with measured data. With limited generalization property, they may be only suitable for similar urban scenarios. Moreover, the usage of more features may not mean better performance. Too many features not only increase the computational requirement, but also probably cause the curse of dimensionality and degrade the prediction performance. Therefore, methodologies need to be developed to guide the feature selection for the path loss predictors based on machine learning.
5.3. Hyperparameter Optimization Problem
Hyperparameter optimization is one of the hardest problems in machine learning. For example, the selection of kernel determines the final performance of SVR-based prediction models. As for ANN-based methods, the number of hidden-layers and the number of neurons are also crucial. Although approaches like grid search can be used for solving this problem to some degree, further research works are still necessary.
5.4. More Machine-Learning-Based Models
With the rapid development of machine learning, new algorithms emerge to improve the model accuracy and computational efficiency. More models and parameters should be taken into consideration to solve the path loss prediction problem.
5.5. Incremental Learning
Until now, machine-learning-based algorithms for path loss prediction are almost based on batch learning, which assumes that all training samples are available before the training. After learning from these samples, the training process is terminated and the model building is finished. However, in practical applications, training samples of path loss increase gradually over time. After new samples arrive, relearning process with all the data takes considerable amount of time and space.
Incremental learning algorithms can gradually update, correct, and enhance previous knowledge so that the updated one can adapt to new arriving samples without relearning from all the data. New knowledge can be learned from the new data to build a more accurate path loss predictor, whereas most of the previous-learned knowledge is retained. However, the accuracy of the path loss predictor may be negatively affected by introducing incremental learning algorithms, which lack the forgetting mechanism for selecting training data.