1. Introduction
Indoor localization, as defined in [
1], is a process that is employed to obtain a user or device location in an indoor environment. While satellite-based global systems are available for outdoor positioning, their applicability in indoor positioning is largely limited by the high attenuation of signal penetrating buildings, numerous reflections and problematic satellite visibility which all lead to impossible or inaccurate positioning. Enabled by the progress in indoor localization technologies and related electronic devices [
2], the demand for precise localization systems is in the spotlight of many industrial fields. The number of use cases where indoor tracking or localization of an object is essential has been continuously increasing [
3]. To name a few, the applications may target healthcare (e.g., patient localization), office (e.g., access to printer based on location), or home environment (e.g., automatic light control). In the last decade, many works have investigated different indoor technologies, methods and systems. Several more detailed surveys are available in [
1,
4,
5].
Among the most popular are technologies based on the signals of Local Area Networks (LANs or Wi-Fi) and Bluetooth Low Energy (BLE), a recent version of the Bluetooth standard. While LAN signals offer more stable measurements, BLE requires far less power. BLE has become one of the most frequently used short-range wireless communication systems for indoor localization purposes [
6]. Utilizing the Bluetooth system for indoor localization has been a target of numerous studies including our own [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16]. In these works, in general, one or combination of three basic techniques is employed. Received Signal Strength (RSS) [
9] may be evaluated to determine location from signal propagation loss, Time of Arrival (ToA) may be calculated to reveal location from the duration of signal propagation, and Angle of Arrival (AoA) may be utilized to relate received signal direction with receiver location [
1]. The RSS-based approach, which is easy and cheap to implement even with signals of limited bandwidth and omnidirectional antennas, is considered in this work.
The RSS-fingerprinting based localization is realized in two steps. At first, in the so-called offline (training) phase, a map of the indoor environment divided into cells with finite spatial resolution is created. Subsequently, the RSS values of the Radio Frequency (RF) signals received from fixed beacons are collected at positions of interest during a limited time period and assigned to the environment map to create an RSS-fingerprint map. Second, in the online (testing) phase, the actually monitored signal strength is compared to entries in the RSS-fingerprint map to estimate the location of an object. The performance of the fingerprint-based localization is determined not only by the precision of RSS measurement, but also by the algorithm used to align measurements in the testing phase with the offline RSS fingerprint map.
The RSS fingerprint-based indoor positioning has been received a lot of interest [
9,
10,
11,
12,
13,
17,
18,
19]. Recent studies have revealed that this positioning technology has good performance in complex environments (absence of Line of Sight (LoS) signal path), but overall computational cost is not negligible [
20]. Most importantly, fluctuations in RSS values due to small scale fading and frequent changes in the characteristics of the indoor environment can significantly increase the positioning estimation error.
This paper addresses the challenge of aligning online RSS measurements with offline RSS fingerprint map by employing Machine Learning (ML) techniques. We aim to review the approaches to improving RSS-based location fingerprinting by means of ML found in the literature. To this end, we present a detailed survey of the work published so far. We find that the published work is not directly comparable since the setup for measurement and for the interpretation of results varies significantly. Thus, to provide also a quantitative comparison of the core techniques possible, we propose an indoor experimental scenario and create a database of RSS fingerprints and test measurements. With the use of these data, we compare how different learning algorithms can improve the location fingerprinting result. We start with the algorithms found in the literature, but include also more complex models such as Artificial Neural Networks (ANNs). The contribution can be summarized as follows:
Provide a survey of state of the art in ML-based fingerprinting improvement (
Section 2),
Provide a re-usable dataset obtained from our measurement campaign (
Section 3),
Benchmark different ML-based classifiers with respect to their potential to improve location fingerprinting results (
Section 4 and
Section 5).
This paper is organized as follows. Introduction is followed by
Section 2 in which an overview of the ML algorithms used to improve the RSS-based location fingerprinting is provided. The created BLE-based indoor localization system and the conducted indoor measurement campaign to create a re-usable dataset are described in
Section 3. Performance study of different ML-based classifiers with purpose to improve fingerprinting-based location results is presented in
Section 4 and
Section 5. Finally, this work is briefly summarized and concluded in
Section 6.
3. Indoor Measurement Campaign
To create a comprehensive database of RSS fingerprints, extensive indoor measurements have to be performed. To this end, we have set up a BLE-based indoor localization system using single-input multiple-output (SIMO) approach. Its basic concept has been introduced in [
12] and a part of the dataset used in this work has been obtained from our previous measurements [
12]. In this section, the whole measurement campaign and the indoor environment are described in detail.
3.1. Remote and Self-Positioning
Before we describe our system, let us briefly introduce two main concepts of indoor positioning, namely remote-positioning and self-positioning systems [
33]. The difference lies in the roles of fixed and moving elements of the system.
In the remote positioning mode, the fixed anchors are listening for reports and movable tags transmit reports. This approach benefits from the option to have more than one receiving antenna because the space limitation and power consumption are not a significant issue. Hence, it is possible to collect multiple RSS samples for a single advertisement (see
Figure 1). For example, each antenna (connected to a dedicated receiver) may listen on a specific channel or be spatially placed for diversity. Ideally, the multiple uncorrelated samples offer more ways to improve the accuracy of a positioning system, especially for moving targets. Additionally, a remote positioning system enables to track not only smartphones but also very simple tags.
Since during the calm work hours, it is not a problem to catch reports from more than 20 tags in an indoor environment, the system communication layer must ensure high throughput, low latency and highly reliable connection from anchors to the processing server. Demanding system communication and the need for back-channel in situations where the resolved position needs to be sent back to the tag for navigation purposes are the downsides of this mode. Another disadvantage lies in the increasing RF noise with a rising number of tags.
On the other hand, in the case of the self-positioning mode, where fixed anchors transmit advertisements and tags receive them, the system communication layer has almost no requirements. A typical BLE receiver can listen on a single channel at a time, which means that the devices receive reports from all anchors on the same channel during the time scan interval (see
Figure 1). Further, the anchors cannot transmit the reports at the same time. Otherwise, the advertisement reports can interfere with each other and be lost. This decreases the number of collected RSS samples and limits channel diversity, notably for moving targets. Although it may seem that the self-positioning system does not share the disadvantage of increasing RF noise with a rising number of tags, it is not entirely true. Tags such as smartphones, wearables and other personal devices transmit BLE advertisements on their own. For example, when the user wants to locate himself, their smartphone collects the RSS samples from anchors but also from their watch, which may introduce interference. A remote positioning system can receive reports from both watch and smartphone, but otherwise, it is passive (no reports from anchors).
Therefore, the impact of active BLE devices on RF noise is about the same as in the remote positioning system.
3.2. Experimental Setup
Block diagram of the proposed BLE-based localization system is shown in
Figure 2. From left to right, a personal computer (PC) is used to process and log the RSS samples (the data are processed offline later). For data logging (the source code for data logging is available for download here:
https://github.com/Standa-R/BlePositioningSystem, accessed on 4 July 2021), the msi GT72 2QD Dominator notebook with Windows 10 operating system and Microsoft Visual Studio integrated development environment was used. Raspberry Pi B+ (acting as a gateway to Ethernet) exposes the application programming interface (API) to control and collect data from anchors. The ZigBee network established with Silicon Labs ETRX357 ZigBee modules [
34] secures the system communication layer, which is responsible for transferring data from anchors to Raspberry Pi. The positioning layer is built from Laird Connectivity BL652 BLE modules [
35]. Apart from the gateway, all parts of the system are entirely tetherless, which assures high portability and short setup time for the measurement.
The ZigBee module acting as a coordinator connects Raspberry Pi B+ to the ZigBee network. To avoid a direct connection to the pins of the processor, the ZigBee module interacts with Raspberry via FTDI USB/UART converter.
To support easy setup and functionality of the ZigBee network in harsh RF environment, the RF channel used to transfer data is automatically selected from the IEEE 802.15.4 frequency band (2.4 GHz) by the ZigBee module. The ZigBee channels 15 and 26 (center frequencies 2.425 and 2.480 GHz) are excluded from the automatic channel selection list due to the risk of interference with BLE advertisement channels 38 and 39 (center frequencies 2.426 and 2.480 GHz). Low data rate is one of the main drawbacks of the ZigBee network. For the remote positioning mode, the bottleneck is even more narrowed by the maximum UART baud rate of the ZigBee ETRX357 module, which is 115.2 kbps. We assume that the tags transmit advertisement every 100 ms (see
Figure 1). Therefore, we can estimate the absolute maximum number of tags tracked by the system at a time. Each advertisement report transmitted by a tag generates an anchor report, which contains 29 bytes of useful information (protocol header, advertiser ID, timestamp, 4 × RSS, 4 × channel). However, due to the protocol overhead of the ZigBee module, it transmits 49 bytes to Raspberry Pi B+ via UART. The system contains four anchors, which means that each advertisement report requires 200 bytes transmitted via UART. Hence, while neglecting RF collisions, the coordinator can theoretically accept a maximum of 7 tags. Practically, the system with three tags already experiences high latency, out of order samples (several seconds old samples), and losing of samples. Therefore, a configurable address filter discarding any report with advertisement address different from the allowed list was implemented in the anchors. This solution, however, causes a situation when the ZigBee-based transport layer is not suitable for cases where a high refresh rate is required. Nevertheless, the realized system is portable and excels with its straightforward implementation in an indoor environment with reliable reproducibility of the measurements, which are the most important parameters for our experimental purposes.
The anchor firmware runs on STM32F091CCT [
36], the smallest 32-bit MCU from STMicroelectronics with six UART modules. Four UART modules are used for BL652, one for ZigBee and one for debug. Four fixed anchors, each fitted with four BL652 modules and one Silicon Labs ETRX357 ZigBee module (see
Figure 2) perform localization-based communication (measuring). Since BLE652 modules are software controllable, the system supports both remote (anchors are listening while tags periodically transmit reports) and self-positioning (anchors periodically transmit reports and tags are listening) modes. The anchor has four omnidirectional dipole antennas mounted in line with a spacing of
, where
corresponds to wavelength in the center of the 2.4 GHz ISM band, i.e., 2440 MHz (channel 17). Depending on the antennas being either straight (horizontal polarization) or bent (vertical polarization), the antenna spacing can be either
or
. In this work, the half-wavelength spacing was used. Each anchor is powered from a single Li-ion 18650 cell, which can provide power for more than 24 h on a single charge even under the full load. To minimize ZigBee traffic, the anchor sends a single packet per BLE report. Due to the fact that each BL652 module listens on different RF channels at a time (see
Figure 1) and random processing delays are experienced, the BL652 modules (antennas) do not receive the report from the BLE advertising event of the transmitting tag at the same time. Therefore, the microcontroller buffers the reports according to the advertiser’s ID and schedules them for ZigBee transmission once all four measurements are available. When not all four reports have been received in a four-millisecond time window, the anchor assumes that the BLE module was not able to receive the report and transmits the measured RSS values available from other antennas. Anchors extend each report by a timestamp, which enables further processing units to relate the reports from all anchors to each advertisement event. Considering a minimum interval between two advertisement events (20 ms), the maximum synchronization error was set to 10 ms. The time reference of the anchors is synchronized periodically.
The proposed system can work with any BLE tag, e.g., custom tag (Tag 1), smartphones or wearable devices. In
Figure 2, Tag 0 represents the simplest and smallest tag. It has no link to the system communication layer, which limits the usage to the remote-positioning mode. Hence, it only transmits BLE reports while the anchors measure and send RSS samples for processing. These tags can be placed, for instance, on the equipment we want to track. On the contrary, Tag 1 employs the ZigBee module (connection to system communication layer), which enables its configuration and data collection via the API. Therefore, Tag 1 can work in both remote and self-positioning modes. In this work, we used Tag 1 equipped with an omnidirectional dipole antenna [
37] for BLE module. Further, it is connected to the ZigBee network via the ZigBee module, which uses the second omnidirectional dipole antenna. In the case of remote-positioning mode, the tag periodically transmits BLE reports and only listens for messages from the ZigBee network. On the other hand, in self-positioning mode the tag listens for reports and sends the measured RSS (from BLE channels) to the gateway over ZigBee network. In the following analysis, we focus on the results obtained in the remote positioning mode.
3.3. Indoor Environment
Measurement campaigns were conducted in the building of Brno University of Technology (BUT), at the Department of Radio Electronics (DREL). The measurement scenario and floor plan are illustrated in
Figure 3. The complete localization system was installed in a narrow corridor with dimensions 44 × 1.8 × 2.7 m, located on the seventh floor of DREL. The floor is made from concrete, walls are lined with plasterboard and the ceiling material is mineral wool with paint finish. Wooden cabinets and metal chairs are located on the right side of the floor, whereas the left side is without obstructions (see
Figure 4). Altogether seven laboratories are located on this floor with an approximate dimension of 7 × 10 m each. They have wardrobes, operable windows, several chairs, two tables and one desk. Two of the laboratories have been used to collect measurements in our setup as shown by the black dashed arrow in the right-hand side of
Figure 3.
3.4. Measurement Scenario
As seen in
Figure 3 and
Figure 4, the reference point [0, 0] m of the two-dimensional coordinate system is placed at the position of Anchor 0 at the height of 1 m above the floor level, which is often considered as the common height for moving tags. In general, the accuracy of indoor localization is also influenced by the placement of anchors. Many times, it is a trade-off between the mounting possibilities and the characteristics of indoor environment. Based on the considered indoor environment conditions, anchors with ID from 0 to 3 were installed 2.2 m above floor level. In the two-dimensional system, they have coordinates [0, 0], [14.2, 0], [23.9, 0] and [30.7, 0] m, respectively. It means that anchors are above the tags and above the height of the people. To achieve the lowest possible fluctuations of the RSS values during the measurements, the tag was in a constant height of 1 m and the movement of people was minimal.
The measurement with a step of 0.5 m in the X and Y axes (see
Figure 3) was conducted in the corridor from 0 to 35 m and 0.4 to 1.4 m, respectively. Each measurement lasted 72 s for remote positioning and 140 s for self-positioning. There were collected about 670 records at each position (in total more than 140,000 records), where each record contains RSS samples from four antennas from four anchors. Since the tag in the self-positioning mode has only one receiver and one antenna, it can collect only one sample at a time. Therefore, it takes three scan intervals (see
Figure 1) to collect samples on each channel from the anchors. During these three intervals, each antenna sends three BLE advertisements. The tag buffers the samples, and ideally, after the three scan intervals, it should send three messages for processing in the same format as the anchors send in remote-positioning. However, due to interference and radio tuning, sometimes, the report was missing. Thus, the measurement time was doubled, and finally, measurement samples were trimmed. That way, samples were collected on different channels, but at different times. In the remote-positioning mode, the BLE report does not contain any user data, only ID of tags (it is a random number). In the case of self-positioning mode, the message contains only the ID of antenna and anchors. In a transmission channel with time-independent fading, it is a usable approach for the fixed tags. For tags in motion, the movement speed and advertising period are the limiting factors. For measurement conducted in the corridor the tag was always in the Line-of-Sight (LoS).
Further measurements were carried out in laboratories which is a Non-Line of Sight (NLoS) condition for the signal. In
Figure 3, the measurement path is marked by black arrows. The step was again 0.5 m, but only the places free to walk (i.e., not obstructed by furniture) were covered.
3.5. Dataset
The complete measurement results are provided as an open dataset to be re-used by the research community. The complete data are contained in a comma-separated values (.csv) text file, which has been preprocessed in the following steps:
The raw measurement, collected as the received signal levels at anchors in the remote positioning mode, contain, for each anchor, four channel numbers and four corresponding RSS levels, as measured by four BLE modules (onboard each anchor). At this point, it is not assured that RSS levels at all three advertisement channels are recorded and that all values are valid. In the self-positioning mode, one measurement contains the same set of values, although taken in the opposite direction.
All values where RSS level of −110 dBm was recorded are considered to be missing measurements as at this level the receiver fails to measure the actual received power level.
In order to ensure there are no missing values, records from two consecutive measurements are taken as one sample. For each anchor, the first valid measurement is recorded in the sample and the remaining measurements are discarded (e.g., measurements from further antennas). This allows to clean the data set in such a way that no missing values appear in the data.
In the end, for each tag position, there are three RSS signal levels collected at three advertisement channels at/from each anchor. This means that each position is characterized by twelve (three samples multiplied by four anchors) measured values, defining the feature space for the machine learning algorithms.
4. One-Dimensional Positioning
In the one-dimensional positioning, we aim to estimate the location along a single axis. To this end, four anchors parallel to this axis are used in the RSS power measurement. In the measurement data, we select a subset of the center of the corridor—the
Y axis coordinate in this case is limited to 0.9 m. In
Figure 5, we visualize the measured RSS when taking into account three of the four anchors, in order to limit ourselves to a three-dimensional space which can well be captured in a picture. The color of the points represents the distance along the considered axis. Obviously, the points with the same color tend to appear in similar locations in the three-dimensional space, which poses a promising starting point for any classification algorithm.
To illustrate the situation further, we present the mean RSS values for signals belonging to each anchor and three channel frequencies in
Figure 6. Although the overall trend of RSS values being larger in areas close to the respective anchors, it shall be noted that the stability of the measurements is limited—the mean values tend to fluctuate in neighboring locations which prevents the position from being determined from signal levels directly. While the mean values are displayed in the figure, we have observed a very similar behavior also for the median values of RSS.
In order to evaluate the performance of different algorithms in the localization problem, we have split the whole dataset into the training set, and the test set. The split is done in such a way that from the measurements obtained at each position, the first 80% samples in time are allocated to the training set, while the remaining 20% samples are allocated to the test set. For the split, the samples are ordered by time stamps, however the time stamps are removed from both sets after the split to prevent data leakage.
The training of the model is done on the training set only, with no information about the testing set available to the training algorithm. Since in general the models considered in this work allow for different hyperparameter values (e.g., the number of neighbors
k to consider in the
k Nearest Neighbors classifier or the number of neurons per layer in the Multi Layer Perceptron classifier), we have used the
n-fold cross validation approach to determine the appropriate model parameters. In principle, the training set is randomly split into
n groups of samples (folds) of equal sizes. All models have been trained with
folds in the cross-validation search for model parameters. The value has been selected as a trade-off between the computational complexity (low number of folds leads to faster model selection) and prediction error (high number of folds leads to low prediction error) as discussed in [
38,
39]. Since the cross validation in this case is only used for primary selection of the model to be trained, the prediction error is not critical and the relatively low number of folds gives satisfactory results. Then,
folds are used for training and the remaining fold is used for validation. This procedure is repeated
n times so that each fold is used for validation once. In the end, the average performance of the model (accuracy achieved on the validation fold) across the
n runs is recorded. After the complete desired parameter space of the model is explored and the validation outputs are obtained, the model structure performing best in cross-validation is used for final training of the model on the whole training set. This final model is then evaluated on the test set to obtain the model performance in terms of accuracy. The whole procedure is displayed in detail in
Figure 7. For training the models, the Amazon Web Services (AWS) platform was used, utilizing an Elastic Computing Cloud (EC2) instance of the g2.8xlarge type selected with emphasis on system memory. The instance was running the Anaconda with Python 3 (x86_64) Amazon Machine Image so no special software installation was required. In the whole study, we have used Python programming language and its related libraries. The core of the training and evaluation is using scikit-learn (
https://scikit-learn.org, accessed on 4 July 2021). More details can be found on:
https://github.com/slaninam/Loc1D (accessed on 4 July 2021).
The time correlation of RSS samples is a phenomenon which has been studied in [
40]. The authors have shown that the samples obtained for the same location are correlated in general, irrespective of the time difference between the collection of separate samples—they report that the mean and variance of the RSS in one location remains the same over time and the auto-covariance function of the RSS in one location has the same shape for separate time series. This effect does not only allow the machine learning approach to be employed for location estimation, but further investigation of the RSS correlation is reported to have the potential to improve classical localization algorithms. The temporal variation of RSS is also considered in [
41], where the authors state that the most temporal fluctuations of the RSS are caused by moving objects, varying electromagnetic wave landscape, directionality of antennas and RF interference. A combination of fingerprints from several access points brings the benefits of having the mean differences of RSS more stable over a longer period of time.
The target values are represented as categorical variables, corresponding to the discrete spatial positions at which the measurements have been taken. In total, there are 70 categories corresponding to position on a 35 m long corridor with 0.5 m spacing.
We evaluate the performance of the algorithms through accuracy, i.e., the proportion of samples for which the correct location was estimated (in the ideal case, accuracy would be 1.0 meaning the location for 100% samples is estimated correctly). In the implementation, we have used the scikit-learn [
42] library.
4.1. kNN Classifier
The
k Nearest Neighbors classifier is a very popular approach in machine learning-based location fingerprinting as seen in
Table 1. The idea behind this technique is to (i) remember the whole set of samples in the training phase and (ii) find the majority vote among
k nearest samples in the training set to determine the estimate in the test sample.
In the design of such classifier, the following needs to be considered:
The entire training set needs to be stored, which can lead to large memory requirements for bigger datasets.
The metric to measure the distance from neighbors needs to be selected. A common choice is the Minkowski metric, defined for two points
in the
n-dimensional space as:
in the simplest cases with the parameter
leading to Manhattan distance and with
leading to Euclidean distance.
The number of neighbors to be considered is a design choice for the algorithm and depends on the characteristics of the dataset. In general, higher values of k lead to smoother decision boundaries, while smaller numbers of k capture data variations more faithfully.
In order to test the performance of
kNN on our dataset, we vary the parameter
p between 1 and 20 and select the Euclidean distance as the distance metric. In line with the flowchart in
Figure 7, we use 80% of the samples for training and the remaining 20% for testing. To tune the classifier hyperparameter, i.e., the number of neighbors in this case, we perform 5-fold cross-validation on the training set. The classifier performing best is then tested with the test set.
The classification results are shown in
Figure 8 where the real and estimated positions are plotted in a scatter plot diagram. The color of the points corresponds to the number of samples at a given position, showing that the off-diagonal points are rather isolated observations. It is also visible that some positions have less samples in total. This is a result of the measurement data post-processing. At some positions (distance around 20 m), it turns out that in relatively many measurements invalid values appear. Thus, after filtering, fewer valid samples remain. Although this may be unfair to the model, it reflects well the real life situation when also the system needs to handle invalid measurements. The best validation results have been achieved in case the number of neighbors is set to just one. Then, when the number of neighbors is increasing, the performance of the algorithm deteriorates gradually (see
Figure 9).
The accuracy of the positioning algorithm, i.e., the number of distances classified correctly in the test set, was more than 99.5%. The mean and maximal errors in terms of predicted distance are shown in
Table 2. The most distant prediction corresponds to the error of 4.5 m. Since a majority of the samples are classified correctly, the mean absolute error is in the order of millimeters. Still, we shall be aware that we are solving a classification problem, so the real distance error is discrete—either zero or multiples of 0.5 m in each sample.
4.2. Support Vector Machines
Another rather popular approach to the fingerprint classification is the SVM algorithm. The idea behind this is finding the separating hyperplane, such that the distance between the hyperplane and the closest samples belonging of distinct classes is maximized. Naturally, the dimensionality of the hyperplane depends on the number of features in the data set, and the description of the problem will be rather simple in case the number of features is low.
In general the Support Vector Machine classifier is searching for vectors of weights
and biases
based on the training vectors
such that the prediction given by
classifying the inputs to two classes
is correct for most samples. The problem can be formulated as
subject to
In this expression, the search for the maximum classification margin is represented by the minimization of , while the penalty term allows for classification in cases where the problem is not perfectly separable with a hyperplane. Here, the parameter C controls the strength of the penalty term, i.e., adjusts regularization in the training process. The term represents the images of input vectors in feature space (dependent on the kernel function) and is the loss term.
We have trained the SVM classifier using the Radial Basis Function kernel with a variable regularization parameter to see how good it performs on our dataset. Since the SVM classifier is basically a binary classifier, the extension to multiple classes has been done using the one vs. one approach. The results are shown in
Figure 10 and
Figure 11 where the scatter plot and the dependency on the regularization parameter is displayed, respectively. In the hyperparameter selection phase, the best results were obtained for C equal to 20, but we have found that the impact of this parameter is rather low as shown in
Figure 11.
Furthermore, also in this case, the accuracy on the test set was above 0.99, the achievable performance being comparable to the
kNN classifier. The advantage here is that the whole set of training samples need not be stored for the prediction, but on the other hand, due to the multi-class generalization there are in fact multiple component models forming the final prediction. The absolute error values are displayed in
Table 2, showing that the model performance is comparable to
kNN.
4.3. Random Forest
The Random Forest classifier is based on an ensemble of decision trees which classify the given sample by following a set of conditions determined in the training phase. We have experimented with this approach, varying the number of trees (decision elements) in the structure between 10 and 200. The results show that the Random Forest classifier is able to classify the dataset accurately (see
Figure 12). Actually, the results achieved are among the best overall, with the accuracy over 99.9% in all cases in which the number of estimators is above 40. The best results in the validation phase were obtained when the number of estimators was 180. Interestingly, in the cases where the model did not classify the position correctly, it selected the closest point so the maximal position error is 0.5 m.
4.4. Multi-Layer Perceptron
All the ML approaches presented in the previous subsections have shown their potential to serve as very good location estimators in our scenario. Still, the Artificial Neural Network paradigm has found to be a good application in many classification problems. Thus, we have also applied a simple ANN in the form of a Multi-Layer Perceptron (MLP) with multiple hidden layers activated by the Rectified Linear Unit (ReLU) activation function and the logistic function as the activation of the output layer.
The network takes as many inputs as there are input parameters and has as many outputs as there are possible output classes. The basic structure of the MLP with two hidden layers is shown in
Figure 13. In the figure, we show the dimension of weight matrices and bias vectors at each layer of the network for the case of two hidden layers. In a general case, for
m input features,
n outputs (classes) and
o,
p neurons in two hidden layers, the network will have
parameters to learn, in the case shown in the figure the number of parameters for this simple network is 133 and grows with the increase of the number of neurons in the hidden layers.
In the hyperparameter optimization, we have varied the number of hidden layers (up to 3) and the number of neurons in each layer (varying exponentially from 5 up to 1280). We have found that there is no increase of performance when the third hidden layer is added, so the final model had two hidden layers with 640 and 320 neurons in the hidden layers, respectively.
As obvious from the results (see
Figure 14), the MLP has had inferior performance to all solutions described above. In no case did the classification accuracy on the test set reach over 0.98. Given the complexity of the model and the performance of other techniques, the MLP classifier is not suitable for the given problem.
5. Two-Dimensional Positioning
The problem gets more complicated when we aim for two-dimensional positioning with the same set of anchors, allowing the tag to travel also in the perpendicular direction through laboratory rooms next to the corridor. Thus, we aim to estimate location along two axes: parallel (
X) and perpendicular (
Y) to the axis along which the anchors are located in the corridor. A visualization of measured data for one laboratory is available in
Figure 15 where each of the subplots represents signals received on one of the four anchors. The visualization corresponds to the rightmost room in
Figure 3. We only consider one frequency (channel 37) in the visualisations, the color of each area representing the mean of all measured values for the particular position.
Obviously, anchor 0 was the closest to the room with the highest RSS values, anchor 3 was the farthest. The locations have a different coordinate system compared to the one dimensional case in
Section 4, which means that the required models have to be re-trained. We have assigned each location a unique id made up from the (
X) and (
Y) coordinates. In total, there are 300 possible locations in the two dimensional model—213 valid locations correspond to the corridor, 18 and 69 correspond to the two laboratories in
Figure 3, respectively. In the end, this means the model needs to classify each sample into one of 300 classes.
5.1. kNN Classifier
For the case of the
k Nearest Neighbors classifier we have tested a variable number of neighbors also in the two-dimensional case. Similar to the one-dimensional case, the classifier works best with rather few neighbors, in this case the best results were obtained for
in the 5-fold cross validation. In this case, it is possible to achieve the correct classification in more than 99% of cases as shown in
Table 3.
As shown in
Figure 16, the majority of points have been classified correctly. This plot, for clarity, uses a color scheme to address the number of occurences for each point. In order to align the scale of the two subplots, the values are normalized with respect to the maximum. We can observe from the figure that the errors appear mainly in the
y coordinate. One reason is that the number of valid measurements is lower at the locations in the laboratory rooms than at the corridor locations. The second reason is that signals received in the rooms are subject to higher attenuation due to wall penetration which causes the signal levels at distant anchors to be rather low. The third reason is the geometry of the experiment in which the distance differences from anchors in the
Y direction are lower than the distance differences in the
X direction.
In the two-dimensional case, we can obtain the absolute spatial distances by evaluating the errors in the
X and in the
Y coordinate for each sample in the test set. Then, the location error can simply be calculated using the Pythagorean theorem. In this way, we can obtain the summary statistics of the maximal error distance and the mean error distance as shown in
Table 3. In this case the errors are also discrete valued, but since diagonal errors are also possible, they will not in general be multiples of 0.5 m.
5.2. SVM Classifier
In the SVM classifier, we have found that the impact of the regularization parameter is stronger in the two-dimensional case than in simpler classification along a single axis. Larger values of the parameter
C tend to improve the performance of the model, although even for
the performance on the test set is slightly worse than that experienced for
kNN. Having in mind that the increased regularization parameter strongly impacts the learning time, the value 38 has been the maximum tested with the accuracy of classification reaching 99%. The scatter plot diagram for the SVM classifier in the two-dimensional case is shown in
Figure 17. Clearly, the spread of values is comparable to the case of
Figure 16.
5.3. Random Forest Classifier
The Random Forest (RF) classifier performs superior to all other algorithms in this case (see
Figure 18) as well as in the simpler one-dimensional scenario. Recall that in the one-dimensional case, RF was among the best options, however in the two-dimensional scenario in which the problem is more complex the performance of the algorithm is much more strongly visible.
Varying the number of predictors in the RF ensemble, we see that the best performance is achieved when the number of trees is 240, which is the largest value we have tested to keep the complexity of the model maintainable. Even in this complex case with 300 output classes, the accuracy of the model on the test set is over 99.9%.
5.4. Multi-Layer Perceptron
In this case we have taken networks with up to three hidden layers consisting of up to 1280 neurons as the candidate architectures. The best results have been achieved with 1280 + 1280 neurons in the hidden layers, which means that for the larger dataset (compared to the one-dimensional case) a more complex network is required. Still, given the complexity of the model and the computational complexity of the training, the results achieved (see
Figure 19) are worse compared to the other cases—the accuracy over all points (
X and
Y correct at the same time) is 97.6%. The neural network is also prone to highly distant misclassifications as in the worst case the spatial distance between the correct and predicted class was almost 30 m.
6. Summary and Conclusions
To compare the performance of the different ML techniques in both the one-dimensional and the two-dimensional localization scenario, we provide an overview of test set accuracy in
Table 2 and
Table 3, respectively. In these tables, we provide three values for each approach. The accuracy achieved in the test set is computed as the ratio of correctly classified positions to the whole number of samples in the test set. The maximal error is the spatial distance of the farthest misclassified position from the correct location and the mean error is the average spatial distance of all predictions.
From the results it is obvious that Bluetooth Low Energy can well be used for RSS-based localization although the received signal levels tend to fluctuate over time at any given position. Machine learning algorithms provide the opportunity to achieve very high accuracy and low errors as particularly seen in the case of Random Forest classifier.
In this work, we have provided an overview of the possible approaches to improving the performance of the fingerprinting technique using the supervised machine learning paradigm. We have defined a laboratory environment and performed a measurement campaign to collect a representative number of samples for localization based on RSS fingerprints in the radio environment of Bluetooth Low Energy. The collected data are freely available, ready to be re-used in further work also by other researchers.
The complexity of the different models is well illustrated by the disk size needed to store the model for inference. Considering the hyperparameter adjustment and the one dimensional scenario as described in
Section 4 for example, the largest disk footprint has the Random Forest classifier (175 MB), which is the best performing algorithm. This may limit the usability of this approach in practical applications. The size of the other models is comparable, with
kNN requiring 3.8 MB, Multi-Layer Perceptron using 2.5 MB and the SVM model using 6.9 MB.
The results achieved in the RSS fingerprinting employing machine learning show that with proper selection of algorithm and with a sufficient number of samples for training, the localization can be very accurate. The accuracy achieved for three of the best performing algorithms is over 99%. Among these, the k Nearest Neighbor classifier has slightly lower performance and comes with the requirement to store the learning samples (neighbors) for the inference phase. The Support Vector Machine, on the other hand, requires a complex model due to the high number of output classes. The Random Forest classifier scored best in our experiment and is seen as a very promising technique in this specific application.
The work presented in this paper evaluates the usage of machine learning algorithms for positioning in fixed conditions, where not many external factors are expected to influence the working of the position estimation. As a follow-up work, we aim to extend the data-set and the models to evaluate how factors such as presence of humans, changes in furniture position or imprecise ground truth measurement deteriorate the obtained results.