research-article

Open access

Laser Range Scanners for Enabling Zero-overhead WiFi-based Indoor Localization System

Authors:

Hamada Rizk,

Hirozumi Yamaguchi,

Moustafa Youssef,

Teruo HigashinoAuthors Info & Claims

ACM Transactions on Spatial Algorithms and Systems, Volume 9, Issue 1

Article No.: 4, Pages 1 - 25

https://doi.org/10.1145/3539659

Published: 12 January 2023 Publication History

All formats PDF

Abstract

Robust and accurate indoor localization has been the goal of several research efforts over the past decade. Toward achieving this goal, WiFi fingerprinting-based indoor localization systems have been proposed. However, fingerprinting involves significant effort—especially when done at high density—and needs to be repeated with any change in the deployment area. While a number of recent systems have been introduced to reduce the calibration effort, these still trade overhead with accuracy. This article presents LiPhi++, an accurate system for enabling fingerprinting-based indoor localization systems without the associated data collection overhead. This is achieved by leveraging the sensing capability of transportable laser range scanners to automatically label WiFi scans, which can subsequently be used to build (and maintain) a fingerprint database. As part of its design, LiPhi++ leverages this database to train a deep long short-term memory network utilizing the signal strength history from the detected access points. LiPhi++ also has provisions for handling practical deployment issues, including the noisy wireless environment, heterogeneous devices, among others. Evaluation of LiPhi++ using Android phones in two realistic testbeds shows that it can match the performance of manual fingerprinting techniques under the same deployment conditions without the overhead associated with the traditional fingerprinting process. In addition, LiPhi++ improves upon the median localization accuracy obtained from crowdsourcing-based and fingerprinting-based systems by 284% and 418%, respectively, when tested with data collected a few months later.

1 Introduction

Recent years have witnessed the advent of indoor localization systems harnessing the capabilities of smartphones [40, 49, 53, 54, 55, 67]. At the core of indoor localization systems, WiFi-based systems are built on the ubiquitous deployment of WiFi technology. Toward making WiFi-based localization possible at the intended accuracy level, fingerprint-based approaches are proposed, as they have been shown to be capable of providing accurate, fine-grained positioning [1, 2, 4, 6, 7, 13, 27, 28, 30, 39, 42, 45, 51, 62, 63, 68, 70]. Fingerprinting works in two phases: offline and online. During the offline phase, the received signal strengths (i.e., RSS), from the access points (APs) installed in the area of interest, are recorded by a cell phone carried by the site surveyor. This is done at specific reference points. The surveyor has to remain at the exact points over a relatively long time and tag these recorded data manually through the use of a data collection application. The collected fingerprint is then leveraged to build a localization model that can be either probabilistic methods, e.g., Reference [68], or machine learning-based, e.g., References [1, 18, 47]. During the online phase, this model can subsequently be used to obtain a real-time estimate of the user’s location.

The data collection required for fingerprinting is time-consuming, vulnerable to environmental change, laborious, and unscalable; especially in large testbeds. To tackle this problem, several techniques have been proposed including using robots, additional sensors, computer vision, crowd-sensing, propagation models, and/or data interpolation techniques [14, 22, 34, 57]. These approaches do not account for the effect of humans or reduce the ubiquity and/or accuracy of the localization system or may have other limitations (e.g., require a complex calibration process, raise privacy concerns, require interaction from the users, etc).

In this article, we propose LiPhi++, a system for seamlessly enabling fingerprinting-based indoor localization without the associated data collection overhead inherent in the traditional fingerprinting method. The idea is to opportunistically leverage transportable laser-range scanners (LRSs) (or LiDARs) in a user-transparent way to tag WiFi scans collected during the normal movement of building users without human intervention. LiPhi++ can also construct the required fingerprint database and build the localization model with as few as only one LRS. In addition, the used LRSs can be temporarily deployed and then reused in other buildings (Figure 1), significantly reducing the overhead and cost of deploying a WiFi localization system.

Fig. 1.

Nevertheless, LiPhi++ needs to make provision for a number of challenges including handling instantaneous changes of the WiFi signals that could yield spurious points in the estimated traces, matching anonymous LRS traces (i.e., sequence of point labels) to the WiFi scans collected by the identified user, and constructing robust deep learning localization models. For this, we introduce a novel iterative user trace refinement approach that uses temporal and spatial location smoothing and outlier detection to ensure the validity of the initial trace shape. This estimated user trace is matched to the available LRS traces collected at the same time with the lowest cumulative pointwise distance. These automatically labeled user traces constitute a large (i.e., dense) fingerprint database that enables the effective training of deep learning models. Additionally, LiPhi++ includes provisions to ensure the generalization and robustness of the trained model against overfitting.

We implemented LiPhi++ on Android devices in two different testbeds. Our results show that LiPhi++ outperforms the state-of-the-art indoor positioning techniques in both testbeds by at least 284%. This accuracy is obtained without any fingerprinting overhead or user intervention along with robustness to temporal variations of the signals and infrastructure.

This article extends our earlier work in Reference [48]. Specifically, we propose a new localization model (multimodal deep recurrent neural network) to provide more robust performance. This performance is achieved through training the localization model on a sequence of input scans rather than a single scan as in Reference [48]. Therefore, the proposed localization model is designed to learn the underlying relationship between the signals received from WiFi access points as well as the temporal correlation (i.e., historical changes) between successive scans, leading to better localization performance. Furthermore, LiPhi++ compensates for the temporal variations of the WiFi signals and the class imbalance problems through the use of both spatial discretization and augmentation techniques. The proposed model and its associated modules enhance the accuracy by 18.5% and 37.3% compared to our earlier method in Reference [48] when tested with data collected on the calibration time or a few months later, respectively.

The rest of this article is structured as follows: Section 2 gives a background on laser range finders and discusses practical issues facing WiFi-based fingerprinting localization. In Section 3, we provide an overview of LiPhi++. Section 4 presents in detail the methodology proposed by LiPhi++. In Section 5, we describe the data collection process and provide a detailed evaluation of the system. In Section 6, we discuss the research carried out in literature and most relevant to LiPhi++. Finally, we conclude the article in Section 7.

2 Background and Motivation

In this section, we start with a background on laser range scanners. Then, we discuss issues that need to be addressed by traditional fingerprinting techniques.

2.1 Laser Range Scanners

LRSs are devices that detect surrounding objects using eye-safe lasers. A laser beam scans the scene in one or two dimensions and can obtain accurate distance at each angle with sub-decimeter errors and with high frequency (e.g., 20 scans per second at each angle). Since the LRS units become less expensive and more popular [23, 31, 65], they can be used more in large indoor environments such as malls and museums. Our team has developed a LiDAR-based tracking system (Figure 1) and has deployed it in a shopping mall for the purpose of developing LiPhi++. As shown from the figure, it is battery operated with a place-and-play feature, using Raspberry Pi 3 and an LTE module. Therefore, the setup overhead cost of the LiDAR-based tracking system is negligible (which includes placing the LiDAR in a specific location and marking its location on the map). Using LiDARs enables the collection of large amounts of dense data, which facilitates the effective application of accurate, though data-hungry solutions (e.g., deep learning).

2.2 Fingerprint Construction Overhead

In this section, we quantify the site survey overhead (in terms of time) that is incurred by a typical manual fingerprinting process. Given \(n\) reference points in the area of interest and assuming the time for collecting the information at each point is \(\tau\) minutes, and a constant time for moving to a new point and tagging the location the surveyor is standing at (setup time, denoted by \(\alpha\) ); then the site survey overhead is computed as \(n \times (\tau + \alpha)\) . For instance, to construct a fingerprint database for a shopping mall at 1,000 points with \(\tau = 5\) mins and \(\alpha = 1\) mins would require 100 hours in data collection, highlighting the massive overhead inherent in manual fingerprinting. Furthermore, this time cost has to be paid with every change in the testbed (e.g., moving an AP to a new location). In contrast, LiPhi++ requires zero-extra overhead as the fingerprint database is constructed opportunistically and transparently from the users of the building in their normal movements in the environment. Additionally, the used LiDAR-based tracking system covers a large area and is easily transportable with negligible overhead.

2.3 Temporal Variations

The RSS values from an arbitrary AP at a given location have two types of variations: Short-duration variations and Long-duration variations. Short-duration variations are due to occasional environment variations, e.g., user’s movements, and lead to drastic variations in the RSS measurements received from some APs over a short time. These types of fluctuations usually lead to location estimates that remarkably deviate from their preceding or subsequent estimations and can be considered outliers. However, Long-duration variations refer to attenuation of the signals due to lasting changes in the environment that permanently affect the quality of the fingerprint database on which the localization model is built. Figure 2 shows the two RSS distributions of an arbitrary AP at the same location in two different months (July and November), which depicts how the distribution changes over time. This change can be seen as an inadvertent covariate shift [58] and leads to a remarkable drop in the accuracy of the trained machine learning models [16, 48, 58]. This effect is empirically quantified in Section 5. To tackle this issue, the fingerprint database would need to be rebuilt, which would involve the associated overhead cost estimated in the previous section. By leveraging the laser range scanners, LiPhi++ can keep the fingerprint up-to-date, significantly reducing the effects of the temporal variation with zero extra overhead.

Fig. 2.

However, the installed APs and/or their parameters (e.g., transmission power) may change over time due to maintenance/replacement or the addition of new APs. These changes lead to variation in the APs density in the area of interest. For example, during the development of this work, the APs in the building of our Lab have been maintained/replaced, leading to a severe decrease of the originally installed APs by 30%. This, in turn, negatively affects the already-trained traditional localization model as it loses significant information while it cannot consider the new APs. This highlights the importance of continuously updating the fingerprint as in LiPhi++.

3 Problem Statement and System Overview

3.1 Problem Statement

Without a loss of generality, we assume the user’s phone is tracked in a two-dimensional (2D) indoor environment \(\mathbb {L}\) containing \(m\) access points and \(q\) LRS devices. The user is at an unknown location \(l \in \mathbb {L}\) carrying a device receiving scans for the nearby APs. Let an arbitrary WiFi scan be represented as \(x_i=\lbrace x_{i1},\ldots , x_{in}\rbrace\) , where \(n \le m\) due to the noise and AP fluctuation and the \(j{\rm th}\) entry is the RSS measurement from the \(j{\rm th}\) AP in the \(i{\rm th}\) scan. In the offline phase, the problem is formally expressed as follows: Given a signal strength vector \(x_i=\lbrace x_{i1},\ldots , x_{in}\rbrace\) , the coordinates of a subset \(n_z \in n\) of APs with known locations (called reference APs), LiPhi++ seeks to find a rough estimate of the sequence of user locations, \(r = \lbrace r_1,r_2,\ldots , r_k\rbrace\) of length \(k\) , with coarse-grained (i.e., relatively low accuracy) accuracy. These coarse-grained estimates are then required to be improved using a set of LRSs that are temporarily installed in the area of interest. However, accurate LRS-based location estimates \(l = \lbrace l_1,l_2,\ldots , l_k\rbrace\) do not include information about which location belongs to which user. Therefore, LiPhi++ is asked to correct the WiFi-based per-user locations with the accurate but user-anonymous LRS locations. This process yields the targeted automatically constructed fingerprint database. However, the problem in the online phase becomes the following: Given an RSS vector \(x_i=\lbrace x_{i1},\ldots , x_{in}\rbrace\) , we aim at finding the location \(l_i\) that maximizes the probability \(P(l_i|x_i)\) . To answer this query, LiPhi++ trains a deep localization model with the constructed fingerprint database.

In the next sections, we discuss the details of how LiPhi++ builds the fingerprint database and permits robust localization in continuous space.

3.2 System Overview

Figure 3 shows the architecture of the proposed system. LiPhi++ has two phases: an offline phase and an online tracking phase. In the offline phase, LiPhi++ aims to automatically construct a fingerprint database and a deep localization model. Toward achieving these goals, the system designer initializes the offline stage by feeding the floorplan layout of the environment with APs’ locations to the system.¹ For constructing the required fingerprint database, the building users scan the WiFi measurements from the deployed APs in the area of interest using the WiFi Scan Collector module in a crowdsourcing manner. This module is an application installed on the user’s phone. Note that these scans are collected without any manual intervention from the users. Hence it does not require user feedback as in previous systems, e.g., Reference [32]. However, the location of the collected scans can be coarsely estimated by the WiFi Trace Estimator module based on a propagation model as verified in Reference [13]. Simultaneously, the deployed laser range scanners detect moving objects (i.e., users) in the considered environment. The LRSs scans are further processed by LRS Trace Estimator module to obtain the sequence of pedestrian positions forming a trace. Then, the Trace Matcher module is responsible for matching and correcting the estimated WiFi-based trace by the location tags of the most similar LRS-based trace along the available walking paths. As a result, a fingerprint database is constructed of timestamped WiFi scans and annotated by its corresponding LRS-based labels. This database is leveraged by the Localization Model Builder to train a deep neural network, which is used later in the online phase.

Fig. 3.

During the online phase, the user carrying her phone at an unknown location scans for WiFi information from the detectable APs in the area of interest. These scans are then forwarded to the LiPhi++ server. The Location Estimator module feeds the data to the localization model constructed in the offline phase to estimate the current user location.

4 The Liphi++ System

We present the details of the offline phase that include the automatic fingerprint construction and training of a localization model in addition to how this model is queried in the online phase to provide fine-grained location estimates.

4.1 LRS-based Trace Estimator

In this article, we leverage simple, low-cost, commercial off-the-shelf 2D scanners. An LRS unit used in our experiment can scan 1,080 points horizontally, at an angle of 270 \(\circ\) within 25 ms. Without loss of generality, we adopt the tracking method in Reference [66] for trajectory estimation based on LRSs. Each LRS unit horizontally scans the environment, and LiPhi++ calculates the difference between the current distance and the background distance (e.g., walls and fixed objects). Then LiPhi++ detects a group of such points that correspond to a single person’s waist (ellipse) and estimates the location of the user as the center of the arc shape (Figure 4). This method forms the user trace by connecting the subsequent nearby estimated locations as shown in Figure 5. It is worth mentioning that LiPhi++ can operate with only a single LRS by covering the whole area of interest incrementally over time by moving the LRS to different sub-areas.

Fig. 4.

Fig. 5.

4.2 WiFi-based Trace Estimator

The goal of this module is to obtain a coarse-grained estimate of the unknown phone’s location given the corresponding WiFi RSS information. The guiding principle utilized here is that the measured RSSs from different APs heard by a particular user device can ideally represent the distance between the AP and that device. In other words, the stronger the RSS received from an AP, the closer the device should be to this AP [13]. Building on this observation, we define two sub-modules, Spatial Rule Generator and WiFi Scan Annotator, to enable a transparent estimation of the user location in real time.

4.2.1 Spatial Rule Generator.

This module aims to define rules (referred to as spatial rules) that characterize every location in the environment based on its distance to the reference APs (i.e., APs where locations are known). These rules represent the expected RSS relations between every possible pair of reference APs at each location. To enable the effective calculation of spatial rules, LiPhi++ discretizes the area by creating a grid that is virtually superimposed on the layout of the area of interest, and the center of each grid cell is referred to as a reference point. The density of the grid cells is a configurable parameter that could be selected to balance between computational overhead and accuracy (Section 5.2.1). As a result, several reference points (i.e., grid cell points) are defined such that they are uniformly distributed over the area of interest. Given an arbitrarily defined point in the area, the Euclidean distances between that point and every detectable reference AP are calculated. Then, for each point, the spatial pairwise rules are derived based on the distance between the point in question and every possible pair of reference APs. In particular, assume there are two reference APs \(A\) and \(B\) with distances \(d_A\) and \(d_B\) from the point, respectively. A spatial rule can be defined as \(d_B \lt d_A\) if \(B\) is closer to that point than \(A\) . It is expected that this rule should apply to the signal strengths detected by the mobile device at that point. That is, the RSS level from \(B\) should be expectedly higher than the RSS level from \(A\) forming an RSS rule \(x_B \gt x_A\) (Figure 6). For \(m\) APs present in the area of interest, a total of \(\binom{m}{2}\) pairwise spatial rules are defined for each point. Note that considering relative relations between different reference APs rather than the absolute RSS values lead to device-invariant location estimates.

Fig. 6.

4.2.2 WiFi Scan Annotator.

The goal of this module is to estimate the user’s most probable location, which is then used to annotate the corresponding WiFi scan. This can be done by locating the user’s mobile device based on the measured RSS signals received from the reference APs during the normal movement of the users of the building (i.e., crowdsourcing). Subsequently, the location of the user’s device is estimated using the pre-calculated spatial rules of each point. During the resolution process, short-term variations in the wireless channel (e.g., arising from link blockages between the device and the reference APs) may lead to erroneous matches. To combat this, this module employs a scoring approach, outlier removal, and smoothing techniques, described as follows.

The process of finding a rough user location is actually resolved to a number of best-matching reference points, as selected by their matching score. Therefore, the rough user location is estimated as the center of mass of reference points with the top matching scores. More specifically, all reference points are initialized with a score of zero (i.e., \(s = 0\) ). Matches between the given RSS rules and the rules for a given reference point increase the score for that reference point by one for each successful match. Naturally, this implies that a number of reference points (if not all) will have nonzero matching scores. Therefore, the points with the top \(\gamma\) scores are selected, and their weighted average is reported as the user’s location with an assigned weight equaling their corresponding score. Empirically, \(\gamma\) is one of the system parameters and we found that \(\gamma\) of 10% is sufficient to achieve good performance enabling better matching and location refinement (as described in Section 4.3).

To further boost the robustness against the occasional environmental dynamics, LiPhi++ leverages multiple consecutive location estimates in its operation, which constitutes a window of \(v\) locations. In particular, given a window of \(v\) locations (i.e., consecutive estimates in time), LiPhi++ removes the anomalous location estimates that may deviate from the majority of scans in that window. As a result, the predicted consecutive locations are ensured to be correlated. To do that, it calculates the distance between each estimated location and the remaining \(v-1\) reported locations, a value we term inter-location distance and define as [15, 41]

\(\begin{equation} d_i=\sum _{j\ne i}\frac{d(r_i,r_j)}{v-1}, \end{equation}\)

(1)

where \(d(r_i,r_j)\) is the Euclidean distance between locations \(r_i\) and \(r_j\) . Then, we calculate the average \(d_{\textrm {avg}}\) of inter-location distance of all locations. This module rejects the location estimates that do not satisfy \(d_i\le d_{avg}\) [44]. Finally, a robust estimate of the user location \(r^*\) is calculated as the center of mass (i.e., weighted average) of the remaining \(u\) locations after removing outliers as

\(\begin{equation} r^* =\frac{\sum _{i=1}^{u} s_{i} r_i}{\sum _{i=1}^{u} s_{i}}, \end{equation}\)

(2)

where \(r_i\) is the \(i{\textrm {th}}\) reported location after avoiding outliers and \(s_{i}\) is its corresponding confidence score. In calculating this average, the weight used for a given location is its corresponding normalized matching score.

4.3 Trace Matcher

We perform trace matching to refine the labels obtained by WiFi location query resolution. LRS location estimates and WiFi predicted labels are synchronized using the same clock. Therefore, LiPhi++ defines a matching window over which WiFi predicted labels are matched to the LRS-based estimated locations along the walking path. Matching over windows helps the system work well even with low LRS coverage as the matching window could be longer, thereby leveraging the LRS estimates at the start and the end of the trace. Even though the location estimated from WiFi at each timestep is not perfectly accurate, the general shape of the trace can uniquely identify the correct walking path in the area of interest reported by the LRS units, especially after the rejection of anomalous location estimates. This can be seen in Figure 7. LiPhi++ starts with maximum uncertainty about the correct trace and reduces ambiguity incrementally over time by buffering estimations in a window of length \(k\) . The system continuously matches the timestamped WiFi-based labels inside that buffer to the corresponding labels generated by LRS traces. When a trace ambiguity is detected, LiPhi++ starts looking into the buffer to find a trace \(c = [\ell _1,\ldots , \ell _{k}]\) such that

\(\begin{equation} c^* = \text{arg min}_{c \in \mathcal {C}} \sum _{i=1}^{k} d(l_i, r_i), \end{equation}\)

(3)

where \(d(l_i, r_i)\) is the Euclidean distance between the LRS-derived estimate and the WiFi-derived estimate at the \(i{\rm th}\) timestep stored in a buffer of length \(k\) . Thereafter, all the buffered WiFi scans are annotated with corresponding timestamped locations from the LRS-derived labels. The output of this step is an accurately labeled fingerprint database that can be used to construct a localization model.

Fig. 7.

4.4 Spatial Discretizer

Although the LRS-based labels are accurate, the corresponding WiFi (RSS) measurements are prone to short-term variations, which may lead to placing the collected RSS sample in an incorrect location, affecting the system’s accuracy. Moreover, the LRS-based labels are represented by coordinates in the continuous space, including unbalanced distribution of WiFi scans across those labels. Therefore, smoothing the assigned labels is expected to boost the localization model’s generalization ability and mitigate the class imbalance problem. This can be done by performing a discretization process mapping the coordinates of LRS-based labels to the closest cell coordinates in the virtual grid. Figure 8 shows an example. This allows the trained RSS-to-location function of the deep model to gain more flexibility as its provided classes (locations) are somewhat distorted.

Fig. 8.

Note that the class imbalance problem is solved using the proposed data augmentation methods (Section 4.6.4).

4.5 Data Augmenter

The goal of this module is to automatically generate synthetic training scans increasing the available training data and balancing the amount of data among different classes (reference points).

LiPhi++ employs two data augmenters: AP Dropping and Signal-shifting. Both techniques are described next.

4.5.1 AP Dropping.

The number of APs detected at a fixed location varies with time as discussed in Section 2.3. LiPhi++ handles this variation of APs’ density problem by randomly switching off some of the APs (i.e., setting their RSS to \(-100\) ) in the training samples and generating new synthetic samples reflecting this behavior. This has the effect of making the training process noisy, forcing the model to learn and generalize from incomplete yet plausible input vectors. This ensures that the model can generalize well in real-world conditions that are, in practice, noisy. Specifically, LiPhi++ generates a binary mask that can be multiplied by the RSS input vector to selectively drop certain APs in a scan (see Figure 9). This trick helps the model generalize and take account of such practical phenomena and also provides the required data for efficient training of the model as discussed in Section 5.2.6.

Fig. 9.

4.5.2 Signal-shifting.

Due to the noise in the wireless channels and the diversity of the WiFi chips in different devices, the magnitude of the RSS may be shifted with some variance as discussed in Section 2.3 and justified in References [36, 38, 69]. This module generates new synthetic samples to emulate this behavior. Specifically, LiPhi++ adds white Gaussian noise to the different entries of the RSS vectors as shown in Figure 10. The generated data are then combined with the original real data at hand to train the localization model. Signal-shifting yields improved performance relative to the base case. This is due to the variation added to the underlying data distribution for every AP that further enhances the generalization ability of the considered deep model.

Fig. 10.

4.6 Localization Model Constructor

This module is responsible for building a localization model using the WiFi fingerprints constructed in the previous module. The trained model is harnessed during the online phase to estimate the user location given the received WiFi scan(s). A recurrent neural network (RNN) is used at the core of this module as LiPhi++ has to learn the complex dynamics of input sequences as well as use internal memory to capture the temporal correlation across long and short input sequences. Long short-term memory (LSTM) is a commonly used RNN unit designed to handle sequential (time series) data with long and short-range dependency. This can be justified due to its internal memory used to remember information across consecutive input sequences. Additionally, it can overcome the vanishing gradient problem [21].

4.6.1 The Network Architecture.

The structure of the deep neural network, considered by the LiPhi++ system, is shown in Figure 11. Specifically, the network consists of multiple cascaded LSTM layers typically two followed by three fully connected layers (i.e., dense layers). In addition, considering multiple layers enhances the network’s ability to learn more complexity. In particular, each layer consists of a number of LSTM blocks, each of which has a built-in memory to memorize a sequence of data over time. This memory consists of three non-linear gate units that control the action that should be done on the content stored in memory. A LSTM block operates upon feeding the input sequence then the gates within the block compute a new state and output from the supplied input. The memory of the block is believed to characterize the sequential nature of the input data, which in this case is the historical RSS input. Therefore, it is a good model to capture the evolution of the cell information over time per user location.

Fig. 11.

4.6.2 The Input Layer.

Is a sequence of \(u\) timesteps, where each timestep consists of a \(u\) -dimensional RSS input vector ( \(n\) ) as demonstrated in Figure 11. This input is fed to the recurrent neural network during the offline and online phases. Constructing such an input structure involves segmenting the RSS stream into sequences of predefined fixed-length \(u\) , such that the beginning of each RSS sequence overlaps with the end of the preceding RSS sequence. Specifically, each RSS sequence is constituted by \(u-1\) preceding scans in addition to the current one (i.e., successive RSS sequences are shifted by one timestep).

Each entry in the sequence consists of the RSS observation from all the covering \(n\) WiFi access points at an arbitrary time instant. None-heard access points in a given scan are set to \(-\) 100 dBm. Therefore, the input to the deep localization model has a fixed size even with the presence of fluctuating access points (Section 4.6.4 handles this problem). To speed up the model convergence, an additional re-scaling is applied to the RSS sequences’, such that the data spans the range of \([0,1]\) .

4.6.3 The Output Layer.

Consists of a number of neurons corresponding to the number of considered grid cells (represented by the center of the grid). This network is trained to operate as a multinomial (multi-class) classifier by leveraging a softmax activation function in the output layer. This leads to a probability distribution over the reference fingerprint locations given an input scan.

More formally, each sequence of WiFi scans \(c_i = (c_{i1}, c_{i2}, ..c_{in})\) is fed to the network. The corresponding discrete outputs (i.e., logits) is \(a_i = (a_{i1},a_{i2},\ldots ,a_{im})\) capture the score for each reference points from the possible \(m\) reference points to be the estimated point. The softmax function converts the logit score \(a_{ij}\) (for sequence \(i\) to be from reference point \(j\) ) into a probability as

\(\begin{equation} p(a_{ij})= \frac{e^{a_{ij}}}{\sum _{j=1}^{j=n}{e^{a_{ij}}}}. \end{equation}\)

(4)

During the offline phase, the output probability label vector \(P(a_i) = [p(a_{i1}), p(a_{i2})\ldots p(a_{im})]\) is formalized using one-hot-encoding to obtain the ground-truth vector \(g_i\) . This encoding has a probability of one for the correct reference point and zeros for others.

The optimal model is captured through a repeated application of the Backpropagation Through Time algorithm [20]. The model is trained using the Adaptive Moment Estimation (Adam optimizer [25]) to minimize the average cross-entropy between the estimated output probability distribution \(P(a_i)\) and the one-hot-encoded vector \(g_i\) . The loss function is defined as follows:

\(\begin{equation} \mathop {\mathcal {L}}= \frac{1}{M_s} \sum _{i=1}^{m} D(P(a_i),g_i), \end{equation}\)

(5)

where \(P(a_i)\) is obtained using the softmax function, \(g_i\) is the one-hot encoded vector of the \(i{\rm th}\) input sequence, \(M_s\) is the number of input sequences available for training, and \(D(P(a_i),g_i)\) is the cross-entropy function defined as

\(\begin{equation} D(P(a_i),g_i) = - \sum _{j=1}^{m} g_{ij} log (P(a_{ij})). \end{equation}\)

(6)

4.6.4 Ensuring Model Robustness.

The deep RNN architecture is adopted due to its ability to capture the underlying relationship between the signals received from the detectable access points as well as the temporal correlation (i.e., historical changes) between successive scans, leading to better positioning performance. However, it builds undesirable co-dependence between the model’s inputs (i.e., different APs), which may lead to deterioration in the LiPhi++ performance with the high APs’ fluctuation. To increase the model robustness against over-fitting and variation of APs’ density and also accelerate the training process, LiPhi++ employs two regularization techniques and batch normalization: First, we use dropout regularization [56], which has been proved effective in training deep networks [26, 43, 44]. It is used during training to prevent the neurons of a network from developing co-dependencies among each other. In this technique, neurons are stochastically removed from the network architecture during the training phase at different epochs. As a result, these neurons do not contribute to the forward nor the back-propagation passes. This technique can be viewed as sampling different architectures during training, but all these architectures share weights. Second, we leveraged early stopping so that training would stop once the performance improvements are no longer achieved. This is practically achieved by training the network until the performance starts to degrade and subsequently adopting the model that achieved the best performance during the training process. Therefore, the number of epochs is automatically selected to avoid under- and overfitting [10].

Finally, batch normalization is adopted to increase the stability of the neural network and reduce the required time for convergence. It also allows each layer of a network to learn by itself a little bit more independently of other layers. This is done by normalizing the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

4.7 Online Phase

4.7.1 The Location Estimator.

The goal of this phase is to locate the user in real time using the received WiFi signals from the detectable access points in the area of interest. This can be done by processing the scanned WiFi information and extracting the corresponding RSS sequence as described previously. Thereafter, this sequence is then fed to the trained localization model to get a location estimate as one of the defined reference points (i.e., grid cell centers). The point \(a^*\) with the maximum probability given the RSS input sequence ( \(c\) ) can be selected as the estimated location. That is, we want to find

\(\begin{equation} a^* = \text{argmax}_a [P(a|c)]. \end{equation}\)

(7)

The challenge here is that the built model can predict the user locations and only at a few discrete locations. As such, the estimated locations, even with a very accurate localization model, will be spaced out leading to a bad user experience. Therefore, this phase aims to track the user in the continuous spatial space (i.e., anywhere even in locations different from the reference points). To do so, LiPhi++ reports the center of mass of all reference points, i.e., by applying a spatial weighted average over the reference points, where the weights of each point are chosen as their corresponding likelihood as output from the classifier network [24]. More formally,

\(\begin{equation} l_x =\frac{\sum _{i=1}^{m} P_i a_{ix}}{\sum _{i=1}^{m} P_i}, \end{equation}\)

(8)

\(\begin{equation} l_y =\frac{\sum _{i=1}^{m} P_i a_{iy}}{\sum _{i=1}^{m} P_i}, \end{equation}\)

(9)

where \(a_{ix}\) and \(a_{iy}\) are the spatial coordinates of reference point \(a_i\) , and \(P_i\) is its corresponding softmax likelihood.

4.7.2 Model Updater.

This part of the module is optional and can operate only when the localization model becomes outdated due to long-term variations, e.g., changes in the building structure, furniture placement, the available APs, and so on. LiPhi++ does this in two steps (1) updating the fingerprint database and (2) generating a new model. In such cases, we can temporarily add transportable LRSs to the area of interest for collecting a new fingerprint database by the systems’ users and the labels are transparently assigned in the same way as discussed previously. This database is then used to train the deep model. To minimize training time and reduce the data needed as compared to the model creation time, transfer learning is leveraged. Specifically, LiPhi++ adopts fine-tuning to update the learned RSS-to-location mapping by learning a new mapping when the RSS level changes. It is worth mentioning that the hyperparameters do not change in the model update process. Additionally, the time taken for re-training the model depends on the amount of the newly collected data and the convergence time of the model. The fresh data are usually small (as low as one sample per location [29]) if the environment does not significantly change, and the model is frequently updated. The convergence time of the model is, in general, fast (around 13 epochs), since it is initialized with a well-trained prior state (from the previous training runs), as confirmed in References [1, 29].

5 Evaluation

In this section, we evaluate the performance of LiPhi++ in two real-world indoor testbeds whose details are presented in Table 1. The first one (denoted as Office) is a big office of 240 m \(^2\) area at a service building in our university (Figure 12). It is a cluttered indoor environment that contains desks, whiteboards, and bookcases. The second one (denoted as Floor), shown in Figure 13, is a larger indoor testbed spanning a whole floor in another university campus with a 629 m \(^2\) area containing several labs of different sizes and furniture placements, meeting rooms, offices as well as corridors.

Fig. 12.

Fig. 13.

Table 1.

Criteria	Office	Floor
Area ( \(m^2\) )	10 \(\times\) 24	17 \(\times\) 37
Number of virtual fingerprint points	128	576
Building Material	Brick	Brick & Wood
Total number of APs	52	136
Number of reference APs	5	8
Number of LRSs	4	6
Sampling rate (scan/sec)	1	1
LRS height (m)	1.4	1.4
Number of collected samples per location	1,500	5,200k

Table 1. Summary of the Testbeds Considered in Evaluating LiPhi++

First, we describe how the data are collected and the software used. Next, we study the effect of the different system parameters on LiPhi++’s accuracy. Finally, we compare the performance of LiPhi++ to three state-of-the-art localization systems.

5.1 Data Collection Setup and Tools

The data are collected with an Android application designed especially for this task. This application continuously scans for the nearby APs in the area of interest and records the information of each one including the current time, the MAC address (ID), and the corresponding signal strength (i.e., timestamp, ID, RSS). The scanning rate is set to 1 scan per second. Even though we have a total of 136 and 52 APs detected in the Floor and the Office testbeds, respectively; we use five and eight reference APs, respectively, since those are the ones with known a priori locations. LRS units are uniformly distributed over the area of interest and they cover around six rooms of the Floor environment.² Four LRSs are deployed along the periphery of the walls in the Office testbed. All LRSs are installed at the same height of 1.40 m, and user traces can be detected as visualized in Figure 5.

Test points were collected on a uniform grid with a 1-m spacing using the traditional fingerprinting approach for evaluation only. Note that LiPhi++ does not require any calibration or collection of data in the traditional fingerprinting manner to build the fingerprint database. The data were collected using several Android phones, including Samsung Note8, HTC One X9, Motorola Moto G5, among others. This is done with a view to capturing the device-variant characteristics of the WiFi measurements. The total number of samples that are transparently collected via crowdsourcing and automatically labeled at the Office and the Floor testbeds is 13 K and 208 K samples, respectively. This number of samples is increased triple times by the data augmentation module. Then, 80% are used for training, and 20% are dedicated for validation purposes. Holdout test scans are collected on different days to show how the system will perform in the presence of environmental changes over time. The test data were collected at 128 and 567 fingerprint points in the Office and the Floor testbeds, respectively. The number of test samples per fingerprint point is 100 and 120 samples in the Office and the Floor testbeds, respectively. We implemented our deep localization model using the Keras learning library, which is a high-level neural network API running on top of the Google TensorFlow framework.

5.2 Effect of Changing LiPhi++ Parameters

In this section, we evaluate the effect of the different parameters and factors that affect LiPhi++ performance. In the following subsections, we show the effects of changing these parameters only on the Floor testbed for clarity of presentation. We report the optimal obtained parameters in Table 2. However, we report how LiPhi++ performs in both testbeds in Subsection 5.2.8.

Table 2.

Parameter	Range	Default
Smoothing Window length	1–20	5
Virtual grid spacing (m)	0.1–5	0.5 m
Number of DNN layers	1–100	5
Learning rate	0–1	0.001
Number of hidden units per layer	20–1000	100

Table 2. Default Parameter Values Used in the Evaluation

5.2.1 Effect of Virtual Grid Spacing.

Figure 14 shows the effect of changing the virtual cell spacing (i.e., reference point density) on the overall accuracy of the system and the corresponding time required by the WiFi-based Trace Estimator module to provide a location estimate. The run-time is calculated using a Lenovo Thinkpad X1 laptop running a 2.2-GHz Intel i7-8750H processor with 64 GB RAM. The figure shows that, as expected, a smaller spacing between reference points yields high localization accuracy with a negligible delay in the calculation time. Nevertheless, this is performed only during the offline phase and LiPhi++ does not affect the real-time performance of the system. Additionally, the figure shows that a reference point spacing of up to 0.5 m is enough to maintain the high accuracy of LiPhi++. It is worth mentioning that at a grid spacing of 1.5 m, a remarkable relative drop in the system accuracy is observed as virtual reference points have been defined over some non-accessible locations.

Fig. 14.

5.2.2 Effect of Varying Density of Reference APs.

Figure 15 shows the effect of changing the APs density on the system accuracy. For this, we uniformly and incrementally removed APs from the eight total reference APs present in the area of interest. The figure shows the accuracy of the WiFi trace estimator module degrades as the number of available APs is reduced. However, even with as low as five reference APs, LiPhi++ maintains a steady localization error of around 1 m. This high accuracy with a relatively low number of APs can be explained as a result of two processes: First, the location resetting using the accurate LRS trace estimator. Second, the used data augmentation techniques help the model to maintain its localization accuracy even with low AP densities. This highlights the robustness of LiPhi++.

Fig. 15.

5.2.3 Effect of LRS-based Labeling.

In this section, we study the influence of training the localization model of LiPhi++ using WiFi scans labeled by LRS as compared to labeling using the WiFi-based Trace Estimator module only. Figure 16 shows boxplots of the localization error of the system in both cases. The figure depicts that, as expected, leveraging LRS gives a drastic improvement in median error (273.8%), compared to the case of relying only on the coarse-grained WiFi-based labels. This can be attributed to the LRSs’ refinement of the training data, which significantly enhances the learning of the localization model and justifies the impact of using the place-and-play LRSs on the LiPhi++ system.

Fig. 16.

5.2.4 Number of Layers in the Network.

Deep learning is designed to provide a hierarchical learning ability that can be achieved through cascading different layers. Therefore the number of layers of the deep network is one of the effective hyperparameters to boost the system performance. Figure 17 shows the effect of changing the number of layers on LiPhi++ accuracy. Empirically, the figure shows that increasing the number of layers increases the accuracy. This can be justified as the deeper models have more parameters and better learning ability. The figure also shows that, beyond an optimal value of five layers, the model tends to overfit the training data, leading to an accuracy drop.

Fig. 17.

5.2.5 WiFi Estimator Smoothing Window.

Figure 18 shows the effect of varying the number of the WiFi scans utilized in estimating a location by the WiFi-based estimator module (Section 4.2). The figure shows that the more scans are fed to the module, the better localization accuracy until it reaches an optimal value at \(v= 5\) beyond which degradation occurs. This can be justified by two opposing factors. (1) Increasing \(v\) results in more information for location smoothing and outlier avoidance. (2) However, as \(v\) increases, more time is spent to collect these samples, which may lead to locating the user in a preceding location (i.e., latency in response). A balance is achieved at a window size of five scans, which leads to the best performance.

Fig. 18.

5.2.6 Effect of Data Augmentation.

Initially, 13 K and 208 K samples are automatically collected and labeled at the Office and the Floor testbeds, respectively. Then, the data augmentation module increases the amount of training data multiple times, enabling efficient utilization of deep learning models. Figure 19 shows the effect of leveraging the augmented data on the localization performance. The figure shows that data augmentation improves the LiPhi++ performance compared to augmentation-free training by 66.2%. The figure also confirms that the more training samples generated by the augmentation technique, the better the performance to cope with real-world deployments by implicitly simulating the inherent variation of the noisy wireless channel. Beyond using three times multiple of the original data, LiPhi++ performance tends to saturate then deteriorate as the noisy data becomes dominant relative to the non-synthetic/original samples.

Fig. 19.

5.2.7 Sequence Length.

Figure 20 shows the performance of LiPhi++ when varying the number of timesteps of the sequence that is fed to the RNN as an input. The figure depicts that as the input sequence gets longer, the positioning accuracy improves. This is due to the fact that the localization model has more information (multiple scans) over time, which helps the model to avoid spurious samples generated due to temporal signal variations. The model saturates at an optimal value of five timesteps (i.e., 5 seconds), which yields the best performance. Note that since LiPhi++ works with overlapping sequences, it provides an estimate for every one (the scanning rate), enabling real-time tracking. Longer sequences exceeding this length may lead to a drop in the system performance as the sequence will cover multiple user locations.

Fig. 20.

5.2.8 Performance in Different Testbeds.

In this section, we evaluate how the system would perform in two different testbeds: the Floor testbed and the Office testbed (Figures 12 and 13). The former is larger (629 m \(^2\) ) with many rooms; therefore, it requires a wider coverage by LRSs while the latter is smaller (240 m \(^2\) ) an open area without inner walls. In the Floor testbed, six LRSs are leveraged to cover the whole area of 12 rooms incrementally, and the Office testbed has four LRSs, covering the entire area of interest. Figure 21 shows that LiPhi++ obtains better performance in the Floor testbed as compared to the Office testbed. This can be justified due to two reasons: (1) The number of considered APs (input vector) in the Floor testbed is more (136 and 52 on the Floor and the Office testbeds, respectively), which favors its performance as the model learns more information, and (2) WiFi signatures are more location discriminative in the Floor testbed due to the presence of walls and the richer multi-path environment.

Fig. 21.

5.2.9 Performance of Fine-tuning.

Figure 22 shows the localization performance of LiPhi++ when training the localization model from scratch with the whole dataset compared to fine-tuning the already trained model with a few new samples collected later after some changes in the environment (typically 1,000 samples). The figure shows that LiPhi++ behaves equally in terms of localization accuracy for the two cases. However, fine-tuning provides tremendous savings of training time in the online phase as it takes as low as only 13 epochs for convergence. However, training the model from scratch requires 1,225 epochs to converge. This is because fine-tuning starts with good initial values of the parameters as compared to starting from a random set of parameters in the other case.

Fig. 22.

5.3 Comparative Evaluation

In this section, we compare the accuracy of LiPhi++ to one traditional fingerprinting technique that builds a denoising autoencoder for localization (WiDeep [1]) and another probabilistic WiFi-based localization technique that automatically constructs the fingerprint database using labels obtained from BLE-devices (iBeacons), HybridLoc [52]. For a fair comparison, all techniques are deployed in the same environment and trained on the same data (i.e., RSS data) collected from a total of 136 APs. Additionally, 10 iBeacons are installed in the environment, which is required for HybridLoc’s operation, as reported in Reference [52].

5.3.1 Location Accuracy.

Figures 23 and 24 show the CDF of distance error for the three techniques with temporal variations. Figure 23 shows the accuracy of the three systems when tested with a fresh fingerprint. Specifically, LiPhi++ and WiDeep [1] are matched in performance even though WiDeep [1] is trained with manual fingerprinted data while LiPhi++ is trained with automatically constructed fingerprints. This can be attributed due to the accurate labeling obtained from the LRS-based trace estimator module and the well-designed deep localization model that considers the evolution of the data among consecutive WiFi scans. HybridLoc [52] obtains the lowest performance, as BLE-based tracking provides coarse-grained labeling, which is not enough for annotating WiFi scans. Additionally, it uses a probabilistic method that does not benefit from the deep learning methods advantages.

Fig. 23.

Fig. 24.

Figure 24 shows how all systems would perform when tested four months later. The figure illustrates that our LiPhi++ system achieves improvements in localization performance by 284.7% and 418%, under this condition, as compared to HybridLoc and WiDeep; respectively. This can be explained by the combination of the data augmentation methods (AP dropping and signal-shifting) in the training data and the adoption of different regularization techniques, which gives LiPhi++ greater flexibility and generalization ability than the other systems. Additionally, LiPhi++ and HybridLoc [52] obtained better performance as compared to WiDeep [1] as they have provisions to update the fingerprint database and the localization model. Despite the high overhead spent to collect data for WiDeep [1] as a traditional fingerprinting technique, its accuracy cannot be maintained without re-doing the arduous calibration process.

LiPhi++ outperforms the earlier work in Reference [48], when tested with fresh and time-variant test data, by 18.5% and 37.3%. This can be justified due to the ability of the new localization model (multimodal deep recurrent neural network to learn/estimate the user location from a sequence of input scans rather than a single scan as in Reference [48]. As a result, the localization model learns the underlying relationship between the RSSs received from APs and the temporal correlation (i.e., signal evolution) between successive scans, leading to better localization performance.

In summary—as shown in Table 3—LiPhi++ is robust to variation over time surpassing the other techniques. This highlights the promise of LiPhi++ in enabling an accurate localization model with zero calibration overhead.

Table 3.

Mode	Technique	25 \({\rm th}\) Percentile	50 \({\rm th}\) Percentile	75 \({\rm th}\) Percentile	90 \({\rm th}\) Percentile
Fresh fingerprint	*LiPhi++*	0.28 m	0.67 m	1.17 m	2.02 m
	Liphi [48]	0.38 m ( \(-\) 33.5%)	0.79 m ( \(-\) 18.5%)	1.41 m ( \(-\) 20.3%)	2.62 m ( \(-\) 29.7%)
	HybridLoc [52]	1.15 m ( \(-\) 304%)	2.41 m ( \(-\) 261.6%)	3.82 m ( \(-\) 225.8%)	4.89 m ( \(-\) 142%)
	WiDeep [1]	0.38 m ( \(-\) 33.5%)	0.72 m ( \(-\) 8%)	1.40 m ( \(-\) 19.4%)	1.99 m (1.5%)
Four months later	*LiPhi++*	0.47 m	0.94 m	1.98 m	3.33 m
	LiPhi [48]	0.80 m ( \(-\) 70.5%)	1.29 m ( \(-\) 37.3%)	2.49 m ( \(-\) 26.2%)	4.28 m ( \(-\) 28.7%)
	HybridLoc [52]	1.77 m ( \(-\) 279.3%)	3.63 m ( \(-\) 284.7%)	6.23 m ( \(-\) 215%)	7.35 m ( \(-\) 120.9%)
	WiDeep [1]	3.21 m ( \(-\) 274.3%)	5.12 m ( \(-\) 418%)	7.12 m ( \(-\) 514.5%)	8.75 m ( \(-\) 541.9%)

Table 3. Summary of the Localization Error Percentiles of Different Techniques

5.3.2 Time per Location Estimate.

We used a Lenovo Thinkpad X1 laptop running a 2.2-GHz Intel i7-8750H processor with 64 GB RAM for evaluating the end-to-end running time of the different techniques. Figure 25 shows the results. The figure shows that as LiPhi++, Liphi (the earlier work in Reference [48]), and WiDeep [1] are all deep neural network-based systems, they need to pass the data through all the layers of the network. This takes more time than the traditional probabilistic technique proposed in HybridLoc [52]. LiPhi++ needs less location-inference time compared to Liphi and WiDeep, as LiPhi++ has a fewer number of layers and neurons and, by extension, a smaller number of calculations. Nevertheless, since the sampling rate is set to 1 ms, all techniques’ running time allows them to provide real-time location tracking.

Fig. 25.

5.3.3 Device Heterogeneity.

In this section, we evaluate the robustness of all systems to device heterogeneity. Initially, all systems are trained and tested with data collected by the same set of devices (i.e., Samsung Note 8, HTC One X9, Motorola Moto G5), which is shown in Figure 26. The figure also shows the performance of all systems when tested with data collected by Google Pixel XL (i.e., not included in the training set), which has completely different form factors and WiFi chips. HybridLoc [52] provides acceptable adaptability to the device heterogeneity as it utilizes probabilistic techniques, which are known to perform well in the presence of uncertainty. WiDeep [1] leverages denoising autoencoders and models device heterogeneity effect as an additive noise leading to remarkable robustness. The figure confirms that LiPhi++ provides superior robustness to the device heterogeneity problem (approximately the same accuracy when testing with the different testing device as when testing with the same training devices). This is due to the combination of data augmentation (i.e., Signal-shifting and spatial discretizer) in the training data and the adoption of a recurrent neural network. The RNN implicitly learns the location from relative RSS values in the input sequence rather than the absolute RSS amplitudes in a single scan, which gives LiPhi++ greater flexibility than the other systems.

Fig. 26.

6 Related Work

In this section, we discuss the most relevant literature.

6.1 Fingerprinting Systems

Fingerprinting systems [9, 68] present the most popular localization technique due to their high accuracy. In particular, the system in Reference [9] employs deterministic matching using K-nearest neighbor, so that the unknown user location is assigned to the fingerprint location closest to the average RSS signature of that location. However, deterministic techniques cannot handle the inherent noise and variations in the WiFi signal. However, probabilistic techniques such as Reference [68] have better adaptability to noise as noise is usually modeled as an uncertainty phenomenon. In this case, the recorded fingerprints are the RSS histogram of each AP at each reference location and the user location is estimated based on Bayesian inference. In these techniques, the signals from different APs are considered to be independent to avoid the curse of dimensionality problem. This leads to a loss of useful information, which leads to coarse-grained localization accuracy.

However, cameras are used to improve WiFi fingerprinting-based indoor positioning in Reference [34]. However, this solution usually requires a complicated calibration process to adjust for camera scaling and perspective and may necessitate the presence of many permanently installed cameras to cover the whole area. Unlike camera-based solutions, a LiDAR-based solution is transportable. So it can seamlessly be placed and work without any tedious calibration. Additionally, camera-based solutions are not suitable for environments where privacy is at a premium, which is not the case in LiDARs.

Recently, different deep learning-based localization systems, e.g., References [1, 5, 17, 36, 37, 38, 46, 49, 50, 63, 64] have shown better localization performance due to their ability to learn complex patterns and automatically extract discriminative features. Several deep learning architectures have been proposed in indoor positioning including Restricted Boltzman Machines in DeepFi [63], a deep convolutional neural network for CSI-based localization in Reference [64] and stacked denoising autoencoders for each fingerprint reference point in Reference [1]. The commonality between these techniques is that they depend on traditional fingerprinting and do not have provisions to reduce the data collection overhead. This is a major problem in deep learning-based systems, as they require large amounts of data to be properly trained, which directly translates to extra fingerprinting overhead to satisfy this requirement.

In contrast, LiPhi++ builds a deep learning-based localization model relying on a fingerprint database transparently constructed without explicit user participation. Additionally, LiPhi++ has provisions to boost the model’s robustness to noise.

6.2 Crowdsourcing Systems

Another line of research is proposed to mitigate the calibration overhead required for constructing fingerprint databases using crowdsourcing. This can be done explicitly without user intervention [33] or implicitly with user intervention [3, 35, 61]. The system in Reference [33] increases the fingerprint coverage by periodically asking the user to provide her current location. Although in theory, this method can provide accurate fingerprints, it is annoying to users and is not a practical solution. Hence, the systems in References [3, 35, 61] implicitly estimate the user location coarsely using dead-reckoning based on the users’ smartphones’ inertial sensors. The corresponding WiFi scan is then associated with a fingerprint. Thereafter, the location estimation can be opportunistically refined using sensor-based landmarks or map-matching. However, inertial sensors in smartphones are noisy, leading to an increasing error over time and missed opportunities to correct estimations. To avoid the noisy inertial sensors, the system in Reference [52] proposes a method to tag WiFi scans with locations obtained from BLE-enabled high-end smartphones. Therefore, the method requires the area to be well covered with BLE beacons (e.g., iBeacons), which should be sensed by high-end phones to estimate the user location and therefore build a WiFi fingerprint database. While this method is feasible, its application requires the site surveyors to be equipped with high-end phones and the area to be well covered by BLE beacons. Therefore, it cannot be considered a ubiquitous solution for every environment.

LiPhi++, on the contrary, requires neither user intervention nor high-end devices with permanently installed devices. It only uses temporarily installed LRSs for constructing the fingerprint database.

6.3 Propagation-based Systems

The basic idea of propagation models is the use of signal strength measurements received from the APs at the user device to calculate the distance between those APs and the device [12]. In particular, the stronger the RSS overheard from an arbitrary AP, the shorter the distance between the device and that AP. For example, the system in Reference [9] proposes a free-space propagation model that is then extended by calculating the signal attenuation in complex indoor environments caused by different objects such as walls and furniture. The systems in References [14, 22] synthetically build the radio maps of the different locations in 2D and 3D areas, respectively. To do that, these systems use some WiFi scans collected from the environment to calibrate the propagation model. This process usually incurs a high computational cost and cannot generalize well, since the model parameters are tightly coupled to the phone used for measurements. To handle the hardware dependency problem, IncVoronoi [13] constructs a Voronoi diagram of the area of interest relative to the different AP locations. Therefore, IncVoronoi incrementally enhances the confidence of the user region by refining the Voronoi tessellation of the area of interest as well as handling hardware diversity.

Although propagation-based techniques do not, in general, require a site survey, they provide coarse-grained accuracy as compared to fingerprinting-based techniques.

6.4 LRS-based Systems

Several systems have been proposed leveraging LRSs in many indoor applications. In Reference [11], an indoor navigational system for a robot is built based on WiFi for localization in addition to an embedded LRS for enhancing the position estimates and avoiding obstacles. LiPhi++, on the contrary, does not assume that every user’s phone is equipped with LRSs. Another research direction aims to track pedestrians in indoor environments based on LRS as proposed in References [19, 60]. Although LiDAR is a promising technology for accurate user tracking, it cannot identify the tracked person. To handle this issue, the system in Reference [59] leverages mobile phone inertial sensors to estimate the user trajectory using the dead-reckoning (DR) approach. Then, the system matches the LRS-based trajectory to the DR-based trajectory to identify the user. However, depending on the noisy onboard inertial sensors in the consumers’ phones lead to large position errors and random estimated trajectories that cannot be easily matched. Additionally, the system leverages LRSs for tracking purposes and requires inertial sensors (which exist only in high-end smartphones) for identification purposes. This, therefore, requires extreme deployment expenses and limits its ubiquitous adoption.

In contrast, LiPhi++ does not require noisy inertial sensors nor permanently deployed LRSs. LiPhi++ only uses LRSs temporarily during fingerprint database construction/maintenance, leading to extreme savings in expenses. Additionally, it provides an accurate WiFi localization system with similar accuracy to traditional (i.e manual) fingerprinting techniques with virtually zero data collection overhead.

7 Conclusion

We presented LiPhi++, a robust system that can automatically construct (and update) a fingerprint database for indoor localization systems without requiring traditional data collection overhead. The system leverages crowd-sourced WiFi signals to roughly estimate the user location, whose accuracy is then improved using an LRS-derived estimate. The resulting fine-grained location estimates are used to tag WiFi scans that are then used to train a deep localization model. We also presented data augmentation techniques to improve the localization model’s robustness against temporal signal variations.

We evaluated LiPhi++, in two different challenging environments that represent a full floor and a large Office in the university, using different Android devices. The results show that LiPhi++ provides a localization accuracy better than the state-of-the-art crowdsourcing-based and fingerprinting-based systems by 284% and 418%, when tested with data collected a few months later; respectively.

Acknowledgment

The first author thanks Nvidia for the hardware gift.

Footnotes

This information can be obtained from the building CAD information or automatically determined using crowdsourcing techniques as in Reference [8].

This can be done incrementally with a single LRS unit only.

References

[1]

Moustafa Abbas, Moustafa Elhamshary, Hamada Rizk, Marwan Torki, and Moustafa Youssef. 2019. WiDeep: WiFi-based accurate and robust indoor localization system using deep learning. In Proceedings of the International Conference on Pervasive Computing and Communications (PerCom’19). IEEE.

Abstract

1 Introduction

2 Background and Motivation

2.1 Laser Range Scanners

2.2 Fingerprint Construction Overhead

2.3 Temporal Variations

3 Problem Statement and System Overview

3.1 Problem Statement

3.2 System Overview

4 The Liphi++ System

4.1 LRS-based Trace Estimator

4.2 WiFi-based Trace Estimator

4.2.1 Spatial Rule Generator.

4.2.2 WiFi Scan Annotator.

4.3 Trace Matcher

4.4 Spatial Discretizer

4.5 Data Augmenter

4.5.1 AP Dropping.

4.5.2 Signal-shifting.

4.6 Localization Model Constructor

4.6.1 The Network Architecture.

4.6.2 The Input Layer.

4.6.3 The Output Layer.

4.6.4 Ensuring Model Robustness.

4.7 Online Phase

4.7.1 The Location Estimator.

4.7.2 Model Updater.

5 Evaluation

5.1 Data Collection Setup and Tools

5.2 Effect of Changing LiPhi++ Parameters

5.2.1 Effect of Virtual Grid Spacing.

5.2.2 Effect of Varying Density of Reference APs.

5.2.3 Effect of LRS-based Labeling.

5.2.4 Number of Layers in the Network.

5.2.5 WiFi Estimator Smoothing Window.

5.2.6 Effect of Data Augmentation.

5.2.7 Sequence Length.

5.2.8 Performance in Different Testbeds.

5.2.9 Performance of Fine-tuning.

5.3 Comparative Evaluation

5.3.1 Location Accuracy.

5.3.2 Time per Location Estimate.

5.3.3 Device Heterogeneity.

6 Related Work

6.1 Fingerprinting Systems

6.2 Crowdsourcing Systems

6.3 Propagation-based Systems

6.4 LRS-based Systems

7 Conclusion

Acknowledgment

Footnotes

References

Cited By

Index Terms

Recommendations

Indoor localization without the pain

Analysis of Crowdsensed WiFi Fingerprints for Indoor Localization

Indoor human localization with orientation using WiFi fingerprinting

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access