1. Introduction
In developed nations, the demographic change in the population is creating many challenges both from a societal and an economic standpoint. Research into aging, age-related conditions, and the means to support an aging population has therefore become a priority for many governments around the world [
1]. Ambient assisted living (AAL) is the European Union’s funding program that aims at developing “information and communication technologies (ICT) in a person’s daily living and working environment to enable them to stay active for longer duration, remain socially connected and live independently into old age” (
www.aal-europe.eu, accessed on 17 August 2021). This can be achieved by developing both preventive and monitoring systems for aging safely at home, in the community, and at work. In this regard, radio frequency (RF) sensing is particularly suitable for realizing AAL systems. The main advantages of the technology are four-fold:
it is passive and does not require users to carry obtrusive and uncomfortable sensors [
2];
it is suitable for a wide range of monitoring purposes such as localization and tracking [
3], fall detection [
4], vital sign monitoring [
5], gesture recognition [
6] and behavioral sensing [
7];
RF signals can penetrate walls, clutter and other occlusions, unlike many other sensors that have a limited field of view [
8]; and
it is privacy preserving which increases acceptance of the monitoring technology, unlike vision-based systems that are intrusive [
9].
The focus of this paper is on device-free localization and tracking (DFLT), because position and motion embed a wealth of information about people and can be used to develop various AAL applications.
DFLT methods use received signal strength (RSS) measurements between static wireless nodes to provide location estimates of a person inside the monitored area. In DFLT, there are two fundamental challenges: first, having a model of the RSS as a function of person’s location and, second, maintaining an accurate model over time. Fingerprint-based and data-driven DFLT methods use a supervised training period to collect RSS measurements that are labeled with a person’s known locations [
10,
11]. The labeled data are used for characterizing the unique propagation properties of the environment. During run time, the non-parametric models can be used to accurately localize people even in challenging indoor deployments. As a drawback, the training process is laborious, and the performance degrades drastically as the environment is altered [
12]. Parametric model-based DFLT approaches use physical models to describe the changes in RSS with respect to the locations of the sensors and person [
2,
13]. Typically, these methods only require a short empty-room calibration period when the area is not occupied by people and hence, they are easier to deploy than fingerprint systems. A downside is that the model inaccuracies limit the localization accuracy and these systems require recalibration and retraining if the environment changes. Both systems can be realized with the Calibration Data unit and Localization-and-Tracking unit shown in
Figure 1, and the main difference is how the system is calibrated during the calibration period
. Model-based systems only require the RSS (
), whereas fingerprinting methods also require the person’s location (
). Considering an AAL application, all empty-room calibration periods and training procedures that require human effort are very inconvenient since it can take from several minutes [
2,
14] up to half an hour [
11,
15]. This paper presents a parametric model-based DFLT system that requires no Calibration Data unit, streamlining the requirements for deployment. The proposed system is presented in
Figure 1 and the system can be realized using the Localization-and-Tracking unit, the Smoother unit and the Parameter Estimator unit.
The work in this paper addresses three major problems associated with the calibration of typical DFLT systems. First, most DFLT methods require an empty-room calibration period when the area is not occupied by people to calculate the mean RSS level
[
2,
16]. Typically, the RSS changes are defined with respect to
and an accurate estimate is a strict requirement of DFLT. Second, it is a common presumption that the model parameters are the same for all links [
17,
18], i.e., for all transmitter, receiver and frequency channel combinations. Third, model-based DFLT methods that take into account the unique model parameters for each link, require a calibration period when a person moves along a known training trajectory to estimate them [
13,
19]. If
is constant and the model parameters are the same for all the links, only a short empty-room calibration period is enough. However, neither of these are valid assumptions in general scenarios. In
Figure 2, the measured and modeled RSS as a function of excess path length (see Equation (
6)) for two example links is illustrated. As shown, the model parameters of the two links differ significantly from one another invalidating the common assumption. The used model is the exponential model [
20] (see Equation (
26)) with parameters,
, which are estimated using nonlinear least squares over the measurements and known trajectory.
This paper aims to address the limitations and drawbacks of typical DFLT methods by developing a system that does not require separate empty-room calibration periods and that can learn the unique model parameters for each link during the time of operation. The development efforts of this paper are validated using a open indoor deployment and a residential apartment deployment, and it is shown that the proposed system can achieve an average tracking accuracy as low as in the open environment and in the apartment. This paper makes the following contributions:
A Gaussian filter is presented to estimate the state of the target and a novel Measurement Selection unit is developed to select and combine the measurement models of two DFLT methods into one filtering algorithm. The developed system is demonstrated to outperform a state-of-the-art adaptive DFLT system and reduce the tracking error by .
A Gaussian smoother is implemented, and it is used to evaluate the expectations involved in the expectation step of the Expectation-Maximization (EM) algorithm. Moreover, we show how the maximization step of the EM algorithm is available in closed form for the considered measurement model. The presented EM algorithm is computationally very efficient, up to 18 times faster than current solutions used in the literature.
An EM algorithm is presented for estimating the unknown RSS model parameters, liberating the system from the need for supervised training and calibration periods. It is demonstrated that the EM algorithm not only improves the accuracy of the introduced system, but also other DFLT systems.
The experiments conducted in this paper, together with Matlab code to run the presented filtering, smoothing and EM algorithms are made publicly available and are published in [
21]. The aim is to lower the threshold to start research in the area and advance the field of DFLT in general.
The rest of the paper is organized as follows. Related work is discussed in
Section 2.
Section 3 formulates the problem and presents the localization-and-tracking system. The parameter estimation framework is presented in
Section 4. The experiments that were conducted are introduced in
Section 5 and the results are presented in
Section 6. Thereafter, conclusions are drawn.
2. Related Work
In this section, related works that use calibration and training are summarized. We begin the section by introducing the most common method in DFLT—using an empty-room calibration period. Systems that use online training are discussed thereafter. Lastly, works that calibrate the model using supervised or unsupervised training are presented. Empty-room calibration—Most DFLT systems define the RSS changes with respect to
, often referred to as the reference or baseline RSS. The system performance depends on an accurate estimate of
and therefore, it is typically calculated over several minutes when the area is not occupied by people [
2,
18,
22]. In this paper,
is one of the parameters estimated by the EM algorithm and it is shown that the system can be initialized without having an empty-room calibration period.
It is worth noting that methods that do not define the RSS changes with respect to
have also been proposed [
8,
23]. Instead, these methods calculate the sample RSS variance over a fixed time window and do not require calibration. A downside is that variance-based methods cannot locate stationary targets because in such scenarios, the variance drops close to the noise floor and does not show up in the estimated image.
Online calibration—Several works have proposed calibrating the reference RSS
[
14,
24], or measurement noise variance
[
25,
26] online. The system can be deployed without a calibration period when the reference RSS is estimated on-the-fly. Moreover, improved estimators can be developed when better noise models are available [
23,
25,
26]. These approaches first estimate the target state and then the parameters in a sequential order. However, such a decoupling is not always possible, or degrades the system performance if the state estimates are inaccurate. Unlike online calibration methods, this work uses the EM algorithm to estimate
and
from batches of data that is collected while the person is inside the monitored area.
Estimating the RSS distribution (defined by
and
) in two states, when a person is crossing or not crossing the imaginary link line between the transceivers, has been explored in [
27,
28,
29,
30,
31]. Not only can this information be used to localize people, but also to determine if the monitored area is occupied—a non-trivial task in RF sensing. The aforementioned works model the target location as a binary state: either the person is in between the transmitter (TX) and receiver (RX), or they are not. In this paper, we use a continuous measurement model and there is no need to explicitly determine whether a target is crossing the link or not.
Model calibration—The works in [
13,
15,
19] use an offline calibration phase for estimating parameters of the measurement model. During calibration of the parameters, a person moves along a known training trajectory and visits locations of interest. The RSS is recorded between the static wireless nodes and the measurement model parameters are estimated. During the online phase, the calibrated measurement model is used in the tracking algorithm. The works cited above demonstrate high tracking accuracy, but the calibration phase is inconvenient since it can take up to 30 min as in [
15]. In this paper, the estimated trajectory and EM algorithm are used for unsupervised learning. The position estimates can be inaccurate in the beginning, but as the person moves in the area, the model parameters can be estimated more accurately resulting in improved tracking performance. The proposed parameter estimation method does not require human intervention other than normal movement in the area.
Parameter estimation—This present work is most closely related to the developments in [
22,
32], which have an online calibration module as well as a batch-estimation module for tuning the model parameters. The aforementioned works use an imaging solution, and the accuracy is inevitably affected by the binary measurement model and resolution of the pixels. Instead, we use a continuous measurement model, and the implemented Bayesian filter directly estimates the kinematic state of the target using the RSS and higher tracking accuracy is expected [
20]. Furthermore, the existing approaches use a nonlinear least-squares solution to estimate the model parameters [
22,
32], while the proposed method solves the problem using a maximum likelihood approach based on a computationally efficient EM algorithm. Since unsupervised learning depends on accurate position estimates, the introduced system is superior with respect to [
22,
32]. The experimental results demonstrate that the proposed system can reduce the tracking error by
or more and the EM algorithm is computationally up to 18 times faster than the nonlinear optimization method used in [
22].
Expectation-maximization has also been used for RSS-based DFLT in [
20,
33]. However, these approaches use a mini-batch- and particle-filtering-based online EM approach. Although the online EM approach is attractive for on-the-fly estimation of the parameters and rapid adaptation to changing environments, the method also suffers from some important drawbacks. First, since the expectation step is based on particle filtering, which yields a degenerate approximation of the smoothing posterior density required by EM [
34,
35], it is computationally heavy, and the estimation of the marginal log-likelihood may be poor. Second, in [
20,
33] the maximization step cannot be done in closed form, and it is implemented by propagating a set of sufficient statistics and the numerical integration is carried out using importance sampling. In contrast, in this paper, the expectation step is calculated using a Gaussian approximation for the smoothing distribution which can be computed efficiently and does not suffer from trajectory degeneration. Furthermore, we show that the maximization step of the EM algorithm is available in closed form for the considered measurement model and implemented Gaussian smoother. Hence, with respect to [
20,
33], the solution presented in this paper is more tractable in terms of the approximation of the expectation and maximization steps and computational complexity. In addition, the EM algorithm used in [
20,
33] is only evaluated with simulations, whereas we validate our proposed method using experimental data. The EM algorithm is widely used in different applications and the readers are referred to [
36] for an introduction to parameter estimation and to [
37,
38] for a more general treatment of parameter estimation in nonlinear dynamical system using Gaussian filtering and smoothing.
3. Localization and Tracking
This work aims to track the kinematic state of a target using the RSS measured between static wireless nodes. The components of the introduced Localization-and-Tracking unit and their relations are visualized in
Figure 3 and presented in the following. We begin by presenting the models used for localization and tracking. Thereafter, the estimation tasks are presented, and they are performed by two complementary blocks: (i) the Radio Tomographic Imaging (RTI) unit summarized in
Section 3.2, and (ii) the Extended Kalman Filter (EKF) unit presented in
Section 3.3. The EKF uses the combination of RSS and RTI position estimates in the measurement update, and the Measurement Selection unit selects and combines the measurements as described in
Section 3.4.
The idea to augment the EKF with RTI position estimates was originally presented in [
3]. The localization-and-tracking system presented in this paper further improves the filter by introducing a Measurement Selection unit which selects and combines the measurements in a way that enhances the tracking accuracy. In addition, we propose a novel RTI positioning scheme that also estimates covariance of the position estimates. Furthermore, the developed localization-and-tracking system performs no low-pass filtering of the RTI images in contrast with the system presented in [
3]. The reason is that image filtering has a negative impact on the parameter estimation algorithm since it introduces a lag in the state estimates and causes correlated position errors.
3.1. Models
3.1.1. Dynamic Model
For DFLT, the state of the system in two-dimensional Euclidean space can be defined as
where
and
are the
x- and
y-coordinates, and the velocity components are denoted as
and
. This state representation is particularly suitable for DFLT because the position and velocity define the temporal and spectral properties of the RSS [
39]. This state evolves at time
k in accordance with
where
is the state transition matrix of the dynamic model and
is Gaussian process noise. As a person is not expected to change velocity very rapidly and unexpectedly, a common choice of
in RSS-based DFLT [
18,
22] is the second-order kinematic model [
40], given by
where
is an identity matrix, ⊗ the Kronecker product,
q the power spectral density of the process noise and
the sampling period.
3.1.2. Measurement Model
Consider a wireless network, where each of the
S nodes can communicate with the other
nodes. Moreover, the wireless devices can communicate on
C different frequency channels. Each transmitter, receiver and channel combination is a unique link and the total number of measured links is
. It is to be noted that full connectivity is not mandatory for DFLT. It is also to be noted that we do not assume channel reciprocity. The reason being, although the radio channel is reciprocal, measurements of the radio channel are not reciprocal, and parameters of the reciprocal link can be different [
41].
The nodes communicate in round-robin fashion and at time k, one node transmits and the other nodes receive. At the next time instant, , the transmission turn is assigned to the next node in the schedule. The nodes transmit sequentially, and one communication cycle consists of one transmission by every node. At the end of the communication cycle, the nodes switch simultaneously to the next frequency channel in a predefined list. Thereafter, a new communication cycle is initiated. Once each node has transmitted on every frequency channel, the schedule is restarted from the beginning.
For the considered problem, the measurement system at time
k can be defined as
where
is the measured RSS,
a deterministic link indicator matrix defined by the schedule (see
Section 3.3.1),
the linear-in-parameters measurement model and
is Gaussian measurement noise. The human-induced RSS changes are modeled using an exponential model [
20] and the complete linear-in-parameters model is defined as
where
is the decay rate,
the reference RSS and
the measurement gain,
and
. In (
5), the excess path length
defines an ellipse with the foci at the TX and RX and it relates the person’s location
to link
l with TX
m and RX
n by
where
and
denote the TX and RX positions in respective order. Lastly, the measurement noise covariance is assumed diagonal and it is defined as
. It is to be noted that the RSS can be measured at most for
links simultaneously because only one node transmits at a time. Moreover, to measure the
L links takes
transmissions and
duration of time.
3.2. Radio Tomographic Imaging
3.2.1. Image Estimation
RTI estimates a discretized RSS change field, denoted by
, using the RSS of
links measured on frequency channel
c. As in [
42], the RSS is assumed to be a linear combination of voxel changes plus noise
where
is the mean-removed RSS,
a weight matrix that relates the spatial change field
to the RSS,
N the voxel number and
the measurement noise. The measurement vector and noise covariance in (
4) can be decomposed as
and
where
and
denote the RSS and measurement noise covariance on channel
c. Now, the RTI measurement and noise vectors are related to the model in (
4) via
and
.
The minimum mean square error estimate for the model in (
7), with zero-mean Gaussian image prior
, is
The covariance matrix
for pixels
m and
n is [
42]
where
is the variance of each pixel and
is a user-defined space constant. For link
l and pixel
n, we define the elements of
as
where
and
are the measurement gain and decay rate of the model defined in (
5),
is a normalization term and
the excess path length. In the literature,
has taken many forms, and the reader is referred to [
16,
43] for further details.
The projection matrix is channel dependent and it is computed independently for each of the channels. However, must be computed only once at the beginning of the experiment and the real-time computation of the image requires only one matrix multiplication, of multiplications and additions. The spatial change field is estimated at the end of each communication cycle when and contains measurements from time instant to k. The image estimate on channel c is denoted from now on as , to prevent using two time notations.
3.2.2. RTI Positioning
For a single target, localizing the person can be postulated as finding the mode of
since it is expected that the pixels with highest intensity locate near the target [
44]. The mode is in the set of pixels with intensity higher than
, where
denotes the maximum component of
and
is a threshold. The threshold is a tuning parameter between two extremes: if
all pixels are taken into account and if
only a single pixel is accounted for, and we have empirically found that
provides a good overall performance. Let us define
where
. Now, the position estimate and sample covariance are given by:
where
are the pixel coordinates and
the weight for pixel
n.
An example is illustrated in
Figure 4 in which two RTI images are shown together with the estimates given in Equations (12) and (13). The image on the left is an ideal RTI image, pixels with
are centered around the target location and the image has very little noise resulting in an accurate position estimate and small covariance. The image on the right is very noisy and the image is multimodal. As a result, the estimated position does not accurately indicate the target location and the estimated covariance is significantly higher than in the other image. However, estimating the covariance allows taking such uncertainties into account and the tracking filter developed in the next section gives less weight to position estimates that are estimated from noisy images, such as the one on the right of
Figure 4.
3.3. Tracking Filter
The extended Kalman filter (EKF) computes the marginal posterior distribution of
for each time step
k using the data
and assuming Gaussian approximations for the filtering densities so that
. Different than conventional Bayesian filtering implementations for DFLT [
13,
18,
20], in this work, the measurement model of the filter is augmented with the position estimates from RTI as in [
3]. This bounds the filter’s measurement residuals by the position errors of the imaging approach. Therefore, the developed filter has the robustness of an imaging method and the tracking accuracy of a Bayesian filter. The filtering algorithm consist of three steps: (i) prediction step, (ii) model selection, and (iii) measurement update step. We simply refer to the introduced filter as EKF, although it is more complex than a first-order filter that would solely use RSS. In the following, we first present the observation model of the EKF and thereafter, the prediction and update steps of the filter.
3.3.1. EKF Observation Model
Recall that at a given time instant
k, at most
links are measured. Instead of using the complete model defined in Equation (
5), the EKF operates on a subset of the measurement model. We refer to the subset as the
observation model and essentially, it contains the measurements and associated models sampled at time
k. Thus, the observation model is defined by the TX and channel identifiers, and it changes with time. To explicitly define the observation model, consider the set of nodes
and the set of channels
. Then, the link index
l corresponding to the transmission by node
on frequency channel
and received by node
is
The RXs that measure the RSS at time k are
and
. Now if
, we can define the indices of the link selection matrix, measurement model, and noise covariance as follows:
In addition, the EKF requires the Jacobian of
, and the elements of this matrix are given by
where
and
denote the positions of nodes
i and
j. The Jacobian for
m is
3.3.2. Prediction Step
Given that the dynamic model in (
2) is linear, the prediction step of the first-order additive noise EKF can be expressed as [
36]
where
and
denote the state estimate and covariance in respective order, and
and
are the predicted mean and covariance.
3.3.3. Measurement Update
The
measurement selection unit presented in
Section 3.4 calculates the measurement residual
and forms the associated measurement noise covariance matrix
and measurement model matrix
. Using these, the mean
and covariance
can be updated using [
36]
3.4. Measurement Selection
The DFLT system implementations using Bayesian filtering or imaging (in particular RTI) have different characteristics. Depending on the target’s position and system deployment, the performance of the introduced filter can be improved by enabling or disabling certain measurements. For example, the covariance of the RTI position estimate can be small and biased. On the other hand, the filter can converge to an incorrect trajectory and the estimated covariance is not able to account for the uncertainties in the state estimate. To solve these issues, a logic to select the effective measurements is introduced. The procedure is based on the normalized innovation squared (a.k.a. square of the Mahalanobis distance) [
40]
where
is the linear measurement model. The test statistic has a
distribution with two degrees of freedom and it can be used to assess whether the realized RTI estimate is unexpectedly large with respect to the prior predictive distribution. In addition, the square of the Mahalanobis distance between two successive RTI estimates is calculated
where
and
denote the previous RTI position estimate and covariance. The test statistic can be used to assess whether the prior predictive distribution has converged to an incorrect trajectory.
For simplicity, the index notation is dropped and the measurement model, measurement noise covariance matrix, and Jacobian are simply denoted as: , and , respectively. The resulting logic to select the measurement models is presented below in which denotes the confidence interval of the distribution with two degrees of freedom.
if and —It is likely that the filter has diverged. Use only the output of RTI, i.e., , and .
else if—Normal operation, concatenate the models: , and
else—The RTI position estimate is likely inaccurate, use only the RSS measurements, i.e., , and .
The measurement residual
, measurement noise covariance matrix
and measurement model matrix
are used by the EFK update step presented in
Section 3.3.3.
5. Experiments
The development efforts of this paper are demonstrated using Texas Instruments CC2531 USB dongle nodes [
45]. The nodes operate on the
GHz ISM band and communicate on a set of frequency channels
defined by the IEEE 802.15.4 standard [
46]. The wireless nodes follow a round-robin schedule as discussed in
Section 3.1.2. In the transmitted packets, the nodes include the most recent RSS measurements, associated with the transmissions of other nodes. The time interval between the communications is approximately,
, defining the sampling period for the system. A base station that overhears all the traffic extracts the RSS from the packets and relays the measurements to a computer through UART for centralized processing. The readers are referred to [
47] for a detailed description of the communication protocol. It is to be noted that the method of this paper can be generalized to any device capable of measuring the RSS including Wi-Fi, Bluetooth and RFID.
The experiments are conducted in an open indoor environment and in a downtown residential apartment. In both experiments, 20 nodes are deployed as illustrated in
Figure 5. In the open environment, the nodes are set on top of podiums (≈0.9 m) and deployed around a
area. The size of the apartment is
and the nodes are deployed by the electric sockets so they could be powered from the mains. The walk-in closets did not have electric sockets on the exterior walls, so we decided to deploy one battery-powered node in each to ensure coverage of the entire apartment. These two nodes are located at
and
.
Before the experiment, reference positions were defined and marked. During the experiment, the person’s trajectory follows the imaginary lines between the markers. Once the target reaches a reference position, they stop, remain stationary for a few seconds, and then walk to the next reference position. During the experiment, the person is carrying a video camera. In post-processing, the RSS and video streams are synchronized, and the video is used to define the ground truth trajectory. In
Section 6.4, the statistical significance of the tracking error is tested to assure that the generated trajectory is close to the ground truth.
The experiments in both environments are conducted with one, four and 16 frequency channels and the set of used channels are: , and . In addition, three different trials are conducted with each channel number. The trials are approximately three minutes long and every reference position is visited at least once in each trial. In the following section, the experiments are referred to as Ex, where i indicates the experiment number and j the trial. Experiments 1–3 are conducted in the open environment and experiments 4–6 in the apartment. Furthermore, Ex1 and Ex4 use one channel, Ex2 and Ex5 four channels, and Ex3 and Ex6 all 16 frequency channels. In the apartment experiment, there are several co-existing Wi-Fi networks located in the coverage area, but the presented system can remain operational. The system is not particularly sensitive to occasional packet drops and frequency channel diversity partly mitigates interference issues. As an example, the packet reception rate is below on the most congested channel and above on channels that do not share the frequency band with Wi-Fi.
The imaging parameters used in the experiments are given in
Table 1, whereas the parameters of the tracking algorithm are defined by the measurement model
. In the experiments,
is assumed to be constant unless otherwise stated, the measurement gain and variance are initialized using
and
. In
Section 6, the initialization of
is discussed. The only user-defined parameter in the EKF is the process noise value and it should be tuned to the actual motion. In this paper,
which corresponds to an acceleration of
for the considered system. The tracking filter is initialized when the person has reached the first reference position and is stationary. The filter is initialized using
, where
and
are the center coordinates of the monitored area and
. To note, the ground truth position is never at
when the filter is initialized. Occupancy assessment is an important problem in DFLT [
32] but for simplicity, we assume we know the time instances when the person enters and exits the monitored area.
The filters are evaluated using the root-mean-square error (RMSE) which is defined as
. The mean-squared error (MSE) is
where
62,000 is the total number of estimates in one trial,
denotes the ground truth position, the hat accent indicates the estimate and
the square of the Euclidean norm.
6. Experimental Results
Matlab code to run the tracking and localization, smoothing and parameter estimation algorithms presented in
Section 3 and
Section 4, and the experimental data presented in
Section 5 are available in [
21]. The reader is referred to the associated readme file for an overall algorithmic description of the derivations presented in this paper and to the Matlab files for the actual implementation of the algorithms. The development efforts of this paper are experimentally validated in the following and benchmarked against existing solutions from literature. For now,
is calculated using a two-minute empty-room calibration period. From
Section 6.3 onward,
is initialized without an empty-room calibration period.
6.1. EM with Existing DFLT Methods
In this section, it is shown that the EM algorithm can be used to enhance not only the performance of the proposed system, but also two de facto DFLT methods from literature. The first is RTI. The target is positioned as presented in
Section 3.2, a standard Kalman filter (KF) is used for tracking and the RTSS is used for smoothing. The second method is a particle filter (PF). The implemented PF is a sequential importance resampling (SIR) filter with 1000 particles. The state estimate and covariance are calculated from the filtering distribution which is approximated by the set of particles and associated weights. A particle smoother could be implemented for approximating the smoothing distributions but for simplicity, the RTSS presented in
Section 4.1 is used instead. A re-initialization procedure is required by the PF, because it is prone to diverge when the measurement model is inaccurate [
3]. The PF is re-initialized, if the position error is larger than two meters, by drawing new particles from a uniform distribution within the monitored area and with zero velocity.
In
Figure 6, RMSE of the different filters as a function of EM iteration number. As shown, the tracking performance is satisfactory with the initial parameter estimates (EM iteration 0) and all filters have an RMSE above one meter. It is to be noted that most DFLT systems are implemented similarly, i.e., an empty-room calibration period is used to estimate
and an educated guess is used for the other parameters. The empty-room data cannot be used to estimate the other parameters since they depend on the location of the person. However, they can be estimated using the state estimates after the person has entered the monitored area and moves around. In this paper, the EM algorithm based on Gaussian smoothing is used and with better parameter estimates, the tracking accuracy can be improved as we demonstrate in the following.
After the filtering recursion, the RTSS recursion starts from the last time step and proceeds backwards to the first time step. Thereafter, the E-step of the EM algorithm can be approximated using the smoothing distribution and the parameter estimates are obtained from the M-step in closed form. Using the new parameter estimates, the filtering recursion is started from the beginning. This iterative process improves the model parameter estimates and results in enhanced tracking accuracy. As shown in
Figure 6, the RMSE decreases by 46–67%, depending on the filter. The results demonstrate that the implemented smoother and EM algorithm can also be used with other DFLT methods, and it is an effective method to improve system performance.
6.2. Parameter Estimation Algorithms
In this section, the EM algorithm is compared to the nonlinear least-squares (NLS) approach proposed in [
22]. The parameter estimates are obtained by minimizing the cost function
where
is the nonlinear exponential model and
is now a parameter to be estimated. In this paper, a nonlinear least-squares solver based on the interior-reflective Newton method described in [
48] is used to find the minimum of
and thereafter, the ML estimate of
is computed. The NLS approach provides freedom in the set of parameters that are estimated. In the following, we evaluate the NLS approach that estimates the following parameters
: (i)
, as proposed in this paper; (ii)
, as proposed in [
22]; and (iii)
, a system that estimates all measurement model parameters. The results are compared to the EM algorithm that estimates
. We denote the parameter estimation algorithm and set of parameters simply as NLS(
).
The RMSEs are illustrated in
Figure 7 and the results imply that estimating
yields the highest tracking accuracy whereas estimating
the lowest. To examine this difference more closely, we concentrate on Ex3 and the NLS and calculate the average
statistic, defined as
and
. The
statistic measures how much of the observed variation in the mean can be explained by the model. For the two cases,
and
for NLS(
) and NLS(
) in corresponding order, meaning that estimating
explains the mean of the data more accurately, but the difference is only
. Calculating the Kullback–Leibler Divergence (KLD) yields
and
for
and
in respective order. As the KLD indicates,
is unable to account for the noise in the data and improved estimators can be developed when better noise models are available. Thus, estimating
rather than
has a significantly higher impact on tracking accuracy. It is to be noted that EM(
) and NLS(
) yield comparative performance and small differences are expected, for example, due to the termination rule of the optimization method. Interestingly, NLS(
) has a higher RMSE than NLS(
). This is either caused by over fitting the model or then the optimization algorithm converges to a local minimum. In the measurement model,
and
are coupled and the optimization algorithm must solve for these simultaneously which can be problematic.
The main benefit of the proposed EM algorithm is that it can be solved in closed form using simple arithmetic operations, whereas the NLS approach requires a solver for the nonlinear optimization problem. In practice, the estimates only require computing two vector products ( and ) with complexity , and calculating with complexity , where n is the state dimension and K the number of measurements. As an example, for three minutes of experimental data and using the initial parameter estimates, the computation time of the parameter estimation algorithms in experiment Ex3 are: for EM(), NLS(), NLS() and NLS() in respective order. The results are obtained using a Matlab implementation and a computer equipped with a 2.60 GHz Intel Core i7-8850H processor and 32 GB of RAM. As demonstrated by the results, the EM algorithm is computationally very efficient, up to 18 times faster than NLS. It is to be noted that the computation time of NLS has a significant dependence on the parameter values that are used to initialize the optimization algorithm. For example, the computation time of NLS() is during the last parameter estimation iteration. Additionally, the link number has an impact since it defines the number of times NLS is called. As an example, the computation time in Ex and NLS() is , which is significantly shorter than in Ex because the experiment only uses one frequency channel.
6.3. System Comparison
In this section, the proposed system is benchmarked against an adaptive radio tomographic imaging (ARTI) system [
22]. ARTI is an imaging method that estimates
and
online, smoothing is used to enhance the image and state estimates, and NLS is used for estimating
and
. In the experiments, both systems are initialized without any prior information of the RSS, model parameters or location of the person. ARTI has an online calibration unit to estimate the reference RSS and the system is functional from the very beginning. The proposed system does not have such a feature and we use the online calibration unit of ARTI to estimate
during the first filtering recursion and then the unit is disabled during the subsequent iterations. It is to be noted that
could be initialized in various ways, but for matter of fairness we use the same method as ARTI uses.
The results are summarized in
Table 2 and for each experiment, the results are averaged over the three trials. As shown, the proposed algorithm results in superior performance, an average decrease of
in the RMSE with respect to ARTI. For ARTI the estimates are most of the time accurate, but in certain positions, the location estimate is widely off (the skewness is 5 and kurtosis is 37 indicating that the distribution has a positive skew, and it is heavy tailed). The main reason for the large position errors is that a link can measure a really large RSS change when the person is not on the link line, the straight imaginary line between the TX and RX. When using imaging methods, this one link will dominate over the other links and the person will be localized in between the wrong TX-RX pair. The proposed system is not as vulnerable to such outliers (skewness is 2 and kurtosis is 10) because of the implemented measurement selection logic which discards RTI position estimates with abnormally large errors. The performance difference between ARTI and EKF can also be explained with the set of parameters that are estimated by the systems. ARTI estimates
and this limits the achievable accuracy of the system as discussed in the previous section. To support this claim, in
Figure 6 it is shown that an RTI solution together with EM almost achieves the same accuracy as the proposed solution.
6.4. System Performance over Time
Next, we demonstrate that the proposed system can maintain its high accuracy over time. The conducted trials are actually snippets from a longer experiment. The entire experiment contains the three trials explained before and a five-minute period when the person randomly walks inside the monitored area. In between the occupancy time periods, the person leaves the area for two minutes at a time. The five-minute period takes place before the trials, and we will run ten iterations of the EM algorithm using data from this period. Then, the obtained parameter estimates are used the next time the person enters the area. After each trial, the EM algorithm is used once to recalculate the model parameter estimates. The results are summarized in
Table 3 and in every experiment, the tracking accuracy remains high throughout the different trials and there is no indication that the RMSE increases. This implies that the proposed system is suitable for estimating the model parameters without requiring human intervention and for maintaining high tracking accuracy over time. The ground truth trajectory together with the coordinate estimates is illustrated in
Figure 8 for Ex
. Please note that the covariance of RTI position estimates changes from frame to frame and, therefore, the pink area which illustrates the
confidence interval is not constant.
The described procedure is one of the possibilities how the proposed system would be used in practice, i.e., the parameters would be estimated at regular intervals or once the person has covered enough distance. However, there is a downside to the EM algorithm. It does not account for prior information, and it computes the ML estimates of the parameters from the data that is used as input. As an example, the data from the five-minute period is forgotten when the parameters are re-estimated after the first trial. This is an issue that must be solved for systems that are deployed over an extended period of time. One alternative is to compute the maximum a posteriori (MAP) estimates which can be done in practice by maximizing
at the M-step instead of the plain
[
36]. The prior information is included in the MAP estimate via the additional term
.
The ground truth trajectory is reconstructed from the video recording. When the person is stationary, the ground truth locations are accurate because the reference positions were measured precisely with a laser rangefinder, and it is easy to extract these time instances from the video. When the person is moving, the ground truth can contain small errors because the video and measurements cannot be perfectly synchronized. Furthermore, the person does not move exactly with constant velocity. Let us assume that the ground truth trajectory is reconstructed accurately and let the null hypothesis be that the RMSE is the same when the person is stationary and moving. Then, we can test the statistical significance of the result in determine whether the null hypothesis should be rejected or retained. The RMSE is when the person is standing still and when moving. The t-test statistic of the independent two-sample t-test equals and the critical value is with a significance level. Since the statistic is lower than the critical value, the null hypothesis remains valid and the RMSE for stationary and moving periods can be considered the same. The result indicates that the ground truth trajectory has been accurately reconstructed.
6.5. Simulations
Lastly, we want to validate the development efforts of this paper numerically using simulations. Thus, the performance of the proposed system and ARTI are numerically analyzed using a simulation scenario which replicates Experiment 3. In total, 100 Monte Carlo simulations are performed and for each run, the model parameters are randomly drawn. The model parameters used in the simulation are drawn from: a Gaussian distribution,
, non-standardized Student’s t-distribution
, a uniform distribution
and a log-normal distribution
. The exact distributions of the model parameters are unknown, but the used ones provide a functional fit and they resemble the empirical distributions obtained using data from the open environment experiments. In the following, the RMSE is evaluated with respect to the posterior Cramér-Rao bound (PCRB) of RSS-based DFLT [
3]. In addition, the RMSE of the parameter estimates are examined.
In
Figure 9, RMSE of the two systems as a function of parameter estimation iteration number is illustrated. With the initial parameter estimates, see iteration number zero in
Figure 9, ARTI achieves a lower RMSE because
and
are estimated online during the filtering recursion. However, the proposed system outperforms ARTI after the parameters have been estimated by the EM and NLS algorithms. As illustrated in the figure, the EKF converges much closer to the PCRB than ARTI. More quantitatively, at iteration number five, the PCRB is
whereas the RMSE of the EKF and ARTI are
and
in respective order, a
decrease in tracking error in favor of the EKF. The EKF achieves higher tracking accuracy due to two reasons. First, the EKF-based tracking algorithm is more accurate than the KF-based tracking algorithm of ARTI. With improved tracking performance the parameter estimates are more accurate which improves the tracking performance even further. As shown in
Table 4, the RMSE of the parameter estimates for the EKF are significantly lower for
,
and
. The second reason is that an accurate estimate of
rather than
has a significantly higher impact on tracking accuracy as discussed in
Section 6.2. As tabulated in
Table 4, the RMSE of
for ARTI and the EKF are
and
in respective order, only a
increase in RMSE when it is not estimated by the proposed system. Respectively, the RMSE of
decreases by
when estimated by the EM algorithm and as a result, improved estimators can be developed when better noise models are available.