1. Introduction
Driven by the demands of applications such as video surveillance and space target monitoring, multi-target tracking technology based on infrared images has made great progress [
1,
2,
3]. In these applications, one often encounters challenges such as the coexistence of the target of interest and other interfering targets or complex image backgrounds. Existing target tracking and detection algorithms mainly include two categories: Track-Before-Detect (TBD) [
4] and Detect-Before-Track (DBT) algorithms [
5]. Particle filtering is the main technology in the TBD category [
6]. Since sampling of the probability density function is required, the computational complexity of the method is high, and the performance will be seriously degraded when the background is complex. The corresponding technologies in the DBT group include the transform domain [
7], deep learning [
8], image separation [
9,
10] and other technologies. The current research mainly focuses on the latter two categories because of their superior performance. However, since deep learning methods require a large number of labeled training samples [
8] and the existing technology based on low-rank and sparse matrix separation requires joint spatial–temporal processing of multiple frames [
11,
12,
13,
14,
15,
16], the computational complexity is high. In addition, due to the improvement in current imaging sensors, targets often show “extended” characteristics; i.e., they occupy several pixels in the image rather than “point” targets [
17]. Reasonable exploitation of this feature will further improve the accuracy of target tracking methods.
The main innovations of this work are as follows:
- (1)
Low-rank and sparse matrix separation and the multi-target filtering method based on random finite sets (RFSs) are integrated into our framework. It effectively utilizes the physical characteristics of the target and background in the image and avoids the pure data-driven properties of the deep learning method; at the same time, the efficiency is deeply improved by filtering compared to traditional spatial–temporal, multi-frame, high-dimensional processing.
- (2)
The target obtained in the detection process of low-rank and sparse separation naturally has position and scale information, which is modeled as measurement by RFSs; meanwhile, the filtering process also uses RFSs to describe state parameters. The whole tracking process is realized within a conjugate Bayesian framework.
- (3)
In the filtering process, the continuity of the target’s motion is introduced through the state equation so that any false alarm that may occur in the detection process can be effectively suppressed; in addition, the result obtained by filtering can be used as prior information for detection in the next frame. Thus, deep fusion is achieved, and this makes the entire framework more accurate.
2. Differential Low-Rank and Sparse Matrix Separation
In target detection for infrared images, the different physical properties of the target and background, namely sparsity and low rank, respectively, are often used to construct a separation model, as shown in (1) [
18]:
where
is a “patch image” matrix whose columns are formed by a series of sliding windows of size
from top left to bottom right of the original image (see
Figure 1). Here,
and
are the low-rank and sparse matrices, which represent background and targets, respectively.
is the nuclear norm of the matrix, which reflects the low-rank constraint, while
is the
-norm that describes the sparsity. The parameters
and
adjust the weights of the low-rank and sparse terms. Without losing mathematical equivalence, the matrix
and its nuclear norm in (1) can be further decomposed into
where
, and
is the Frobenius norm of the matrix. Minimization is achieved when
and
.
Inspired by (2), the low-rank matrix is decomposed into two parts: the dictionary,
, and coefficient,
. Thus, (1) becomes
where
, and the dimensions of each matrix are
and
, respectively. A unified fast algorithm can be used to solve (3), and the specific process is shown in Algorithm 1. In this algorithm, the formula
is multiplied by the matrix
, which comes from the differentiability of the objective function in (3), which greatly improves the computational efficiency. At the same time,
is an operation carried out on each component of the vector, and its
-th component is
, which is only a threshold operation. Reference [
19] proves that if a solution obtained by the algorithm in Algorithm 1 satisfies
, then it is global optimal [
19].
Algorithm 1 Algorithm for solving the differential form of low-rank and sparse matrix separation.
|
Input: Data , dictionary , parameters and Step size controller Output: Low rank matrix , sparse matrix Define: Initialize: For k = 1, 2, …. (until convergence) End where |
3. PMBM for Extended Multi-Target Tracking
In recent years, due to the development of infrared sensors, imaging resolution has become higher and higher, so small targets will also occupy several pixels in an image, showing the characteristics of “extended” targets. In order to make better use of this information, in extended target filtering, the state of the target is added with the shape (scale) information on the basis of the original kinematic (such as position and velocity) information only.
3.1. PMBM for State Modeling
Extended target tracking requires appropriate mathematical tools to model the kinematic and shape information of the target. In this paper, the Bernoulli RFS in (6) is first introduced to describe a single extended target.
In (6), the target state vector takes an empty set, which means that the target “disappears” with a probability of
. The state
“exists” as a single element set,
, with a probability of
. The function
is a probability density function describing the position (through the expectation) and the extension (through the standard deviation) of the target. Generally,
can be chosen to be a Gaussian distribution.
Furthermore, in order to represent multiple targets, an index set,
, is introduced. In this case, the number of targets in
is equal to the cardinality of
, thus obtaining multi-Bernoulli (MB) RFSs.
where
and
represent the union of multiple targets. From (6) and (7), one can conclude that the key parameters of the MB model are
, which represent the survival probability of the
-th target and the probability density representing the “extended” state, respectively.
Data association is one of the key steps in multi-target tracking, and its purpose is to establish a connection between the measurement data and target status. The mathematical representation of data association is also considered in state modeling. Let be the corresponding weights of several MB RFSs; i.e., for a fixed , there is an MB RFS , and then their combination forms a multi-Bernoulli mixture (MBM) RFS. The key parameters of the MBM RFS can be listed as . The weight represents the probability that a certain measurement belongs to the target states.
When multiple targets exist, some target could be missed in some time steps. In (8), the undetected targets,
, are described through the Poisson point process (PPP):
where
,
are constant numbers, and
is the probability density of undetected targets.
is the inner product.
By combining (7) and (8), the state model for extended multi-target tracking, the Poisson multi-Bernoulli mixture (PMBM), can be obtained:
where
and
(
) are the detected and undetected targets, respectively, and
Once the state model was obtained, we also used the probability density function to represent state transition. In addition, was employed to represent the probability that the target still “exists” from time to time , and a PPP model with density was introduced to represent the emergence of new target.
3.2. Measurement Modeling
The measurement refers to the “targe part” in the infrared image obtained through the detection process. Let
be the probability that the target can be detected, and after detection, the model of the measurement is given as
which is a PPP model with an intensity of
. In (11),
is the Poisson intensity and
is the spatial distribution of the measurement. On the other hand, the probability that the target is not measured is
Assuming the measurement dataset is and is the union of the measurement index set and the target’s state index set , the process of data association is basically a partition of , and all possible results of the partition are set to . For a simple interpretation, if and , then the process of data association is carried out to assign these three measurement values to two targets, where each target can have multiple measurements or none. Thus, one result may be , where the measurements and are associated with the target , the target is not measured, and the measurement is not associated with any detected targets. The set indicates that it may be a measurement of a new target or clutter. In multi-target tracking, clutter is also modeled as a PPP with an intensity of .
3.3. Conjugate Bayesian Filtering
The process of filtering in multi-target tracking refers to obtaining the target state at time by combining the state transition information with the target state at time and the measurement at time . Since the states at time and time are the prior and posterior distributions, respectively, it is a Bayesian filtering process.
As discussed above, the target state is modeled as a PMBM distribution. It has been proven that in Bayesian filtering, if the prior distribution is a PMBM, then the posterior distribution also satisfies a PMBM. Therefore, the filtering process only needs to update the parameters of the model, eliminating the calculation of the density function itself, thus significantly improving efficiency.
The filtering process can be divided into two processes: prediction and update. The prediction means using the state at time k − 1 to “predict” a value at time k through the state transfer function
. Since the distribution in a PMBM is fixed (as Gaussian and Poisson), the parameters
are enough to describe the target state at time
. Furthermore, the predicted state also satisfies the PMBM, and the corresponding parameters are [
20,
21]
where
Since the process of prediction is independent of the measurement,
equals
.
The update process refers to the use of the predicted state value in (14), combined with the measurement
, to obtain the final target state at time
. It has been shown that in the PPP measurement model, the updated state also satisfies the PMBM [
20,
21]:
where
and the PPP intensity is
.
5. Experiments
In this section, an infrared image dataset from an unmanned aerial vehicle (UAV) is used to verify the performance of the algorithm proposed in this paper. This dataset has the following characteristics: (1) Due to the huge difference in distance between the sensor and the UAV, the scale of the target in the images varies greatly. At a closer distance, the UAV appears as an extended target, and at a longer distance, it appears as a point target. Therefore, it matches the features of the extended target filtering algorithm discussed in this paper. (2) The dataset contains multiple flight processes, and each flight process forms an image sequence. These sequences include both single UAV flights and multiple UAVs flying simultaneously. Meanwhile, there are both sky backgrounds and complex ground backgrounds, which enhances the difficulty of detection and tracking.
After comprehensive analysis, three representative state-of-the-art methods were selected for comparison. These three methods are four-dimensional spatial–temporal tensor decomposition with a block term decomposition-based norm and multidirectional derivative-based priors (4DST-BTMD) [
14]; a weighted adaptive Schatten p-norm and spatial–temporal tensor transpose variability (WASpN-STTTV) model [
13]; and a Spatial Temporary Tensor Modeling With Saliency Filter Regularization (STTM-SFR) algorithm [
10]. It has been shown that these algorithms outperform multiple frame sequential detection methods such as spatial temporal difference measurement (STLDM) [
22] and edge and corner awareness-based spatial temporal tensors (ECASTTs) [
23]. In addition, the advantages of PMBM filtering compared with other RFS filters have been verified in [
24].
There are some control parameters in the process of low-rank and sparse matrix decomposition, and their values are chosen as follows. In the construction of the patch image matrix,
, the size of the sliding window is 16*16 pixels. The balance parameters,
and
, in (3) are chosen according to the recommendations in [
13,
14] with fine-tuning for the infrared dataset. Specifically, we set
, where
and
are the 2D patch sizes, and
is the number of patches from the original image. The value of the constant
is initially set to 2.2 and adaptively adjusted in real time according to PMBM filtering based on the result of the previous timepoint. The value of
is set to
to balance the measurements and clusters in the filtering. As for PMBM tracking, the detection probability is 0.95, and the false alarm probability is
.
The metrics used for the performance comparison were also obtained from studies of the above methods, and they include the Background Suppression Factor (BSF) and the Signal-to-Clutter Ratio Gain (SCRG) [
13]. The BSF is a commonly used indicator for infrared image target detection, defined as
where
and
are the standard variances of the input and the detected image, respectively. The SCRG is defined as follows:
where
, and
are the maximum, minimum, and standard variance of the image, respectively. The receiver operating characteristic (ROC) curve [
13] is also used to compare the performance of the different algorithms, where the x-axis is
and the y-axis is
.
In the dataset [
11], six image sequences (each containing about 300 image frames with 256 × 256 pixels) were selected for performance verification. Sequences 1 and 2 are multiple extended targets in the sky background (see the two frames in
Figure 2a,b, respectively). The trajectories between the targets overlapped, which makes the tracking process difficult. Sequences 4 to 6 are images including complex backgrounds (see the four frames in
Figure 2c,g–i, respectively). Their backgrounds contain objects with different characteristics, which can cause confusion and make the detection process difficult. The corresponding results of low-rank and sparse matrix separation are listed below the original image frame, respectively (see
Figure 2d–f,j–l). One can see that the targets can be detected effectively.
For a quantitative comparison, the mean of the BSF and SCRG metrics are calculated over the six image sequences, respectively, and listed in
Table 1. Higher values mean better suppression of the background and better accuracy in object detection. The results show that the proposed algorithm outperforms the state-of-the-art algorithms, which are STTM-SFR, WAS
pN-STTTV, and 4DST-BTMD, except in one situation (see Seq. 4). It is worth mentioning that performance was enhanced not only by the low-rank and sparse separation algorithm itself but also by the use of accurate prior information in the filtering process.
Finally, the ROC curve is used to compare the values of
and
. In
Figure 3, the six panels (a) to (d) are plotted from image sequence 1 to 6, respectively. The horizontal axis in the figure represents the
value after logarithmic transformation (log10), and the vertical axis represents the
value. There are four curves in each panel, representing the four methods we employed. The red curve in each panel represents the proposed method, and it is almost always above the other curves.
6. Conclusions
A unified framework for small-target tracking in infrared images is proposed in this work. The framework deeply combines a fast low-rank and sparse matrix separation algorithm and a PMBM based on the Bayesian multi-target filtering process. The low-rank and sparse matrix separation completes the process of target detection based on a signal image, and it constitutes a fast algorithm benefiting from its differentiable form with global convergence. On the other hand, extended multi-target filtering is carried out, which matches the characteristics of the target in high-resolution infrared images. In the process of filtering, the state of multiple targets is modeled by the PMBM distribution. Due to the conjugate feature of the PMBM in the Bayesian framework, we only need to update the parameters rather than the whole distribution, so the computational complexity is significantly reduced. Unlike simultaneous spatial–temporal processing based on multiple frames in the existing literature, the enhancement in accuracy comes from the sequential use of detection and filtering, where the former provides the measurement for the latter and the latter provides precise prior information for the former. A theoretical complexity analysis is given in this work, and it is found to be lower than in existing 3D or 4D low-rank and sparse decomposition in multiple frames, which explains the efficiency of the whole framework. Finally, verification on a practical infrared dataset was carried out, and corresponding quantitative metrics (BCF and SCRG) and ROC curves proved the accuracy of the proposed method in cluster suppression and target detection. One of the limitations of the united framework comes from the parameters, which need fine-tuning based on the theoretical criteria.
Although the proposed framework has led to some improvements in terms of computational efficiency and accuracy, there is still some room for improvement. For example, the current combination is feature-level. Future work will consider a direct integration of matrix decomposition and PMBM filtering; that is, we will integrate the matrix decomposition process into the state modeling, transition, and update of the PMBM and develop a framework that advances from the current feature-level fusion to signal-level fusion.